Modeling Content from Human-Verified Blacklists for Accurate Zero-Hour Phish Detection
Phishing attacks are a signiﬁcant security threat to users of the Internet, causing tremendous economic loss every year. Past work in academia has not been adopted by industry in part due to concerns about liability over false positives. However, blacklist-based methods heavily used in industry are slow in responding to new phish attacks, and tend to be easily overwhelmed by phishing techniques such as fast-ﬂux and the proliferation of toolkits. In this paper, we present the design and evaluation of two blacklist-enhanced content-based algorithms. The key insight behind our algorithms is to leverage existing human-veriﬁed whitelists and blacklists, and relax them via probabilistic methods to attain high true positive rates while maintaining extremely low false positive rates. Comprehensive experiments over a diverse spectrum of data sources show that our approach currently achieves a false positive rate of 0.0434% with a true positive rate of 87.42%. Our algorithms are able to adapt quickly to new phishing attacks by incremental retraining, and present a new framework that will generalize to evolving attacks.