Identifying suspicious URLs: an application of large-scale online learning

16 years 7 months ago

Download cseweb.ucsd.edu

This paper explores online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs. We show that this application is particularly appropriate for online algorithms as the size of the training data is larger than can be efficiently processed in batch and because the distribution of features that typify malicious URLs is changing continuously. Using a real-time system we developed for gathering URL features, combined with a real-time source of labeled URLs from a large Web mail provider, we demonstrate that recentlydeveloped online algorithms can be as accurate as batch techniques, achieving classification accuracies up to 99% over a balanced data set.

Justin Ma, Lawrence K. Saul, Stefan Savage, Geoffr

Real-time Traffic

Associated Urls | ICML 2009 | Machine Learning | Malicious Urls | Online Algorithms |

claim paper

» Towards more effective distance functions for word image matching

» Using the wisdom of the crowds for keyword generation

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2009
Where	ICML
Authors	Justin Ma, Lawrence K. Saul, Stefan Savage, Geoffrey M. Voelker

Comments (0)

Sciweavers

Identifying suspicious URLs: an application of large-scale online learning

Associated Urls | ICML 2009 | Machine Learning | Malicious Urls | Online Algorithms |

Explore & Download

Productivity Tools

Sciweavers