Identifying the true type of a computer file can be a difficult problem. Previous methods of file type recognition include fixed file extensions, fixed “magic numbers” stored ...
We report on an automated runtime anomaly detection method at the application layer of multi-node computer systems. Although several network management systems are available in th...
A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...
Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...
In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engin...
Alexandros Ntoulas, Marc Najork, Mark Manasse, Den...
Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-bas...