— Web script crashes and malformed dynamically-generated web pages are common errors, and they seriously impact the usability of web applications. Current tools for web-page vali...
Shay Artzi, Adam Kiezun, Julian Dolby, Frank Tip, ...
The web contains lots of interesting factual information about entities, such as celebrities, movies or products. This paper describes a robust bootstrapping approach to corrobora...
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...
Combating Web spam has become one of the top challenges for Web search engines. State-of-the-art spam detection techniques are usually designed for specific known types of Web spa...
Yiqun Liu, Rongwei Cen, Min Zhang, Shaoping Ma, Li...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...