A central problem in information retrieval is the automated classification of text documents. While many existing methods achieve good levels of performance, they generally require...
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...
The main text content of an HTML document on the WWW is typically surrounded by additional contents, such as navigation menus, advertisements, link lists or design elements. Conte...
Learning to rank is a new statistical learning technology on creating a ranking model for sorting objects. The technology has been successfully applied to web search, and is becom...
Tao Qin, Tie-Yan Liu, Xu-Dong Zhang, De-Sheng Wang...
We present a novel model for validating and improving the content and structure organization of a website. This model studies the website as a graph and evaluates its interconnect...