We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through...
Adriano Veloso, Wagner Meira Jr., Marco Cristo, Ma...
Abstract. As XML diffusion keeps increasing, it is today common practice for most developers to deal with XML parsing and transformation. XML is used as format to e.g. render data,...
Web directory hierarchy is critical to serve user’s search request. Creating and maintaining such directories without human experts involvement requires good classification of we...
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...
Many text documents naturally have two kinds of labels. For example, we may label web pages from universities according to their categories, such as "student" or "fa...