Supervised text classification is the task of automatically assigning a category label to a previously unlabeled text document. We start with a collection of pre-labeled examples ...
In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, ma...
Abstract. We present the Tom language that extends Java with the purpose of providing high level constructs inspired by the rewriting community. Tom bridges thus the gap between a ...
Horatiu Cirstea, Pierre-Etienne Moreau, Antoine Re...
Category ranking provides a way to classify plain text documents into a pre-determined set of categories. This work proposes to have a look at typical document collections and ana...
Often, independent organizations define and advocate different XML formats for a similar purpose and, as a result, application programs need to mutually convert between such forma...