Semi-structured documents (e.g. journal art,icles, electronic mail, television programs, mail order catalogs, ...) a.re often not explicitly typed; the only available t,ype inform...
Motivated by current efforts to construct more realistic spam filtering experimental corpora, we present a newly assembled, publicly available corpus of genuine and unsolicited (s...
The semi-structured information available in HTML and similar documents provide valuable information that can be used for information extraction applications. This information tog...
The Arabic language is a highly flexional and morphologically very rich language. It presents serious challenges to the automatic classification of documents, one of which is deter...
Functional image classification is the assignment of different image types to separate classes to optimize their rendering for reading or other specific end task, and is an import...
Rafael Dueire Lins, Gabriel Pereira e Silva, Steve...