Originally conceived as a "naive" baseline experiment using traditional n-gram language models as classifiers, the NCLEANER system has turned out to be a fast and lightw...
Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-bas...
Developing Web pages following established standards can make the information more accessible, their rendering more efficient, and their processing by computer applications easier...
Ontologies are becoming increasingly common in the World Wide Web as the building block for a future Semantic Web. In this Web, ontologies will be responsible for making the semant...
Karin Koogan Breitman, Carolina Howard Felic&iacut...
In this paper we present CUTER, a system that processes HTML pages in order to extract the useful text from them. The mechanism is focalized on HTML pages that include news articl...
George Adam, Christos Bouras, Vassilis Poulopoulos