Sciweavers

85 search results - page 4 / 17
» ECON: An Approach to Extract Content from Web News Page
Sort
View
SIGIR
2005
ACM
14 years 1 months ago
Title extraction from bodies of HTML documents and its application to web page retrieval
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Sh...
SIGIR
2000
ACM
13 years 12 months ago
OCELOT: a system for summarizing Web pages
Abstract We introduce OCELOT, a prototype system for automatically generating the “gist” of a web page by summarizing it. Although most text summarization research to date has ...
Adam L. Berger, Vibhu O. Mittal
SOFSEM
2007
Springer
14 years 1 months ago
Creating Permanent Test Collections of Web Pages for Information Extraction Research
In the research area of automatic web information extraction, there is a need for permanent and annotated web page collections enabling objective performance evaluation of differen...
Bernhard Pollak, Wolfgang Gatterbauer
DEXA
2006
Springer
197views Database» more  DEXA 2006»
13 years 9 months ago
Cleaning Web Pages for Effective Web Content Mining
Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-bas...
Jing Li, Christie I. Ezeife
DOCENG
2009
ACM
14 years 2 months ago
Web document text and images extraction using DOM analysis and natural language processing
: © Web Document Text and Images Extraction using DOM Analysis and Natural Language Processing Parag Mulendra Joshi, Sam Liu HP Laboratories HPL-2009-187 Web page text extraction,...
Parag Mulendra Joshi, Sam Liu