Sciweavers

131 search results - page 10 / 27
» Ranking-Constrained Keyword Sequence Extraction from Web Doc...
Sort
View
SIGIR
2009
ACM
14 years 1 months ago
Web-derived resources for web information retrieval: from conceptual hierarchies to attribute hierarchies
A weakly-supervised extraction method identifies concepts within conceptual hierarchies, at the appropriate level of specificity (e.g., Bank vs. Institution), to which attribute...
Marius Pasca, Enrique Alfonseca
WWW
2005
ACM
14 years 7 months ago
Extracting context to improve accuracy for HTML content extraction
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo
ISMIS
2003
Springer
13 years 12 months ago
MetaNews: An Information Agent for Gathering News Articles on the Web
This paper presents MetaNews, an information gathering agent for news articles on the Web. MetaNews reads HTML documents from online news sites and extracts article information fro...
Dae-Ki Kang, Joongmin Choi
WWW
2006
ACM
14 years 7 months ago
Robust web content extraction
We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative ...
Marek Kowalkiewicz, Maria E. Orlowska, Tomasz Kacz...
DOCENG
2003
ACM
13 years 12 months ago
Creating reusable well-structured PDF as a sequence of component object graphic (COG) elements
Portable Document Format (PDF) is a page-oriented, graphically rich format based on PostScript semantics and it is also the format interpreted by the Adobe Acrobat viewers. Althou...
Steven R. Bagley, David F. Brailsford, Matthew R. ...