Sciweavers

224 search results - page 17 / 45
» Syntactic Folding and its Application to the Information Ext...
Sort
View
WWW
2005
ACM
14 years 8 months ago
Extracting context to improve accuracy for HTML content extraction
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo
ICDM
2008
IEEE
143views Data Mining» more  ICDM 2008»
14 years 1 months ago
Exploiting Data Semantics to Discover, Extract, and Model Web Sources
We describe DEIMOS, a system that automatically discovers and models new sources of information. The system exploits four core technologies developed by our group that makes an en...
José Luis Ambite, Craig A. Knoblock, Kristi...
KDD
2009
ACM
228views Data Mining» more  KDD 2009»
14 years 8 months ago
A generalized Co-HITS algorithm and its application to bipartite graphs
Recently many data types arising from data mining and Web search applications can be modeled as bipartite graphs. Examples include queries and URLs in query logs, and authors and ...
Hongbo Deng, Michael R. Lyu, Irwin King
WWW
2001
ACM
14 years 8 months ago
Crawling the Hidden Web
Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of Web pages reachable purely by following hypertext links, ignoring search forms and pag...
Sriram Raghavan, Hector Garcia-Molina
WWW
2011
ACM
13 years 2 months ago
OXPath: little language, little memory, great value
Data about everything is readily available on the web—but often only accessible through elaborate user interactions. For automated decision support, extracting that data is esse...
Andrew Jon Sellers, Tim Furche, Georg Gottlob, Gio...