Sciweavers

543 search results - page 8 / 109
» Exploiting content redundancy for web information extraction
Sort
View
WWW
2009
ACM
14 years 8 months ago
Estimating web site readability using content extraction
Nowadays, information is primarily searched on the WWW. From a user perspective, the readability is an important criterion for measuring the accessibility and thereby the quality ...
Thomas Gottron, Ludger Martin
WSDM
2010
ACM
265views Data Mining» more  WSDM 2010»
14 years 4 months ago
Data-oriented Content Query System: Searching for Data into Text on the Web
As the Web provides rich data embedded in the immense contents inside pages, we witness many ad-hoc efforts for exploiting fine granularity information across Web text, such as We...
Kevin Chen-Chuan Chang, Mianwei Zhou, Tao Cheng
WWW
2005
ACM
14 years 8 months ago
Extracting context to improve accuracy for HTML content extraction
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo
ICDM
2008
IEEE
143views Data Mining» more  ICDM 2008»
14 years 2 months ago
Exploiting Data Semantics to Discover, Extract, and Model Web Sources
We describe DEIMOS, a system that automatically discovers and models new sources of information. The system exploits four core technologies developed by our group that makes an en...
José Luis Ambite, Craig A. Knoblock, Kristi...
WWW
2003
ACM
14 years 8 months ago
Annotating Web pages for the needs of Web Information Extraction Applications
This paper outlines our approach to the creation of annotated corpora for the purposes of Web Information Extraction, and presents the Web Annotation tool. This tool enables the a...
Georgios Sigletos, Dimitra Farmakiotou, Konstantin...