Sciweavers

498 search results - page 4 / 100
» Robust web content extraction
Sort
View
WWW
2009
ACM
14 years 7 months ago
Estimating web site readability using content extraction
Nowadays, information is primarily searched on the WWW. From a user perspective, the readability is an important criterion for measuring the accessibility and thereby the quality ...
Thomas Gottron, Ludger Martin
WWW
2003
ACM
14 years 7 months ago
DOM-based content extraction of HTML documents
Web pages often contain clutter (such as pop-up ads, unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction o...
Suhit Gupta, Gail E. Kaiser, David Neistadt, Peter...
WWW
2005
ACM
14 years 7 months ago
Extracting semantic structure of web documents using content and visual information
This work aims to provide a page segmentation algorithm which uses both visual and content information to extract the semantic structure of a web page. The visual information is u...
Rupesh R. Mehta, Pabitra Mitra, Harish Karnick
WWW
2010
ACM
13 years 6 months ago
Exploiting content redundancy for web information extraction
We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
CIKM
2009
Springer
13 years 7 months ago
OfCourse: web content discovery, classification and information extraction for online course materials
: OfCourse: Web Content Discovery, Classification and Information Extraction for Online Course Materials Yuhong Xiong, Ping Luo, Yong Zhao, Fen Lin, Shicong Feng, Baoyao Zhou, Liw...
Yuhong Xiong, Ping Luo, Yong Zhao, Fen Lin, Shicon...