Sciweavers

498 search results - page 15 / 100
» Robust web content extraction
Sort
View
SIGIR
2006
ACM
14 years 21 days ago
Getting work done on the web: supporting transactional queries
Many searches on the web have a transactional intent. We argue that pages satisfying transactional needs can be distinguished from the more common pages that have some information...
Yunyao Li, Rajasekar Krishnamurthy, Shivakumar Vai...
CIKM
2006
Springer
13 years 10 months ago
A fast and robust method for web page template detection and removal
The widespread use of templates on the Web is considered harmful for two main reasons. Not only do they compromise the relevance judgment of many web IR and web mining methods suc...
Karane Vieira, Altigran Soares da Silva, Nick Pint...
SEMWEB
2009
Springer
14 years 1 months ago
Policy-Aware Content Reuse on the Web
The Web allows users to share their work very effectively leading to the rapid re-use and remixing of content on the Web including text, images, and videos. Scientific research d...
Oshani Seneviratne, Lalana Kagal, Tim Berners-Lee
LAWEB
2003
IEEE
14 years 7 hour ago
On the Image Content of the Chilean Web
In this paper we perform a study of the image contents of the Chilean web (.cl domain) using automatic feature extraction, content-based analysis and face detection algorithms. In...
Alejandro Jaimes, Javier Ruiz-del-Solar, Rodrigo V...
ICDE
2010
IEEE
255views Database» more  ICDE 2010»
14 years 1 months ago
On supporting effective web extraction
— Commercial tuple extraction systems have enjoyed some success to extract tuples by regarding HTML pages as tree structures and exploiting XPath queries to find attributes of t...
Wook-Shin Han, Wooseong Kwak, Hwanjo Yu