Sciweavers

498 search results - page 6 / 100
» Robust web content extraction
Sort
View
MAICS
2004
13 years 8 months ago
Intelligent Content Based Title and Author Name Extraction from Formatted Documents
This paper describes the development of algorithms for extracting the title and the names of the authors from documents available on the World Wide Web. In this paper we describe ...
Eric G. Berkowitz, Mohamed Reda Elkhadiri, Tim Sah...
WWW
2010
ACM
14 years 1 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han
ICWE
2004
Springer
14 years 2 days ago
Personalizing Web Sites for Mobile Devices Using a Graphical User Interface
Despite recent advances in wireless and portable hardware technologies, mobile access to the Web is often laborious. For this reason, several solutions have been proposed to custom...
Leonardo Teixeira Passos, Marco Tulio de Oliveira ...
CIKM
2008
Springer
13 years 8 months ago
Coreex: content extraction from online news articles
We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...
Jyotika Prasad, Andreas Paepcke
WISA
2004
Springer
14 years 1 days ago
Content-Based Synchronization Using the Local Invariant Feature for Robust Watermarking
This paper addresses the problem of content-based synchronization for robust watermarking. Synchronization is a process of extracting the location to embed and detect the signature...
Hae-Yeoun Lee, Jong-Tae Kim, Heung-Kyu Lee, Young-...