Sciweavers

498 search results - page 11 / 100
» Robust web content extraction
Sort
View
ICMCS
2007
IEEE
140views Multimedia» more  ICMCS 2007»
14 years 1 months ago
Audio Signature Extraction Based on Projections of Spectrograms
Content-based signatures are designed to be a robust bitstream representation of the content so as to enable content identi cation even though the original content may go through ...
Regunathan Radhakrishnan, Claus Bauer, Corey Cheng...
IIWAS
2008
13 years 8 months ago
Combining content extraction heuristics: the CombinE system
The main text content of an HTML document on the WWW is typically surrounded by additional contents, such as navigation menus, advertisements, link lists or design elements. Conte...
Thomas Gottron
DEXA
2006
Springer
197views Database» more  DEXA 2006»
13 years 8 months ago
Cleaning Web Pages for Effective Web Content Mining
Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-bas...
Jing Li, Christie I. Ezeife
AUSDM
2006
Springer
160views Data Mining» more  AUSDM 2006»
13 years 10 months ago
Extraction of Flat and Nested Data Records from Web Pages
This paper deals with studies the problem of identification and extraction of flat and nested data records from a given web page. With the explosive growth of information sources ...
Siddu P. Algur, P. S. Hiremath
SOFSEM
2007
Springer
14 years 25 days ago
Creating Permanent Test Collections of Web Pages for Information Extraction Research
In the research area of automatic web information extraction, there is a need for permanent and annotated web page collections enabling objective performance evaluation of differen...
Bernhard Pollak, Wolfgang Gatterbauer