Sciweavers

119 search results - page 11 / 24
» Learning to Extract Text-Based Information from the World Wi...
Sort
View
WWW
2006
ACM
14 years 9 months ago
Robust web content extraction
We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative ...
Marek Kowalkiewicz, Maria E. Orlowska, Tomasz Kacz...
WWW
2009
ACM
14 years 3 months ago
Near real time information mining in multilingual news
This paper presents a near real-time multilingual news monitoring and analysis system that forms the backbone of our research work. The system integrates technologies to address t...
Martin Atkinson, Erik Van der Goot
WEBDB
1999
Springer
196views Database» more  WEBDB 1999»
14 years 27 days ago
Web Ecology: Recycling HTML Pages as XML Documents Using W4F
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to...
Arnaud Sahuguet, Fabien Azavant
MTA
2008
186views more  MTA 2008»
13 years 8 months ago
Tactile web browsing for blind people
Information on the World Wide Web becomes more and more important for our society. For blind people this is a chance to access more information for their everyday life. In this pap...
Martin Rotard, Christiane Taras, Thomas Ertl
WWW
2010
ACM
14 years 3 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han