Sciweavers

498 search results - page 8 / 100
» Robust web content extraction
Sort
View
AAAI
2006
13 years 8 months ago
Table Extraction Using Spatial Reasoning on the CSS2 Visual Box Model
Tables on web pages contain a huge amount of semantically explicit information, which makes them a worthwhile target for automatic information extraction and knowledge acquisition...
Wolfgang Gatterbauer, Paul Bohunsky
WWW
2011
ACM
13 years 1 months ago
HyLiEn: a hybrid approach to general list extraction on the web
We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual struc...
Fabio Fumarola, Tim Weninger, Rick Barber, Donato ...
DEXAW
2008
IEEE
123views Database» more  DEXAW 2008»
14 years 1 months ago
Text Extraction from the Web via Text-to-Tag Ratio
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
Tim Weninger, William H. Hsu
COLING
2008
13 years 8 months ago
Emotion Classification Using Massive Examples Extracted from the Web
In this paper, we propose a data-oriented method for inferring the emotion of a speaker conversing with a dialog system from the semantic content of an utterance. We first fully a...
Ryoko Tokhisa, Kentaro Inui, Yuji Matsumoto
ICWS
2009
IEEE
14 years 3 months ago
Deactivation of Unwelcomed Deep Web Extraction Services through Random Injection
Websites serve content both through Web Services as well as through user-viewable webpages. While the consumers of web-services are typically ‘machines’, webpages are meant fo...
Varun Bhagwan, Tyrone Grandison