Sciweavers

DEXAW
2008
IEEE
123views Database» more  DEXAW 2008»
14 years 2 months ago
Text Extraction from the Web via Text-to-Tag Ratio
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
Tim Weninger, William H. Hsu