Sciweavers

708 search results - page 12 / 142
» Identifying Content Blocks from Web Documents
Sort
View
ESORICS
2011
Springer
12 years 8 months ago
Protecting Private Web Content from Embedded Scripts
Many web pages display personal information provided by users. The goal of this work is to protect that content from untrusted scripts that are embedded in host pages. We present a...
Yuchen Zhou, David Evans
DEXAW
2008
IEEE
123views Database» more  DEXAW 2008»
14 years 3 months ago
Text Extraction from the Web via Text-to-Tag Ratio
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
Tim Weninger, William H. Hsu
CIKM
2008
Springer
13 years 10 months ago
Identifying table boundaries in digital documents via sparse line detection
Most prior work on information extraction has focused on extracting information from text in digital documents. However, often, the most important information being reported in an...
Ying Liu, Prasenjit Mitra, C. Lee Giles
CIKM
2010
Springer
13 years 7 months ago
Automatic metadata extraction from multilingual enterprise content
Enterprises provide professionally authored content about their products/services in different languages for use in web sites and customer care. For customer care, personalization...
Melike Sah, Vincent Wade
KDD
2003
ACM
161views Data Mining» more  KDD 2003»
14 years 9 months ago
Eliminating noisy information in Web pages for data mining
A commercial Web page typically contains many information blocks. Apart from the main content blocks, it usually has such blocks as navigation panels, copyright and privacy notice...
Lan Yi, Bing Liu, Xiaoli Li