Sciweavers

2190 search results - page 103 / 438
» Unweaving a web of documents
Sort
View
ICDAR
2009
IEEE
14 years 5 months ago
User-Guided Wrapping of PDF Documents Using Graph Matching Techniques
There are a number of established products on the market for wrapping—semi-automatic navigation and extraction of data—from web pages. These solutions make use of the inherent...
Tamir Hassan
IEEEICCI
2002
IEEE
14 years 3 months ago
An Agent-Assisted Document Storage for Software Process Environments
Traditional software process environment stores documents using either centralized or distributed approach. With the assistance of web agent, this paper presents a new document st...
Jason Jen-Yen Chen, Chun-Yi Lin
DEXAW
2008
IEEE
123views Database» more  DEXAW 2008»
14 years 4 months ago
Text Extraction from the Web via Text-to-Tag Ratio
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
Tim Weninger, William H. Hsu
ICDE
2008
IEEE
218views Database» more  ICDE 2008»
14 years 11 months ago
AxPRE Summaries: Exploring the (Semi-)Structure of XML Web Collections
The nature of semistructured data in web collections is evolving. Increasingly, XML web documents (or documents exchanged via web services) are valid with regard to a schema, yet ...
Mariano P. Consens, Flavio Rizzolo, Alejandro A. V...
CIKM
2009
Springer
14 years 5 months ago
Improving web page classification by label-propagation over click graphs
In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled si...
Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffn...