Sciweavers

502 search results - page 34 / 101
» Extracting Partial Structures from HTML Documents
Sort
View
ECIR
2010
Springer
13 years 10 months ago
Extracting Multilingual Topics from Unaligned Comparable Corpora
Topic models have been studied extensively in the context of monolingual corpora. Though there are some attempts to mine topical structure from cross-lingual corpora, they require ...
Jagadeesh Jagarlamudi, Hal Daumé III
VLDB
2011
ACM
251views Database» more  VLDB 2011»
13 years 3 months ago
Harvesting relational tables from lists on the web
A large number of web pages contain data structured in the form of “lists”. Many such lists can be further split into multi-column tables, which can then be used in more seman...
Hazem Elmeleegy, Jayant Madhavan, Alon Y. Halevy
ERCIMDL
2005
Springer
115views Education» more  ERCIMDL 2005»
14 years 2 months ago
A No-Compromises Architecture for Digital Document Preservation
Abstract. The Multivalent Document Model offers a practical, proven, nocompromises architecture for preserving digital documents of potentially any data format. We have implemented...
Thomas A. Phelps, Paul B. Watry
DOCENG
2007
ACM
14 years 12 days ago
Extracting reusable document components for variable data printing
Variable Data Printing (VDP) has brought new flexibility and dynamism to the printed page. Each printed instance of a specific class of document can now have different degrees of ...
Steven R. Bagley, David F. Brailsford, James A. Ol...
RIAO
2000
13 years 9 months ago
Combining linguistic and spatial information for document analysis
We present a framework to analyze color documents of complex layout. In addition, no assumption is made on the layout. Our framework combines in a content-driven bottom-up approac...
Marco Aiello, Christof Monz, Leon Todoran