Sciweavers

502 search results - page 11 / 101
» Extracting Partial Structures from HTML Documents
Sort
View
WWW
2009
ACM
14 years 9 months ago
Extracting article text from the web with maximum subsequence segmentation
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
Jeff Pasternack, Dan Roth
DOCENG
2009
ACM
14 years 2 months ago
Web document text and images extraction using DOM analysis and natural language processing
: © Web Document Text and Images Extraction using DOM Analysis and Natural Language Processing Parag Mulendra Joshi, Sam Liu HP Laboratories HPL-2009-187 Web page text extraction,...
Parag Mulendra Joshi, Sam Liu
EACL
2006
ACL Anthology
13 years 9 months ago
Multilingual Term Extraction from Domain-specific Corpora Using Morphological Structure
Morphologically complex terms composed from Greek or Latin elements are frequent in scientific and technical texts. Word forming units are thus relevant cues for the identificatio...
Delphine Bernhard
WEBDB
2009
Springer
149views Database» more  WEBDB 2009»
14 years 3 months ago
Extracting Route Directions from Web Pages
Linguists and geographers are more and more interested in route direction documents because they contain interesting motion descriptions and language patterns. A large number of s...
Xiao Zhang, Prasenjit Mitra, Sen Xu, Anuj R. Jaisw...
ICDAR
2003
IEEE
14 years 1 months ago
Document Transformation System from Papers to XML Data Based on Pivot XML Document Method
This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transfor...
Yasuto Ishitani