Sciweavers

502 search results - page 8 / 101
» Extracting Partial Structures from HTML Documents
Sort
View
WEBI
2004
Springer
14 years 1 months ago
Semi-Structured Complex List Extraction
The semi-structured information available in HTML and similar documents provide valuable information that can be used for information extraction applications. This information tog...
Anders Arpteg
TREC
2000
13 years 9 months ago
Information Space Based on HTML Structure
The main goal for the Information Space system for TREC9 was early precision. To facilitate this, an emphasis was placed on seeking matches from only the TITLE, H1, H2 and H3 tags...
Gregory B. Newby
IJCAI
2003
13 years 9 months ago
Information Extraction from Web Documents Based on Local Unranked Tree Automaton Inference
Information extraction (IE) aims at extracting specific information from a collection of documents. A lot of previous work on 10 from semi-structured documents (in XML or HTML) us...
Raymond Kosala, Maurice Bruynooghe, Jan Van den Bu...
ER
2003
Springer
98views Database» more  ER 2003»
14 years 1 months ago
Extracting Relations from XML Documents
XML is becoming a prevalent format for data exchange. Many XML documents have complex schemas that are not always known, and can vary widely between information sources and applica...
Eugene Agichtein, C. T. Howard Ho, Vanja Josifovsk...
DOCENG
2009
ACM
14 years 2 months ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan