Search Sciweavers | Sciweavers

85 search results - page 1 / 17

» Extracting unstructured data from template generated web doc...

160

click to vote

CIKM
2003
Springer

129views Information Technology» more CIKM 2003»

Extracting unstructured data from template generated web documents

15 years 12 months ago

Download www.ir.iit.edu

We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...

Ling Ma, Nazli Goharian, Abdur Chowdhury, Misun Ch...

claim paper

Read More »

181

click to vote

CIKM
1998
Springer

120views Information Technology» more CIKM 1998»

Ontology-Based Extraction and Structuring of Information from Data-Rich Unstructured Documents

15 years 11 months ago

Download pages.cs.wisc.edu

We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontolog...

David W. Embley, Douglas M. Campbell, Randy D. Smi...

claim paper

Read More »

164

click to vote

AAAI
1997

162views Intelligent Agents» more AAAI 1997»

Template-Based Information Mining from HTML Documents

15 years 8 months ago

Download research.microsoft.com

Tools for mining information from data can create added value for the Internet. As the majority of electronic documents available over the network are in unstructured textual form...

Jane Yung-jen Hsu, Wen-tau Yih

claim paper

Read More »

157

Voted

COMAD
2009

190views Knowledge Management» more COMAD 2009»

Business Insight from Collection of Unstructured Formatted Documents with IBM Content Harvester

15 years 7 months ago

Download www.cse.iitb.ac.in

In this paper, we report the development and experiments of IBM Content Harvester (CH), a tool to analyze and recover templates and content from word processor created text docume...

Biplav Srivastava, Yuan-Chi Chang

claim paper

Read More »

172

Voted

SIGIR
2004
ACM

135views Information Technology» more SIGIR 2004»

16 years 3 days ago

Query-related data extraction of hidden web documents

Download dis.shef.ac.uk

The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dyna...

Yih-Ling Hedley, Muhammad Younas, Anne E. James, M...

claim paper

Read More »

« Prev « First page 1 / 17 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers