Sciweavers

8316 search results - page 125 / 1664
» Web Document Modeling
Sort
View
WWW
2002
ACM
13 years 10 months ago
Improvement of HITS-based algorithms on web documents
In this paper, we present two ways to improve the precision of HITS-based algorithms on Web documents. First, by analyzing the limitations of current HITS-based algorithms, we pro...
Longzhuang Li, Yi Shang, Wei Zhang
CIKM
2003
Springer
14 years 4 months ago
Extracting unstructured data from template generated web documents
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Ling Ma, Nazli Goharian, Abdur Chowdhury, Misun Ch...
EMNLP
2008
14 years 6 days ago
An Exploration of Document Impact on Graph-Based Multi-Document Summarization
The graph-based ranking algorithm has been recently exploited for multi-document summarization by making only use of the sentence-to-sentence relationships in the documents, under...
Xiaojun Wan
LREC
2008
140views Education» more  LREC 2008»
14 years 6 days ago
Unsupervised Relation Extraction From Web Documents
The IDEX system is a prototype of an interactive dynamic Information Extraction (IE) system. A user of the system expresses an information request in the form of a topic descripti...
Kathrin Eichler, Holmer Hemsen, Günter Neuman...
WWW
2005
ACM
14 years 11 months ago
Extracting semantic structure of web documents using content and visual information
This work aims to provide a page segmentation algorithm which uses both visual and content information to extract the semantic structure of a web page. The visual information is u...
Rupesh R. Mehta, Pabitra Mitra, Harish Karnick