Sciweavers

708 search results - page 27 / 142
» Identifying Content Blocks from Web Documents
Sort
View
CIKM
1999
Springer
14 years 27 days ago
Word Segmentation and Recognition for Web Document Framework
It is observed that a better approach to Web information understanding is to base on its document framework, which is mainly consisted of (i) the title and the URL name of the pag...
Chi-Hung Chi, Chen Ding, Andrew Lim
WWW
2006
ACM
14 years 9 months ago
Cat and mouse: content delivery tradeoffs in web access
Web pages include extraneous material that may be viewed as undesirable by a user. Increasingly many Web sites also require users to register to access either all or portions of t...
Balachander Krishnamurthy, Craig E. Wills
ACL
2006
13 years 10 months ago
A DOM Tree Alignment Model for Mining Parallel Data from the Web
This paper presents a new web mining scheme for parallel data acquisition. Based on the Document Object Model (DOM), a web page is represented as a DOM tree. Then a DOM tree align...
Lei Shi, Cheng Niu, Ming Zhou, Jianfeng Gao
DEXAW
2010
IEEE
181views Database» more  DEXAW 2010»
13 years 9 months ago
Towards a Search System for the Web Exploiting Spatial Data of a Web Document
In this paper, we describe our work in progress in the scope of information retrieval exploiting the spatial data extracted from web documents. We discuss problems of a search for ...
Stefan Dlugolinsky, Michal Laclavik, Ladislav Hluc...
RIAO
2007
13 years 10 months ago
Content-Based Peer-to-Peer Network Overlay for Full-Text Federated Search
Peer-to-peer network overlays have mostly been designed to support search over document names, identifiers, or keywords from a small or controlled vocabulary. In this paper we pro...
Jie Lu, Jamie Callan