Sciweavers

708 search results - page 19 / 142
» Identifying Content Blocks from Web Documents
Sort
View
HT
2010
ACM
14 years 1 months ago
Is this a good title?
Missing web pages, URIs that return the 404 “Page Not Found” error or the HTTP response code 200 but dereference unexpected content, are ubiquitous in today’s browsing exper...
Martin Klein, Jeffery L. Shipman, Michael L. Nelso...
SIGIR
2005
ACM
14 years 2 months ago
Title extraction from bodies of HTML documents and its application to web page retrieval
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Sh...
SIGIR
2003
ACM
14 years 1 months ago
Building a web thesaurus from web link structure
Thesaurus has been widely used in many applications, including information retrieval, natural language processing, and question answering. In this paper, we propose a novel approa...
Zheng Chen, Shengping Liu, Liu Wenyin, Geguang Pu,...
HT
2003
ACM
14 years 1 months ago
Untangling compound documents on the web
Most text analysis is designed to deal with the concept of a “document”, namely a cohesive presentation of thought on a unifying subject. By contrast, individual nodes on the ...
Nadav Eiron, Kevin S. McCurley
ICWE
2010
Springer
13 years 6 months ago
Linking Related Documents: Combining Tag Clouds and Search Queries
Nowadays, Web encyclopedias suffer from a high bounce rate. Typically, users come to an encyclopaedia from a search engine and upon reading the first page on the site they leave it...
Christoph Trattner, Denis Helic