Sciweavers

708 search results - page 9 / 142
» Identifying Content Blocks from Web Documents
Sort
View
WWW
2008
ACM
14 years 9 months ago
Mining the search trails of surfing crowds: identifying relevant websites from user activity
The paper proposes identifying relevant information sources from the history of combined searching and browsing behavior of many Web users. While it has been previously shown that...
Mikhail Bilenko, Ryen W. White
ANLP
1994
105views more  ANLP 1994»
13 years 10 months ago
Modeling Content Identification from Document Images
A new technique to locate content-representing words for a given document image using representation of character shapes is described. A character shape code representation define...
Takehiro Nakayama
COLING
1996
13 years 10 months ago
Identifying the Coding System and Language of On-line Documents on the Internet
This paper proposes a new algorithm that simultaneously identifies the coding system and language of a code string fetched from the Internet, especially World-Wide Web. The algori...
Gen-itiro Kikui
DOCENG
2009
ACM
14 years 3 months ago
Web article extraction for web printing: a DOM+visual based approach
: © Web Article Extraction for Web Printing: a DOM+Visual based Approach Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong, Jerry; Liu HP Laboratories HPL-2009-185 Article extrac...
Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong...
ENTCS
2006
116views more  ENTCS 2006»
13 years 8 months ago
How Recent is a Web Document?
One of the most important aspects of a Web document is its up-to-dateness or recency. Up-to-dateness is particularly relevant to Web documents because they usually contain content...
Bo Hu, Florian Lauck, Jan Scheffczyk