Search Sciweavers | Sciweavers

708 search results - page 9 / 142

» Identifying Content Blocks from Web Documents

194

click to vote

WWW
2008
ACM

201views Internet Technology» more WWW 2008»

Mining the search trails of surfing crowds: identifying relevant websites from user activity

16 years 7 months ago

Download www2008.org

The paper proposes identifying relevant information sources from the history of combined searching and browsing behavior of many Web users. While it has been previously shown that...

Mikhail Bilenko, Ryen W. White

claim paper

Read More »

158

Voted

ANLP
1994

105views more ANLP 1994»

Modeling Content Identification from Document Images

15 years 8 months ago

Download acl.ldc.upenn.edu

A new technique to locate content-representing words for a given document image using representation of character shapes is described. A character shape code representation define...

Takehiro Nakayama

claim paper

Read More »

122

click to vote

COLING
1996

103views Computational Linguistics» more COLING 1996»

Identifying the Coding System and Language of On-line Documents on the Internet

15 years 8 months ago

Download acl.ldc.upenn.edu

This paper proposes a new algorithm that simultaneously identifies the coding system and language of a code string fetched from the Internet, especially World-Wide Web. The algori...

Gen-itiro Kikui

claim paper

Read More »

250

click to vote

DOCENG
2009
ACM

223views Document Analysis» more DOCENG 2009»

Web article extraction for web printing: a DOM+visual based approach

16 years 1 months ago

Download www.hpl.hp.com

: © Web Article Extraction for Web Printing: a DOM+Visual based Approach Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong, Jerry; Liu HP Laboratories HPL-2009-185 Article extrac...

Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong...

claim paper

Read More »

182

click to vote

ENTCS
2006

116views more ENTCS 2006»

How Recent is a Web Document?

15 years 6 months ago

Download www.icsi.berkeley.edu

One of the most important aspects of a Web document is its up-to-dateness or recency. Up-to-dateness is particularly relevant to Web documents because they usually contain content...

Bo Hu, Florian Lauck, Jan Scheffczyk

claim paper

Read More »

« Prev « First page 9 / 142 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers