Search Sciweavers | Sciweavers

708 search results - page 10 / 142

» Identifying Content Blocks from Web Documents

161

click to vote

ACL
2006

101views Computational Linguistics» more ACL 2006»

Examining the Content Load of Part of Speech Blocks for Information Retrieval

15 years 8 months ago

Download acl.ldc.upenn.edu

We investigate the connection between part of speech (POS) distribution and content in language. We define POS blocks to be groups of parts of speech. We hypothesise that there ex...

Christina Lioma, Iadh Ounis

claim paper

Read More »

188

click to vote

WWW
2003
ACM

130views Internet Technology» more WWW 2003»

DOM-based content extraction of HTML documents

16 years 7 months ago

Download www.psl.cs.columbia.edu

Web pages often contain clutter (such as pop-up ads, unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction o...

Suhit Gupta, Gail E. Kaiser, David Neistadt, Peter...

claim paper

Read More »

158

click to vote

JUCS
2008

185views more JUCS 2008»

Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction

15 years 6 months ago

Download www.jucs.org

Abstract: As web sites are getting more complicated, the construction of web information extraction systems becomes more troublesome and time-consuming. A common theme is the diffi...

Jinbeom Kang, Joongmin Choi

claim paper

Read More »

171

click to vote

ICDAR
2009
IEEE

184views Document Analysis» more ICDAR 2009»

PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents

16 years 1 months ago

Download www.cvc.uab.es

This paper presents PDF-TREX, an heuristic approach for table recognition and extraction from PDF documents. The heuristics starts from an initial set of basic content elements an...

Ermelinda Oro, Massimo Ruffolo

claim paper

Read More »

181

click to vote

WWW
2004
ACM

116views Internet Technology» more WWW 2004»

Web page summarization using dynamic content

16 years 7 months ago

Download www.iw3c2.org

Summarizing web pages have recently gained much attention from researchers. Until now two main types of approaches have been proposed for this task: content- and context-based met...

Adam Jatowt

claim paper

Read More »

« Prev « First page 10 / 142 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers