Search Sciweavers | Sciweavers

232 search results - page 11 / 47

» Query-related data extraction of hidden web documents

143

Voted

WWW
2009
ACM

189views Internet Technology» more WWW 2009»

Extracting data records from the web using tag path clustering

15 years 8 months ago

Download www2009.org

Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the ﬁrst step of this object extraction process, identiﬁes...

Gengxin Miao, Jun'ichi Tatemura, Wang-Pin Hsiung, ...

claim paper

Read More »

150

Voted

NAACL
2010

182views Computational Linguistics» more NAACL 2010»

Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment

15 years 1 months ago

Download research.microsoft.com

The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several...

Jason R. Smith, Chris Quirk, Kristina Toutanova

claim paper

Read More »

132

Voted

IPM
2007

149views more IPM 2007»

Web page title extraction and its application

15 years 3 months ago

Download research.microsoft.com

This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...

Yewei Xue, Yunhua Hu, Guomao Xin, Ruihua Song, Shu...

claim paper

Read More »

167

Voted

WEBI
2005
Springer

216views Internet Technology» more WEBI 2005»

A Semi-Supervised Document Clustering Algorithm Based on EM

15 years 9 months ago

Download www.dii.unisi.it

Document clustering is a very hard task in Automatic Text Processing since it requires to extract regular patterns from a document collection without a priori knowledge on the cat...

Leonardo Rigutini, Marco Maggini

claim paper

Read More »

152

Voted

ICASSP
2009
IEEE

137views Signal Processing» more ICASSP 2009»

Data hiding in hard-copy text documents robust to print, scan and photocopy operations

15 years 10 months ago

Download www.merl.com

This paper describes a method for hiding data inside printed text documents that is resilient to print/scan and photocopying operations. Using the principle of channel coding with...

Avinash L. Varna, Shantanu Rane, Anthony Vetro

claim paper

Read More »

« Prev « First page 11 / 47 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers