Sciweavers

232 search results - page 11 / 47
» Query-related data extraction of hidden web documents
Sort
View
WWW
2009
ACM
14 years 8 days ago
Extracting data records from the web using tag path clustering
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
Gengxin Miao, Jun'ichi Tatemura, Wang-Pin Hsiung, ...
NAACL
2010
13 years 5 months ago
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several...
Jason R. Smith, Chris Quirk, Kristina Toutanova
IPM
2007
149views more  IPM 2007»
13 years 7 months ago
Web page title extraction and its application
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...
Yewei Xue, Yunhua Hu, Guomao Xin, Ruihua Song, Shu...
WEBI
2005
Springer
14 years 1 months ago
A Semi-Supervised Document Clustering Algorithm Based on EM
Document clustering is a very hard task in Automatic Text Processing since it requires to extract regular patterns from a document collection without a priori knowledge on the cat...
Leonardo Rigutini, Marco Maggini
ICASSP
2009
IEEE
14 years 2 months ago
Data hiding in hard-copy text documents robust to print, scan and photocopy operations
This paper describes a method for hiding data inside printed text documents that is resilient to print/scan and photocopying operations. Using the principle of channel coding with...
Avinash L. Varna, Shantanu Rane, Anthony Vetro