Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...
Document clustering is a very hard task in Automatic Text Processing since it requires to extract regular patterns from a document collection without a priori knowledge on the cat...
This paper describes a method for hiding data inside printed text documents that is resilient to print/scan and photocopying operations. Using the principle of channel coding with...