For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...
Unit selection text-to-speech systems currently produce very natural synthesized phrases by concatenating speech segments from a large database. Recently, increasing demand for de...
Machine-learning algorithms are employed in a wide variety of applications to extract useful information from data sets, and many are known to suffer from superlinear increases in ...
Karthik Nagarajan, Brian Holland, Alan D. George, ...
: With the increasing popularity of semi-structured documents (particularly in the form of XML) for knowledge management, it is important to create tools that use the additional in...
The blogosphere has unique structural and temporal properties since blogs are typically used as communication media among human individuals. In this paper, we propose a novel tech...