For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
In this article we present an evaluation of text clustering and classification methods for creating digital library browse interfaces, focusing on the particular case of collecti...
Two-dimensional and three-dimensional coordinate systems are the basic graphics symbols in many graphical documents. A robust coordinate system detection scheme is needed in order...
We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the resu...
Deng Cai, Xiaofei He, Zhiwei Li, Wei-Ying Ma, Ji-R...