Topic models provide a powerful tool for analyzing large text collections by representing high dimensional data in a low dimensional subspace. Fitting a topic model given a set of...
Automatic annotation of documents with controlled vocabulary terms (descriptors) from a conceptual thesaurus is not only useful for document indexing and retrieval. The mapping of...
This paper presents a new use of the Curvelet transform as a multiscale method for indexing linear singularities and curved handwritten shapes in documents images. As it belongs t...
This paper describes the development of a new document ranking system based on layout similarity. The user has a need represented by a set of ”wanted” documents, and the syste...
May Huang, Daniel DeMenthon, David S. Doermann, Ly...
We are presenting a text analysis tool set that allows analysts in various fields to sieve through large collections of multilingual news items quickly and to find information that...