The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact ...
We propose a hybrid, unsupervised document clustering approach that combines a hierarchical clustering algorithm with Expectation Maximization. We developed several heuristics to ...
Cheap and versatile cameras make it possible to easily and quickly capture a wide variety of documents. However, low resolution cameras present a challenge to OCR because it is vi...
Charles E. Jacobs, Patrice Y. Simard, Paul A. Viol...
Digital libraries can take advantage of documents that have their content (semantics) explicitly represented as knowledge structures. These knowledge-rich documents can be created ...
Abstract. We consider how the construction of multi-structured documents implies the definition of structuration vocabularies. In a multiusers context, the growth of these vocabula...