The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric mixed membership model--each data point is modeled with a collection of components of different proportions. T...
Sinead Williamson, Chong Wang, Katherine A. Heller...
Abstract. We propose a scheme for automatically generating compressors for XML documents from Document Type Definition(DTD) specifications. Our algorithm is a lossless adaptive a...
Multinomial distributions are often used to model text documents. However, they do not capture well the phenomenon that words in a document tend to appear in bursts: if a word app...
Rasmus Elsborg Madsen, David Kauchak, Charles Elka...
: This paper reports on the Joaquim Nabuco Project, a pioneering work in Latin America on document digitalization, enhancement, compression, indexing, retrieval and network transmi...