As the web expands exponentially, the need to put some order to its content becomes apparent. Hypertext categorization, that is the automatic classification of web documents into ...
(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical patt...
This paper is concerned with the problem of definition search. Specifically, given a term, we are to retrieve definitional excerpts of the term and rank the extracted excerpts acc...
Multinomial distributions are often used to model text documents. However, they do not capture well the phenomenon that words in a document tend to appear in bursts: if a word app...
Rasmus Elsborg Madsen, David Kauchak, Charles Elka...
Many real-world datasets can be clustered along multiple dimensions. For example, text documents can be clustered not only by topic, but also by the author's gender or sentim...