The document-length normalization problem has been widely studied in the field of Information Retrieval. The Cosine Normalization [2], the Maximum tf Normalization [1] and the By...
Sylvain Lamprier, Tassadit Amghar, Bernard Levrat,...
—We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the...
In this paper we study the problem of collecting training samples for building enterprise taxonomies. We develop a computer-aided tool named InfoAnalyzer, which can effectively as...
Human-quality text summarization systems are di cult to design, and even more di cult to evaluate, in part because documents can di er along several dimensions, such as length, wri...
Jade Goldstein, Mark Kantrowitz, Vibhu O. Mittal, ...
This research characterizes the spontaneous spoken disfluencies typical of human-computer interaction, and presents a predictive model accounting for their occurrence. Data were c...