In this paper, we introduce a visualization method that couples a trend chart with word clouds to illustrate temporal content evolutions in a set of documents. Specifically, we us...
In this paper, we explore a CLIR-based approach to construct large-scale Chinese-English comparable corpora, which is valuable for translation knowledge mining. The initial source...
Documents often contain inherently many concepts reflecting specific and generic aspects. To automatically generate a short summary text of documents on similar topics, it is im...
Document recognition involves many kinds of hypotheses: segmentation hypotheses, classification hypotheses, spatial relationship hypotheses, and so on. Many recognition strategie...
Richard Zanibbi, Dorothea Blostein, James R. Cordy
This paper presents an extensive experimental study of the state-of-the-art of XML compression tools. The study reports the behavior of nine XML compressors using a large corpus o...