Text similarity spans a spectrum, with broad topical similarity near one extreme and document identity at the other. Intermediate levels of similarity – resulting from summariza...
Donald Metzler, Yaniv Bernstein, W. Bruce Croft, A...
In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, ma...
We present a new statistical compression method, which we call Phrase Based Dense Code (PBDC), aimed at compressing large digital libraries. PBDC compresses the text collection to ...
Abstract. Automated Text Categorization has reached the levels of accuracy of human experts. Provided that enough training data is available, it is possible to learn accurate autom...
In this paper, targeting del.icio.us tag data, we propose a method, FolksoViz, for deriving subsumption relationships between tags by using Wikipedia texts, and visualizing a folk...
Kangpyo Lee, Hyunwoo Kim, Chungsu Jang, Hyoung-Joo...