The large unstructured text collections demand full-text search capabilities from IR systems. Current systems typically allow users only to connect to a single database (or site) ...
Finding definitions in huge text collections is a challenging problem, not only because of the many ways in which definitions can be conveyed in natural language texts but also be...
We present a term recognition approach to extract acronyms and their definitions from a large text collection. Parenthetical expressions appearing in a text collection are identif...
We present a scalable algorithm for the parallel computation of inverted files for large text collections. The algorithm takes into account an environment of a high bandwidth netw...
Berthier A. Ribeiro-Neto, Joao Paulo Kitajima, Gon...
We present three distributed algorithms to build global inverted files for very large text collections. The distributed environment we use is a high bandwidth network of workstati...
Berthier A. Ribeiro-Neto, Edleno Silva de Moura, M...
We present a method of searching text collections that takes advantage of hierarchrical information within documents and integrates searches of structured and unstructured data. W...
M. Catherine McCabe, Jinho Lee, Abdur Chowdhury, D...
We present a new statistical compression method, which we call Phrase Based Dense Code (PBDC), aimed at compressing large digital libraries. PBDC compresses the text collection to ...
We present a corpus-based approach to the class expansion task. For a given set of seed entities we use co-occurrence statistics taken from a text collection to define a membersh...