This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
This paper presents a geo-temporal gazetteer Web service that provides access to names of places and historical periods, together with the associated geotemporal information. With...
Textual patterns have been used effectively to extract information from large text collections. However they rely heavily on textual redundancy in the sense that facts have to be m...
The National Digital Information Infrastructure and Preservation Program will demonstrate a pilot tools platform called Recollection that supports access to distributed NDIIPP coll...
Background: Identifying disease gene from a list of candidate genes is an important task in bioinformatics. The main strategy is to prioritize candidate genes based on their simil...