Amberfish at the TREC 2004 Terabyte Track

15 years 8 months ago

Download trec.nist.gov

The TREC 2004 Terabyte Track evaluated information retrieval in largescale text collections, using a set of 25 million documents (426 GB). This paper gives an overview of our experiences with this collection and describes Amberfish, the text retrieval software used for the experiments. 1 Preface This is the first year of the Terabyte Track and the first time we have participated directly in the TREC conference. It is also the first time that the Amberfish software (etymon.com/tr.html) has been used in TREC. Our goals for this ambitious track were simply to complete the task and to gain some rudimentary experience with evaluation and the Terabyte collection. This paper presents a summary of the Amberfish software followed by a brief discussion of this year's Terabyte Track. 2 Amberfish Amberfish is open source text retrieval software developed by the author starting in 1998 and distributed by Etymon Systems, Inc. The project was based on lessons learned from previous implementatio...

Nassib Nassar

Real-time Traffic