MapReduce for Information Retrieval Evaluation: "Let's Quickly Test This on 12 TB of Data"

15 years 7 months ago

Download eprints.eemcs.utwente.nl

We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net

Djoerd Hiemstra, Claudia Hauff

Real-time Traffic

CLEF 2010 | Information Technology | Large-scale Information Retrieval | Low Cost Machines | Small Case Study |

claim paper

» PRISM PrivacyPreserving Search in MapReduce

» Efficient partialduplicate detection based on sequence matching

» Largescale music tag recommendation with explicit multiple attributes

» Learning URL patterns for webpage deduplication

Post Info
More Details (n/a)

Added	08 Nov 2010
Updated	08 Nov 2010
Type	Conference
Year	2010
Where	CLEF
Authors	Djoerd Hiemstra, Claudia Hauff

Comments (0)

Sciweavers

MapReduce for Information Retrieval Evaluation: "Let's Quickly Test This on 12 TB of Data"

CLEF 2010 | Information Technology | Large-scale Information Retrieval | Low Cost Machines | Small Case Study |

Explore & Download

Productivity Tools

Sciweavers