Sciweavers

WEBDB
2009
Springer

Bridging the Terminology Gap in Web Archive Search

14 years 6 months ago
Bridging the Terminology Gap in Web Archive Search
Web archives play an important role in preserving our cultural heritage for future generations. When searching them, a serious problem arises from the fact that terminology evolves constantly. Since today’s users formulate queries using current terminology, old but relevant documents are often not retrieved. The query saint petersburg museum, for instance, does not retrieve documents from the 1970s about museums in Leningrad (the former name of Saint Petersburg). We address this problem by determining query reformulations that paraphrase the user’s information need using terminology prevalent in the past. A measure of across-time semantic similarity that assesses the degree of relatedness between two terms when used at different times is proposed. Using this measure as a crucial building block, we propose a novel query reformulation technique based on a hidden Markov model (HMM). Experiments on twenty years worth of New York Times articles demonstrate the usefulness and efficienc...
Klaus Berberich, Srikanta J. Bedathur, Mauro Sozio
Added 25 May 2010
Updated 25 May 2010
Type Conference
Year 2009
Where WEBDB
Authors Klaus Berberich, Srikanta J. Bedathur, Mauro Sozio, Gerhard Weikum
Comments (0)