Sciweavers

AIRS
2010
Springer

Advanced Training Set Construction for Retrieval in Historic Documents

13 years 9 months ago
Advanced Training Set Construction for Retrieval in Historic Documents
Retrieval in historic documents with non-standard spelling requires a mapping from search terms onto the historic terms in the document. For describing this mapping, we have developed a rule-based approach. The bottleneck of this method has been the training set construction for the algorithm where an expert has to assign manually current word forms to historic spelling variants. As a better solution, we apply a spell checker on a corpus of historic texts, which gives us a list of candidate terms and associated suggestions. The new method generates possible rules for the suggestions and accepts the most frequent rules. Experimental results with German and English texts from different centuries demonstrate the feasibility of our approach. Thus a training set can be constructed with much less initial effort. Key words: Spelling variation, training set construction, historic documents
Andrea Ernst-Gerlach, Norbert Fuhr
Added 10 Feb 2011
Updated 10 Feb 2011
Type Journal
Year 2010
Where AIRS
Authors Andrea Ernst-Gerlach, Norbert Fuhr
Comments (0)