Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

170

AIRS
2010
Springer

228views Information Technology» more AIRS 2010»

Advanced Training Set Construction for Retrieval in Historic Documents

15 years 4 months ago

Advanced Training Set Construction for Retrieval in Historic Documents

Download www.is.informatik.uni-duisburg.de

Retrieval in historic documents with non-standard spelling requires a mapping from search terms onto the historic terms in the document. For describing this mapping, we have developed a rule-based approach. The bottleneck of this method has been the training set construction for the algorithm where an expert has to assign manually current word forms to historic spelling variants. As a better solution, we apply a spell checker on a corpus of historic texts, which gives us a list of candidate terms and associated suggestions. The new method generates possible rules for the suggestions and accepts the most frequent rules. Experimental results with German and English texts from different centuries demonstrate the feasibility of our approach. Thus a training set can be constructed with much less initial effort. Key words: Spelling variation, training set construction, historic documents

Andrea Ernst-Gerlach, Norbert Fuhr

Real-time Traffic

AIRS 2010 | Historic | Historic Documents | Historic Spelling Variants | Information Technology |

claim paper

Related Content

» A CrossLanguage Approach to Historic Document Retrieval

» On Building a FullText Digital Library of Historical Documents

» Textimage alignment for historical handwritten documents

» A search engine for historical manuscript images

» The Robert Gordon University at the Opinion Retrieval Task of the 2007 TREC Blog Track

» Measuring historical word sense variation

» Online duplicate document detection signature reliability in a dynamic retrieval environme...

» Regular Sound Changes for CrossLanguage Information Retrieval

» Aligning Transcripts to Automatically Segmented Handwritten Manuscripts

Post Info
More Details (n/a)

Added	10 Feb 2011
Updated	10 Feb 2011
Type	Journal
Year	2010
Where	AIRS
Authors	Andrea Ernst-Gerlach, Norbert Fuhr

Comments (0)