ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Sponsored search is one of the major sources of revenue for search engines on the World Wide Web. It has been observed that while showing ads for every query maximizes shortterm r...
Sandeep Pandey, Kunal Punera, Marcus Fontoura, Van...
Stochastic context-free grammars (SCFGs) have long been recognized as useful for a large variety of tasks including natural language processing, morphological parsing, speech reco...
The annotation of web sites in social bookmarking systems has become a popular way to manage and find information on the web. The community structure of such systems attracts spam...
Beate Krause, Christoph Schmitz, Andreas Hotho, Ge...
ParaMor, our unsupervised morphology induction system performed well at Morpho Challenge 2008. When ParaMor's morphological analyses, which specialize at identifying inflecti...
Christian Monson, Jaime G. Carbonell, Alon Lavie, ...