Abstract. This paper describes the technique for translation of character n-grams we developed for our participation in CLEF 2006. This solution avoids the need for word normalizat...
Abstract. The automatic detection of shared content in written documents –which includes text reuse and its unacknowledged commitment, plagiarism– has become an important probl...
An important factor for discriminating between webpages is their genre (e.g., blogs, personal homepages, e-shops, online newspapers, etc). Webpage genre identification has a great...
—In this paper, we present a novel approach to search and retrieve from document image collections, without explicit recognition. Existing recognition-free approaches such as wor...
This paper integrates five linguistic resources, including Cilin, a Chinese-English dictionary, ASBC corpus, SemCor, and WordNet, to construct a Chinese-English WordNet. The resul...