Similarity search and similarity join on strings are important for applications such as duplicate detection, error detection, data cleansing, or comparison of biological sequences....
We report on the initial stages of development of a robust parsing system, to be used as part of The Editor's Assistant, a program that detects and corrects textual errors an...
An organization's data records are often noisy because of transcription errors, incomplete information, lack of standard formats for textual data or combinations thereof. A f...
Luis Gravano, Panagiotis G. Ipeirotis, Nick Koudas...
Free text botanical descriptions contained in printed floras can provide a wealth of valuable scientific information. In spite of this richness, these texts have seldom been anal...
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focus...