The word error rate of any optical character recognition system (OCR) is usually substantially below its component or character error rate. This is especially true of Indic langua...
Venkat Rasagna, Anand Kumar 0002, C. V. Jawahar, R...
— The number of XML documents produced and available on the Internet is steadily increasing. It is thus important to devise automatic procedures to extract useful information fro...
Francesca Trentini, Markus Hagenbuchner, Alessandr...
The processing and management of XML data are popular research issues. However, operations based on the structure of XML data have not received strong attention. These operations ...
Theodore Dalamagas, Tao Cheng, Klaas-Jan Winkel, T...
We present a novel approach for multilingual document clustering using only comparable corpora to achieve cross-lingual semantic interoperability. The method models document colle...
This paper presents a system that combines two text mining techniques; information extraction and clustering. A rulebased approach is used to perform the information extraction tas...