— This paper presents a new method based on Fourier and moments features to extract words and characters from a video text line in any direction for recognition. Unlike existing ...
Automatic document classification is an important step in organizing and mining documents. Information in documents is often conveyed using both text and images that complement ea...
It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed topic hierarchy. In this paper,...
We address the problem of extracting bilingual chunk pairs from parallel text to create training sets for statistical machine translation. We formulate the problem in terms of a s...
Topic modeling techniques have widespread use in text data mining applications. Some applications use batch models, which perform clustering on the document collection in aggregat...