Sciweavers

466 search results - page 21 / 94
» Scalable Feature Extraction from Noisy Documents
Sort
View
ICDAR
2003
IEEE
14 years 3 months ago
Localization, Extraction and Recognition of Text in Telugu Document Images
In this paper we present a system to locate, extract and recognize Telugu text. The circular nature of Telugu script is exploited for segmenting text regions using the Hough Trans...
Atul Negi, K. Nikhil Shanker, Chandra Kanth Chered...
SIGIR
2006
ACM
14 years 3 months ago
Feature diversity in cluster ensembles for robust document clustering
The performance of document clustering systems depends on employing optimal text representations, which are not only difficult to determine beforehand, but also may vary from one ...
Xavier Sevillano, Germán Cobo, Francesc Al&...
COLING
2010
13 years 4 months ago
Shallow Information Extraction from Medical Forum Data
We study a novel shallow information extraction problem that involves extracting sentences of a given set of topic categories from medical forum data. Given a corpus of medical fo...
Parikshit Sondhi, Manish Gupta, ChengXiang Zhai, J...
SIGIR
2003
ACM
14 years 3 months ago
Text categorization by boosting automatically extracted concepts
Term-based representations of documents have found widespread use in information retrieval. However, one of the main shortcomings of such methods is that they largely disregard le...
Lijuan Cai, Thomas Hofmann
ANLP
1994
104views more  ANLP 1994»
13 years 11 months ago
Language Determination: Natural Language Processing from Scanned Document Images
Many documents are available to a computer only as images from paper. However, most natural language processing systems expect their input as character-coded text, which may be di...
Penelope Sibun, A. Lawrence Spitz