Sciweavers

466 search results - page 18 / 94
» Scalable Feature Extraction from Noisy Documents
Sort
View
DAS
2006
Springer
14 years 1 months ago
Script Identification from Indian Documents
Abstract. Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching on...
Gopal Datt Joshi, Saurabh Garg, Jayanthi Sivaswamy
TAL
2010
Springer
13 years 8 months ago
Summarization as Feature Selection for Document Categorization on Small Datasets
Abstract. Most common feature selection techniques for document categorization are supervised and require lots of training data in order to accurately capture the descriptive and d...
Emmanuel Anguiano-Hernández, Luis Villase&n...
WWW
2009
ACM
14 years 2 months ago
Extracting data records from the web using tag path clustering
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
Gengxin Miao, Jun'ichi Tatemura, Wang-Pin Hsiung, ...
CIKM
2009
Springer
14 years 4 months ago
The impact of document structure on keyphrase extraction
Keyphrases are short phrases that reflect the main topic of a document. Because manually annotating documents with keyphrases is a time-consuming process, several automatic appro...
Katja Hofmann, Manos Tsagkias, Edgar Meij, Maarten...
ICDAR
2003
IEEE
14 years 3 months ago
Extraction, layout analysis and classification of diagrams in PDF documents
Diagrams are a critical part of virtually all scientific and technical documents. Analyzing diagrams will be important for building comprehensive document retrieval systems. This ...
Robert P. Futrelle, Mingyan Shao, Chris Cieslik, A...