Sciweavers

332 search results - page 1 / 67
» Document Content Extraction Using Automatically Discovered F...
Sort
View
ICDAR
2009
IEEE
13 years 6 months ago
Document Content Extraction Using Automatically Discovered Features
We report an automatic feature discovery method that achieves results comparable to a manually chosen, larger feature set on a document image content extraction problem: the locat...
Sui-Yu Wang, Henry S. Baird, Chang An
KDD
2002
ACM
148views Data Mining» more  KDD 2002»
14 years 9 months ago
Discovering informative content blocks from Web documents
In this paper, we propose a new approach to discover informative contents from a set of tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first partition...
Shian-Hua Lin, Jan-Ming Ho
ICDAR
2009
IEEE
14 years 3 months ago
Scalable Feature Extraction from Noisy Documents
We cope with the metadata recognition in layoutoriented documents. We address the problem as a classification task and propose a method for automatic extraction of relevant featu...
Loïc Lecerf, Boris Chidlovskii
MLDM
2005
Springer
14 years 2 months ago
CorePhrase: Keyphrase Extraction for Document Clustering
Abstract. The ability to discover the topic of a large set of text documents using relevant keyphrases is usually regarded as a very tedious task if done by hand. Automatic keyphra...
Khaled M. Hammouda, Diego N. Matute, Mohamed S. Ka...
SIGIR
2003
ACM
14 years 1 months ago
Text categorization by boosting automatically extracted concepts
Term-based representations of documents have found widespread use in information retrieval. However, one of the main shortcomings of such methods is that they largely disregard le...
Lijuan Cai, Thomas Hofmann