Sciweavers

99 search results - page 9 / 20
» Generalized inverse document frequency
Sort
View
ICDAR
2009
IEEE
14 years 2 months ago
Scalable Feature Extraction from Noisy Documents
We cope with the metadata recognition in layoutoriented documents. We address the problem as a classification task and propose a method for automatic extraction of relevant featu...
Loïc Lecerf, Boris Chidlovskii
KCAP
2005
ACM
14 years 1 months ago
Extracting significant words from corpora for ontology extraction
This paper reports a technique for Knowledge Extraction using Natural Language Processing for the purposes of semi-automatic Ontology learning. Determination of significant words ...
Dileep G. Damle, Victoria S. Uren
ICDAR
2011
IEEE
12 years 7 months ago
Chinese Keyword Spotting Using Knowledge-Based Clustering
—Content-based document image retrieval is a new and promising research area. Without OCR, document indexing directly based on image content is more general and convenient. Howev...
Yong Xia, Kuanquan Wang, Mingwei Li
FLAIRS
2007
13 years 9 months ago
Indexing Documents by Discourse and Semantic Contents from Automatic Annotations of Texts
The basic aim of the model proposed here is to automatically build semantic metatext structure for texts that would allow us to search and extract discourse and semantic informati...
Brahim Djioua, Jean-Pierre Desclés
SPIRE
2010
Springer
13 years 5 months ago
Dual-Sorted Inverted Lists
Several IR tasks rely, to achieve high efficiency, on a single pervasive data structure called the inverted index. This is a mapping from the terms in a text collection to the docu...
Gonzalo Navarro, Simon J. Puglisi