Sciweavers

466 search results - page 12 / 94
» Scalable Feature Extraction from Noisy Documents
Sort
View
COLING
2010
13 years 4 months ago
EM-based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora
In this paper, we present an unsupervised hybrid model which combines statistical, lexical, linguistic, contextual, and temporal features in a generic EMbased framework to harvest...
Lianhau Lee, AiTi Aw, Min Zhang, Haizhou Li
ACSC
2003
IEEE
14 years 3 months ago
A Comparative Study for Domain Ontology Guided Feature Extraction
We introduced a novel method employing a hierarchical domain ontology structure to extract features representing documents in our previous publication (Wang 2002). All raw words i...
Bill B. Wang, Robert I. McKay, Hussein A. Abbass, ...
KDD
2007
ACM
186views Data Mining» more  KDD 2007»
14 years 10 months ago
Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus
We present a document routing and index partitioning scheme for scalable similarity-based search of documents in a large corpus. We consider the case when similarity-based search ...
Deepavali Bhagwat, Kave Eshghi, Pankaj Mehra
ICDAR
2003
IEEE
14 years 3 months ago
A Model-based Line Detection Algorithm in Documents
In this paper we present a novel model based approach to detect severely broken parallel lines in noisy textual documents. It is important to detect and remove these lines so the ...
Yefeng Zheng, Huiping Li, David S. Doermann
ICDAR
2011
IEEE
12 years 9 months ago
Localization of Digit Strings in Farsi/Arabic Document Images Using Structural Features and Syntactical Analysis
—This paper presents a new method for localization of digit strings with a specific syntax in Farsi/ Arabic document images. First, some features are extracted from all connected...
Ali Abedi, Karim Faez