Sciweavers

138 search results - page 17 / 28
» Data Cleaning for Word Alignment
Sort
View
SIGMOD
2001
ACM
145views Database» more  SIGMOD 2001»
14 years 8 months ago
Automatic Segmentation of Text into Structured Records
In this paper we present a method for automatically segmenting unformatted text records into structured elements. Several useful data sources today are human-generated as continuo...
Vinayak R. Borkar, Kaustubh Deshmukh, Sunita Saraw...
JMLR
2012
11 years 11 months ago
Bounding the Probability of Error for High Precision Optical Character Recognition
We consider a model for which it is important, early in processing, to estimate some variables with high precision, but perhaps at relatively low recall. If some variables can be ...
Gary B. Huang, Andrew Kae, Carl Doersch, Erik G. L...
CVPR
2009
IEEE
14 years 10 days ago
ImageNet: A large-scale hierarchical image database
The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images a...
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai...
CAISE
2007
Springer
14 years 2 months ago
A Context-based Approach for Complex Semantic Matching
Semantic matching1 is a fundamental step in implementing data sharing applications. Most systems automating this task however limit themselves to finding simple (one-to-one) match...
Youssef Bououlid Idrissi, Julie Vachon
LREC
2008
141views Education» more  LREC 2008»
13 years 10 months ago
New Resources for Document Classification, Analysis and Translation Technologies
The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into Englis...
Stephanie Strassel, Lauren Friedman, Safa Ismael, ...