Sciweavers

554 search results - page 96 / 111
» Stylistic text segmentation
Sort
View
WWW
2006
ACM
14 years 8 months ago
Using graph matching techniques to wrap data from PDF documents
Wrapping is the process of navigating a data source, semiautomatically extracting data and transforming it into a form suitable for data processing applications. There are current...
Tamir Hassan, Robert Baumgartner
KDD
2001
ACM
196views Data Mining» more  KDD 2001»
14 years 8 months ago
Efficient discovery of error-tolerant frequent itemsets in high dimensions
We present a generalization of frequent itemsets allowing the notion of errors in the itemset definition. We motivate the problem and present an efficient algorithm that identifie...
Cheng Yang, Usama M. Fayyad, Paul S. Bradley
ARTCOM
2009
IEEE
14 years 2 months ago
Chunker for Tamil
This paper presents the Part Of Speech tagger and Chunker for Tamil using Machine learning techniques. Part Of Speech tagging and chunking are the fundamental processing steps for...
V. Dhanalakshmi, P. Padmavathy, M. Anand Kumar, K....
DOCENG
2009
ACM
14 years 2 months ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan
WCRE
2008
IEEE
14 years 2 months ago
Reverse Engineering CAPTCHAs
CAPTCHAs are automated Turing tests used to determine if the end-user is human and not an automated program. Users are asked to read and answer Visual CAPTCHAs, which often appear...
Abram Hindle, Michael W. Godfrey, Richard C. Holt