Sciweavers

808 search results - page 67 / 162
» Keyword-based document clustering
Sort
View
ICDAR
2009
IEEE
13 years 5 months ago
Document Content Extraction Using Automatically Discovered Features
We report an automatic feature discovery method that achieves results comparable to a manually chosen, larger feature set on a document image content extraction problem: the locat...
Sui-Yu Wang, Henry S. Baird, Chang An
ICPR
2008
IEEE
14 years 2 months ago
A robust technique for text extraction in mixed-type binary documents
A crucial preprocessing stage in applications such as OCR is text extraction from mixed-type documents. The present work, in contrast to most until now, successfully faces the pro...
Charalambos Strouthopoulos, Athanasios Nikolaidis
INEX
2005
Springer
14 years 1 months ago
A Flexible Structured-Based Representation for XML Document Mining
This paper reports on the INRIA group’s approach to XML mining while participating in the INEX XML Mining track 2005. We use a flexible representation of XML documents that allo...
Anne-Marie Vercoustre, Mounir Fegas, Saba Gul, Yve...
EMNLP
2010
13 years 5 months ago
NLP on Spoken Documents Without ASR
There is considerable interest in interdisciplinary combinations of automatic speech recognition (ASR), machine learning, natural language processing, text classification and info...
Mark Dredze, Aren Jansen, Glen Coppersmith, Ken Wa...
VLDB
2007
ACM
93views Database» more  VLDB 2007»
14 years 8 months ago
Measuring the Structural Similarity of Semistructured Documents Using Entropy
We propose a technique for measuring the structural similarity of semistructured documents based on entropy. After extracting the structural information from two documents we use ...
Sven Helmer