Sciweavers

832 search results - page 68 / 167
» Document clustering with committees
Sort
View
SIGIR
2003
ACM
14 years 2 months ago
An information-theoretic measure for document similarity
Recent work has demonstrated that the assessment of pairwise object similarity can be approached in an axiomatic manner using information theory. We extend this concept specifica...
Javed A. Aslam, Meredith Frost
HICSS
2006
IEEE
133views Biometrics» more  HICSS 2006»
14 years 3 months ago
Being Literate with Large Document Collections: Observational Studies and Cost Structure Tradeoffs
How do people work with large document collections? We studied the effects of different kinds of analysis tools on the behavior of people doing rapid large-volume data assessment,...
Daniel M. Russell, Malcolm Slaney, Yan Qu, Mave Ho...
ICDAR
2009
IEEE
13 years 6 months ago
Document Content Extraction Using Automatically Discovered Features
We report an automatic feature discovery method that achieves results comparable to a manually chosen, larger feature set on a document image content extraction problem: the locat...
Sui-Yu Wang, Henry S. Baird, Chang An
ICPR
2008
IEEE
14 years 3 months ago
A robust technique for text extraction in mixed-type binary documents
A crucial preprocessing stage in applications such as OCR is text extraction from mixed-type documents. The present work, in contrast to most until now, successfully faces the pro...
Charalambos Strouthopoulos, Athanasios Nikolaidis
INEX
2005
Springer
14 years 2 months ago
A Flexible Structured-Based Representation for XML Document Mining
This paper reports on the INRIA group’s approach to XML mining while participating in the INEX XML Mining track 2005. We use a flexible representation of XML documents that allo...
Anne-Marie Vercoustre, Mounir Fegas, Saba Gul, Yve...