Sciweavers

808 search results - page 66 / 162
» Keyword-based document clustering
Sort
View
WWW
2005
ACM
14 years 1 months ago
Finding the boundaries of information resources on the web
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
Pavel Dmitriev, Carl Lagoze, Boris Suchkov
DAS
2008
Springer
13 years 9 months ago
A Complete Optical Character Recognition Methodology for Historical Documents
In this paper a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. This methodology cons...
Georgios Vamvakas, Basilios Gatos, Nikolaos Stamat...
TMM
2002
140views more  TMM 2002»
13 years 7 months ago
Narrowing the semantic gap - improved text-based web document retrieval using visual features
In this paper, we present the results of our work that seek to negotiate the gap between low-level features and high-level concepts in the domain of web document retrieval. This wo...
Rong Zhao, William I. Grosky
SIGIR
2003
ACM
14 years 1 months ago
An information-theoretic measure for document similarity
Recent work has demonstrated that the assessment of pairwise object similarity can be approached in an axiomatic manner using information theory. We extend this concept specifica...
Javed A. Aslam, Meredith Frost
HICSS
2006
IEEE
133views Biometrics» more  HICSS 2006»
14 years 1 months ago
Being Literate with Large Document Collections: Observational Studies and Cost Structure Tradeoffs
How do people work with large document collections? We studied the effects of different kinds of analysis tools on the behavior of people doing rapid large-volume data assessment,...
Daniel M. Russell, Malcolm Slaney, Yan Qu, Mave Ho...