A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use ...
We propose a novel conception language for exploring the results retrieved by several internet search services (like search engines) that cluster retrieved documents. The goal is ...
Gloria Bordogna, Alessandro Campi, Giuseppe Psaila...
—A vast number of historical and badly degraded document images can be found in libraries, public, and national archives. Due to the complex nature of different artifacts, such p...
In order to preserve our cultural heritage and for automated document processing libraries and national archives have started digitizing historical documents. In the case of degra...
Florian Kleber, Robert Sablatnig, Melanie Gau, Hei...
In this paper, we present a wavelet based approach which tries to automatically find the number of clusters present in a data set, along with their position and statistical proper...