We simulate different architectures of a distributed Information Retrieval system on a very large Web collection, in order to work out the optimal setting for a particular set of r...
Abstract. Machine learning techniques are increasingly being applied to problems in the domain of information retrieval and text mining. In this paper we present an application of ...
We consider the problem of image representation and clustering. Traditionally, an n1 × n2 image is represented by a vector in the Euclidean space Rn1×n2 . Some learning algorith...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
We propose a partitioning scheme for similarity search indexes that is called Maximal Metric Margin Partitioning (MMMP). MMMP divides the data on the basis of its distribution pat...