The presence of replicas or near-replicas of documents is very common on the Web. Documents may be replicated completely or partially for different reasons (versions, mirrors, etc.), or the same resource can be associated to different URLs (aliases, dynamically generated pages, etc.). Whilst replication can improve information accessibility by the users, the presence of near-replicated documents can hinder the effectiveness of search engines. For example, users would be annoyed by the presence of many similar pages in the result list in response to a query to a search engine. We propose a method to detect similar pages, in particular replicas and near-replicas, which is based on a pair of signatures. Both signatures are low dimensional vectors in order to reduce the computational costs for comparings pairs of documents. The first signature is obtained by a random projection of the bag-of-words vector representing the page contents. The second signature, referred to as Hypelink Map, is...