Sciweavers

77 search results - page 11 / 16
» Pairwise Document Similarity in Large Collections with MapRe...
Sort
View
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
CPM
2000
Springer
177views Combinatorics» more  CPM 2000»
13 years 11 months ago
Identifying and Filtering Near-Duplicate Documents
Abstract. The mathematical concept of document resemblance captures well the informal notion of syntactic similarity. The resemblance can be estimated using a fixed size “sketch...
Andrei Z. Broder
SEKE
2010
Springer
13 years 5 months ago
Incremental Construction of Topic Hierarchies using Hierarchical Term Clustering
Topic hierarchies are very useful for managing, searching and browsing large repositories of text documents. The hierarchical clustering methods are used to support the constructi...
Ricardo M. Marcacini, Solange O. Rezende
ACL
1992
13 years 8 months ago
SEXTANT: Exploring Unexplored Contexts for Semantic Extraction from Syntactic Analysis
For a very long time, it has been considered that the only way of automatically extracting similar groups of words from a text collection for which no semantic information exists ...
Gregory Grefenstette
CIKM
2005
Springer
14 years 1 months ago
Query expansion using term relationships in language models for information retrieval
Language Modeling (LM) has been successfully applied to Information Retrieval (IR). However, most of the existing LM approaches only rely on term occurrences in documents, queries...
Jing Bai, Dawei Song, Peter Bruza, Jian-Yun Nie, G...