Sciweavers

841 search results - page 141 / 169
» Information Retrieval with Cluster Genetic
Sort
View
WWW
2008
ACM
14 years 8 months ago
Efficient similarity joins for near duplicate detection
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
WWW
2007
ACM
14 years 8 months ago
Generative models for name disambiguation
Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or even share the same name with ot...
Yang Song, Jian Huang 0002, Isaac G. Councill, Jia...
WWW
2006
ACM
14 years 8 months ago
GoGetIt!: a tool for generating structure-driven web crawlers
We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a W...
Altigran Soares da Silva, Edleno Silva de Moura, J...
WWW
2006
ACM
14 years 8 months ago
The impact of online music services on the demand for stars in the music industry
The music industry's business model is to produce stars. In order to do so, musicians producing music that fits into well defined clusters of factors explaining the demand of...
Ian Pascal Volz
KDD
2005
ACM
118views Data Mining» more  KDD 2005»
14 years 8 months ago
On the use of linear programming for unsupervised text classification
We propose a new algorithm for dimensionality reduction and unsupervised text classification. We use mixture models as underlying process of generating corpus and utilize a novel,...
Mark Sandler