Sciweavers

728 search results - page 115 / 146
» Mining for Empty Rectangles in Large Data Sets
Sort
View
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 8 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
KDD
2004
ACM
134views Data Mining» more  KDD 2004»
14 years 8 months ago
Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pair
Given a user-specified minimum correlation threshold and a market basket database with N items and T transactions, an all-strong-pairs correlation query finds all item pairs with...
Hui Xiong, Shashi Shekhar, Pang-Ning Tan, Vipin Ku...
IDA
2002
Springer
13 years 7 months ago
Classification with sparse grids using simplicial basis functions
Recently we presented a new approach [20] to the classification problem arising in data mining. It is based on the regularization network approach but in contrast to other methods...
Jochen Garcke, Michael Griebel
SDM
2009
SIAM
251views Data Mining» more  SDM 2009»
14 years 4 months ago
High Performance Parallel/Distributed Biclustering Using Barycenter Heuristic.
Biclustering refers to simultaneous clustering of objects and their features. Use of biclustering is gaining momentum in areas such as text mining, gene expression analysis and co...
Alok N. Choudhary, Arifa Nisar, Waseem Ahmad, Wei-...
IJSI
2008
156views more  IJSI 2008»
13 years 7 months ago
Co-Training by Committee: A Generalized Framework for Semi-Supervised Learning with Committees
Many data mining applications have a large amount of data but labeling data is often difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supe...
Mohamed Farouk Abdel Hady, Friedhelm Schwenker