Sciweavers

743 search results - page 88 / 149
» Performance Measurements for Privacy Preserving Data Mining
Sort
View
KDD
2004
ACM
195views Data Mining» more  KDD 2004»
16 years 5 months ago
Improved robustness of signature-based near-replica detection via lexicon randomization
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
KDD
2006
ACM
120views Data Mining» more  KDD 2006»
16 years 5 months ago
Hierarchical topic segmentation of websites
In this paper, we consider the problem of identifying and segmenting topically cohesive regions in the URL tree of a large website. Each page of the website is assumed to have a t...
Ravi Kumar, Kunal Punera, Andrew Tomkins
AI
2005
Springer
15 years 6 months ago
Integrating Web Content Clustering into Web Log Association Rule Mining
Abstract. One of the effects of the general Internet growth is an immense number of user accesses to WWW resources. These accesses are recorded in the web server log files, which...
Jiayun Guo, Vlado Keselj, Qigang Gao
ICDM
2002
IEEE
122views Data Mining» more  ICDM 2002»
15 years 9 months ago
Using Category-Based Adherence to Cluster Market-Basket Data
In this paper, we devise an efficient algorithm for clustering market-basket data. Different from those of the traditional data, the features of market-basket data are known to b...
Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen
KDD
2005
ACM
80views Data Mining» more  KDD 2005»
16 years 5 months ago
Wavelet synopsis for data streams: minimizing non-euclidean error
We consider the wavelet synopsis construction problem for data streams where given n numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much sma...
Sudipto Guha, Boulos Harb