In this paper, a two-stage block hypothesis testing following the idea of Fan, Lin and Cheng (2004) is proposed for massive data regression analysis. Variables selection criteria ...
We show through an analysis of a massive data set from YouTube that the productivity exhibited in crowdsourcing exhibits a strong positive dependence on attention, measured by the...
Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special p...
Abstract. Data declustering speeds up large data set retrieval by partitioning the data across multiple disks or sites and performing retrievals in parallel. Performance is determi...
Hak-Cheol Kim, Mario A. Lopez, Scott T. Leutenegge...
The detection of correlations between different features in high dimensional data sets is a very important data mining task. These correlations can be arbitrarily complex: One or...