Background: High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This sc...
Clustering data in high dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by proje...
Kamalika Chaudhuri, Sham M. Kakade, Karen Livescu,...
Text classification poses some specific challenges. One such challenge is its high dimensionality where each document (data point) contains only a small subset of them. In this pap...
Several algorithms based on link analysis have been developed to measure the importance of nodes on a graph such as pages on the World Wide Web. PageRank and HITS are the most pop...
The Markov Random Walk model has been recently exploited for multi-document summarization by making use of the link relationships between sentences in the document set, under the ...