We present a new algorithm for mining maximal frequent itemsets from a transactional database. Our algorithm is especially efficient when the itemsets in the database are very lon...
In some classification problems, like the detection of illnesses in patients, classes are very unbalanced and the misclassification costs for different classes vary significantly....
Spatially enabled government requires the development of effective SDIs that will support the vast majority of society, who are not spatially aware, in a transparent manner. This ...
We collected file system content data from 857 desktop computers at Microsoft over a span of 4 weeks. We analyzed the data to determine the relative efficacy of data deduplication...
Low-dimensional topic models have been proven very useful for modeling a large corpus of documents that share a relatively small number of topics. Dimensionality reduction tools s...