We present a framework for automatically summarizing social group activity over time. The problem is important in understanding large scale online social networks, which have dive...
This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify...
We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous ...
Marina Barsky, Ulrike Stege, Alex Thomo, Chris Upt...
If Kolmogorov complexity [25] measures information in one object and Information Distance [4, 23, 24, 42] measures information shared by two objects, how do we measure information...
We introduce a multi-stage ensemble framework, ErrorDriven Generalist+Expert or Edge, for improved classification on large-scale text categorization problems. Edge first trains a ...