We consider the dictionary problem in external memory and improve the update time of the wellknown buffer tree by roughly a logarithmic factor. For any λ ≥ max{lg lg n, logM/B(...
Similarity search in time series data is an active area of research in data mining. In this paper we introduce a new approach for performing similarity search over time series dat...
A considerable amount of clean semistructured data is internally available to companies in the form of business reports. However, business reports are untapped for data mining, da...
Stephen W. Liddle, Douglas M. Campbell, Chad Crawf...
A major obstacle to the construction of a probabilistic translation model is the lack of large parallel corpora. In this paper we first describe a parallel text mining system that...
Many scalable data mining tasks rely on active learning to provide the most useful accurately labeled instances. However, what if there are multiple labeling sources (`oracles...