Similarity search in time series data is an active area of research in data mining. In this paper we introduce a new approach for performing similarity search over time series dat...
A considerable amount of clean semistructured data is internally available to companies in the form of business reports. However, business reports are untapped for data mining, da...
Stephen W. Liddle, Douglas M. Campbell, Chad Crawf...
A major obstacle to the construction of a probabilistic translation model is the lack of large parallel corpora. In this paper we first describe a parallel text mining system that...
Many scalable data mining tasks rely on active learning to provide the most useful accurately labeled instances. However, what if there are multiple labeling sources (`oracles...
The Web has been rapidly "deepened" by myriad searchable databases online, where data are hidden behind query interfaces. As an essential task toward integrating these m...