Sciweavers

BIBM
2010
IEEE

Scalable, updatable predictive models for sequence data

13 years 9 months ago
Scalable, updatable predictive models for sequence data
The emergence of data rich domains has led to an exponential growth in the size and number of data repositories, offering exciting opportunities to learn from the data using machine learning algorithms. In particular, sequence data is being made available at a rapid rate. In many applications, the learning algorithm may not have direct access to the entire dataset because of a variety of reasons such as massive data size or bandwidth limitation. In such settings, there is a need for techniques that can learn predictive models (e.g., classifiers) from large datasets without direct access to the data. We describe an approach to learn from massive sequence datasets using statistical queries. Specifically we show how Markov Models and Probabilistic Suffix Trees (PSTs) can be constructed from sequence databases that answer only a class of count queries. We analyze the query complexity (a measure of the number of queries needed) for constructing classifiers in such settings and outline some ...
Neeraj Koul, Ngot Bui, Vasant Honavar
Added 28 Feb 2011
Updated 28 Feb 2011
Type Journal
Year 2010
Where BIBM
Authors Neeraj Koul, Ngot Bui, Vasant Honavar
Comments (0)