Sciweavers

DCC
2006
IEEE

Compression and Machine Learning: A New Perspective on Feature Space Vectors

15 years 1 days ago
Compression and Machine Learning: A New Perspective on Feature Space Vectors
The use of compression algorithms in machine learning tasks such as clustering and classification has appeared in a variety of fields, sometimes with the promise of reducing problems of explicit feature selection. The theoretical justification for such methods has been founded on an upper bound on Kolmogorov complexity and an idealized information space. An alternate view shows compression algorithms implicitly map strings into implicit feature space vectors, and compressionbased similarity measures compute similarity within these feature spaces. Thus, compression-based methods are not a "parameter free" magic bullet for feature selection and data representation, but are instead concrete similarity measures within defined feature spaces, and are therefore akin to explicit feature vector models used in standard machine learning algorithms. To underscore this point, we find theoretical and empirical connections between traditional machine learning vector models and compression...
D. Sculley, Carla E. Brodley
Added 25 Dec 2009
Updated 25 Dec 2009
Type Conference
Year 2006
Where DCC
Authors D. Sculley, Carla E. Brodley
Comments (0)