The Coding Divergence for Measuring the Complexity of Separating Two Sets

13 years 6 months ago

Download jmlr.csail.mit.edu

In this paper we integrate two essential processes, discretization of continuous data and learning of a model that explains them, towards fully computational machine learning from continuous data. Discretization is fundamental for machine learning and data mining, since every continuous datum; e.g., a real-valued datum obtained by observation in the real world, must be discretized and converted from analog (continuous) to digital (discrete) form to store in databases. However, most machine learning methods do not pay attention to the situation; i.e., they use digital data in actual applications on a computer whereas assume analog data (usually vectors of real numbers) theoretically. To bridge the gap, we propose a novel measure of the difference between two sets of data, called the coding divergence, and unify two processes discretization and learning computationally. Discretization of continuous data is realized by a topological mapping (in the sense of mathematics) from the d-dimens...

Mahito Sugiyama, Akihiro Yamamoto

Real-time Traffic

Continuous Data | Discretization | JMLR 2010 | Machine |

claim paper

Post Info
More Details (n/a)

Added	19 May 2011
Updated	19 May 2011
Type	Journal
Year	2010
Where	JMLR
Authors	Mahito Sugiyama, Akihiro Yamamoto

Comments (0)

Sciweavers

The Coding Divergence for Measuring the Complexity of Separating Two Sets

Continuous Data | Discretization | JMLR 2010 | Machine |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers