In this paper we describe an interdisciplinary collaboration between researchers in machine learning and oceanography. The collaboration was formed to study the problem of open ocean biome classification. Biomes are regions on Earth with similar climate (e.g., temperature and rainfall) and vegetation structure (e.g., grasslands, coniferous forests, and deserts). To discover biomes in the open ocean, we apply leading methods in high dimensional data analysis, clustering, and visualization to oceanographic measurements culled from multiple existing databases. We compare traditional approaches, such as k-means clustering and principal component analysis, to newer approaches such as Isomap and maximum variance unfolding. Our work provides the first quantitative classification of open ocean biomes from an automated statistical analysis of multivariate data. It also provides a valuable case study in the use (and misuse) of recently developed algorithms for high dimensional data analysis.
Joshua M. Lewis, Pincelli M. Hull, Kilian Q. Weinb