— To approximate complex data, we propose new type of low-dimensional “principal object”: principal cubic complex. This complex is a generalization of linear and nonlinear principal manifolds and includes them as a particular case. To construct such an object, we combine the method of topological grammars with the minimization of elastic energy defined for its embedment into multidimensional data space. The whole complex is presented as a system of nodes and springs and as a product of one-dimensional continua (represented by graphs), and the grammars describe how these continua transform during the process of optimal complex construction. The simplest case of a topological grammar (“add a node or bisect an edge”) produces “principal trees” that are useful in many practical applications. We demonstrate how this can be applied to the analysis of bacterial genomes and for visualization of microarray data using “metro map” visual representation.
Alexander N. Gorban, Neil R. Sumner, Andrei Yu. Zi