Sciweavers

BMCBI
2008

Simple integrative preprocessing preserves what is shared in data sources

13 years 11 months ago
Simple integrative preprocessing preserves what is shared in data sources
Background: Bioinformatics data analysis toolbox needs general-purpose, fast and easily interpretable preprocessing tools that perform data integration during exploratory data analysis. Our focus is on vector-valued data sources, each consisting of measurements of the same entity but on different variables, and on tasks where source-specific variation is considered noisy or not interesting. Principal components analysis of all sources combined together is an obvious choice if it is not important to distinguish between data source-specific and shared variation. Canonical Correlation Analysis (CCA) focuses on mutual dependencies and discards source-specific "noise" but it produces a separate set of components for each source. Results: It turns out that components given by CCA can be combined easily to produce a linear and hence fast and easily interpretable feature extraction method. The method fuses together several sources, such that the properties they share are preserved. ...
Abhishek Tripathi, Arto Klami, Samuel Kaski
Added 09 Dec 2010
Updated 09 Dec 2010
Type Journal
Year 2008
Where BMCBI
Authors Abhishek Tripathi, Arto Klami, Samuel Kaski
Comments (0)