Simple integrative preprocessing preserves what is shared in data sources

15 years 6 months ago

Download www.biomedcentral.com

Background: Bioinformatics data analysis toolbox needs general-purpose, fast and easily interpretable preprocessing tools that perform data integration during exploratory data analysis. Our focus is on vector-valued data sources, each consisting of measurements of the same entity but on different variables, and on tasks where source-specific variation is considered noisy or not interesting. Principal components analysis of all sources combined together is an obvious choice if it is not important to distinguish between data source-specific and shared variation. Canonical Correlation Analysis (CCA) focuses on mutual dependencies and discards source-specific "noise" but it produces a separate set of components for each source. Results: It turns out that components given by CCA can be combined easily to produce a linear and hence fast and easily interpretable feature extraction method. The method fuses together several sources, such that the properties they share are preserved. ...

Abhishek Tripathi, Arto Klami, Samuel Kaski

Real-time Traffic

Analysis | BMCBI 2008 | Exploratory Data Analysis | Vector-valued Data Sources |

claim paper

Added	09 Dec 2010
Updated	09 Dec 2010
Type	Journal
Year	2008
Where	BMCBI
Authors	Abhishek Tripathi, Arto Klami, Samuel Kaski

Sciweavers

Simple integrative preprocessing preserves what is shared in data sources

Analysis | BMCBI 2008 | Exploratory Data Analysis | Vector-valued Data Sources |

Explore & Download

Productivity Tools

Sciweavers