Background: Metabolomics, petroleum and biodiesel chemistry, biomarker discovery, and other fields which rely on high-resolution profiling of complex chemical mixtures generate datasets which contain millions of detector intensity readings, each uniquely addressed along dimensions of time (e.g., retention time of chemicals on a chromatographic column), a spectral value (e.g., mass-to-charge ratio of ions derived from chemicals), and the analytical run number. They also must rely on data preprocessing techniques. In particular, inter-run variance in the retention time of chemical species poses a significant hurdle that must be cleared before feature extraction, data reduction, and knowledge discovery can ensue. Alignment methods, for calibrating retention reportedly (and in our experience) can misalign matching chemicals, falsely align distinct ones, be unduly sensitive to chosen values of input parameters, and result in distortions of peak shape and area. Results: We present an iterat...
Minho Chae, Robert J. Shmookler Reis, John J. Thad