Background: Large biological data sets, such as expression profiles, benefit from reduction of random noise. Principal component (PC) analysis has been used for this purpose, but it tends to remove small features as well as random noise. Results: We interpreted the PCs as a mere signal-rich coordinate system and sorted the squared PC-coordinates of each row in descending order. The sorted squared PC-coordinates were compared with the distribution of the ordered squared random noise, and PC-coordinates for insignificant contributions were treated as random noise and nullified. The processed data were transformed back to the initial coordinates as noise-reduced data. To increase the sensitivity of signal capture and reduce the effects of stochastic noise, this procedure was applied to multiple small subsets of rows randomly sampled from a large data set, and the results corresponding to each row of the data set from multiple subsets were averaged. We call this procedure Row-specific, So...
Joseph W. Foley, Fumiaki Katagiri