Sciweavers

ENGL
2007

Biological Data Mining for Genomic Clustering Using Unsupervised Neural Learning

14 years 12 days ago
Biological Data Mining for Genomic Clustering Using Unsupervised Neural Learning
— The paper aims at designing a scheme for automatic identification of a species from its genome sequence. A set of 64 three-tuple keywords is first generated using the four types of bases: A, T, C and G. These keywords are searched on N randomly sampled genome sequences, each of a given length (10,000 elements) and the frequency count for each of the 4 = 64 keywords is performed to obtain a DNA-descriptor for each sample. Principal Component analysis is then employed on the DNA-descriptors for N sampled instances. The principal component analysis yields a unique feature descriptor for identifying the species from its genome sequence. The variance of the descriptors for a given genome sequence being negligible, the proposed scheme finds extensive applications in automatic species identification. An alternative approach to automatic species classification and identification of species using Self-Organizing Feature Map is also discussed in the paper. The computational map is trained by...
Shreyas Sen, Seetharam Narasimhan, Amit Konar
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2007
Where ENGL
Authors Shreyas Sen, Seetharam Narasimhan, Amit Konar
Comments (0)