Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

207

BIBE
2005
IEEE

82views Bioinformatics» more BIBE 2005»

Using Data Mining Techniques to Learn Layouts of Flat-File Biological Datasets

16 years 17 days ago

Using Data Mining Techniques to Learn Layouts of Flat-File Biological Datasets

Download www.cse.ohio-state.edu

One of the major problems in biological data integration is that many data sources are stored as ﬂat-ﬁles, with a variety of different layouts. Integrating data from such sources can be an extremely time-consuming task. We have been developing data mining techniques to help learn the layout of a dataset in a semi-automatic way. In this paper, we focus on the problem of identifying delimiters for optional ﬁelds. Since these ﬁelds do not occur in every record, frequency based methods are not able to identify the corresponding delimiters. We present a method which uses contrast analysis on the frequency of sequences to identify such delimiters and help complete the layout descriptions. We demonstrate the effectiveness of this technique using three ﬂat-ﬁle biological datasets.

Kaushik Sinha, Xuan Zhang, Ruoming Jin, Gagan Agra

Real-time Traffic

BIBE 2005 | Bioinformatics | Biological Data Integration | Data Mining Techniques | Frequency Based Methods |

claim paper

Related Content

» Crossspecies and crossplatform gene expression studies with the Bioconductorcompliant R pa...

» Mining gene expression data by interpreting principal components

» Biological Data Mining for Genomic Clustering Using Unsupervised Neural Learning

» Learning a complex metabolomic dataset using random forests and support vector machines

» The Use of Various Data Mining and Feature Selection Methods in the Analysis of a Populati...

» Are Zerosuppressed Binary Decision Diagrams Good for Mining Frequent Patterns in High Dime...

» Microarray data mining using landmark geneguided clustering

» Ownership protection of shape datasets with geodesic distance preservation

» Mining Gene Expression Data using Domain Knowledge

Post Info
More Details (n/a)

Added	24 Jun 2010
Updated	24 Jun 2010
Type	Conference
Year	2005
Where	BIBE
Authors	Kaushik Sinha, Xuan Zhang, Ruoming Jin, Gagan Agrawal

Comments (0)