Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life span

15 years 23 days ago

Download www.biomedcentral.com

Background: The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of molecular sequence and profiling data. Here, the potential of such modeling is demonstrated by examining the 5,225 free-text items in the Caenorhabditis Genetic Center (CGC) Bibliography using techniques from statistical information retrieval. Items in the CGC biomedical text corpus were modeled using the Latent Dirichlet Allocation (LDA) model. LDA is a hierarchical Bayesian model which represents a document as a random mixture over latent topics; each topic is characterized by a distribution over words. Results: An LDA model estimated from CGC items had better predictive performance than two standard models (unigram and mixture of unigrams) trained using the same data. To illustrate the practical utility of LDA models of biomedical corpora, a trained CGC LDA model was used for a retrospective study of nemato...

David M. Blei, K. Franks, Michael I. Jordan, I. Sa

Real-time Traffic

Biomedical Corpora | BMCBI 2006 | CGC LDA Model | LDA Model |

claim paper

Post Info
More Details (n/a)

Added	10 Dec 2010
Updated	10 Dec 2010
Type	Journal
Year	2006
Where	BMCBI
Authors	David M. Blei, K. Franks, Michael I. Jordan, I. Saira Mian

Comments (0)

Sciweavers

Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life span

Biomedical Corpora | BMCBI 2006 | CGC LDA Model | LDA Model |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers