Sciweavers

ICMLA
2007

Semi-Supervised Active Learning for Modeling Medical Concepts from Free Text

14 years 26 days ago
Semi-Supervised Active Learning for Modeling Medical Concepts from Free Text
We apply a new active learning formulation to the problem of learning medical concepts from unstructured text. The new formulation is based on maximizing the mutual information that a sample labeling provides about the retrieval/classification model. This methodology is related to and extends the Query-by-Committee approach (QBC) [12] by exploiting unlabeled data in novel ways, beyond their common use only as potential query points. Unlike QBC, this method allows us to employ unlabeled data in addition to labeled data in order to select more appropriate samples for labeling. The samples thus chosen are both informative and also relevant according to a distribution of interest. This flexibility allows us to also tailor the model to arbitrary distributions relevant to the task at hand, in particular to the distribution of the test data. This formulation has implications in scenarios where the training and test distributions are different, or when a general model is adapted to a more s...
Rómer Rosales, Praveen Krishnamurthy, R. Bh
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where ICMLA
Authors Rómer Rosales, Praveen Krishnamurthy, R. Bharat Rao
Comments (0)