Sciweavers

KDD
2003
ACM

Understanding captions in biomedical publications

14 years 12 months ago
Understanding captions in biomedical publications
From the standpoint of the automated extraction of scientific knowledge, an important but little-studied part of scientific publications are the figures and accompanying captions. Captions are dense in information, but also contain many extra-grammatical constructs, making them awkward to process with standard information extraction methods. We propose a scheme for "understanding" captions in biomedical publications by extracting and classifying "image pointers" (references to the accompanying image). We evaluate a number of automated methods for this task, including hand-coded methods, methods based on existing learning techniques, and methods based on novel learning techniques. The best of these methods leads to a usefully accurate tool for caption-understanding, with both recall and precision in excess of 94% on the most important single class in a combined extraction/classification task. General Terms Information extraction Keywords Information extraction, bioi...
William W. Cohen, Richard C. Wang, Robert F. Murph
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2003
Where KDD
Authors William W. Cohen, Richard C. Wang, Robert F. Murphy
Comments (0)