Biomedical images are invaluable in medical education and establishing clinical diagnosis. Clinical decision support (CDS) can be improved by combining biomedical text with automatically annotated images extracted from relevant biomedical publications. In a previous study we reported 76.6% accuracy using supervised machine learning in automatically classifying images, by combining figure captions and image content to find clinical evidence. Image content extraction is traditionally applied on entire images or on pre-determined image regions. Figure images in articles vary greatly in modality and content, which limits the benefit of whole image extraction beyond gross categorization for CDS. However, image text and overlaid annotations identify the regions of interest (ROI) on the image that are referenced in the caption or discussion in the article text. We have previously reported 72.02% accuracy in text and symbols localization but in that experiment we did not exploit the reference...