Advances in medical imaging techniques and devices has resulted in increased use of imaging in monitoring disease progression in patients. However, extracting decision-enabling information from the resulting longitudinal multi-modal image sets poses a challenge. Radiologists often have to manually identify and quantify certain regions of interest in the longitudinal image sets, which bear upon the patient’s condition. As the number of patients increases, the number of longitudinal multi-modal images grows, and the manual annotation and quantification of pathological concepts quickly becomes impractical. In this paper we explore how minimal annotations provided by the user at a few time points can be effectively leveraged to automatically annotate data in the entire multi-modal longitudinal image sets. In particular, we investigate the required number of annotated images per time point and across time for obtaining reasonable results for the entire image set, and what multi-modal cu...