In our prior work, we introduced a generalization of the multiple-instance learning (MIL) model in which a bag's label is not based on a single instance's proximity to a single target point. Rather, a bag is positive if and only if it contains a collection of instances, each near one of a set of target points. This generalized model is much more expressive than the conventional multipleinstance model, and our first algorithm in this model had significantly lower generalization error on several applications when compared to algorithms in the conventional MIL model. However, our learning algorithm for this model required significant time and memory to run. Here we present and empirically evaluate a new algorithm, testing it on data from drug discovery and content-based image retrieval. Our experimental results show that it has the same generalization ability as our previous algorithm, but requires much less computation time and memory.
Qingping Tao, Stephen D. Scott