Fusing partial estimates is a critical and common problem
in many computer vision tasks such as part-based detection
and tracking. It generally becomes complicated and
intractable when there are a large number of multimodal
partial estimates, and thus it is desirable to find an effective
and scalable fusion method to integrate these partial
estimates. This paper presents a novel and effective approach
to fusing multimodal partial estimates in a principled
way. In this new approach, fusion is related to a
computational geometry problem of finding the minimumvolume
orthotope, and an effective and scalable branch and
bound search algorithm is designed to obtain the global optimal
solution. Experiments on tracking articulated objects
and occluded objects show the effectiveness of the proposed
approach.