This paper presents a framework for directly addressing issues arising from self-occlusions and ambiguities due to the lack of depth information in vector-based representations. Visual data directly observed from an image are used to indirectly recover the parameters of an underlying dynamic model of an articulated object. The proposed framework allows us to learn the ambiguities of a representation from training examples. The resulting model is then used to measure the ambiguities of each estimated underlying model parameter given the available visual information. This provides an indication of how much we can "trust" the visual data for estimating certain parts of the model. We then provide a working example of multi-view data fusion for tracking 3D skeletons of articulated objects in a multi-camera environment.