This paper presents a complete solution to estimating a scene’s 3D geometry and appearance from multiple 2D images by using a statistical inverse ray tracing method. Instead of matching image features/pixels across images, the inverse ray tracing approach models the image generation process directly and searches for the best 3D geometry and surface reflectance model to explain all the observations. Here the image generation process is modeled through volumetric ray tracing, where the occlusion/visibility is exactly modeled. All the constraints (including ray constraints and prior knowledge about the geometry) are put into the Ray Markov Random Field (Ray MRF) formulation, developed in [10]. Differently from [10], where the voxel colors are estimated independently of the voxel occupancies, in this work, both voxel occupancies and colors (i.e., both geometry and appearance) are modeled and estimated jointly in the same inversey ray tracing framework (Ray MRF + deep belief propagation...