This paper addresses the problem of 3D sound representation without sound source localization and proposes a theory based on the ray-space representation of light rays, which is independent of object’s specifications. An array of beam-formed microphone-arrays (MAs), are set and each MA generates a sound-image (SImage) by scanning the viewing range of a camera in the same location. SImage has the same size of an image and contains of blocks of sound wave with duration of one imageframe. Captured SImages with the array of MAs generate the sound wave ray-space. To make a dense SImage ray-space, we propose to use the geometry compensation of corresponding images in the location of each MA. By a dense sound ray-space, any virtual SImage, which corresponds to an arbitrary listeningpoint, can be generated. The listening-point sound is generated by averaging the sound wave in each pixel or group of pixel of the virtual SImage.