For many audio-visual applications, the integration and synchronization of audio and video signals is essential. The objective of this paper is to develop a system that displays the active objects in the captured video signal, integrated with their respective audio signals in the form of text. The video and audio signals are captured and processed separately. The signals are buffered and integrated and synchronized using a time-stamping technique. Time-stamps provide the timing information for each of the audio and video processes, the speech recognition and the object detection, respectively. This information is necessary to correlate the audio packets to the video frames. Hence, integration is achieved without the use of video information, such as lip movements. The results obtained are based on a specific implementation of the speech recognition module, which is determined to be the bottleneck process in the proposed system. Keywords— Video processing, audio processing, signals...