This paper presents a complete framework for attention-based video streaming for low bandwidth networks. First, motivated by the fovea-periphery distinction of biological vision systems, the incoming image is partitioned into foveal and periphery regions using an application specific attention function. The attention function is constructed a priori using combinatorial optimization integrated with a back-propagation neural network. Next, a spatial-temporal coding algorithm that exploits the fovea-periphery differentiation is utilized. The foveal regions are encoded with high spatial resolution while the periphery regions are encoded with a lower spatial resolution. In addition, the encoding of the periphery regions has also a lower temporal resolution. Finally, the generated encoded video sequence is streamed using a standard streaming server. As an application, we consider the human face video transmission. Our experimental results indicate that even with a very limited amount of tra...
Çagatay Dikici, H. Isil Bozma