Abstract. This paper explores techniques in the pipeline of image description based on visual codebooks suitable for video on-line processing. The pipeline components are (i) extraction and description of local image features, (ii) translation of each high-dimensional feature descriptor to several most appropriate visual words selected from the discrete codebook and (iii) combination of visual words into bag-of-words using hard or soft assignment weighting scheme. For each component, several state-of-the-art techniques are analyzed and discussed and their usability for video on-line processing is addressed. The experiments are evaluated on the standard Kentucky and Oxford building datasets using image retrieval framework. The results show the impact loosing the pipeline precision in the price of improving the time cost which is crucial for real-time video processing.