In the field of multimedia retrieval in video, text frame classification is essential for text detection, event detection, event boundary detection etc. We propose a new text frame classification method that introduces a combination of wavelet and median-moment with k-means clustering to select probable text blocks among 16 equally sized blocks of a video frame. The same feature combination is used with a new Max-Min Clustering at the pixel level to choose probable dominant text pixels in the selected probable text blocks. For the probable text pixels, a so-called Mutual Nearest Neighbor based Symmetry is explored with a four-quadrant formation centered at the centroid of the probable dominant text pixels to know whether a block is a true text block or not. If a frame produces at least one true text block then it is considered as a text frame otherwise it is a non-text frame. Experimental results on different text and non-text datasets including two public datasets and our own created...