Disability of visual text reading has a huge impact on the quality of life for visually disabled people. One of the most anticipated devices is a wearable camera capable of finding text regions in natural scenes and translating the text into another representation such as speech or braille. In order to develop such a device, text tracking in video sequences is required as well as text detection. The device needs to group homogeneous text regions to avoid multiple and redundant speech syntheses or braille conversions. An automatic text image selection is also required for better character recognition and timely text message presentation. We have developed a prototype system equipped with a head-mounted video camera. Particle filter is employed for fast and robust text tracking. We have tested the performance of our system using 1,730 video frames of hall ways with 27 signboards. The number of text candidate regions