This paper proposes a novel hybrid method to robustly and accurately localize texts in natural scene images. A text region detector is designed to generate a text confidence map,...
We demonstrate a phonotactic-semantic paradigm for spoken document categorization. In this framework, we define a set of acoustic words instead of lexical words to represent acous...
Active learning is well-suited to many problems in natural language processing, where unlabeled data may be abundant but annotation is slow and expensive. This paper aims to shed ...
Document representation and indexing is a key problem for document analysis and processing, such as clustering, classification and retrieval. Conventionally, Latent Semantic Index...
An N-gram language model aims at capturing statistical word order dependency information from corpora. Although the concept of language models has been applied extensively to handl...