Efficient out-of-vocabulary term detection by n-gram array indices with distance from a syllable lattice

14 years 10 months ago

Download mirlab.org

For spoken document retrieval, it is very important to consider Out-of-Vocabulary (OOV) and mis-recognition of spoken words. Therefore, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken document retrieval system that is robust for considering OOV words and mis-recognition of sub-units. We used individual syllables as sub-word unit in continuous speech recognition and an n-gram sequence of syllables in a recognized syllable-based lattice. We propose an n-gram indexing/retrieval method with distance in the syllable lattice for attacking OOV, recognition errors, and high speed retrieval. We applied this method to academic lecture presentation database of 44 hours, and 0.58(F-value) of the OOV words were detected in less than 2.5 milliseconds.

Keisuke Iwami, Yasuhisa Fujii, Kazumasa Yamamoto,

Real-time Traffic

ICASSP 2011 | Oov Words | Signal Processing | Spoken Document Retrieval | Sub-word Unit |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Keisuke Iwami, Yasuhisa Fujii, Kazumasa Yamamoto, Seiichi Nakagawa

Comments (0)

Sciweavers

Efficient out-of-vocabulary term detection by n-gram array indices with distance from a syllable lattice

ICASSP 2011 | Oov Words | Signal Processing | Spoken Document Retrieval | Sub-word Unit |

Explore & Download

Productivity Tools

Sciweavers