Comparison of Word and Subword Indexing Techniques for Mandarin Chinese Spoken Document Retrieval

15 years 11 months ago

Download www.iis.sinica.edu.tw

In this paper, we investigate the use of words and subwords (including both characters and syllables) in audio indexing for Mandarin Chinese spoken document retrieval. Two retrieval approaches, including the well-known vector space model approach and the newly proposed HMM/Ngram-based approach, are used in the present work. We focus on the use of an entire Chinese textual story (from a newspaper) as a query to retrieve Mandarin Chinese spoken documents (from news broadcasts). Experiments are based on the Topic Detection and Tracking Corpora.

Hsin-Min Wang, Berlin Chen

Real-time Traffic

Chinese Spoken Documents | Mandarin Chinese | Multimedia | PCM 2001 | Spoken Document Retrieval |

claim paper

» Extractive spoken document summarization for information retrieval

» Word Topical Mixture Models for Extractive Spoken Document Summarization

Post Info
More Details (n/a)

Added	30 Jul 2010
Updated	30 Jul 2010
Type	Conference
Year	2001
Where	PCM
Authors	Hsin-Min Wang, Berlin Chen

Comments (0)

Sciweavers

Comparison of Word and Subword Indexing Techniques for Mandarin Chinese Spoken Document Retrieval

Chinese Spoken Documents | Mandarin Chinese | Multimedia | PCM 2001 | Spoken Document Retrieval |

Explore & Download

Productivity Tools

Sciweavers