Two-stream indexing for spoken web search

15 years 1 months ago

Download ebiquity.umbc.edu

This paper presents two-stream processing of audio to index the audio content for Spoken Web search. The ﬁrst stream indexes the meta-data associated with a particular audio document. The meta-data is usually very sparse, but accurate. This therefore results in a high-precision, low-recall index. The second stream uses a novel language-independent speech recognition to generate text to be indexed. Owing to the multiple languages and the noise in user generated content on the Spoken Web, the speech recognition accuracy of such systems is not high, thus they result in a low-precision, high-recall index. The paper attempts to use these two complementary streams to generate a combined index to increase the precision-recall performance in audio content search. The problem of audio content search is motivated by the real world implication of the Web in developing regions, where due to literacy and aﬀordability issues, people use Spoken Web which consists of interconnected VoiceSites, wh...

Jitendra Ajmera, Anupam Joshi, Sougata Mukherjea,

Real-time Traffic

Audio Content Search | Audio Documents | Internet Technology | Spoken Web | WWW 2011 |

claim paper

» SpeechDriven Access to the Deep Web on Mobile Devices

» Web derived pronunciations for spoken term detection

» A comparison of grapheme and phonemebased units for Spanish spoken term detection

» Overview of VideoCLEF 2009 New Perspectives on SpeechBased Multimedia Content Enrichment

Post Info
More Details (n/a)

Added	29 May 2011
Updated	29 May 2011
Type	Journal
Year	2011
Where	WWW
Authors	Jitendra Ajmera, Anupam Joshi, Sougata Mukherjea, Nitendra Rajput, Shrey Sahay, Mayank Shrivastava, Kundan Srivastava

Comments (0)

Sciweavers

Two-stream indexing for spoken web search

Audio Content Search | Audio Documents | Internet Technology | Spoken Web | WWW 2011 |

Explore & Download

Productivity Tools

Sciweavers