Sciweavers

ICASSP
2010
IEEE

Syntactic and sub-lexical features for Turkish discriminative language models

13 years 11 months ago
Syntactic and sub-lexical features for Turkish discriminative language models
This paper investigates syntactic and sub-lexical features in Turkish discriminative language models (DLMs). DLM is a featurebased language modeling approach. It reranks the ASR output with discriminatively trained feature parameters. Syntactic information is incorporated into DLM as part-of-speech (PoS) tag n-gram features and head-to-head dependency relations. Sub-lexical units are first utilized as language modeling units in the baseline recognizer. Then, sub-lexical features are used to rerank the sub-lexical hypotheses. We explore features, similar to syntactic features, on sub-lexical units to reveal the implicit morpho-syntactic information conveyed by these units. We find out that DLM yields more improvement for sub-lexical units than for words. Basic sub-lexical n-gram features result in 0.6% reduction over the baseline and morpho-syntactic features yield an additional 0.4% reduction on the test set.
Ebru Arisoy, Murat Saraclar, Brian Roark, Izhak Sh
Added 06 Dec 2010
Updated 06 Dec 2010
Type Conference
Year 2010
Where ICASSP
Authors Ebru Arisoy, Murat Saraclar, Brian Roark, Izhak Shafran
Comments (0)