Sciweavers

63 search results - page 4 / 13
» Large Linguistically-Processed Web Corpora for Multiple Lang...
Sort
View
ACL
2008
13 years 9 months ago
Smoothing a Tera-word Language Model
Frequency counts from very large corpora, such as the Web 1T dataset, have recently become available for language modeling. Omission of low frequency n-gram counts is a practical ...
Deniz Yuret
EMNLP
2009
13 years 5 months ago
Polylingual Topic Models
Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive colle...
David M. Mimno, Hanna M. Wallach, Jason Naradowsky...
ICASSP
2009
IEEE
14 years 2 months ago
Leveraging multiple query logs to improve language models for spoken query recognition
A voice search system requires a speech interface that can correctly recognize spoken queries uttered by users. The recognition performance strongly relies on a robust language mo...
Xiao Li, Patrick Nguyen, Geoffrey Zweig, Dan Bohus
LREC
2010
150views Education» more  LREC 2010»
13 years 9 months ago
Constructing and Using Broad-coverage Lexical Resource for Enhancing Morphological Analysis of Arabic
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy and the performance of NLP applications. We are constructing a broad-coverage ...
Majdi Sawalha, Eric Atwell
VLDB
1999
ACM
134views Database» more  VLDB 1999»
13 years 12 months ago
Capturing and Querying Multiple Aspects of Semistructured Data
Motivated to a large extent by the substantial and growing prominence of the World-Wide Web and the potential benefits that may be obtained by applying database concepts and tech...
Curtis E. Dyreson, Michael H. Böhlen, Christi...