Annotation of large multilingual corpora remains a challenge to the data-driven approach to speech research, especially for under-resourced languages. This paper presents crosslan...
Multilingual corpora are valuable resources for cross-language information retrieval and are available in many language pairs. However the Persian language does not have rich multi...
We describe a web-based corpus query system, Glossa, which combines the expressiveness of regular query languages with the user-friendliness of a graphical interface. Since corpus...
Parallel corpora are indispensable resources for a variety of multilingual natural language processing tasks. This paper presents a technique for fully automatic construction of c...
The original Mixer corpus was designed to satisfy developing commercial and forensic needs. The resulting Mixer corpora, Phases 1 through 5, have evolved to support and increasing...
Linda Brandschain, Christopher Cieri, David Graff,...