Data models and encoding formats for syntactically annotated text corpora need to deal with syntactic ambiguity; underspecified representations are particularly well suited for th...
This paper presents a context sensitive spell checking system that uses mixed trigram models, and introduces a new empirically grounded method for building confusion sets. The pro...
This paper deals with the treatment of constructed neologisms in a machine translation system. It focuses on a particular issue in Romance languages: relational adjectives and the...
Word sketches are part of the Sketch Engine corpus query system. They represent automatic, corpus-derived summaries of the words' grammatical and collocational behaviour. Bes...
Kremena Ivanova, Ulrich Heid, Sabine Schulte im Wa...
In this paper, a large-scale real-world speech database is introduced along with other multimedia driving data. We designed a data collection vehicle equipped with various sensors...
Tokenization is one of the initial steps done for almost any text processing task. It is not particularly recognized as a challenging task for English monolingual systems but it r...
A core ontology is a mid-level ontology which bridges the gap between an upper ontology and a domain ontology. Automatic Chinese core ontology construction can help quickly model ...
Reported speech in the form of direct and indirect reported speech is an important indicator of evidentiality in traditional newspaper texts, but also increasingly in the new medi...
A number of forensic studies published during the last 50 years report that intoxication with alcohol influences speech in a way that is made manifest in certain features of the s...
Florian Schiel, Christian Heinrich, Sabine Barf&uu...