Automatic restoration of punctuation from unpunctuated text has application in improving the fluency and applicability of speech recognition systems. We explore the possibility t...
In many applications, good ranking is a highly desirable performance for a classifier. The criterion commonly used to measure the ranking quality of a classification algorithm is ...
Abstract. We introduce an implementation of a plain trigram part-of-speech tagger which appears to work well on Polish texts. At this moment the tagger achieves 9.4% error rate, wh...
This paper investigates the correlation between acoustic confidence scores as returned by speech recognizers with recognition quality. We report the results of two machine learni...
Short vowels and other diacritics are not part of written Arabic scripts. Exceptions are made for important political and religious texts and in scripts for beginning students of ...
In this paper, we describe our approach and results for high-level feature extraction task (HLF) at TRECVID2008. This year, our focus is to develop a framework which fuses a numbe...
This paper discusses automatic determination of case in Arabic. This task is an important part and major source of errors in full diacritization of Arabic. We use a goldstandard s...
Nizar Habash, Ryan Gabbard, Owen Rambow, Seth Kuli...
In this paper, we present several ways to measure and evaluate the annotation and annotators, proposed and used during the building of the Czech part of the Prague Czech-English D...
If the dataset available to machine learning results from cluster sampling (e.g. patients from a sample of hospital wards), the usual cross-validation error rate estimate can lead...
When data collection is costly and/or takes a significant amount of time, an early prediction of the classifier performance is extremely important for the design of the data minin...