The paper presents Bulgarian National Corpus project (BulNC) - a large-scale, representative, online available corpus of Bulgarian. The BulNC is also a monolingual general corpus,...
While phrase-based statistical machine translation systems currently deliver state-of-theart performance, they remain weak on word order changes. Current phrase reordering models ...
Providing punctuation in speech transcripts not only improves readability, but it also helps downstream text processing such as information extraction or machine translation. In t...
To understand the subjective documents, for example, public comments on the government’s proposed regulation, opinion identification and classification is required. Rather than ...
Namhee Kwon, Liang Zhou, Eduard H. Hovy, Stuart W....
Sense tagged corpus plays a very crucial role to Natural Language Processing, especially on the research of word sense disambiguation and natural language understanding. Having a l...