Bulgarian National Corpus Project

14 years 3 months ago

Download www.lrec-conf.org

The paper presents Bulgarian National Corpus project (BulNC) - a large-scale, representative, online available corpus of Bulgarian. The BulNC is also a monolingual general corpus, fully morpho-syntactically (and partially semantically) annotated, and manually provided with detailed meta-data descriptions. Presently the Bulgarian National corpus consists of about 320 000 000 graphical words and includes more than 10 000 samples. Briefly the corpus structure and the accepted criteria for representativeness and well-balancing are presented. The query language for advance search of collocations and concordances is demonstrated with some examples - it allows to retrieve word combinations, ordered queries, inflexionally and semantically related words, part-of-speech tags, utilising Boolean operations and grouping as well. The BulNC already plays a significant role in natural language processing of Bulgarian contributing to scientific advances in spelling and grammar checking, word sense dis...

Svetla Koeva, Diana Blagoeva, Siya Kolkovska

Real-time Traffic

Bulgarian National Corpus | Education | LREC 2010 | Monolingual General Corpus | National Corpus Project |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Svetla Koeva, Diana Blagoeva, Siya Kolkovska

Comments (0)

Sciweavers

Bulgarian National Corpus Project

Bulgarian National Corpus | Education | LREC 2010 | Monolingual General Corpus | National Corpus Project |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers