Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

160

LREC
2008

70views Education» more LREC 2008»

Process Model for Composing High-quality Text Corpora

15 years 8 months ago

Process Model for Composing High-quality Text Corpora

Download www.lrec-conf.org

The Teko corpus composing model offers a decentralized, dynamic way of collecting high-quality text corpora for linguistic research. The resulting corpus consists of independent text sets. The sets are composed in cooperation with linguistic research projects, so each of them responds to a specific research need. The corpora are morphologically annotated and XML-based, with in-built compatibilty with the Kaino user interface used in the corpus server of the Research Institute for the Languages of Finland. Furthermore, software for extracting standard quantitative reports from the text sets has been created during the project. The paper describes the project, and estimates its benefits and problems. It also gives an overview of the technical qualities of the corpora and corpus interface connected to the Teko project.

Mikko Lounela

Real-time Traffic

Education | Linguistic Research | LREC 2008 | Teko Corpus | Text Sets |

claim paper

Related Content

» Learning Sentential Paraphrases from Bilingual Parallel Corpora for TexttoText Generation

» Automatic Acquisition of ChineseEnglish Parallel Corpus from the Web

» Finite State Models for the Generation of Large Corpora of Natural Language Texts

» Building a Web Corpus of Czech

» Active Learning for Multilingual Statistical Machine Translation

» Multilingual Term Extraction from Domainspecific Corpora Using Morphological Structure

» Evolutionary hierarchical dirichlet processes for multiple correlated timevarying corpora

» Learning authortopic models from text corpora

» Improving the estimation of relevance models using large external corpora

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Mikko Lounela

Comments (0)