Language-Independent Methods for Compiling Monolingual Lexical Data

15 years 11 months ago

Download wortschatz.uni-leipzig.de

Abstract: In this paper we describe a flexible, portable and languageindependent infrastructure for setting up large monolingual language corpora. The approach is based on collecting a large amount of monolingual text from various sources. The input data is processed on the basis of a sentence-based text segmentation algorithm. We describe the entry structure of the corpus database as well as various query types and tools for information extraction. Among them, the extraction and usage of sentence-based word collocations is discussed in detail. Finally we give an overview of different applications for this language resource. A WWW interface allows for public access to most of the data and information extraction tools (http://wortschatz.uni-leipzig.de).

Christian Biemann, Stefan Bordag, Gerhard Heyer, U

Real-time Traffic

CICLING 2004 | Information Extraction | Large Monolingual Language | Natural Language Processing | Text Segmentation Algorithm |

claim paper

» On the accuracy of language trees

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2004
Where	CICLING
Authors	Christian Biemann, Stefan Bordag, Gerhard Heyer, Uwe Quasthoff, Christian Wolff

Comments (0)

Sciweavers

Language-Independent Methods for Compiling Monolingual Lexical Data

CICLING 2004 | Information Extraction | Large Monolingual Language | Natural Language Processing | Text Segmentation Algorithm |

Explore & Download

Productivity Tools

Sciweavers