Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

152

KCAP
2011
ACM

favoriteEmaildiscussreport

245views Information Technology» more KCAP 2011»

Language resources extracted from Wikipedia

14 years 5 months ago

Language resources extracted from Wikipedia

Download www.aifb.kit.edu

Wikipedia provides an interesting amount of text for more than hundred languages. This also includes languages where no reference corpora or other linguistic resources are easily available. We have extracted background language models built from the content of Wikipedia in various languages. The models generated from Simple and English Wikipedia are compared to language models derived from other established corpora. The diﬀerences between the models in regard to term coverage, term distribution and correlation are described and discussed. We provide access to the full dataset and create visualizations of the language models that can be used exploratory. The paper describes the newly released dataset for 33 languages, and the services that we provide on top of them. Categories and Subject Descriptors I.2.7 [Natural Language Processing]: Language models; I.2.6 [Learning]: Knowledge acquisition General Terms Languages, Measurement

Denny Vrandecic, Philipp Sorg, Rudi Studer

Real-time Traffic

English Wikipedia | Information Technology | KCAP 2011 | Language Models | Natural Language Processing |

claim paper

Related Content

» Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary

» Acquiring a Taxonomy from the German Wikipedia

» A Random Graph Walk based Approach to Computing Semantic Relatedness Using Knowledge from ...

» Wikipedia and the Web of Confusable Entities Experience from Entity Linking Query Creation...

» Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment

» Creating a Knowledge Base from a Collaboratively Generated Encyclopedia

» Spatiotemporal mapping of Wikipedia concepts

» FineGrained Geographical Relation Extraction from Wikipedia

» A General Method for Creating a Bilingual Transliteration Dictionary

Post Info
More Details (n/a)

Added	16 Sep 2011
Updated	16 Sep 2011
Type	Journal
Year	2011
Where	KCAP
Authors	Denny Vrandecic, Philipp Sorg, Rudi Studer

Comments (0)