Sciweavers

ACL
2012

Syntactic Annotations for the Google Books NGram Corpus

12 years 2 months ago
Syntactic Annotations for the Google Books NGram Corpus
We present a new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages; it reflects 6% of all books ever published. This new edition introduces syntactic annotations: words are tagged with their part-of-speech, and headmodifier relationships are recorded. The annotations are produced automatically with statistical models that are specifically adapted to historical text. The corpus will facilitate the study of linguistic trends, especially those related to the evolution of syntax.
Yuri Lin, Jean-Baptiste Michel, Erez Aiden Lieberm
Added 29 Sep 2012
Updated 29 Sep 2012
Type Journal
Year 2012
Where ACL
Authors Yuri Lin, Jean-Baptiste Michel, Erez Aiden Lieberman, Jon Orwant, Will Brockman, Slav Petrov
Comments (0)