Syntactic Annotations for the Google Books NGram Corpus

13 years 9 months ago

Download www.petrovi.de

We present a new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of ﬁve centuries, in eight languages; it reﬂects 6% of all books ever published. This new edition introduces syntactic annotations: words are tagged with their part-of-speech, and headmodiﬁer relationships are recorded. The annotations are produced automatically with statistical models that are speciﬁcally adapted to historical text. The corpus will facilitate the study of linguistic trends, especially those related to the evolution of syntax.

Yuri Lin, Jean-Baptiste Michel, Erez Aiden Lieberm

Real-time Traffic

ACL 2012 | Computational Linguistics | Ects | Part Of Speech | Statistical Models |

claim paper

» Coupled temporal scoping of relational facts

Post Info
More Details (n/a)

Added	29 Sep 2012
Updated	29 Sep 2012
Type	Journal
Year	2012
Where	ACL
Authors	Yuri Lin, Jean-Baptiste Michel, Erez Aiden Lieberman, Jon Orwant, Will Brockman, Slav Petrov

Comments (0)

Sciweavers

Syntactic Annotations for the Google Books NGram Corpus

ACL 2012 | Computational Linguistics | Ects | Part Of Speech | Statistical Models |

Explore & Download

Productivity Tools

Sciweavers