The two most important tasks in information extraction from the Web are webpage structure understanding and natural language sentences processing. However, little work has been don...
Chunyu Yang, Yong Cao, Zaiqing Nie, Jie Zhou, Ji-R...
This paper describes a hybrid tagger/parser for French (FrAG), and presents results from ongoing development work, corpus annotation and evaluation. The core of the system is a se...
CzEng 0.9 is the third release of a large parallel corpus of Czech and English. For the current release, CzEng was extended by significant amount of texts from various types of so...
This paper describes a hybrid model that combines machine learning with linguistic heuristics for integrating unknown word identification with Chinese word segmentation. The model...
Temporal reasoners for document understanding typically assume that a document’s creation date is known. Algorithms to ground relative time expressions and order events often re...