Traditional wisdom holds that once documents are turned into bag-of-words (unigram count) vectors, word orders are completely lost. We introduce an approach that, perhaps surprisi...
Xiaojin Zhu, Andrew B. Goldberg, Michael Rabbat, R...
The new wrapper model for extractiong text data from HTML documents is introduced. The Kushmerick's wrapper class (Kusshmerick 2000) may be unsuccessful in the case that suff...
We introduce a new method for automatically constructing concept hierarchies where the concept nodes follow a generalization / specialization relation. Starting from a set of conc...
A traditional goal of Artificial Intelligence research has been a system that can read unrestricted natural language texts on a given topic, build a model of that topic and reason...
Ken Barker, Bhalchandra Agashe, Shaw Yi Chaw, Jame...
Some discourse structures such as enumerative structures have typographical, punctuational and laying out characteristics which (1) make them easily identifiable and (2) convey hi...