We present a robust parser which is trained on a treebank of ungrammatical sentences. The treebank is created automatically by modifying Penn treebank sentences so that they conta...
Jennifer Foster, Joachim Wagner, Josef van Genabit...
Human-quality text summarization systems are di cult to design, and even more di cult to evaluate, in part because documents can di er along several dimensions, such as length, wri...
Jade Goldstein, Mark Kantrowitz, Vibhu O. Mittal, ...
The importance of text mining stems from the availability of huge volumes of text databases holding a wealth of valuable information that needs to be mined. Text categorization is...
A new dictionary-based text categorization approach is proposed to classify the chemical web pages efficiently. Using a chemistry dictionary, the approach can extract chemistry-re...
Chunyan Liang, Li Guo, Zhaojie Xia, Feng-Guang Nie...
Due to the globalization on the Web, many companies and institutions need to efficiently organize and search repositories containing multilingual documents. The management of the...