A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

15 years 4 months ago

Download www.cse.ohio-state.edu

Several recent discourse parsers have employed fully-supervised machine learning approaches. These methods require human annotators to beforehand create an extensive training corpus, which is a time-consuming and costly process. On the other hand, unlabeled data is abundant and cheap to collect. In this paper, we propose a novel semi-supervised method for discourse relation classification based on the analysis of cooccurring features in unlabeled data, which is then taken into account for extending the feature vectors given to a classifier. Our experimental results on the RST Discourse Treebank corpus and Penn Discourse Treebank indicate that the proposed method brings a significant improvement in classification accuracy and macro-average F-score when small training datasets are used. For instance, with training sets of c.a. 1000 labeled instances, the proposed method brings improvements in accuracy and macro-average F-score up to 50% compared to a baseline classifier. We believe that...

Hugo Hernault, Danushka Bollegala, Mitsuru Ishizuk

Real-time Traffic

Discourse | Discourse Treebank | EMNLP 2010 | Macro-average F-score | Natural Language Processing |

claim paper

Added	11 Feb 2011
Updated	11 Feb 2011
Type	Journal
Year	2010
Where	EMNLP
Authors	Hugo Hernault, Danushka Bollegala, Mitsuru Ishizuka

Sciweavers

A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

Discourse | Discourse Treebank | EMNLP 2010 | Macro-average F-score | Natural Language Processing |

Explore & Download

Productivity Tools

Sciweavers