Sciweavers

EMNLP
2009

A Rich Feature Vector for Protein-Protein Interaction Extraction from Multiple Corpora

13 years 10 months ago
A Rich Feature Vector for Protein-Protein Interaction Extraction from Multiple Corpora
Because of the importance of proteinprotein interaction (PPI) extraction from text, many corpora have been proposed with slightly differing definitions of proteins and PPI. Since no single corpus is large enough to saturate a machine learning system, it is necessary to learn from multiple different corpora. In this paper, we propose a solution to this challenge. We designed a rich feature vector, and we applied a support vector machine modified for corpus weighting (SVM-CW) to complete the task of multiple corpora PPI extraction. The rich feature vector, made from multiple useful kernels, is used to express the important information for PPI extraction, and the system with our feature vector was shown to be both faster and more accurate than the original kernelbased system, even when using just a single corpus. SVM-CW learns from one corpus, while using other corpora for support. SVM-CW is simple, but it is more effective than other methods that have been successfully applied to other ...
Makoto Miwa, Rune Sætre, Yusuke Miyao, Jun-i
Added 17 Feb 2011
Updated 17 Feb 2011
Type Journal
Year 2009
Where EMNLP
Authors Makoto Miwa, Rune Sætre, Yusuke Miyao, Jun-ichi Tsujii
Comments (0)