Sciweavers

NAACL
2010

Not All Seeds Are Equal: Measuring the Quality of Text Mining Seeds

13 years 10 months ago
Not All Seeds Are Equal: Measuring the Quality of Text Mining Seeds
Open-class semantic lexicon induction is of great interest for current knowledge harvesting algorithms. We propose a general framework that uses patterns in bootstrapping fashion to learn open-class semantic lexicons for different kinds of relations. These patterns require seeds. To estimate the goodness (the potential yield) of new seeds, we introduce a regression model that considers the connectivity behavior of the seed during bootstrapping. The generalized regression model is evaluated on six different kinds of relations with over 10000 different seeds for English and Spanish patterns. Our approach reaches robust performance of 90% correlation coefficient with 15% error rate for any of the patterns when predicting the goodness of seeds.
Zornitsa Kozareva, Eduard H. Hovy
Added 14 Feb 2011
Updated 14 Feb 2011
Type Journal
Year 2010
Where NAACL
Authors Zornitsa Kozareva, Eduard H. Hovy
Comments (0)