Automatic Term Extraction Using Log-Likelihood Based Comparison with General Reference Corpus

15 years 11 months ago

Download www.g-sidorov.org

Abstract. In the paper we present a method that allows an extraction of singleword terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.

Alexander F. Gelbukh, Grigori Sidorov, Eduardo Lav

Real-time Traffic

General Reference Corpus | Multi-word Term Extraction | Natural Language Processing | NLDB 2010 | Singleword Terms |

claim paper

» Automatic extraction of bilingual terms from a ChineseJapanese parallel corpus

» Automatic Term Recognition Based on the Statistical Differences of Relative Frequencies in...

» A comparison of grapheme and phonemebased units for Spanish spoken term detection

» Sentence Reduction for Automatic Text Summarization

» Harvesting MultiWord Expressions from Parallel Corpora

» Classification of raster maps for automatic feature extraction

» Abbreviation definition identification based on automatic precision estimates

» BOEMIE OntologyBased Text Annotation Tool

Post Info
More Details (n/a)

Added	20 Jul 2010
Updated	20 Jul 2010
Type	Conference
Year	2010
Where	NLDB
Authors	Alexander F. Gelbukh, Grigori Sidorov, Eduardo Lavin-Villa, Liliana Chanona-Hernández

Comments (0)

Sciweavers

Automatic Term Extraction Using Log-Likelihood Based Comparison with General Reference Corpus

General Reference Corpus | Multi-word Term Extraction | Natural Language Processing | NLDB 2010 | Singleword Terms |

Explore & Download

Productivity Tools

Sciweavers