This paper reports a technique for Knowledge Extraction using Natural Language Processing for the purposes of semi-automatic Ontology learning. Determination of significant words in a relevant collection of text is an important first step in building ontologies from natural language text. However, terminology identification is also a slow and expensive process requiring terminological and domain expertise. We report experiments with three different document collections comparing word frequency distributions over documents against a reference corpus representing more general subject matter.
Dileep G. Damle, Victoria S. Uren