The real di culty in development of practical NLP systems comes from the fact that we do not have e ective means for gathering \knowledge". In this paper, we propose an algorithm which acquires automatically knowledge of semantic collocations among \words" from sample corpora. The algorithm proposed in this paper tries to discover semantic collocations which will be useful for disambiguating structurally ambiguous sentences, by a statistical approach. The algorithm requires a corpus and minimum linguistic knowledge (parts-of-speech of words, simple in ection rules, and a small number of general syntactic rules). We conducted two experiments of applying the algorithm to di erent corpora to extract different types of semantic collocations. Though there are some unsolved problems, the results showed the e ectiveness of the proposed algorithm.
Satoshi Sekine, Jeremy J. Carroll, Sophia Ananiado