This work extends a semi-automatic grammar induction approach previously proposed in [1]. We investigate the use of Information Gain (IG) in place of Mutual Information (MI) for grammar induction based on an unannotated training corpus. Experiments using the ATIS-3 training corpus indicate that the use of IG led to better precision and recall of desired semantic categories and at earlier stages in the grammar induction process when compared MI. We also investigate methods to automatically terminate the iterative grammar induction algorithm for grammar output. We define the stopping criterion to be where relative increment in grammar coverage scants 1%. Grammar coverage is measured in terms of coverage of the training corpus vocabulary. We obtain an output grammar based on this extended semi-automatic grammar induction algorithm with automatic termination. This grammar compares favorably with the handcrafted and semi-automatic grammars from [1] based on NLU performance using the ATIS-3...
Chin-Chung Wong, Helen M. Meng, Kai-Chung Siu