This paper reports on a study involving the automatic extraction of Chinese legal terms. We used a word segmented corpus of Chinese court judgments to extract salient legal expressions with standard collocation learning techniques. Our method takes the characteristics of Chinese legal terms into account. The extracted terms were evaluated by human markers and compared against a legal term glossary manually compiled from the same set of data. Results show that at least 50% of the extracted terms are legally salient. Hence they may supplement the outcome and lighten the inconsistency of human efforts. Moreover, various types of significant knowledge in the legal context are mined from the data as a by-product.
Oi Yee Kwong, Benjamin K. Tsou