—Linguistic summarization (LS) is a data mining or knowledge discovery approach to extract patterns from databases. Many authors have used this technique to generate summaries like “Most senior workers have high salary,” which can be used to better understand and communicate about data; however, few of them have used it to generate IF-THEN rules like “IF X is large and Y is medium, THEN Z is small,” which not only facilitate understanding and communication of data, but also can be used in decision-making. In this paper an LS approach to generate IF-THEN rules for causal databases is proposed. Both type-1 and interval type-2 fuzzy sets are considered. Five quality measures – the degrees of truth, sufficient coverage, reliability, outlier and simplicity – are defined. Among them, the degree of reliability is especially valuable for finding the most reliable and representative rules, and the degree of outlier can be used to identify outlier rules and data for close-up inv...
Dongrui Wu, Jerry M. Mendel