Current Data Mining techniques usually do not have a mechanism to automatically infer semantic features inherent in the data being “mined”. The semantics are either injected in the initial stages (by feature construction) or by interpreting the results produced by the algorithms. Both of these techniques have proved effective but require a lot of human effort. In many domains, semantic information is implicitly available and can be extracted automatically to improve data mining systems. In this paper, we present a case study of a system that is trained to extract semantic features for apparel products and populate a knowledge base with these products and features. We show that semantic features of these items can be successfully extracted by applying text learning techniques to the descriptions obtained from websites of retailers. We also describe several applications of such a knowledge base of product semantics that we have built including recommender systems and competitive int...
Rayid Ghani, Andrew E. Fano