While much of the data on the web is unstructured in nature, there is also a significant amount of embedded structured data, such as product information on e-commerce sites or sto...
The rapid growth of the Internet over the last decade has been startling. However, efforts to track its growth have often fallen afoul of bad data -- for instance, how much traffi...
Multinomial distributions over words are frequently used to model topics in text collections. A common, major challenge in applying all such topic models to any text mining proble...
Organizing textual documents into a hierarchical taxonomy is a common practice in knowledge management. Beside textual features, the hierarchical structure of directories reflect...
Yi Huang, Kai Yu, Matthias Schubert, Shipeng Yu, V...
Associative classification is a rule-based approach to classify data relying on association rule mining by discovering associations between a set of features and a class label. Su...