Mining informative patterns from very large, dynamically changing databases poses numerous interesting challenges. Data summarizations (e.g., data bubbles) have been proposed to c...
With the increased abilities for automated data collection made possible by modern technology, the typical sizes of data collections have continued to grow in recent years. In suc...
In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) t...
Many real-world datasets can be clustered along multiple dimensions. For example, text documents can be clustered not only by topic, but also by the author's gender or sentim...
chool of Business Research – FY 2007 Research Abstracts Dare to Share: Protecting Sensitive Knowledge with Data Sanitization Data sanitization is a process that is used to promot...