This paper presents a new method for building domain-specific web search engines. Previous methods eliminate irrelevant documents from the pages accessed using heuristics based on...
Experiments were conducted to test several hypotheses on methods for improving document classification for the malicious insider threat problem within the Intelligence Community. ...
Interaction with large unstructured datasets is difficult because existing approaches, such as keyword search, are not always suited to describing concepts corresponding to the di...
Saleema Amershi, James Fogarty, Ashish Kapoor, Des...
We present a study of new word identification (NWI) to improve the performance of a Chinese word segmenter. In this paper the distribution and types of new words are discussed emp...
Sparse coding--that is, modelling data vectors as sparse linear combinations of basis elements--is widely used in machine learning, neuroscience, signal processing, and statistics...
Julien Mairal, Francis Bach, Jean Ponce, Guillermo...