Learning models for recognizing objects with few or no training examples is important, due to the intrinsic longtailed distribution of objects in the real world. In this paper, we...
Signature-driven spam detection provides an alternative to machine learning approaches and can be very effective when near-duplicates of essentially the same message are sent in h...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
We study dimensionality reduction or feature selection in text document categorization problem. We focus on the first step in building text categorization systems, that is the cho...
Random forests were introduced as a machine learning tool in Breiman (2001) and have since proven to be very popular and powerful for high-dimensional regression and classificatio...
Collections are a fundamental tool for reproducible evaluation of information retrieval techniques. We describe a new method for distributing the document lengths and term counts ...