Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe...
Proactive learning is a generalization of active learning designed to relax unrealistic assumptions and thereby reach practical applications. Active learning seeks to select the m...
Document understanding techniques such as document clustering and multi-document summarization have been receiving much attention in recent years. Current document clustering meth...
Dingding Wang, Shenghuo Zhu, Tao Li, Yun Chi, Yiho...
Similarity search in metric spaces is a general paradigm that can be used in several application fields. It can also be effectively exploited in content-based image retrieval syst...
Today, valuable business information is increasingly stored as unstructured data (documents, emails, etc.). For example, documents exchanged between business partners capture info...