Topic distillation aims at finding key resources which are high-quality pages for certain topics. With analysis in non-content features of key resources, a pre-selection method is ...
We report on a study that was undertaken to better identify users' goals behind web search queries by using click through data. Based on user logs which contain over 80 millio...
Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unse...
Estimating insurance premia from data is a difficult regression problem for several reasons: the large number of variables, many of which are discrete, and the very peculiar shape...
Nicolas Chapados, Yoshua Bengio, Pascal Vincent, J...
The problem of identifying approximately duplicate objects in databases is an essential step for the information integration process. Most existing approaches have relied on gener...