Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
Many algorithm visualizations have been created, but little is known about which features are most important to their success. We believe that pedagogically useful visualizations ...
Purvi Saraiya, Clifford A. Shaffer, D. Scott McCri...
Labeling text data is quite time-consuming but essential for automatic text classification. Especially, manually creating multiple labels for each document may become impractical ...
Abstract. This paper addresses a task of variable selection which consists in choosing a subset of variables that is sufficient to predict the target label well. Here instead of tr...
We attack the task of predicting which news-stories are more appealing to a given audience by comparing ‘most popular stories’, gathered from various online news outlets, over ...
Elena Hensinger, Ilias N. Flaounas, Nello Cristian...