Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
Text streams are becoming more and more ubiquitous, in the forms of news feeds, weblog archives and so on, which result in a large volume of data. An effective way to explore the...
Xiang Wang 0002, Kai Zhang, Xiaoming Jin, Dou Shen
Abstract. This paper presents a statistical framework based on Principal Component Analysis (PCA) for discovering the contextual factors which most strongly influence user behavio...
— As the academic world moves away from physical journals and proceedings towards online document repositories, the ability to efficiently locate work of interest among the torr...
Jayanthkumar Kannan, Beverly Yang, Scott Shenker, ...
In this paper, we discuss the findings of an in-depth observational study of reading and within-document navigation and add to these findings the results of a second analysis of h...