The retrieval of similar documents from large scale datasets has been the one of the main concerns in knowledge management environments, such as plagiarism detection, news impact a...
Felipe Bravo-Marquez, Gaston L'Huillier, Sebasti&a...
: In TREC2004, Dublin City University took part in three tracks, Terabyte (in collaboration with University College Dublin), Genomic and Novelty. In this paper we will discuss each...
Stephen Blott, Fabrice Camous, Paul Ferguson, Geor...
Combating Web spam has become one of the top challenges for Web search engines. State-of-the-art spam detection techniques are usually designed for specific known types of Web spa...
Yiqun Liu, Rongwei Cen, Min Zhang, Shaoping Ma, Li...
Abstract. Since current search engines employ link-based ranking algorithms as an important tool to decide a ranking of sites, Web spammers are making a significant effort to man...
Millions of users retrieve information from the Internet using search engines. Mining these user sessions can provide valuable information about the quality of user experience and...