In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
Images are amongst the most widely proliferated form of digital information due to affordable imaging technologies and the Web. In such an environment, the use of digital watermar...
Redirection spam presents a web page with false content to a crawler for indexing, but automatically redirects the browser to a different web page. Redirection is usually immediat...
Web advertising (Online advertising), a form of advertising that uses the World Wide Web to attract customers, has become one of the world’s most important marketing channels. Th...
Combating Web spam is one of the greatest challenges for Web search engines. State-of-the-art anti-spam techniques focus mainly on detecting varieties of spam strategies, such as ...
Chao Wei, Yiqun Liu, Min Zhang, Shaoping Ma, Liyun...