Web spam research has been hampered by a lack of statistically significant collections. In this paper, we perform the first large-scale characterization of web spam using conten...
The Web is a hypertextual environment in permanent evolution. There are new technologies and Web publishing behaviors emerging everyday. This study presents trends on the evolutio...
There exist many interrelated information sources on the Internet that can be categorized into structured (database) and semistructured (documents). A key challenge is to integrat...
Long-term search history contains rich information about a user's search preferences. In this paper, we study statistical language modeling based methods to mine contextual i...
We present a simple linguistically-motivated method for characterizing the semantic relations that hold between two nouns. The approach leverages the vast size of the Web in order...