This paper describes an intelligent agent to facilitate bitext mining from the Web via automatic discovery of URL pairing patterns (or keys) for retrieving parallel web pages. The...
In this paper we describe the semantic partitioner algorithm, that uses the structural and presentation regularities of the Web pages to automatically transform them into hierarchi...
The emergence of personalized homepage services, e.g. personalized Google Homepage and Microsoft Windows Live, has enabled Web users to select Web contents of interest and to aggr...
Jie Han, Dingyi Han, Chenxi Lin, Hua-Jun Zeng, Zhe...
In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engin...
Alexandros Ntoulas, Marc Najork, Mark Manasse, Den...
Dividing web pages into fragments has been shown to provide significant benefits for both content generation and caching. In order for a web site to use fragment-based content gen...
Lakshmish Ramaswamy, Arun Iyengar, Ling Liu, Fred ...