We mine a large taxonomic dataset for subject classification rules. We then use these rules to perform an extensive analysis of the subject matter of the largest general purpose in...
Templates in web sites hurt search engine retrieval performance, especially in content relevance and link analysis. Current template removal methods suffer from processing speed ...
For a given set of search engines, a search engine is redundant if its searchable contents can be found from other search engines in this set. In this paper, we propose a method t...
We profile a system for search and analysis of largescale email archives. The system builds around four facets: Content-based search engine, statistical topic model, automaticall...
Although Web search engines are targeted towards helping people find new information, people regularly use them to re-find Web pages they have seen before. Researchers have noted ...