Web page classification is important to many tasks in information retrieval and web mining. However, applying traditional textual classifiers on web data often produces unsatisfyi...
This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an application is retrieval of the top-k most similar d...
The development of natural language processing (NLP) components is resource-intensive and therefore justifies exploring ways of reducing development time and effort when building ...
Sonja E. Bosch, Laurette Pretorius, Kholisa Podile...
Systems consisting of multiple proxy servers are a popular solution to deal with performance and network resource utilization problems related to the growth of the Web numbers. Af...
Our participation in TREC 2003 aims to adapt the use of the DFR (Divergence From Randomness) models with Query Expansion (QE) to the robust track and the topic distillation task o...
Giambattista Amati, Claudio Carpineto, Giovanni Ro...