Sciweavers

WWW
2001
ACM

On integrating catalogs

15 years 3 days ago
On integrating catalogs
We address the problem of integrating documents from different sources into a master catalog. This problem is pervasive in web marketplaces and portals. Current technology for automating this process consists of building a classifier that uses the categorization of documents in the master catalog to construct a model for predicting the category of unknown documents. Our key insight is that many of the data sources have their own categorization, and classification accuracy can be improved by factoring in the implicit information in these source categorizations. We show how a Naive Bayes classification can be enhanced to incorporate the similarity information present in source catalogs. Our analysis and empirical evaluation show substantial improvement in the accuracy of catalog integration.
Rakesh Agrawal, Ramakrishnan Srikant
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2001
Where WWW
Authors Rakesh Agrawal, Ramakrishnan Srikant
Comments (0)