Sciweavers

368 search results - page 62 / 74
» Template-Based Information Mining from HTML Documents
Sort
View
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
ICDM
2003
IEEE
119views Data Mining» more  ICDM 2003»
14 years 19 days ago
A Dynamic Adaptive Self-Organising Hybrid Model for Text Clustering
Clustering by document concepts is a powerful way of retrieving information from a large number of documents. This task in general does not make any assumption on the data distrib...
Chihli Hung, Stefan Wermter
KDD
1999
ACM
99views Data Mining» more  KDD 1999»
13 years 11 months ago
On the Merits of Building Categorization Systems by Supervised Clustering
This paper investigates the use of supervised clustering in order to create sets of categories for classi cation of documents. We use information from a pre-existing taxonomy in o...
Charu C. Aggarwal, Stephen C. Gates, Philip S. Yu
KDD
2010
ACM
235views Data Mining» more  KDD 2010»
13 years 11 months ago
The topic-perspective model for social tagging systems
In this paper, we propose a new probabilistic generative model, called Topic-Perspective Model, for simulating the generation process of social annotations. Different from other g...
Caimei Lu, Xiaohua Hu, Xin Chen, Jung-ran Park, Ti...
IADIS
2004
13 years 8 months ago
Relevance feedback using semantic association between indexing terms in large free text corpuses
Relevance feedback has been considered as a means of incorporating learning into information retrieval systems for quite sometime now. This paper discusses the research results of...
Shahzad Khan, Kenan Azam