A Novelty-based Clustering Method for On-line Documents

15 years 6 months ago

Download www.db.itc.nagoya-u.ac.jp

In this paper, we describe a document clustering method called noveltybased document clustering. This method clusters documents based on similarity and novelty. The method assigns higher weights to recent documents than old ones and generates clusters with the focus on recent topics. The similarity function is derived probabilistically, extending the conventional cosine measure of the vector space model by incorporating a document forgetting model to produce novelty-based clusters. The clustering procedure is a variation of the K-means method. An additional feature of our clustering method is an incremental update facility, which is applied when new documents are incorporated into a document repository. Performance of the clustering method is examined through experiments. Experimental results show the efficiency and effectiveness of our method. Key words: document clustering, forgetting factor, incremental processing, novelty, on-line documents 1

Sophoin Khy, Yoshiharu Ishikawa, Hiroyuki Kitagawa

Real-time Traffic

Clustering Method | Conventional Cosine Measure | Documents | Internet Technology | WWW 2008 |

claim paper

Added	16 Dec 2010
Updated	16 Dec 2010
Type	Journal
Year	2008
Where	WWW
Authors	Sophoin Khy, Yoshiharu Ishikawa, Hiroyuki Kitagawa

Sciweavers

A Novelty-based Clustering Method for On-line Documents

Clustering Method | Conventional Cosine Measure | Documents | Internet Technology | WWW 2008 |

Explore & Download

Productivity Tools

Sciweavers