An On-Line Document Clustering Method Based on Forgetting Factors

15 years 11 months ago

Download www.db.itc.nagoya-u.ac.jp

Abstract. With the rapid development of on-line information services, information technologies for on-line information processing have been receiving much attention recently. Clustering plays important roles in various on-line applications such as extraction of useful information from news feeding services and selection of relevant documents from the incoming scientiﬁc articles in digital libraries. In on-line environments, users generally have interests on newer documents than older ones and have no interests on obsolete old documents. Based on this observation, we propose an on-line document clustering method F2ICM (Forgetting-Factor-based Incremental Clustering Method) that incorporates the notion of a forgetting factor to calculate document similarities. The idea is that every document gradually losses its weight (or memory) as time passes according to this factor. Since F2ICM generates clusters using a document similarity measure based on the forgetting factor, newer documents h...

Yoshiharu Ishikawa, Yibing Chen, Hiroyuki Kitagawa

Real-time Traffic