Web usage mining plays an important role in the personalization of Web services, adaptation of Web sites, and the improvement of Web server performance. It applies data mining techniques to discover Web access patterns from Web usage data. In order to discover access patterns, Web usage data should be reconstructed into sessions with or without user identification. However, not all Web server logs contain complete information for constructing user sessions. One approach for solving such a problem is to use time-oriented heuristics to reconstruct user sessions. This paper describes improved statistical-based timeoriented heuristics for the reconstruction of user sessions from a server log. Comparative analysis are carried out using two similarity measures. The performance results of the proposed improved heuristics are promising and in some cases show reasonable improvements.
Jie Zhang, Ali A. Ghorbani