People display regularities in almost everything they do. This paper proposes characteristics of an idealized algorithm that would allow an automatic extraction of web user profil based on user navigation paths. We describe a simple predictive approach with these characteristics and show its predictive accuracy on a large dataset from KDDCup web logs (a commercial web site), while using fewer computational and memory resources. To achieve this objective, our approach is articulated around three notions: (1) Applying probabilistic exploration using Markov models. (2) Avoiding the problem of Markov model highdimensionality and sparsity by clustering web documents, based on their content, before applying the Markov analysis. (3) Clustering Markov models, and extraction of their gravity centers. On the basis of these three notions, the approach makes possible the prediction of future states to be visited in k steps and navigation sessions monitoring, based on both content and traversed pat...
Younes Hafri, Chabane Djeraba, Peter L. Stanchev,