We present a large-scale analysis of the content of weblogs dating back to the release of the Blogger program in 1999. Over one million blogs were analyzed from their conception through June 2006. These data was submitted to the Text Analysis: Word Counts program [12], which conducted a word-count analysis using Linguistic Inquiry and Word Counts (LIWC) dictionaries [20] to provide and analyze a representative sample of blogger word usage. Covariation among LIWC dictionaries suggests that blogs vary along five psychologically relevant linguistic dimensions: Melancholy, Socialness, Ranting, Metaphysicality, and Work-Relatedness. These variables and others were subjected to a cluster analysis in an attempt to extract natural usage groups to inform design of blogging systems, the results of which were mixed. AUTHOR KEYWORDS Blogs, Personas, Cluster Analysis, PCA, Unobtrusive, Word usage, LIWC, User Modeling. ACM CLASSIFICATION KEYWORDS H5.m. Information interfaces and presentation (e.g.,...
Adam D. I. Kramer, Kerry Rodden