In today’s data-rich networked world, people express many aspects of their lives online. It is common to segregate different aspects in different places: you might write opinionated rants about movies in your blog under a pseudonym while participating in a forum or web site for scholarly discussion of medical ethics under your real name. However, it may be possible to link these separate identities, because the movies, journal articles, or authors you mention are from a sparse relation space whose properties (e.g., many items related to by only a few users) allow reidentification. This re-identification violates people’s intentions to separate aspects of their life and can have negative consequences; it also may allow other privacy violations, such as obtaining a stronger identifier like name and address. This paper examines this general problem in a specific setting: reidentification of users from a public web movie forum in a private movie ratings dataset. We present three major...
Dan Frankowski, Dan Cosley, Shilad Sen, Loren G. T