—Privacy policies serve to inform consumers about a company’s data practices, and to protect the company from legal risk due to undisclosed uses of consumer data. In addition, US and EU regulators require companies to accurately describe their practices in these policies, and some laws prescribe how companies should write these policies. Despite these aims, privacy policies are frequently criticized for being vague and uninformative. To support and improve the analysis of privacy policies, we report results from constructing an information type lexicon from manual, human annotations and an entity extractor based on part-of-speech tagging. The lexicon was constructed from 3,850 annotations obtained from crowd workers analyzing 15 privacy policies. An entity extractor was designed to extract entities from these annotations. The extractor succeeds at finding entities in 92% of annotations and the lexicon consists of 725 unique entities. Finally, we measured the terminological reuse ac...
Jaspreet Bhatia, Travis D. Breaux