Identifying the most influential documents in a corpus is an important problem in many fields, from information science and historiography to text summarization and news aggregation. Unfortunately, traditional bibliometrics such as citations are often not available. We propose using changes in the thematic content of documents over time to measure the importance of individual documents within the collection. We describe a dynamic topic model for both quantifying and qualifying the impact of these documents. We validate the model by analyzing three large corpora of scientific articles. Our measurement of a document's impact correlates significantly with its number of citations.
Sean Gerrish, David M. Blei