Information genealogy: uncovering the flow of ideas in non-hyperlinked document databases

16 years 8 months ago

Download www.benyah.net

We now have incrementally-grown databases of text documents ranging back for over a decade in areas ranging from personal email, to news-articles and conference proceedings. While accessing individual documents is easy, methods for overviewing and understanding these collections as a whole are lacking in number and in scope. In this paper, we address one such global analysis task, namely the problem of automatically uncovering how ideas spread through the collection over time. We refer to this problem as Information Genealogy. In contrast to bibliometric methods that are limited to collections with explicit citation structure, we investigate content-based methods requiring only the text and timestamps of the documents. In particular, we propose a language-modeling approach and a likelihood ratio test to detect influence between documents in a statistically wellfounded way. Furthermore, we show how this method can be used to infer citation graphs and to identify the most influential do...

Benyah Shaparenko, Thorsten Joachims

Real-time Traffic

Data Mining | Explicit Citation Structure | KDD 2007 | NIPS Conference Proceedings | Text Documents |

claim paper

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2007
Where	KDD
Authors	Benyah Shaparenko, Thorsten Joachims

Comments (0)

Sciweavers

Information genealogy: uncovering the flow of ideas in non-hyperlinked document databases

Data Mining | Explicit Citation Structure | KDD 2007 | NIPS Conference Proceedings | Text Documents |

Explore & Download

Productivity Tools

Sciweavers