Longitudinal Analytics on Web Archive Data: It's About Time!

14 years 10 months ago

Download cedric.cnam.fr

Organizations like the Internet Archive have been capturing Web contents over decades, building up huge repositories of time-versioned pages. The timestamp annotations and the sheer volume of multi-modal content constitutes a gold mine for analysts of all sorts, across diﬀerent application areas, from political analysts and marketing agencies to academic researchers and product developers. In contrast to traditional data analytics on click logs, the focus is on longitudinal studies over very long horizons. This longitudinal aspect aﬀects and concerns all data and metadata, from the content itself, to the indices and the statistical metadata maintained for it. Moreover, advanced analysts prefer to deal with semantically rich entities like people, places, organizations, and ideally relationships such as company acquisitions, instead of, say, Web pages containing such references. For example, tracking and analyzing a politician’s public appearances over a decade is much harder than...

Gerhard Weikum, Nikos Ntarmos, Marc Spaniol, Peter

Real-time Traffic

Algorithms | CIDR 2011 | Company Acquisitions | Product Developers | Time Axis |

claim paper

» Modeling of concurrent web sessions with bounded inconsistency in shared data

» Employing Inductive Databases in Concrete Applications

Post Info
More Details (n/a)

Added	25 Aug 2011
Updated	25 Aug 2011
Type	Journal
Year	2011
Where	CIDR
Authors	Gerhard Weikum, Nikos Ntarmos, Marc Spaniol, Peter Triantafillou, András A. Benczúr, Scott Kirkpatrick, Philippe Rigaux, Mark Williamson

Comments (0)

Sciweavers

Longitudinal Analytics on Web Archive Data: It's About Time!

Algorithms | CIDR 2011 | Company Acquisitions | Product Developers | Time Axis |

Explore & Download

Productivity Tools

Sciweavers