Twarql is an infrastructure translating microblog posts from Twitter as Linked Open Data in real-time. The approach employed in Twarql can be summarized as follows: (1) extract content (e.g. entity mentions, hashtags and URLs) from microposts streamed from Twitter; (2) encode content in RDF using shared and well-known vocabularies (FOAF, SIOC, MOAT, etc.); (3) enable structured querying of microposts with SPARQL; (4) enable subscription to a stream of microposts that match a given query; and (5) enable scalable real-time delivery of streaming annotated data using sparqlPuSH. In this paper we use a brand tracking scenario to demonstrate how Twarql enables flexibility in handling the information overload of those interested in collectively analyzing microblog data for sensemaking. The dataset produced is shared as Linked Data. Twarql is available as open source and can be easily deployed or extended for monitoring Twitter data in various contexts such as brand tracking, disaster relief ...
Pablo N. Mendes, Alexandre Passant, Pavan Kapanipa