Coping with Information Overload is a major challenge of the 21st century. Huge volumes and varieties of multilingual data must be processed to extract salient information. Previous research has addressed automatic characterization of streaming content. However, information includes both content and associated meta-data, which humans deal with as a gestalt but computer systems often treat separately. Random attributed graphs provide an effective means to characterize and draw inferences from large volumes of language content plus associated meta-data. This paper describes these methods and their utility, with experimental proof-of-concept on the Switchboard and Enron corpora.
Allen L. Gorin, Carey E. Priebe, John Grothendieck