This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...
The Linguistic Data Consortium (LDC) seeks to provide its members with quality linguistic resources and services. In order to pursue these ideals and to remain current, LDC monito...
In languages that use diacritical characters, if these special signs are stripped-off from a word, the resulted string of characters may not exist in the language, and therefore i...
: This paper describes foundational work investigating the protection requirements of sensitive medical information, which is being stored more routinely in repository systems for ...
Nathan Lea, Stephen Hailes, Tony Austin, Dipak Kal...
We study distributed content replication networks formed voluntarily by selfish autonomous users, seeking access to information objects that originate from distant servers. Each us...
Gerasimos G. Pollatos, Orestis Telelis, Vassilis Z...