The huge amount of data available from Internet information sources has focused much attention on the sharing of distributed information through Peer Data Management Systems (PDMS...
Protecting users' privacy is becoming one of the rising issues for the success of future communications. The Internet in particular, with its open architecture, presents sever...
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
Scientific data are posing new challenges to data management due to the large volume, complexity and heterogeneity of the data. Meanwhile, scientific collaboration becomes increas...
Fusheng Wang, Pierre-Emmanuel Bourgue, Georg Hacke...
In China-US Million Book Digital Library, output of the digitalization process is more than one terabyte of text in OEB and PDF format. To access these data quickly and accurately,...