In many data sharing settings, such as within the biological and biomedical communities, global data consistency is not always attainable: different sites' data may be dirty,...
In this paper we propose a new approach to improve electronic editions of human science corpus, providing an efficient estimation of manuscripts pages structure. In any handwriti...
Lustre is a parallel file system that presents high aggregated IO bandwidth by striping file extents across many storage devices. However, our experiments indicate excessively wid...
Weikuan Yu, Jeffrey S. Vetter, Shane Canon, Song J...
We compare two algorithms for sorting out-of-core data on a distributed-memory cluster. One algorithm, Csort, is a 3-pass oblivious algorithm. The other, Dsort, makes three passes...
We observed that for multimedia data – especially music - collaborative similarity measures perform much better than similarity measures derived from content-based sound feature...