This paper describes a high performance sampling architecture for inference of latent topic models on a cluster of workstations. Our system is faster than previous work by over an...
The location-aware keyword query returns ranked objects that are near a query location and that have textual descriptions that match query keywords. This query occurs inherently i...
With the advances in processing, memory, and connectivity technologies, applications are becoming increasingly distributed, data-centric, and web based. These applications demand ...
We introduce the proximity rank join problem, where we are given a set of relations whose tuples are equipped with a score and a real-valued feature vector. Given a target feature...
An increasing amount of personal data is automatically gathered and stored on servers by administrations, hospitals, insurance companies, etc. Citizen themselves often count on in...
Tristan Allard, Nicolas Anciaux, Luc Bouganim, Yan...
We present FlashStore, a high throughput persistent keyvalue store, that uses flash memory as a non-volatile cache between RAM and hard disk. FlashStore is designed to store the ...
Large-scale data analysis has become increasingly important for many enterprises. Recently, a new distributed computing paradigm, called MapReduce, and its open source implementat...
Ranking is a fundamental operation in data analysis and decision support, and plays an even more crucial role if the dataset being explored exhibits uncertainty. This has led to m...
This is a demonstration of data coordination in a peer data management system through the employment of distributed triggers. The latter express in a declarative manner individual...
Entity Resolution (ER) is the process of identifying groups of records that refer to the same real-world entity. Various measures (e.g., pairwise F1, cluster F1) have been used fo...
David Menestrina, Steven Whang, Hector Garcia-Moli...