We present the design, implementation, and evaluation of ArrayStore, a new storage manager for complex, parallel array processing. ArrayStore builds on prior work in the area of m...
Emad Soroush, Magdalena Balazinska, Daniel L. Wang
Today’s one-pass analytics applications tend to be data-intensive in nature and require the ability to process high volumes of data efficiently. MapReduce is a popular programm...
Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGreg...
Joins are essential for many data analysis tasks, but are not supported directly by the MapReduce paradigm. While there has been progress on equi-joins, implementation of join alg...
Modern enterprise, web, and multimedia applications are generating unstructured content at unforeseen volumes in the form of documents, texts, and media files. Such content is gen...
Krishna Kunchithapadam, Wei Zhang, Amit Ganesh, Ni...
Life sciences researchers perform scientific literature search as part of their daily activities. Many such searches are executed against PubMed, a central repository of life sci...
Julia Stoyanovich, Mayur Lodha, William Mee, Kenne...
In spite of the omnipresence of parallel (multi-core) systems, the predominant strategy to evaluate window-based stream joins is still strictly sequential, mostly just straightfor...
Scheduling data processing workflows (dataflows) on the cloud is a very complex and challenging task. It is essentially an optimization problem, very similar to query optimizati...
Herald Kllapi, Eva Sitaridi, Manolis M. Tsangaris,...
In this demonstration we present BRRL, a library for making distributed main-memory applications fault tolerant. BRRL is optimized for cloud applications with frequent points of c...
Tuan Cao, Benjamin Sowell, Marcos Antonio Vaz Sall...
We consider extending decision support facilities toward large sophisticated networks, upon which multidimensional attributes are associated with network entities, thereby forming...