Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce

15 years 5 months ago

Download www.turn.com

Large-scale data analysis has become increasingly important for many enterprises. Recently, a new distributed computing paradigm, called MapReduce, and its open source implementation Hadoop, has been widely adopted due to its impressive scalability and ﬂexibility to handle structured as well as unstructured data. In this paper, we describe our data warehouse system, called Cheetah, built on top of MapReduce. Cheetah is designed speciﬁcally for our online advertising application to allow various simpliﬁcations and custom optimizations. First, we take a fresh look at the data warehouse schema design. In particular, we deﬁne a virtual view on top of the common star or snowﬂake data warehouse This virtual view abstraction not only allows us to design a SQL-like but much more succinct query language, but also makes it easier to support many advanced query processing features. Next, we describe a stack of optimization techniques ranging from data compression and access method to m...

Songting Chen

Real-time Traffic

Data Warehouse | Data Warehouse Schema | MapReduce | PVLDB 2010 |

claim paper

Added	30 Jan 2011
Updated	30 Jan 2011
Type	Journal
Year	2010
Where	PVLDB
Authors	Songting Chen

Sciweavers

Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce

Data Warehouse | Data Warehouse Schema | MapReduce | PVLDB 2010 |

Explore & Download

Productivity Tools

Sciweavers