Hive - a petabyte scale data warehouse using Hadoop

15 years 8 months ago

Download infolab.stanford.edu

— The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop [1] is a popular open-source map-reduce implementation which is being used in companies like Yahoo, Facebook etc. to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse. In this paper, we present Hive, an open-source data warehousing solution built on top of Hadoop. Hive supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into mapreduce jobs that are executed using Hadoop. In addition, HiveQL enables users to plug in custom map-reduce scripts into queries. The language includes a type system with support for tables containing primitive types, collections like arrays and maps, and nested compositions of ...

Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zhen

Real-time Traffic

Data Sets | Database | Hive | ICDE 2010 | Warehousing Solution |

claim paper

» Clydesdale structured data processing on MapReduce

» Ricardo integrating R and Hadoop

» Data warehousing and analytics infrastructure at facebook

» Apache hadoop goes realtime at Facebook

Post Info
More Details (n/a)

Added	17 May 2010
Updated	17 May 2010
Type	Conference
Year	2010
Where	ICDE
Authors	Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang 0002, Suresh Anthony, Hao Liu, Raghotham Murthy

Comments (0)

Sciweavers

Hive - a petabyte scale data warehouse using Hadoop

Data Sets | Database | Hive | ICDE 2010 | Warehousing Solution |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers