Ricardo: integrating R and Hadoop

14 years 5 months ago

Download www.almaden.ibm.com

Many modern enterprises are collecting data at the most detailed level possible, creating data repositories ranging from terabytes to petabytes in size. The ability to apply sophisticated statistical analysis methods to this data is becoming essential for marketplace competitiveness. This need to perform deep analysis over huge data repositories poses a signiﬁcant challenge to existing statistical software and data management systems. On the one hand, statistical software provides rich functionality for data analysis and modeling, but can handle only limited amounts of data; e.g., popular packages like R and SPSS operate entirely in main memory. On the other hand, data management systems—such as MapReduce-based systems—can scale to petabytes of data, but provide insufﬁcient analytical functionality. We report our experiences in building Ricardo, a scalable platform for deep analytics. Ricardo is part of the eXtreme Analytics Platform (XAP) project at the IBM Almaden Research C...

Sudipto Das, Yannis Sismanis, Kevin S. Beyer, Rain

Real-time Traffic

Data Management | Data Management System | Database | IBM Almaden Research Center | SIGMOD 2010 |

claim paper

Post Info
More Details (n/a)

Added	18 Jul 2010
Updated	18 Jul 2010
Type	Conference
Year	2010
Where	SIGMOD
Authors	Sudipto Das, Yannis Sismanis, Kevin S. Beyer, Rainer Gemulla, Peter J. Haas, John McPherson

Comments (0)

Sciweavers

Ricardo: integrating R and Hadoop

Data Management | Data Management System | Database | IBM Almaden Research Center | SIGMOD 2010 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers