Active storage clouds are an attractive platform for executing large data intensive workloads found in many fields of science. However, active storage presents new system management challenges. A large system of fault-prone machines with local persistent state can easily degenerate into a mess of unreferenced data and runaway computations. Our solution to this problem is DataLab, a software framework for running data parallel workloads on active storage clusters. DataLab provides a simple language for expressing workloads, works with legacy application codes, and achieves robustness through the use of distributed transactions. Our prototype implementation scales to 250 nodes on a large biometric image processing workload. Categories and Subject Descriptors C.4 [Performance]: Fault Tolerance; H.2.4 [Systems]: Parallel Databases General Terms Reliability, Performance Keywords Active Storage, Transactions, Cloud Computing