CheCUDA: A Checkpoint/Restart Tool for CUDA Applications

15 years 2 months ago

Download www.sc.isc.tohoku.ac.jp

Abstract—In this paper, a tool named CheCUDA is designed to checkpoint CUDA applications that use GPUs as accelerators. As existing checkpoint/restart implementations do not support checkpointing the GPU status, CheCUDA hooks a part of basic CUDA driver API calls in order to record the status changes on the main memory. At checkpointing, CheCUDA stores the status changes in a ﬁle after copying all necessary data in the video memory to the main memory and then disabling the CUDA runtime. At restarting, CheCUDA reads the ﬁle, re-initializes the CUDA runtime, and recovers the resources on GPUs so as to restart from the stored status. This paper demonstrates that a prototype implementation of CheCUDA can correctly checkpoint and restart a CUDA application written with basic APIs. This also indicates that CheCUDA can migrate a process from one PC to another even if the process uses a GPU. Accordingly, CheCUDA is useful not only to enhance the dependability of CUDA applications but als...

Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu,

Real-time Traffic

CUDA | Cuda Applications | CUDA Runtime | Distributed And Parallel Computing | PDCAT 2009 |

claim paper

Post Info
More Details (n/a)

Added	27 May 2010
Updated	27 May 2010
Type	Conference
Year	2009
Where	PDCAT
Authors	Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu, Hiroaki Kobayashi

Comments (0)

Sciweavers

CheCUDA: A Checkpoint/Restart Tool for CUDA Applications

CUDA | Cuda Applications | CUDA Runtime | Distributed And Parallel Computing | PDCAT 2009 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers