Sciweavers

PDCAT
2009
Springer

CheCUDA: A Checkpoint/Restart Tool for CUDA Applications

14 years 7 months ago
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications
Abstract—In this paper, a tool named CheCUDA is designed to checkpoint CUDA applications that use GPUs as accelerators. As existing checkpoint/restart implementations do not support checkpointing the GPU status, CheCUDA hooks a part of basic CUDA driver API calls in order to record the status changes on the main memory. At checkpointing, CheCUDA stores the status changes in a file after copying all necessary data in the video memory to the main memory and then disabling the CUDA runtime. At restarting, CheCUDA reads the file, re-initializes the CUDA runtime, and recovers the resources on GPUs so as to restart from the stored status. This paper demonstrates that a prototype implementation of CheCUDA can correctly checkpoint and restart a CUDA application written with basic APIs. This also indicates that CheCUDA can migrate a process from one PC to another even if the process uses a GPU. Accordingly, CheCUDA is useful not only to enhance the dependability of CUDA applications but als...
Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu,
Added 27 May 2010
Updated 27 May 2010
Type Conference
Year 2009
Where PDCAT
Authors Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu, Hiroaki Kobayashi
Comments (0)