Adaptive Time Warp protocols in the literature are usually based on a pre-defined analytic model of the system, expressed as a closed form function that maps system state to control parameter. The underlying assumption is that this model itself is optimal. In this paper we present a new approach that utilizes Reinforcement Learning techniques, also known as simulation-based dynamic programming. Instead of assuming an optimal control strategy, the very goal of Reinforcement Learning is to find the optimal strategy through simulation. A value function that captures the history of system feedbacks is used, and no prior knowledge of the system is required. Our reinforcement learning techniques were implemented in a distributed VLSI simulator with the objective of finding the optimal size of a bounded time window. Our experiments using two benchmark circuits indicated that it was successful in doing so.