We identify two fundamental points of utilizing CBR for an adaptive agent that tries to learn on the basis of trial and error without a model of its environment. The first link concerns the utmost efficient exploitation of experience the agent has collected by interacting within its environment, while the second relates to the acquisition and representation of a suitable behavior policy. Combining both connections, we develop a state-action value function approximation mechanism that relies on case-based, approximate transition graphs and forms the basis on which the agent improves its behavior. We evaluate our approach empirically in the context of dynamic control tasks.