To accelerate the learning of reinforcement learning, many types of function approximation are used to represent state value. However function approximation reduces the accuracy of state value, and brings difficulty in the convergence. To solve the problems of tradeoff between the generalization and accuracy in reinforcement learning, we represent state-action value by two CMAC networks with different generalization parameters. The accuracy CMAC network can represent values exactly, which achieves precise control in the states around target area. And the generalization CMAC network can extend experiences to unknown area, and guide the learning of accuracy CMAC network. The algorithm proposed in this paper can effectively avoid the dilemma of achieving tradeoff between generalization and accuracy. Simulation results for the control of double inverted pendulum are presented to show effectiveness of the proposed algorithm.