An agent must acquire internal representation appropriate for its task, environment, sensors. As a learning algorithm, reinforcement learning is often utilized to acquire the relation between sensory input and action. Learning agents in the real world using visual sensors is often confronted with critical problem how to build necessary and sucient state space for the agent to execute the task. In this paper, we propose acquisition of relation between vision and action using Visual State-Action Map (VSAM). VSAM is the application of Self-Organizing Map (SOM). Input image data is mapped on the node of learned VSAM. Then VSAM outputs the appropriate action for the state. We applied VSAM to real robot. The experimental result shows that a real robot avoids the wall while moving around the environment.