The idea of building query-oriented routing indices has changed the way of improving routing efficiency from the basis as it can learn the content distribution during the query routing process. It gradually improves routing efficiency with no excessive network overhead of the routing index construction and maintenance. However, the previously proposed mechanism is not practically effective due to the slow improvement of routing efficiency. In this paper, we propose a novel mechanism for queryoriented routing indices which quickly achieves high routing efficiency at low cost. The maintenance method employs reinforcement learning to utilize mass peer behaviors to construct and maintain routing indices. It explicitly uses the expected value of returned content number to depict the content distribution, which helps quickly approximate the real distribution. Meanwhile, the routing method is to retrieve as many contents as possible. It also helps speed up the learning process further. T...