Proactive assessment of computer-network vulnerability to unknown future attacks is an important but unsolved computer security problem where AI techniques have significant impact potential. In this paper, we investigate the use of reinforcement learning (RL) for proactive security in the context of denial-of-service (DoS) attacks in peer-to-peer (P2P) networks. Such a tool would be useful for network administrators and designers to assess and compare the vulnerability of various network configurations and security measures in order to optimize those choices for maximum security. We first discuss the various dimensions of the problem and how to formulate it as RL. Next we introduce compact parametric policy representations for both single attacker and botnets and derive a policy-gradient RL algorithm. We evaluate these algorithms under a variety of network configurations that employ recent fair-use DoS security mechanisms. The results show that our RL-based approach is able to signifi...