In ad hoc wireless LANs populated by mutually impenetrable groups of anonymous stations, honest stations are prone to "bandwidth stealing" by selfish stations. The problem is addressed at the MAC level by postulating that (i) honest stations use a carefully designed contention strategy teaching selfish stations that the best reply is to stick to the same strategy, and (ii) a verifiable winner policy be designed so that such a strategy can indeed be found and yields high bandwidth shares for honest stations. For a class of random token winner policies, a number of cycle-by-cycle reinforcement learning strategies are evaluated via simulation using an introduced notion formally akin to evolutionary stability.