A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs