In this paper, we address the task of mapping high-level instructions to sequences of commands in an external environment. Processing these instructions is challenging--they posit goals to be achieved without specifying the steps required to complete them. We describe a method that fills in missing information using an automatically derived environment model that encodes states, transitions, and commands that cause these transitions to happen. We present an efficient approximate approach for learning this environment model as part of a policygradient reinforcement learning algorithm for text interpretation. This design enables learning for mapping high-level instructions, which previous statistical methods cannot handle.1
S. R. K. Branavan, Luke S. Zettlemoyer, Regina Bar