We focus on the task of interpreting complex natural language instructions to a robot, in which we must ground high-level commands such as microwave the cup to low-level actions such as grasping. Previous approaches that learn a lexicon during training have inadequate coverage at test time, and pure search strategies cannot handle the exponential search space. We propose a new hybrid approach that leverages the environment to induce new lexical entries at test time, even for new verbs. Our semantic parsing model jointly reasons about the text, logical forms, and environment over multi-stage instruction sequences. We introduce a new dataset and show that our approach is able to successfully ground new verbs such as distribute, mix, arrange to complex logical forms, each containing up to four predicates.