Speakers in all cultures and ages use gestures as they speak (i.e., cospeech gestures). There have been different views in the literature with regard to whether and how a specific type of gestures speakers use, i.e., iconic gestures, interacts with language processing. Here I review evidence showing that iconic gestures are not produced merely from the spatial and/or motoric imagery but from an in interface representation of imagistic and linguistic representation during online speaking Similarly, for comprehension, neuroimaging and behavioral studies indicate that speech and gesture influences semantic processing of each other during online comprehension. These findings show overall that processing of information in both modalities interacts during both comprehension and production of language arguing against models that propose independent processing of each modality. They also have implications for AI models that aim to simulate cospeech gesture use in conversational agents.