The algorithmic generation of textual descriptions of image sequences requires conceptual knowledge. In our case, a stationary camera recorded image sequences of road traffic scenes. The necessary conceptual knowledge has been provided in the form of a so-called Situation Graph Tree (SGT). Other endeavors such as the generation of a synthetic image sequence from a textual description or the transformation of machine vision results for use in a driver assistance system could profit from the exploitation of the same conceptual knowledge, but more in a planning (pre-scriptive) rather than a de-scriptive context. A recently discussed planning formalism, Hierarchical Task Networks (HTNs), exhibits a number of formal similarities with SGTs. These suggest to investigate whether and to which extent SGTs may be re-cast as HTNs in order to re-use the conceptual knowledge about the behavior of vehicles in road traffic scenes for planning purposes.