In this paper we describe a technique that can be used to annotate source code with syntactic tags in XML format. This is achieved by modifying the parser generator bison to emit these tags for an arbitrary LALR grammar. We also discuss an immediate application of this technique, a portable modification of the gcc compiler, that allows for XML output for C, Objective C, C++ and Java programs. While our approach is based on a representation of the parse-tree and does not have the same semantic richness as other approaches, it does have the advantage of being language independent, and thus re-usable in a number of different domains.
James F. Power, Brian A. Malloy