In this paper, attention is paid to the automatic generation of XML-based descriptions containing information about the high-level structure of binary multimedia resources. These structural metadata can then be transformed in order to reflect a desired adaptation of a multimedia resource, and can subsequently be used to create a tailored version of the resource in question. Based on this concept, two technologies are presented: MPEG-21 BSDL and a modified version of XFlavor being able to create BSDL compatible output. Their usage is elaborated in more detail with respect to the valid exploitation of multi-layered temporal scalability in H.264/MPEG-4 AVC’s base specification, and in particular with a focus on a combined usage of the sub-sequence coding technique and Supplemental Enhancement Information (SEI) messages. Some performance measurements in terms of file sizes and computational times are presented as well.