Both public and private organizations have been accumulating large volumes of electronically available text documents for the past years. However, to turn text archives into profitable sources of knowledge, they should be transformed into an integrated and efficiently queryable information system. To attain this objective, the project DIAsDEM employs data mining techniques to derive a semantic XML DTD for a text archive and to semantically annotate its documents. In this article, we briefly describe the DIAsDEM framework for semantic tagging and its application in a case study.