A Generic Architecture for the Conversion of Document Collections into Semantically Annotated Digital Archives