An interprocedural code optimization technique for network processors using hardware multi-threading support