Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping