Using Hardware Multithreading to Overcome Broadcast/Reduction Latency in an Associative SIMD Processor