In a search for an algorithm to compute atan(x) which has both low latency and few floating point instructions, an interesting variant of familiar trigonometry formulas was discovered that allow the start of argument reduction to commence before any references to tables stored in memory are needed. Low latency makes the method suitable for a closed subroutine, and few floating point operations make the method advantageous for a software-pipelined implementation.
Peter W. Markstein