The demand for high-speed FPGA compilation tools has occurred for three reasons: first, as FPGA device capacity has grown, the computation time devoted to placement and routing has grown more dramatically than the compute power of the available computers. Second, there exists a subset of users who are willing to accept a reduction in the quality1 of result in exchange for a highspeed compilation. Third, high-speed compile has been a longstanding desire of users of FPGA-based custom computing machines, since their compile time requirements are ideally closer to those of regular computers. This paper focuses on the placement phase of the compile process, and presents an ultra-fast placement algorithm targeted to FPGAs. The algorithm is based on a combination of multiple-level, bottomup clustering and hierarchical simulated annealing. It provides superior area results over a known high-quality placement tool on a set of large benchmark circuits, when both are restricted to a short run t...