As FPGA devices become larger, more coarse-grain modules coupled with large scale reconfigurable fabric become available, thus enabling new classes of applications to run efficiently, as compared to a general-purpose computer. This paper presents an architecture that benefits from the large number of DSP modules in Xilinx technology to implement massive floating point arithmetic. Our architecture computes the Phylogenetic Likelihood Function (PLF) which accounts for approximately 95% of total execution time in all state-of-the-art Maximum Likelihood (ML) based programs for reconstruction of evolutionary relationships. We validate and assess performance of our architecture against a highly optimized and parallelized software implementation of the PLF that is based on RAxML, which is considered to be one of the fastest and most accurate programs for phylogenetic inference. Both software and hardware implementations use double precision floating point arithmetic.