Sciweavers

FPL
2009
Springer

A highly scalable Restricted Boltzmann Machine FPGA implementation

14 years 5 months ago
A highly scalable Restricted Boltzmann Machine FPGA implementation
Restricted Boltzmann Machines (RBMs) — the building block for newly popular Deep Belief Networks (DBNs) — are a promising new tool for machine learning practitioners. However, future research in applications of DBNs is hampered by the considerable computation that training requires. In this paper, we describe a novel architecture and FPGA implementation that accelerates the training of general RBMs in a scalable manner, with the goal of producing a system that machine learning researchers can use to investigate ever-larger networks. Our design uses a highly efficient, fully-pipelined architecture based on 16-bit arithmetic for performing RBM training on an FPGA. We show that only 16-bit arithmetic precision is necessary, and we consequently use embedded hardware multiply-and-add (MADD) units. We present performance results to show that a speedup of 25-30X can be achieved over an optimized software implementation on a high-end CPU.
Sang Kyun Kim, Lawrence C. McAfee, Peter L. McMaho
Added 24 Jul 2010
Updated 24 Jul 2010
Type Conference
Year 2009
Where FPL
Authors Sang Kyun Kim, Lawrence C. McAfee, Peter L. McMahon, Kunle Olukotun
Comments (0)