In this paper we present and analyze an artificial neural network hardware engine, its architecture and implementation. The engine was designed to solve performance problems of the serial software implementations. It is based on a hierarchical parallel and parameterized architecture. Taking into account verification results, we conclude that this engine improves the computational performance, producing speedups from 52.3 to 204.5 and its architectural parameterization provides more flexibility.