We derive cost formulae for three di erent parallelisation techniques for training supervised networks. These formulae are parameterised by properties of the target computer architecture. It is therefore possible to decide the best match between parallel computer and training technique. One technique, exemplar parallelism, is far superior for almost all parallel computer architectures. Formulae also take into account optimal batch learning as the overall training approach.
R. O. Rogers, David B. Skillicorn