Efficient Parallelization of Batch Pattern Training Algorithm on Many-core and Cluster Architectures

TitleEfficient Parallelization of Batch Pattern Training Algorithm on Many-core and Cluster Architectures
Publication TypeConference Paper
Year of Publication2013
AuthorsTurchenko, V., G. Bosilca, A. Bouteiller, and J. Dongarra
Conference Name7th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems
Date Published09-2013
Conference LocationBerlin, Germany
Keywordsmany-core system, parallel batch pattern training, parallelization efficiency, recirculation neural network

Abstract—The experimental research of the parallel batch pattern back propagation training algorithm on the example of recirculation neural network on many-core high performance computing systems is presented in this paper. The choice of recirculation neural network among the multilayer perceptron, recurrent and radial basis neural networks is proved. The model of a recirculation neural network and usual sequential batch pattern algorithm of its training are theoretically described. An algorithmic description of the parallel version of the batch pattern training method is presented. The experimental research is fulfilled using the Open MPI, Mvapich and Intel MPI message passing libraries. The results obtained on many-core AMD system and Intel MIC are compared with the results obtained on a cluster system. Our results show that the parallelization efficiency is about 95% on 12 cores located inside one physical AMD processor for the considered minimum and maximum scenarios. The parallelization efficiency is about 70-75% on 48 AMD cores for the minimum and maximum scenarios. These results are higher by 15-36% (depending on the version of MPI library) in comparison with the results obtained on 48 cores of a cluster system. The parallelization efficiency obtained on Intel MIC architecture is surprisingly low, asking for deeper analysis.