%0 Conference Paper
%B 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020)
%D 2020
%T Revisiting Dynamic DAG Scheduling under Memory Constraints for Shared-Memory Platforms
%A Gabriel Bathie
%A Loris Marchal
%A Yves Robert
%A Samuel Thibault
%B 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020)
%I IEEE Computer Society Press
%C New Orleans, LA
%8 2020-05
%G eng
%0 Conference Paper
%B 49th International Conference on Parallel Processing (ICPP 2020)
%D 2020
%T Robustness of the Young/Daly Formula for Stochastic Iterative Applications
%A Yishu Du
%A Loris Marchal
%A Guillaume Pallez (Aupy)
%A Yves Robert
%B 49th International Conference on Parallel Processing (ICPP 2020)
%I ACM Press
%C Edmonton, AB, Canada
%8 2020-08
%G eng
%0 Journal Article
%J Parallel Computing
%D 2016
%T Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results
%A Julien Herrmann
%A George Bosilca
%A Thomas Herault
%A Loris Marchal
%A Yves Robert
%A Jack Dongarra
%K Data partition
%K linear algebra
%K parsec
%K QR factorization
%K Redistribution
%K Stencil
%X The classical redistribution problem aims at optimally scheduling communications when reshuffling from an initial data distribution to a target data distribution. This target data distribution is usually chosen to optimize some objective for the algorithmic kernel under study (good computational balance or low communication volume or cost), and therefore to provide high efficiency for that kernel. However, the choice of a distribution minimizing the target objective is not unique. This leads to generalizing the redistribution problem as follows: find a re-mapping of data items onto processors such that the data redistribution cost is minimal, and the operation remains as efficient. This paper studies the complexity of this generalized problem. We compute optimal solutions and evaluate, through simulations, their gain over classical redistribution. We also show the NP-hardness of the problem to find the optimal data partition and processor permutation (defined by new subsets) that minimize the cost of redistribution followed by a simple computational kernel. Finally, experimental validation of the new redistribution algorithms are conducted on a multicore cluster, for both a 1D-stencil kernel and a more compute-intensive dense linear algebra routine.
%B Parallel Computing
%V 52
%P 22-41
%8 2016-02
%G eng
%R doi:10.1016/j.parco.2015.09.005