Publications

Export 11 results:
Filters: Author is Mawussi Zounon  [Clear All Filters]
Conference Paper
Dongarra, J., S. Hammarling, N. Higham, S. Relton, P. Valero-Lara, and M. Zounon, The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems,” International Conference on Computational Science (ICCS 2017), Zürich, Switzerland, Elsevier, June 2017.
Dongarra, J., S. Hammarling, N. Higham, S. Relton, and M. Zounon, Optimized Batched Linear Algebra for Modern Architectures,” Euro-Par 2017, Santiago de Compostela, Spain, Springer, August 2017.
Haidar, A., S. Tomov, A. Abdelfattah, M. Zounon, and J. Dongarra, Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption,” ISC High Performance (ISC'18), Best Poster, Frankfurt, Germany, June 2018.  (3.01 MB)
Conference Proceedings
Haidar, A., A. Abdelfattah, M. Zounon, P. Wu, S. Pranesh, S. Tomov, and J. Dongarra, The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques,” International Conference on Computational Science (ICCS 2018), vol. 10860, Wuxi, China, Springer, pp. 586–600, June 2018.
Journal Article
Haidar, A., A. Abdelfattah, M. Zounon, S. Tomov, and J. Dongarra, A Guide for Achieving High Performance with Very Small Matrices on GPUs: A Case Study of Batched LU and Cholesky Factorizations,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 5, pp. 973–984, May 2018.  (832.92 KB)
Yamazaki, I., J. Kurzak, P. Wu, M. Zounon, and J. Dongarra, Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architectures,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 8, pp. 1879–1892, August 2018.  (2.88 MB)
Poster
Valero-Lara, P., J. Dongarra, A. Haidar, S. D. Relton, S. Tomov, and M. Zounon, A Standard for Batched BLAS Routines , Paris, France, 17th SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP16), April 2016.  (1.93 MB)
Haidar, A., S. Tomov, A. Abdelfattah, M. Zounon, and J. Dongarra, Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption , Frankfurt, Germany, ISC High Performance (ISC18), Best Poster Award, June 2018.  (3.01 MB)
Tech Report
Abalenkovs, M., N. Bagherpour, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Relton, J. Sistek, D. Stevens, et al., PLASMA 17 Performance Report,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-11: University of Tennessee, June 2017.  (7.57 MB)
Abalenkovs, M., N. Bagherpour, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Relton, J. Sistek, D. Stevens, et al., PLASMA 17.1 Functionality Report,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-10: University of Tennessee, June 2017.  (1.8 MB)