Publications

Export 122 results:
Filters: Author is Azzam Haidar  [Clear All Filters]
2016
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Cholesky Factorization on Batches of Matrices with Fixed and Variable Sizes , San Jose, CA, GPU Technology Conference (GTC16), Poster, April 2016.  (480.51 KB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures,” The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016), IPDPS 2016, Chicago, IL, IEEE, May 2016.  (708.62 KB)
Newburn, C. J., G. Bansal, M. Wood, L. Crivelli, J. Planas, A. Duran, P. Souza, L. Borges, P. Luszczek, S. Tomov, et al., Heterogeneous Streaming,” The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016.  (2.73 MB)
Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, J. Falcou, and J. Dongarra, High-performance Matrix-matrix Multiplications of Very Small Matrices,” 22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16), Grenoble, France, Springer International Publishing, August 2016.
Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., High-Performance Tensor Contractions for GPUs,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-738: University of Tennessee, January 2016.  (2.36 MB)
Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., High-Performance Tensor Contractions for GPUs,” International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.  (2.36 MB)
Haidar, A., S. Tomov, K. Arturov, M. Guney, S. Story, and J. Dongarra, LU, QR, and Cholesky Factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi,” IEEE High Performance Extreme Computing Conference (HPEC'16), Waltham, MA, IEEE, September 2016.  (943.23 KB)
Haidar, A., B. Brock, S. Tomov, M. Guidry, J. Jay Billings, D. Shyles, and J. Dongarra, Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations,” 2016 IEEE High Performance Extreme Computing Conference (HPEC ‘16), Waltham, MA, IEEE, September 2016.  (480.29 KB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance, Design, and Autotuning of Batched GEMM for GPUs,” High Performance Computing: 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, no. 9697: Springer International Publishing, pp. 21–38, 2016. DOI: 10.1007/978-3-319-41321-1_2  (1.98 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance, Design, and Autotuning of Batched GEMM for GPUs,” The International Supercomputing Conference (ISC High Performance 2016), Frankfurt, Germany, June 2016.  (1.27 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance, Design, and Autotuning of Batched GEMM for GPUs,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-739: University of Tennessee, February 2016.  (1.27 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs,” International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.  (626.21 KB)
Valero-Lara, P., J. Dongarra, A. Haidar, S. D. Relton, S. Tomov, and M. Zounon, A Standard for Batched BLAS Routines , Paris, France, 17th SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP16), April 2016.  (1.93 MB)
Lopez, M. G., V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, Towards Achieving Performance Portability Using Directives for Accelerators,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016.  (567.02 KB)
2015
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Batched Matrix Computations on Hardware Accelerators,” EuroMPI/Asia 2015 Workshop, Bordeaux, France, September 2015.  (589.05 KB)
Haidar, A., T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, Batched matrix computations on hardware accelerators based on GPUs,” International Journal of High Performance Computing Applications, February 2015. DOI: 10.1177/1094342014567546  (2.16 MB)
Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, Batched Matrix Computations on Hardware Accelerators Based on GPUs,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.  (9.36 MB)
YarKhan, A., A. Haidar, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, Cholesky Across Accelerators,” 17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), Elizabeth, NJ, IEEE, August 2015.
Gates, M., S. Tomov, and A. Haidar, Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.  (4.7 MB)
Haidar, A., J. Kurzak, G. Pichon, and M. Faverge, A Data Flow Divide and Conquer Algorithm for Multicore Architecture,” 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.  (535.44 KB)
Guidry, M., and A. Haidar, On the Design, Autotuning, and Optimization of GPU Kernels for Kinetic Network Simulations Using Fast Explicit Integration and GPU Batched Computation , Oak Ridge, TN, Joint Institute for Computational Sciences Seminar Series, Presentation, September 2015.  (17.25 MB)
Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors,” ISC High Performance 2015, Frankfurt, Germany, July 2015.  (1.49 MB)
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Efficient Eigensolver Algorithms on Accelerator Based Architectures,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.  (6.98 MB)
Solcà, R., A. Kozhevnikov, A. Haidar, S. Tomov, T. C. Schulthess, and J. Dongarra, Efficient Implementation Of Quantum Materials Simulations On Distributed CPU-GPU Systems,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.  (1.09 MB)
Haidar, A., A. YarKhan, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, Flexible Linear Algebra Development and Scheduling with Cholesky Factorization,” 17th IEEE International Conference on High Performance Computing and Communications, Newark, NJ, August 2015.  (494.31 KB)
Haidar, A., T. Dong, S. Tomov, P. Luszczek, and J. Dongarra, Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations,” ISC High Performance, Frankfurt, Germany, Springer, July 2015.  (778.26 KB)
Haidar, A., J. Dongarra, K. Kabir, M. Gates, P. Luszczek, S. Tomov, and Y. Jia, HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” Scientific Programming, vol. 23, issue 1, January 2015. DOI: 10.3233/SPR-140404  (553.94 KB)
Haidar, A., S. Tomov, P. Luszczek, and J. Dongarra, MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing,” 2015 IEEE High Performance Extreme Computing Conference (HPEC ’15), (Best Paper Award), Waltham, MA, IEEE, September 2015.  (678.86 KB)
Anzt, H., J. Dongarra, M. Gates, A. Haidar, K. Kabir, P. Luszczek, S. Tomov, and I. Yamazaki, MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi , Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.  (2.03 MB)
Haidar, A., T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, Optimization for Performance and Energy for Batched Matrix Computations on GPUs,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8), San Francisco, CA, ACM, February 2015. DOI: 10.1145/2716282.2716288  (699.5 KB)
Abalenkovs, M., A. Abdelfattah, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, and A. YarKhan, Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems,” Supercomputing Frontiers and Innovations, vol. 2, no. 4, October 2015. DOI: 10.14529/jsfi1504  (3.68 MB)
Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures,” The Spring Simulation Multi-Conference 2015 (SpringSim'15), Best Paper Award, Alexandria, VA, April 2015.  (608.44 KB)
Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, Performance Analysis and Optimization of Two-Sided Factorization Algorithms for Heterogeneous Platform,” International Conference on Computational Science (ICCS 2015), Reykjavík, Iceland, June 2015.  (1.12 MB)
Baboulin, M., V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, and S. Tomov, Towards a High-Performance Tensor Algebra Package for Accelerators , Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC15), September 2015.  (1.76 MB)
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Towards Batched Linear Solvers on Accelerated Hardware Platforms,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8) co-located with PPOPP 2015, San Francisco, CA, ACM, February 2015.  (403.74 KB)
Haidar, A., Y. Jia, P. Luszczek, S. Tomov, A. YarKhan, and J. Dongarra, Weighted Dynamic Scheduling with Many Parallelism Grains for Offloading of Numerical Workloads to Multiple Varied Accelerators,” Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA'15), vol. No. 5, Austin, TX, ACM, November 2015.  (347.6 KB)
2014
Gates, M., A. Haidar, and J. Dongarra, Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem,” VECPAR 2014, Eugene, OR, June 2014.  (199.44 KB)
Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, Accelerating Numerical Dense Linear Algebra Calculations with GPUs,” Numerical Computations with GPUs: Springer International Publishing, pp. 3-28, 2014. DOI: 10.1007/978-3-319-06548-9_1  (1.06 MB)
Dong, T., A. Haidar, S. Tomov, and J. Dongarra, A Fast Batched Cholesky Factorization on a GPU,” International Conference on Parallel Processing (ICPP-2014), Minneapolis, MN, September 2014.  (1.37 MB)
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Heterogeneous Acceleration for Linear Algebra in Mulit-Coprocessor Environments,” VECPAR 2014, Eugene, OR, June 2014.  (276.52 KB)
Dong, T., A. Haidar, P. Luszczek, J. Harris, S. Tomov, and J. Dongarra, LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU,” 16th IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.  (684.73 KB)
Dongarra, J., A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and A. YarKhan, Model-Driven One-Sided Factorizations on Multicore, Accelerated Systems,” Supercomputing Frontiers and Innovations, vol. 1, issue 1, 2014. DOI: http://dx.doi.org/10.14529/jsfi1401  (1.86 MB)
Haidar, A., P. Luszczek, and J. Dongarra, New Algorithm for Computing Eigenvectors of the Symmetric Eigenvalue Problem,” Workshop on Parallel and Distributed Scientific and Engineering Computing, IPDPS 2014 (Best Paper), Phoenix, AZ, IEEE, May 2014. DOI: 10.1109/IPDPSW.2014.130  (2.33 MB)
Haidar, A., R. Solcà, M. Gates, S. Tomov, T. C. Schulthess, and J. Dongarra, A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,” International Journal of High Performance Computing Applications, vol. 28, issue 2, pp. 196-209, May 2014. DOI: 10.1177/1094342013502097  (1.74 MB)
Haidar, A., C. Cao, I. Yamazaki, J. Dongarra, M. Gates, P. Luszczek, and S. Tomov, Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014. DOI: 10.1109/ScalA.2014.8  (407.5 KB)
Haidar, A., C. Cao, J. Dongarra, P. Luszczek, and S. Tomov, Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (1.51 MB)
2013
Haidar, A., P. Luszczek, J. Kurzak, and J. Dongarra, An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,” University of Tennessee Computer Science Technical Report (also LAWN 283), no. ut-eecs-13-720: University of Tennessee, October 2013.  (1.23 MB)
Haidar, A., P. Luszczek, J. Kurzak, and J. Dongarra, An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,” Supercomputing 2013, Denver, CO, November 2013.
Haidar, A., S. Tomov, J. Dongarra, R. Solcà, and T. C. Schulthess, Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations,” International Supercomputing Conference (ISC), Lecture Notes in Computer Science, vol. 7905, Leipzig, Germany, Springer Berlin Heidelberg, pp. 67-80, June 2013. DOI: 10.1007/978-3-642-38750-0_6  (2.14 MB)
Dongarra, J., M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov, Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” PPAM 2013, Warsaw, Poland, September 2013.  (284.97 KB)

Pages