Comparative Study of One-Sided Factorizations with Multiple Software Packages on Multi-Core Hardware,” 2009 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09) (to appear), 00 2009.“
Enhancing Parallelism of Tile QR Factorization for Multicore Architectures,” Submitted to Transaction on Parallel and Distributed Systems, December 2009.“
Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects,” Journal of Physics: Conference Series, vol. 180, 00 2009.“
Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects , Portland, OR, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures,” Innovative Computing Laboratory Technical Report (also LAPACK Working Note 222 and CS Tech Report UT-CS-09-645), no. ICL-UT-09-03, September 2009.“
Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,” accepted in 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, December 2009.“
Autotuning Dense Linear Algebra Libraries on GPUs , Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.
Faster, Cheaper, Better - A Hybridization Methodology to Develop Linear Algebra Software for GPUs,” LAPACK Working Note, no. 230, 00 2010.“
MaPHyS or the Development of a Parallel Algebraic Domain Decomposition Solver in the Course of the Solstice Project,” Sparse Days 2010 Meeting at CERFACS, Toulouse, France, June 2010.“
QCG-OMPI: MPI Applications on Grids,” Future Generation Computer Systems, vol. 27, no. 4, pp. 357-369, March 2010.“
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment,” 24th IEEE International Parallel and Distributed Processing Symposium (also LAWN 224), Atlanta, GA, April 2010.“
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators,” Proceedings of IPDPS 2011, no. ICL-UT-10-04, Anchorage, AK, October 2010.“
Scheduling Cholesky Factorization on Multicore Architectures with GPU Accelerators , Knoxville, TN, 2010 Symposium on Application Accelerators in High-Performance Computing (SAAHPC'10), Poster, July 2010.
Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,” 24th IEEE International Parallel and Distributed Processing Symposium (submitted), 00 2010.“
Towards a Complexity Analysis of Sparse Hybrid Linear Solvers,” PARA 2010, Reykjavik, Iceland, June 2010.“
Algebraic Schwarz Preconditioning for the Schur Complement: Application to the Time-Harmonic Maxwell Equations Discretized by a Discontinuous Galerkin Method.,” The Twentieth International Conference on Domain Decomposition Methods, La Jolla, California, February 2011.“
A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs,” in GPU Computing Gems, Jade Edition, vol. 2: Elsevier, pp. 473-484, 00 2011.“
LU Factorization for Accelerator-Based Systems,” IEEE/ACS AICCSA 2011, Sharm-El-Sheikh, Egypt, December 2011.“
Parallel algebraic domain decomposition solver for the solution of augmented systems.,” Parallel, Distributed, Grid and Cloud Computing for Engineering, Ajaccio, Corsica, France, 12-15 April, 00 2011.“
QCG-OMPI: MPI Applications on Grids.,” Future Generation Computer Systems, vol. 27, no. 4, pp. 435-369, January 2011.“
Matrices Over Runtime Systems at Exascale,” Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.“
Task-Based Programming for Seismic Imaging: Preliminary Results,” 2014 IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.“