"Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem", VECPAR 2014, Eugene, OR, 06/2014.
"Accelerating Numerical Dense Linear Algebra Calculations with GPUs", Numerical Computations with GPUs: Springer International Publishing, pp. 3-28, 2014.
"Access-averse Framework for Computing Low-rank Matrix Approximations", First International Workshop on High Performance Big Graph Data Management, Analysis, and Mining, Washington, DC, 10/2014.
"Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy", ACM Transactions on Parallel Computing (to appear), 2014.
"Analyzing PAPI Performance on Virtual Machines", VMWare Technical Journal, vol. Winter 2013, 01/2014.
"Assessing the Impact of ABFT and Checkpoint Composite Strategies", 16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, 05/2014.
"Communication-Avoiding Symmetric-Indefinite Factorization", SIAM Journal on Matrix Analysis and Application, vol. 35, issue 4, pp. 1364-1406, 07/2014.
"Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES", 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, New Orleans, LA, 11/2014.
"Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime", Workshop on Large-Scale Parallel Processing, IPDPS 2014, Phoenix, AZ, IEEE, 05/2014.
"Design for a Soft Error Resilient Dynamic Task-based Runtime", ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, 11/2014.
"Designing LU-QR Hybrid Solvers for Performance and Stability", IPDPS 2014, Phoenix, AZ, IEEE, 05/2014.
"Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster", he International Conference for High Performance Computing, Networking, Storage and Analysis (SC 14), New Orleans, LA, IEEE, 11/2014.
"Efficient checkpoint/verification patterns for silent error detection", Innovative Computing Laboratory Technical Report, no. ICL-UT-14-03: University of Tennessee, 05/2014.
"An Efficient Distributed Randomized Algorithm for Solving Large Dense Symmetric Indefinite Linear Systems", Parallel Computing, vol. 40, issue 7, pp. 213-223, 07/2014.
"A Fast Batched Cholesky Factorization on a GPU", International Conference on Parallel Processing (ICPP-2014), Minneapolis, MN, 09/2014.
"Heterogeneous Acceleration for Linear Algebra in Mulit-Coprocessor Environments", VECPAR 2014, Eugene, OR, 06/2014.
"Hybrid Multi-Elimination ILU Preconditioners on GPUs", International Heterogeneity in Computing Workshop (HCW), IPDPS 2014, Phoenix, AZ, IEEE, 05/2014.
"Improving the Energy Efficiency of Sparse Linear System Solvers on Multicore and Manycore Systems", Philosophical Transactions of the Royal Society A -- Mathematical, Physical and Engineering Sciences, vol. 372, issue 2018, 07/2014.
"Improving the performance of CA-GMRES on multicores with multiple GPUs", IPDPS 2014 (Best Paper), Phoenix, AZ, IEEE, 05/2014.
"Looking Back at Dense Linear Algebra Software", Journal of Parallel and Distributed Computing, vol. 74, issue 7, pp. 2548–2560, 07/2014.
"MIAMI: A Framework for Application Performance Diagnosis ", IPASS-2014, Monterey, CA, IEEE, 03/2014.
"Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs", VECPAR 2014 (Best Paper), Eugene, OR, 06/2014.
"New Algorithm for Computing Eigenvectors of the Symmetric Eigenvalue Problem", Workshop on Parallel and Distributed Scientific and Engineering Computing, IPDPS 2014, Phoenix, AZ, IEEE, 05/2014.
"A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks", International Journal of High Performance Computing Applications, vol. 28, issue 2, pp. 196-209, 05/2014.
"Performance Analysis of the MPAS-Ocean Code using HPCToolkit and MIAMI", ICL Technical Report, no. ICL-UT-14-01: University of Tennessee, 02/2014.
"Performance and Reliability Trade-offs for the Double Checkpointing Algorithm", International Journal of Networking and Computing, vol. 4, no. 1, pp. 32-41, 2014.
"Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models", Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications, IEEE Cluster 2014, no. ICL-UT-14-04, Madrid, Spain, IEEE, 09/2014.
"PULSAR Users’ Guide, Parallel Ultra-Light Systolic Array Runtime", University of Tennessee EECS Technical Report, no. UT-EECS-14-733: University of Tennessee, 11/2014.
"Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures", VECPAR 2014, Eugene, OR, 06/2014.
"A Step towards Energy Efficient Computing: Redesigning A Hydrodynamic Application on CPU-GPU", IPDPS 2014, Phoenix, AZ, IEEE, 05/2014.
"Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes", 23rd International Heterogeneity in Computing Workshop, IPDPS 2014, Phoenix, AZ, IEEE, 05/2014.
"Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment", IPDPS 2014, Phoenix, AZ, IEEE, 05/2014.
"Unveiling the Performance-energy Trade-off in Iterative Linear System Solvers for Multithreaded Processors", Concurrency and Computation: Practice and Experience, 09/2014.
"Utilizing Dataflow-based Execution for Coupled Cluster Methods", IEEE Cluster 2014, no. ICL-UT-14-02, Madrid, Spain, IEEE, 09/2014.
"Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting", Concurrency and Computation: Practice and Experience, 09/2013.
"Analyzing PAPI Performance on Virtual Machines", ICL Technical Report, no. ICL-UT-13-02, 08/2013.
"Assessing the impact of ABFT and Checkpoint composite strategies", University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.
"BlackjackBench: Portable Hardware Characterization with Automated Results Analysis", The Computer Journal, 03/2013.
"A Block-Asynchronous Relaxation Method for Graphics Processing Units", Journal of Parallel and Distributed Computing, vol. 73, issue 12, pp. 1613–1626, 12/2013.
"clMAGMA: High Performance Dense Linear Algebra with OpenCL", University of Tennessee Technical Report (Lawn 275), no. UT-CS-13-706: University of Tennessee, 03/2013.
"On the Combination of Silent Error Detection and Checkpointing", UT-CS-13-710: University of Tennessee Computer Science Technical Report, 06/2013.
"Correlated Set Coordination in Fault Tolerant Message Logging Protocols", Concurrency and Computation: Practice and Experience, vol. 25, issue 4, pp. 572-585, 03/2013.
"CPU-GPU Hybrid Bidiagonal Reduction With Soft Error Resilience", ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Montpellier, France, 11/2013.
"Designing LU-QR hybrid solvers for performance and stability", University of Tennessee Computer Science Technical Report (also LAWN 282), no. ut-eecs-13-719: University of Tennessee, 10/2013.
"Diagnosis and Optimization of Application Prefetching Performance", Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, 06/2013.
"Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs", University of Tennessee Computer Science Technical Report, no. ut-cs-13-713, 07/2013.
"Efficient Parallelization of Batch Pattern Training Algorithm on Many-core and Cluster Architectures", 7th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, Berlin, Germany, 09/2013.
"Enabling Workflows in GridSolve: Request Sequencing and Service Trading", Journal of Supercomputing, vol. 64, issue 3, pp. 1133-1152, 06/2013.
"Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI", Concurrency and Computation: Practice and Experience, 07/2013.
"Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems", Parallel Computing, vol. 39, issue 4-5, pp. 212-232, 05/2013.