"Accelerating Collaborative Filtering for Implicit Feedback Datasets using GPUs", (submitted) 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, IEEE, November 2015.
"Acceleration of GPU-based Krylov solvers via Data Transfer Reduction", International Journal of High Performance Computing Applications, 2015.
"Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures, and Accuracy", ACM Transactions on Parallel Computing, vol. 1, issue 2, no. 10, pp. 10:1-10:28, February 2015.
"Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs", International Supercomputing Conference (ISC 2015), Frankfurt, Germany, July 2015.
"Batched Matrix Computations on Hardware Accelerators", EuroMPI/Asia 2015 Workshop, Bordeaux, France, September 2015.
"Batched matrix computations on hardware accelerators based on GPUs", International Journal of High Performance Computing Applications, February 2015.
"Cholesky Across Accelerators", 17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), Elizabeth, NJ, IEEE, August 2015.
"Composing Resilience Techniques: ABFT, Periodic, and Incremental Checkpointing", International Journal of Networking and Computing, vol. 5, no. 1, pp. 2-15, January 2015.
"Computing Low-rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and its Application to Solving a Hierarchically Semiseparable Linear System of Equations", Scientific Programming, 2015.
" A Data Flow Divide and Conquer Algorithm for Multicore Architecture", 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.
"Dense Symmetric Indefinite Factorization on GPU acclerated architectures", International Conference on Parallel Processing and Applied Mathematics (PPAM2015), Krakow, Poland, 09/2015.
"On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors", ISC High Performance 2015, Frankfurt, Germany, July 2015.
"Design for a Soft Error Resilient Dynamic Task-based Runtime", 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.
"Energy efficiency and performance frontiers for sparse computations on GPU supercomputers", Sixth International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM '15), San Francisco, CA, ACM, February 2015.
"Experiences in autotuning matrix multiplication for energy minimization on GPUs", Concurrency in Computation: Practice and Experience, May 2015.
"Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations", ISC High Performance, Frankfurt, Germany, Springer, July 2015.
"Hierarchical DAG scheduling for Hybrid Distributed Systems", 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.
"HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi", Scientific Programming, vol. 23, issue 1, January 2015.
"Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs", (submitted) IEEE Transactions on Parallel and Distributed Systems, November 2015.
"Iterative Sparse Triangular Solves for Preconditioning", EuroPar 2015, Vienna, Austria, August 2015.
"Mixed-Precision Cholesky QR Factorization and its Case Studies on Multicore CPU with Multiple GPUs", SIAM Journal on Scientific Computing, vol. 37, no. 3, pp. C203-C330, May 2015.
"Native Factorizations on Embedded GPUs", 19th IEEE High Performance Extreme Computing Conference (HPEC 2015), Waltham, MA, IEEE, September 2015.
"Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures", The Spring Simulation Multi-Conference 2015 (SpringSim'15), Best Paper Award, Alexandria, VA, April 2015.
"Performance Analysis and Optimization of Two-Sided Factorization Algorithms for Heterogeneous Platform", International Conference on Computational Science (ICCS 2015), Reykjavík, Iceland, June 2015.
"Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof", Innovative Computing Laboratory Technical Report, no. ICL-UT-15-01, April 2015.
"Scheduling for fault-tolerance: an introduction", Innovative Computing Laboratory Technical Report, no. ICL-UT-15-02: University of Tennessee, January 2015.
"Towards Batched Linear Solvers on Accelerated Hardware Platforms", 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8) co-located with PPOPP 2015, San Francisco, CA, ACM, February 2015.
"Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem", VECPAR 2014, Eugene, OR, June 2014.
"Accelerating Numerical Dense Linear Algebra Calculations with GPUs", Numerical Computations with GPUs: Springer International Publishing, pp. 3-28, 2014.
"Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product", University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.
"Access-averse Framework for Computing Low-rank Matrix Approximations", First International Workshop on High Performance Big Graph Data Management, Analysis, and Mining, Washington, DC, October 2014.
"Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting", Concurrency and Computation: Practice and Experience, vol. 26, issue 7, pp. 1408-1431, May 2014.
"Analyzing PAPI Performance on Virtual Machines", VMWare Technical Journal, vol. Winter 2013, January 2014.
"Assembly Operations for Multicore Architectures using Task-Based Runtime Systems", Euro-Par 2014, Porto, Portugal, Springer International Publishing, August 2014.
"Assessing the Impact of ABFT and Checkpoint Composite Strategies", 16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
"clMAGMA: High Performance Dense Linear Algebra with OpenCL ", International Workshop on OpenCL, Bristol University, England, May 2014.
"Communication-Avoiding Symmetric-Indefinite Factorization", SIAM Journal on Matrix Analysis and Application, vol. 35, issue 4, pp. 1364-1406, July 2014.
"Computing Least Squares Condition Numbers on Hybrid Multicore/GPU Systems", International Interdisciplinary Conference on Applied Mathematics, Modeling and Computational Science (AMMCS), Waterloo, Ontario, CA, August 2014.
"Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES", 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, New Orleans, LA, November 2014.
"Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime", Workshop on Large-Scale Parallel Processing, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
"Design for a Soft Error Resilient Dynamic Task-based Runtime", ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, November 2014.
"Designing LU-QR Hybrid Solvers for Performance and Stability", IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
"Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster", The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 14), New Orleans, LA, IEEE, November 2014.
"Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs", Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, May 2014.
"Efficient checkpoint/verification patterns for silent error detection", Innovative Computing Laboratory Technical Report, no. ICL-UT-14-03: University of Tennessee, May 2014.
"An Efficient Distributed Randomized Algorithm for Solving Large Dense Symmetric Indefinite Linear Systems", Parallel Computing, vol. 40, issue 7, pp. 213-223, July 2014.
"A Fast Batched Cholesky Factorization on a GPU", International Conference on Parallel Processing (ICPP-2014), Minneapolis, MN, September 2014.
"Heterogeneous Acceleration for Linear Algebra in Mulit-Coprocessor Environments", VECPAR 2014, Eugene, OR, June 2014.
"Hybrid Multi-Elimination ILU Preconditioners on GPUs", International Heterogeneity in Computing Workshop (HCW), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
"Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-σ formats on NVIDIA GPUs", University of Tennessee Computer Science Technical Report, no. UT-EECS-14-727: University of Tennessee, April 2014.