Publications

Export 10 results:
Filters: Author is Rajib Nath  [Clear All Filters]
Book Chapter
Nath, R., S. Tomov, and J. Dongarra, Blas for GPUs,” Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.  (1.05 MB)
Conference Proceedings
Tomov, S., R. Nath, H. Ltaeif, and J. Dongarra, Dense Linear Algebra Solvers for Multicore with GPU Accelerators,” Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, Atlanta, GA, pp. 1-8, 2010.  (1 MB)
Nath, R., S. Tomov, T. Dong, and J. Dongarra, Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs,” ACM/IEEE Conference on Supercomputing (SC’11), Seattle, WA, November 2011.  (630.63 KB)
Journal Article
Nath, R., S. Tomov, and J. Dongarra, Accelerating GPU Kernels for Dense Linear Algebra,” Proc. of VECPAR'10, Berkeley, CA, June 2010.  (615.07 KB)
Tomov, S., R. Nath, and J. Dongarra, Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms through Hybrid GPU-Based Computing,” Parallel Computing, vol. 36, no. 12, pp. 645-654, 00-2010.  (1.39 MB)
Ltaeif, H., S. Tomov, R. Nath, and J. Dongarra, Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators,” IEEE Transaction on Parallel and Distributed Systems (submitted), March 2010.  (3.75 MB)
Kurzak, J., R. Nath, P. Du, and J. Dongarra, An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs,” Applied Parallel and Scientific Computing, vol. 7133, pp. 248-257, 00-2012.  (623.5 KB)
Nath, R., S. Tomov, and J. Dongarra, An Improved MAGMA GEMM for Fermi GPUs,” International Journal of High Performance Computing, vol. 24, no. 4, pp. 511-515, 00-2010.
Ltaeif, H., S. Tomov, R. Nath, P. Du, and J. Dongarra, A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators,” Proc. of VECPAR'10 (to appear), Berkeley, CA, June 2010.  (870.46 KB)
Tech Report
Nath, R., S. Tomov, and J. Dongarra, An Improved MAGMA GEMM for Fermi GPUs,” University of Tennessee Computer Science Technical Report, no. UT-CS-10-655 (also LAPACK working note 227), July 2010.  (486.71 KB)