Export 227 results:
Filters: Author is Stanimire Tomov [Clear All Filters]
Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures,” PARA 2008, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim Norway, May 2008.“
Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures,” University of Tennessee Computer Science Technical Report, UT-CS-08-615 (also LAPACK Working Note 200), January 2008.“
Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU,” ACM Transactions on Mathematical Software (TOMS), vol. 43, issue 2, October 2016.“
A Standard for Batched BLAS Routines , Paris, France, 17th SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP16), April 2016.
State-of-the-Art Eigensolvers for Electronic Structure Calculations of Large Scale Nano-Systems,” Journal of Computational Physics, vol. 227, no. 15, pp. 7113-7124, January 2008.“
A Step towards Energy Efficient Computing: Redesigning A Hydrodynamic Application on CPU-GPU,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.“
Structure-aware Linear Solver for Realtime Convex Optimization for Embedded Systems,” IEEE Embedded Systems Letters, vol. 9, issue 3, pp. 61–64, May 2017.“
Tensor Contractions using Optimized Batch GEMM Routines , San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.
Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013.“
Towards a High-Performance Tensor Algebra Package for Accelerators , Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC15), September 2015.
Towards Achieving Performance Portability Using Directives for Accelerators,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016.“
Towards Batched Linear Solvers on Accelerated Hardware Platforms,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8) co-located with PPOPP 2015, San Francisco, CA, ACM, February 2015.“
Towards bulk based preconditioning for quantum dot computations,” IEEE/ACM Proceedings of HPCNano SC06 (to appear), January 2006.“
Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems,” University of Tennessee Computer Science Technical Report, UT-CS-08-632 (also LAPACK Working Note 210), January 2008.“
Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems,” Parallel Computing, vol. 36, no. 5-6, pp. 232-240, 00-2010.“
Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems,” Concurrency and Computation: Practice and Experience, October 2013.“
Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster,” The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), May 2013.“
Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.“
A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.“
The Use of Bulk States to Accelerate the Band Edge State Calculation of a Semiconductor Quantum Dot,” Journal of Computational Physics, vol. 223, pp. 774-782, 00-2007.“
The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot,” Journal of Computational Physics (submitted), January 2006.“
Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption , Frankfurt, Germany, ISC High Performance (ISC18), Best Poster Award, June 2018.
Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption,” ISC High Performance (ISC'18), Best Poster, Frankfurt, Germany, June 2018.“
Using MAGMA with PGI Fortran,” PGI Insider, November 2010.“
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy,” ACM Transactions on Mathematical Software, vol. 34, no. 4, pp. 17-22, 00-2008.“
Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems,” Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Best Paper), Rhodes Island, Greece, August 2012.“
Weighted Dynamic Scheduling with Many Parallelism Grains for Offloading of Numerical Workloads to Multiple Varied Accelerators,” Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA'15), vol. No. 5, Austin, TX, ACM, November 2015.“