Export 265 results:
Filters: Author is Stanimire Tomov [Clear All Filters]
MagmaDNN: Accelerated Deep Learning Using MAGMA,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.“
MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs , Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019. DOI: 10.1007/978-3-030-34356-9_37“
MAGMA-sparse Interface Design Whitepaper,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.“
MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020.
MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Research Poster, November 2018.
Matrices Over Runtime Systems at Exascale,” Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.“
Matrix Algebra on GPU and Multicore Architectures , Basel, Switzerland, Workshop on GPU-enabled Numerical Libraries, Presentation, May 2011.
Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions,” Journal of Parallel and Distributed Computing, vol. 145, pp. 188-201, November 2020. DOI: 10.1016/j.jpdc.2020.07.001“
MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR) , Washington, DC, NSF PI Meeting, Poster, April 2018. DOI: 10.6084/m9.figshare.6174143.v3
Mixed-precision Block Gram Schmidt Orthogonalization,” 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Austin, TX, ACM, November 2015.“
Mixed-Precision Cholesky QR Factorization and its Case Studies on Multicore CPU with Multiple GPUs,” SIAM Journal on Scientific Computing, vol. 37, no. 3, pp. C203-C330, May 2015. DOI: DOI:10.1137/14M0973773“
Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems,” Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020. DOI: 10.1098/rspa.2020.0110“
Mixed-precision orthogonalization process Performance on multicore CPUs with GPUs,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.“
Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs,” VECPAR 2014 (Best Paper), Eugene, OR, June 2014.“
Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-05: University of Tennessee, May 2020.“
Mixed-Tool Performance Analysis on Hybrid Multicore Architectures,” First International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2010), San Diego, CA, September 2010.“
Model-Driven One-Sided Factorizations on Multicore, Accelerated Systems,” Supercomputing Frontiers and Innovations, vol. 1, issue 1, 2014. DOI: http://dx.doi.org/10.14529/jsfi1401“
A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms,” ICL Technical Report, no. ICL-UT-21-04: University of Tennessee, August 2021.“
Non-GPU-resident Dense Symmetric Indefinite Factorization,” Concurrency and Computation: Practice and Experience, November 2016. DOI: 10.1002/cpe.4012“
A Note on Auto-tuning GEMM for GPUs,” 9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009. DOI: 10.1007/978-3-642-01970-8_89“
Novel HPC Techniques to Batch Execution of Many Variable Size BLAS Computations on GPUs,” International Conference on Supercomputing (ICS '17), Chicago, Illinois, ACM, June 2017. DOI: 10.1145/3079079.3079103“
A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,” International Journal of High Performance Computing Applications, vol. 28, issue 2, pp. 196-209, May 2014. DOI: 10.1177/1094342013502097“
A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,” Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.“
Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects , Portland, OR, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects,” Journal of Physics: Conference Series, vol. 180, 00 2009.“
Numerical Linear Algebra on Hybrid Architectures: Recent Developments in the MAGMA Project , Portland, Oregon, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
One-Sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators,” The International Conference on Computational Science (ICCS), June 2012.“
OpenDIEL: A Parallel Workflow Engine and DataAnalytics Framework,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.“
Optimization for Performance and Energy for Batched Matrix Computations on GPUs,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8), San Francisco, CA, ACM, February 2015. DOI: 10.1145/2716282.2716288“
Optimizing Batch HGEMM on Small Sizes Using Tensor Cores , San Jose, CA, GPU Technology Conference (GTC), March 2019.
Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization,” IEEE High Performance Extreme Computing Conference (HPEC’18), Waltham, MA, IEEE, September 2018.“
Optimizing Krylov Subspace Solvers on Graphics Processing Units,” Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.“
Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs,” ACM/IEEE Conference on Supercomputing (SC’11), Seattle, WA, November 2011.“
Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices,” International Conference on Computational Science (ICCS 2017), Zurich, Switzerland, Procedia Computer Science, June 2017. DOI: 10.1016/j.procs.2017.05.237“
Out of Memory SVD Solver for Big Data,” 2017 IEEE High Performance Extreme Computing Conference (HPEC'17), Waltham, MA, IEEE, September 2017.“
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs,” International Conference on Parallel Processing (ICPP'11), Taipei, Taiwan, ACM, September 2011. DOI: 10.1109/ICPP.2011.71“
Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems,” Supercomputing Frontiers and Innovations, vol. 2, no. 4, October 2015. DOI: 10.14529/jsfi1504“
Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations,” 2016 IEEE High Performance Extreme Computing Conference (HPEC ‘16), Waltham, MA, IEEE, September 2016.“
Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures,” The Spring Simulation Multi-Conference 2015 (SpringSim'15), Best Paper Award, Alexandria, VA, April 2015.“
Performance Analysis and Optimization of Two-Sided Factorization Algorithms for Heterogeneous Platform,” International Conference on Computational Science (ICCS 2015), Reykjavík, Iceland, June 2015.“
On the performance and energy efficiency of sparse linear algebra on GPUs,” International Journal of High Performance Computing Applications, October 2016. DOI: 10.1177/1094342016672081“
Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014. DOI: 10.1109/ScalA.2014.8“
Performance, Design, and Autotuning of Batched GEMM for GPUs,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-739: University of Tennessee, February 2016.“
Performance, Design, and Autotuning of Batched GEMM for GPUs,” High Performance Computing: 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, no. 9697: Springer International Publishing, pp. 21–38, 2016. DOI: 10.1007/978-3-319-41321-1_2“
Performance, Design, and Autotuning of Batched GEMM for GPUs,” The International Supercomputing Conference (ISC High Performance 2016), Frankfurt, Germany, June 2016.“
Performance Evaluation for Petascale Quantum Simulation Tools,” Proceedings of the Cray Users' Group Meeting, Atlanta, GA, May 2010.“
Performance evaluation for petascale quantum simulation tools,” Proceedings of CUG09, Atlanta, GA, May 2009.“
Performance evaluation of eigensolvers in nano-structure computations,” IEEE/ACM Proceedings of HPCNano SC06 (to appear), January 2006.“
Performance evaluation of LU factorization through hardware counter measurements,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-700, October 2012.“