Scalable Dense Linear Algebra on Heterogeneous Hardware,” HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing, 2013.“
Accelerating NWChem Coupled Cluster through dataflow-based Execution,” 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Springer International Publishing, September 2015.“
Counter Inspection Toolkit: Making Sense out of Hardware Performance Event,” 11th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Cham, Switzerland: Springer, September 2019.“
PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution,” 2015 IEEE International Conference on Cluster Computing, Chicago, IL, IEEE, September 2015.“
Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models,” 2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-04, Madrid, Spain, IEEE, September 2014. DOI: 10.1109/CLUSTER.2014.6968672“
PTG: An Abstraction for Unhindered Parallelism,” International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014.“
Search Space Generation and Pruning System for Autotuners,” 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.“
Software-Defined Events through PAPI,” 24th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), Rio de Janeiro, Brazil, IEEE, May 2019.“
Utilizing Dataflow-based Execution for Coupled Cluster Methods,” 2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-02, Madrid, Spain, IEEE, September 2014.“
What it Takes to keep PAPI Instrumental for the HPC Community,” 1st Workshop on Sustainable Scientific Software (CW3S19), Collegeville, Minnesota, July 2019.“
BlackjackBench: Hardware Characterization with Portable Micro-Benchmarks and Automatic Statistical Analysis of Results,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.“
DAGuE: A Generic Distributed DAG Engine for High Performance Computing,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1151-1158, 00 2011.“
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1432-1441, May 2011.“
Interoperable Convergence of Storage, Networking, and Computation,” FICC 2019, San Francisco, CA, Springer, March 14 15, 2019.“
MPI-aware Compiler Optimizations for Improving Communication-Computation Overlap,” Proceedings of the 23rd annual International Conference on Supercomputing (ICS '09), Yorktown Heights, NY, USA, ACM, pp. 316-325, June 2009.“
Power Management and Event Verification in PAPI,” Tools for High Performance Computing 2015: Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing, September 2015, Dresden, Germany, Dresden, Germany, Springer International Publishing, pp. pp. 41-51, 2016. DOI: 10.1007/978-3-319-39589-0_4“
Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure,” Proceedings of Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, vol. 6960, Santorini, Greece, Springer, pp. 342-344, September 2011.“
Accelerating NWChem Coupled Cluster through Dataflow-Based Execution,” The International Journal of High Performance Computing Applications, pp. 1–13, January 2017. DOI: 10.1177/1094342016672543“
Accelerating NWChem Coupled Cluster through dataflow-based Execution,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 540--551, July 2018. DOI: 10.1177/1094342016672543“
BlackjackBench: Portable Hardware Characterization with Automated Results Analysis,” The Computer Journal, March 2013. DOI: 10.1093/comjnl/bxt057“
DAGuE: A generic distributed DAG Engine for High Performance Computing.,” Parallel Computing, vol. 38, no. 1-2: Elsevier, pp. 27-51, 00 2012.“
Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach,” Scalable Computing and Communications: Theory and Practice: John Wiley & Sons, pp. 699-735, March 2013.“
An Efficient Distributed Randomized Algorithm for Solving Large Dense Symmetric Indefinite Linear Systems,” Parallel Computing, vol. 40, issue 7, pp. 213-223, July 2014. DOI: 10.1016/j.parco.2013.12.003“
Evaluation of Dataflow Programming Models for Electronic Structure Theory,” Concurrency and Computation: Practice and Experience: Special Issue on Parallel and Distributed Algorithms, vol. 2018, issue e4490, pp. 1–20, May 2018. DOI: 10.1002/cpe.4490“
From Serial Loops to Parallel Execution on Distributed Systems,” PPoPP 2012 (submitted), New Orleans, LA, February 2012.“
Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW,” 18th EuroMPI, Santorini, Greece, Springer, pp. 247-254, September 2011.“
OMPIO: A Modular Software Architecture for MPI I/O,” 18th EuroMPI, Santorini, Greece, Springer, pp. 81-89, September 2011.“
PAPI Software-Defined Events for in-Depth Performance Analysis,” The International Journal of High Performance Computing Applications, 2019.“
PaRSEC: Exploiting Heterogeneity to Enhance Scalability,” IEEE Computing in Science and Engineering, vol. 15, issue 6, pp. 36-45, November 2013. DOI: 10.1109/MCSE.2013.98“
Does your tool support PAPI SDEs yet? , Tahoe City, CA, 13th Scalable Tools Workshop, July 2019.
PAPI: Counting outside the Box , Barcelona, Spain, 8th JLESC Meeting, April 2018.
PAPI's New Software-Defined Events for In-Depth Performance Analysis , Lyon, France, CCDSC 2018: Workshop on Clusters, Clouds, and Data for Scientific Computing, September 2018.
PAPI's new Software-Defined Events for in-depth Performance Analysis , Dresden, Germany, 13th Parallel Tools Workshop, September 2019.
Software-Defined Events through PAPI for In-Depth Analysis of Application Performance , Basel, Switzerland, 5th Platform for Advanced Scientific Computing Conference (PASC18), July 2018.
Understanding Native Event Semantics , Knoxville, TN, 9th JLESC Workshop, April 2019.
What it Takes to keep PAPI Instrumental for the HPC Community , Collegeville, MN, The 2019 Collegeville Workshop on Sustainable Scientific Software (CW3S19), July 2019.
Is your scheduling good? How would you know? , Bordeaux, France, 14th Scheduling for Large Scale Systems Workshop, June 2019.
DAGuE: A generic distributed DAG engine for high performance computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.“
Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,” University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.“
Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.“
An efficient distributed randomized solver with application to large dense linear systems,” ICL Technical Report, no. ICL-UT-12-02, July 2012.“
Roadmap for the Development of a Linear Algebra Library for Exascale Computing: SLATE: Software for Linear Algebra Targeting Exascale,” SLATE Working Notes, no. 1, ICL-UT-17-02: Innovative Computing Laboratory, University of Tennessee, June 2017.“
Software-Defined Events (SDEs) in MAGMA-Sparse,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-12: University of Tennessee, December 2018.“