Software-Defined Events (SDEs) in MAGMA-Sparse,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-12: University of Tennessee, December 2018.“
Roadmap for the Development of a Linear Algebra Library for Exascale Computing: SLATE: Software for Linear Algebra Targeting Exascale,” SLATE Working Notes, no. 01, ICL-UT-17-02: Innovative Computing Laboratory, University of Tennessee, June 2017.“
Roadmap for Refactoring Classic PAPI to PAPI++: Part II: Formulation of Roadmap Based on Survey Results,” PAPI++ Working Notes, no. 2, ICL-UT-20-09: Innovative Computing Laboratory, University of Tennessee, July 2020.“
Formulation of Requirements for New PAPI++ Software Package: Part I: Survey Results,” PAPI++ Working Notes, no. 1, ICL-UT-20-02: Innovative Computing Laboratory, University of Tennessee Knoxville, January 2020.“
An efficient distributed randomized solver with application to large dense linear systems,” ICL Technical Report, no. ICL-UT-12-02, July 2012.“
Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.“
Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,” University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.“
DAGuE: A generic distributed DAG engine for high performance computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.“
Is your scheduling good? How would you know? , Bordeaux, France, 14th Scheduling for Large Scale Systems Workshop, June 2019.
What it Takes to keep PAPI Instrumental for the HPC Community , Collegeville, MN, The 2019 Collegeville Workshop on Sustainable Scientific Software (CW3S19), July 2019.
Understanding Native Event Semantics , Knoxville, TN, 9th JLESC Workshop, April 2019.
Software-Defined Events through PAPI for In-Depth Analysis of Application Performance , Basel, Switzerland, 5th Platform for Advanced Scientific Computing Conference (PASC18), July 2018.
PAPI's new Software-Defined Events for in-depth Performance Analysis , Dresden, Germany, 13th Parallel Tools Workshop, September 2019.
PAPI's New Software-Defined Events for In-Depth Performance Analysis , Lyon, France, CCDSC 2018: Workshop on Clusters, Clouds, and Data for Scientific Computing, September 2018.
PAPI: Counting outside the Box , Barcelona, Spain, 8th JLESC Meeting, April 2018.
Does your tool support PAPI SDEs yet? , Tahoe City, CA, 13th Scalable Tools Workshop, July 2019.
PULSE: PAPI Unifying Layer for Software-Defined Events (Poster) , Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020.
Performance Application Programming Interface for Extreme-Scale Environments (PAPI-EX) (Poster) , Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, 20 2020.
Exa-PAPI: The Exascale Performance API with Modern C++ , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
PaRSEC: Exploiting Heterogeneity to Enhance Scalability,” IEEE Computing in Science and Engineering, vol. 15, issue 6, pp. 36-45, November 2013. DOI: 10.1109/MCSE.2013.98“
PAPI Software-Defined Events for in-Depth Performance Analysis,” The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1113-1127, November 2019.“
OMPIO: A Modular Software Architecture for MPI I/O,” 18th EuroMPI, Santorini, Greece, Springer, pp. 81-89, September 2011.“
Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW,” 18th EuroMPI, Santorini, Greece, Springer, pp. 247-254, September 2011.“
Evaluation of Dataflow Programming Models for Electronic Structure Theory,” Concurrency and Computation: Practice and Experience: Special Issue on Parallel and Distributed Algorithms, vol. 2018, issue e4490, pp. 1–20, May 2018. DOI: 10.1002/cpe.4490“
An Efficient Distributed Randomized Algorithm for Solving Large Dense Symmetric Indefinite Linear Systems,” Parallel Computing, vol. 40, issue 7, pp. 213-223, July 2014. DOI: 10.1016/j.parco.2013.12.003“
Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach,” Scalable Computing and Communications: Theory and Practice: John Wiley & Sons, pp. 699-735, March 2013.“
DAGuE: A generic distributed DAG Engine for High Performance Computing.,” Parallel Computing, vol. 38, no. 1-2: Elsevier, pp. 27-51, 00 2012.“
BlackjackBench: Portable Hardware Characterization with Automated Results Analysis,” The Computer Journal, March 2013. DOI: 10.1093/comjnl/bxt057“
Accelerating NWChem Coupled Cluster through dataflow-based Execution,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 540--551, July 2018. DOI: 10.1177/1094342016672543“
Accelerating NWChem Coupled Cluster through Dataflow-Based Execution,” The International Journal of High Performance Computing Applications, pp. 1–13, January 2017. DOI: 10.1177/1094342016672543“
Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure,” Proceedings of Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, vol. 6960, Santorini, Greece, Springer, pp. 342-344, September 2011.“
Power Management and Event Verification in PAPI,” Tools for High Performance Computing 2015: Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing, September 2015, Dresden, Germany, Dresden, Germany, Springer International Publishing, pp. pp. 41-51, 2016. DOI: 10.1007/978-3-319-39589-0_4“
MPI-aware Compiler Optimizations for Improving Communication-Computation Overlap,” Proceedings of the 23rd annual International Conference on Supercomputing (ICS '09), Yorktown Heights, NY, USA, ACM, pp. 316-325, June 2009.“
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1432-1441, May 2011.“
DAGuE: A Generic Distributed DAG Engine for High Performance Computing,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1151-1158, 00 2011.“
BlackjackBench: Hardware Characterization with Portable Micro-Benchmarks and Automatic Statistical Analysis of Results,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.“
What it Takes to keep PAPI Instrumental for the HPC Community,” 1st Workshop on Sustainable Scientific Software (CW3S19), Collegeville, Minnesota, July 2019.“
Utilizing Dataflow-based Execution for Coupled Cluster Methods,” 2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-02, Madrid, Spain, IEEE, September 2014.“
Software-Defined Events through PAPI,” 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019. DOI: 10.1109/IPDPSW.2019.00069“
Search Space Generation and Pruning System for Autotuners,” 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.“
PTG: An Abstraction for Unhindered Parallelism,” International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014.“
Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models,” 2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-04, Madrid, Spain, IEEE, September 2014. DOI: 10.1109/CLUSTER.2014.6968672“
PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution,” 2015 IEEE International Conference on Cluster Computing, Chicago, IL, IEEE, September 2015.“
From Serial Loops to Parallel Execution on Distributed Systems,” International European Conference on Parallel and Distributed Computing (Euro-Par '12), Rhodes, Greece, August 2012.“
Effortless Monitoring of Arithmetic Intensity with PAPI's Counter Analysis Toolkit,” 13th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Springer International Publishing, September 2020.“
Counter Inspection Toolkit: Making Sense out of Hardware Performance Events,” 11th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Cham, Switzerland: Springer, February 2019. DOI: 10.1007/978-3-030-11987-4_2“
Characterization of Power Usage and Performance in Data-Intensive Applications using MapReduce over MPI,” 2019 International Conference on Parallel Computing (ParCo2019), Prague, Czech Republic, September 2019.“
Accelerating NWChem Coupled Cluster through dataflow-based Execution,” 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Springer International Publishing, September 2015.“
Scalable Dense Linear Algebra on Heterogeneous Hardware,” HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing, 2013.“
An Introduction to High Performance Computing and Its Intersection with Advances in Modeling Rare Earth Elements and Actinides,” Rare Earth Elements and Actinides: Progress in Computational Science Applications, vol. 1388, Washington, DC, American Chemical Society, pp. 3-53, October 2021. DOI: 10.1021/bk-2021-1388.ch001“