The Design and Implementation of the Parallel Out of Core ScaLAPACK LU, QR, and Cholesky Factorization Routines,” Concurrency: Practice and Experience, vol. 12, no. 15, pp. 1481-1493, January 2000.“
Reliability and Performance Modeling and Analysis for Grid Computing,” in Handbook of Research on Scalable Computing Technologies (to appear): IGI Global, pp. 219-245, 00 2009.“
Scheduling in the Grid Application Development Software Project,” Resource Management in the Grid: Kluwer Publishers, March 2003.“
PTG: An Abstraction for Unhindered Parallelism,” International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014.“
BlackjackBench: Portable Hardware Characterization with Automated Results Analysis,” The Computer Journal, March 2013. DOI: 10.1093/comjnl/bxt057“
Is your scheduling good? How would you know? , Bordeaux, France, 14th Scheduling for Large Scale Systems Workshop, June 2019.
BlackjackBench: Hardware Characterization with Portable Micro-Benchmarks and Automatic Statistical Analysis of Results,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.“
MPI-aware Compiler Optimizations for Improving Communication-Computation Overlap,” Proceedings of the 23rd annual International Conference on Supercomputing (ICS '09), Yorktown Heights, NY, USA, ACM, pp. 316-325, June 2009.“
PAPI: Counting outside the Box , Barcelona, Spain, 8th JLESC Meeting, April 2018.
Counter Inspection Toolkit: Making Sense out of Hardware Performance Events,” 11th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Cham, Switzerland: Springer, February 2019. DOI: 10.1007/978-3-030-11987-4_2“
Software-Defined Events through PAPI for In-Depth Analysis of Application Performance , Basel, Switzerland, 5th Platform for Advanced Scientific Computing Conference (PASC18), July 2018.
Software-Defined Events through PAPI,” 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019. DOI: 10.1109/IPDPSW.2019.00069“
Does your tool support PAPI SDEs yet? , Tahoe City, CA, 13th Scalable Tools Workshop, July 2019.
From Serial Loops to Parallel Execution on Distributed Systems,” PPoPP 2012 (submitted), New Orleans, LA, February 2012.“
PAPI's new Software-Defined Events for in-depth Performance Analysis , Dresden, Germany, 13th Parallel Tools Workshop, September 2019.
PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution,” 2015 IEEE International Conference on Cluster Computing, Chicago, IL, IEEE, September 2015.“
Understanding Native Event Semantics , Knoxville, TN, 9th JLESC Workshop, April 2019.
Characterization of Power Usage and Performance in Data-Intensive Applications using MapReduce over MPI,” 2019 International Conference on Parallel Computing (ParCo2019), Prague, Czech Republic, September 2019.“
Prospectus for the Next LAPACK and ScaLAPACK Libraries,” PARA 2006, Umea, Sweden, June 2006.“
LAPACK 2005 Prospectus: Reliable and Scalable Software for Linear Algebra Computations on High End Computers : LAPACK Working Note 164, January 2005.
Self Adapting Linear Algebra Algorithms and Software,” IEEE Proceedings (to appear), 00 2004.“
Accelerating Time-To-Solution for Computational Science and Engineering,” SciDAC Review, 00 2009.“
Towards An Efficient, Scalable Replication Mechanism for the I2-DSI Project,” University of North Carolina School of Library and Information Science Technical Report, no. TR-1999-01, January 1999.“
FT-MPI, Fault-Tolerant Metacomputing and Generic Name Services: A Case Study,” Lecture Notes in Computer Science, vol. 4192, no. ICL-UT-06-14: Springer Berlin / Heidelberg, pp. 133-140, 00 2006.“
Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,” University of Tennessee Computer Science Technical Report, no. ut-cs-13-713, July 2013.“
On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,” University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.“
Performance evaluation of LU factorization through hardware counter measurements,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-700, October 2012.“
Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,” Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, May 2014.“
A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination,” Concurrency and Computation: Practice and Experience, vol. 27, issue 5, pp. 1292-1309, April 2015. DOI: 10.1002/cpe.3306“
Acceleration of the BLAST Hydro Code on GPU,” Supercomputing '12 (poster), Salt Lake City, Utah, SC12, November 2012.“
Accelerating the SVD Bi-Diagonalization of a Batch of Small Matrices using GPUs,” Journal of Computational Science, vol. 26, pp. 237–245, May 2018. DOI: 10.1016/j.jocs.2018.01.007“
Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices,” International Conference on Computational Science (ICCS 2017), Zurich, Switzerland, Procedia Computer Science, June 2017.“
LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU,” 16th IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.“
A Step towards Energy Efficient Computing: Redesigning A Hydrodynamic Application on CPU-GPU,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.“
A Fast Batched Cholesky Factorization on a GPU,” International Conference on Parallel Processing (ICPP-2014), Minneapolis, MN, September 2014.“
Hydrodynamic Computation with Hybrid Programming on CPU-GPU Clusters,” University of Tennessee Computer Science Technical Report, no. ut-cs-13-714, July 2013.“
Measuring Computer Performance: A Practioner's Guide,” SIAM Review (book review), vol. 43, no. 2, pp. 383-384, 00 2001.“
Numerical Linear Algebra Algorithms and Software,” Journal of Computational and Applied Mathematics, vol. 123, no. 1-2, pp. 489-514, October 1999.“
Fault Tolerance Techniques for High-performance Computing,” University of Tennessee Computer Science Technical Report (also LAWN 289), no. UT-EECS-15-734: University of Tennessee, May 2015.“
Twenty-Plus Years of Netlib and NA-Net,” University of Tennessee Computer Science Department Technical Report, UT-CS-04-526, 00 2006.“
An Iterative Solver Benchmark,” Scientific Programming (to appear), 00 2002.“
DARPA's HPCS Program: History, Models, Tools, Languages,” in Advances in Computers, vol. 72: Elsevier, January 2008.“
High Performance Conjugate Gradient Benchmark: A new Metric for Ranking High Performance Computing Systems,” International Journal of High Performance Computing Applications, vol. 30, issue 1, pp. 3 - 10, February 2016. DOI: 10.1177/1094342015593158“
Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Department Technical Report, UT-CS-04-526, vol. –89-95, January 2006.“
Recent Advances in Parallel Virtual Machine and Message Passing Interface,” Lecture Notes in Computer Science, vol. 2840: Springer-Verlag, Berlin, January 2003.“
Recursive approach in sparse matrix LU factorization,” Proceedings of 1st SGI Users Conference, Cracow, Poland (ACC Cyfronet UMM, 2000), pp. 409-418, January 2000.“
The 30th Anniversary of the Supercomputing Conference: Bringing the Future Closer—Supercomputing History and the Immortality of Now,” Computer, vol. 51, issue 10, pp. 74–85, November 2018. DOI: 10.1109/MC.2018.3971352“
Self Adapting Numerical Algorithm for Next Generation Applications,” International Journal of High Performance Computing Applications, vol. 17, no. 2, pp. 125-132, January 2003.“
Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems,” Parallel Computing, vol. 39, issue 4-5, pp. 212-232, May 2013.“
Energy Footprint of Advanced Dense Numerical Linear Algebra using Tile Algorithms on Multicore Architecture,” The 2nd International Conference on Cloud and Green Computing (submitted), Xiangtan, Hunan, China, November 2012.“