Publications

Export 937 results:
2011
You, H., B. Rekapalli, Q. Liu, and S. Moore, Autotuned Parallel I/O for Highly Scalable Biosequence Analysis,” TeraGrid'11, Salt Lake City, Utah, July 2011.  (275.34 KB)
Kurzak, J., S. Tomov, and J. Dongarra, Autotuning GEMMs for Fermi,” University of Tennessee Computer Science Technical Report, UT-CS-11-671, (also Lawn 245), April 2011.  (397.45 KB)
Danalis, A., P. Luszczek, G. Marin, J. Vetter, and J. Dongarra, BlackjackBench: Hardware Characterization with Portable Micro-Benchmarks and Automatic Statistical Analysis of Results,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
Anzt, H., S. Tomov, M. Gates, J. Dongarra, and V. Heuveline, Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems , no. UT-CS-11-689, December 2011.  (608.95 KB)
Anzt, H., S. Tomov, J. Dongarra, and V. Heuveline, A Block-Asynchronous Relaxation Method for Graphics Processing Units,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-687 / LAWN 258, November 2011.  (1.08 MB)
Luszczek, P., J. Kurzak, and J. Dongarra, Changes in Dense Linear Algebra Kernels - Decades Long Perspective,” in Solving the Schrodinger Equation: Has everything been tried? (to appear): Imperial College Press, 00-2011.
Horton, M., S. Tomov, and J. Dongarra, A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures,” Symposium for Application Accelerators in High Performance Computing (SAAHPC'11), Knoxville, TN, July 2011.  (329.68 KB)
Bouteiller, A., T. Herault, G. Bosilca, and J. Dongarra, Correlated Set Coordination in Fault Tolerant Message Logging Protocols,” Proceedings of 17th International Conference, Euro-Par 2011, Part II, vol. 6853, Bordeaux, France, Springer, pp. 51-64, August 2011.  (486.68 KB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, DAGuE: A Generic Distributed DAG Engine for High Performance Computing,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1151-1158, 00-2011.  (830.85 KB)
You, H., Q. Liu, Z. Li, and S. Moore, The Design of an Auto-tuning I/O Framework on Cray XT5 System,” Cray Users Group Conference (CUG'11) (Best Paper Finalist), Fairbanks, Alaska, May 2011.  (459.57 KB)
Song, F., S. Tomov, and J. Dongarra, Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures,” University of Tennessee Computer Science Technical Report, UT-CS-11-668, (also Lawn 250), June 2011.  (5.93 MB)
Lively, C., X. Wu, V. Taylor, S. Moore, H-C. Chang, and K. Cameron, Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems,” International Journal of High Performance Computing Applications, vol. 25, no. 3, pp. 342-350, 00-2011.  (467.18 KB)
Luszczek, P., E. Meek, S. Moore, D. Terpstra, V. M. Weaver, and J. Dongarra, Evaluation of the HPC Challenge Benchmarks in Virtualized Environments,” 6th Workshop on Virtualization in High-Performance Cloud Computing, Bordeaux, France, August 2011.  (114.73 KB)
Dongarra, J., M. Faverge, H. Ltaeif, and P. Luszczek, Exploiting Fine-Grain Parallelism in Recursive LU Factorization,” Proceedings of PARCO'11, no. ICL-UT-11-04, Gent, Belgium, April 2011.
Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1432-1441, May 2011.  (1.26 MB)
Anzt, H., P. Luszczek, J. Dongarra, and V. Heuveline, GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,” University of Tennessee Computer Science Technical Report UT-CS-11-690 (also Lawn 260), December 2011.  (662.98 KB)
Dongarra, J., M. Faverge, T. Herault, J. Langou, and Y. Robert, Hierarchical QR factorization algorithms for multi-core cluster systems,” University of Tennessee Computer Science Technical Report (also Lawn 257), no. UT-CS-11-684, October 2011.  (405.71 KB)
Ltaeif, H., P. Luszczek, and J. Dongarra, High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures,” University of Tennessee Computer Science Technical Report, UT-CS-11-673, (also Lawn 247), May 2011.  (424.93 KB)
Du, P., P. Luszczek, and J. Dongarra, High Performance Dense Linear System Solver with Soft Error Resilience,” IEEE Cluster 2011, Austin, TX, September 2011.  (1.27 MB)
Dongarra, J., M. Faverge, H. Ltaeif, and P. Luszczek, High Performance Matrix Inversion Based on LU Factorization for Multicore Architectures,” Proceedings of MTAGS11, Seattle, WA, November 2011.  (879.49 KB)
Agullo, E., C. Augonnet, J. Dongarra, H. Ltaeif, R. Namyst, S. Thibault, and S. Tomov, A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs,” in GPU Computing Gems, Jade Edition, vol. 2: Elsevier, pp. 473-484, 00-2011.
Ma, T., A. Bouteiller, G. Bosilca, and J. Dongarra, Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW,” 18th EuroMPI, Santorini, Greece, Springer, pp. 247-254, September 2011.
Dongarra, J., P. Beckman, and et al., The International Exascale Software Project Roadmap,” International Journal of High Performance Computing, vol. 25, no. 1, pp. 3-60, 00-2011.  (719.74 KB)
Ma, T., G. Bosilca, A. Bouteiller, B. Goglin, J.. Squyres, and J. Dongarra, Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs,” Int'l Conference on Parallel Processing (ICPP '11), Taipei, Taiwan, September 2011.
Agullo, E., C. Augonnet, J. Dongarra, M. Faverge, J. Langou, H. Ltaeif, and S. Tomov, LU Factorization for Accelerator-based Systems,” IEEE/ACS AICCSA 2011, Sharm-El-Sheikh, Egypt, December 2011.  (234.86 KB)
Chaarawi, M., E. Gabriel, R. Keller, R. L. Graham, G. Bosilca, and J. Dongarra, OMPIO: A Modular Software Architecture for MPI I/O,” 18th EuroMPI, Santorini, Greece, Springer, pp. 81-89, September 2011.
Coulomb, K., A. Degomme, M. Faverge, and F. Trahay, An open-source tool-chain for performance analysis,” Parallel Tools Workshop, Dresden, Germany, September 2011.  (622.1 KB)
Nath, R., S. Tomov, T. Dong, and J. Dongarra, Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs,” ACM/IEEE Conference on Supercomputing (SC’11), Seattle, WA, November 2011.  (630.63 KB)
White, J. B., and J. Dongarra, Overlapping Computation and Communication for Advection on a Hybrid Parallel Computer,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
Agullo, E., L. Giraud, A. Guermouche, A. Haidar, and J. Roman, Parallel algebraic domain decomposition solver for the solution of augmented systems.,” Parallel, Distributed, Grid and Cloud Computing for Engineering, Ajaccio, Corsica, France, 12-15 April, 00-2011.
Maloney, A., S. Biersdorff, S. Shende, H. Jagode, S. Tomov, G. Juckeland, R. Dietrich, D. Poole, and C. Lamb, Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs,” International Conference on Parallel Processing (ICPP'11), Taipei, Taiwan, September 2011.  (1.41 MB)
Haidar, A., H. Ltaeif, and J. Dongarra, Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,” Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Seattle, WA, November 2011.  (636.01 KB)
Haidar, A., H. Ltaeif, and J. Dongarra, Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,” University of Tennessee Computer Science Technical Report, UT-CS-11-677, (also Lawn254), August 2011.  (636.01 KB)
Baboulin, M., D. Becker, and J. Dongarra, A parallel tiled solver for dense symmetric indefinite systems on multicore architectures,” University of Tennessee Computer Science Technical Report, no. ICL-UT-11-07, October 2011.  (544.2 KB)
Dongarra, J., Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Technical Report, no. CS-89-85, 00-2011.  (6.42 MB)
Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, N. Ohm Saengpatsa, S. Tomov, and J. Dongarra, Performance Portability of a GPU Enabled Factorization with the DAGuE Framework,” IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 2011.  (290.98 KB)
Lively, C., X. Wu, V. Taylor, S. Moore, H-C. Chang, C-Y. Su, and K. Cameron, Power-Aware Prediction Models of Hybrid (MPI/OpenMP) Scientific Applications,” International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.  (479.49 KB)
Ma, T., T. Herault, G. Bosilca, and J. Dongarra, Process Distance-aware Adaptive MPI Collective Communications,” IEEE Int'l Conference on Cluster Computing (Cluster 2011), Austin, Texas, 00-2011.
Ltaeif, H., P. Luszczek, and J. Dongarra, Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency,” International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.  (1.27 MB)
Agullo, E., C. Coti, T. Herault, J. Langou, S. Peyronnet, A.. Rezmerita, F. Cappello, and J. Dongarra, QCG-OMPI: MPI Applications on Grids.,” Future Generation Computer Systems, vol. 27, no. 4, pp. 435-369, January 2011.  (1.48 MB)
YarKhan, A., J. Kurzak, and J. Dongarra, QUARK Users' Guide: QUeueing And Runtime for Kernels,” University of Tennessee Innovative Computing Laboratory Technical Report, no. ICL-UT-11-02, 00-2011.  (247.12 KB)
Becker, D., M. Baboulin, and J. Dongarra, Reducing the Amount of Pivoting in Symmetric Indefinite Systems,” University of Tennessee Innovative Computing Laboratory Technical Report, no. ICL-UT-11-06, Knoxville, TN, Submitted to PPAM 2011, May 2011.  (145.76 KB)
Bosilca, G., T. Herault, A.. Rezmerita, and J. Dongarra, On Scalability for MPI Runtime Systems,” International Conference on Cluster Computing (CLUSTER), Austin, TX, USA, IEEEE, pp. 187-195, September 2011.  (898.76 KB)
Bosilca, G., T. Herault, A.. Rezmerita, and J. Dongarra, On Scalability for MPI Runtime Systems,” University of Tennessee Computer Science Technical Report, no. ICL-UT-11-05, Knoxville, TN, May 2011.  (898.76 KB)
Bosilca, G., T. Herault, P. Lemariner, J. Dongarra, and A.. Rezmerita, Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure,” Proceedings of Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, vol. 6960, Santorini, Greece, Springer, pp. 342-344, September 2011.  (115.75 KB)
Du, P., P. Luszczek, S. Tomov, and J. Dongarra, Soft Error Resilient QR Factorization for Hybrid System,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-675, Knoxville, TN, July 2011.  (1.39 MB)
Du, P., P. Luszczek, S. Tomov, and J. Dongarra, Soft Error Resilient QR Factorization for Hybrid System,” UT-CS-11-675 (also LAPACK Working Note #252), no. ICL-CS-11-675, July 2011.  (1.39 MB)
Du, P., P. Luszczek, S. Tomov, and J. Dongarra, Soft Error Resilient QR Factorization for Hybrid System with GPGPU,” Journal of Computational Science, Seattle, WA, Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems at SC11, November 2011.  (965.88 KB)
Sourbier, F., A. Haidar, L. Giraud, H. Ben-Hadj-Ali, S. Operto, and J. Virieux, Three-dimensional parallel frequency-domain visco-acoustic wave modelling based on a hybrid direct/iterative solver.,” To appear in Geophysical Prospecting journal., 00-2011.  (1.04 MB)
Haidar, A., H. Ltaeif, and J. Dongarra, Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices.,” Submitted to SIAM Journal on Scientific Computing (SISC), 00-2011.

Pages