%0 Journal Article %J Journal of Supercomputing %D 2013 %T Enabling Workflows in GridSolve: Request Sequencing and Service Trading %A Yinan Li %A Asim YarKhan %A Jack Dongarra %A Keith Seymour %A Aurlie Hurault %K grid computing %K gridpac %K netsolve %K service trading %K workflow applications %X GridSolve employs a RPC-based client-agent-server model for solving computational problems. There are two deficiencies associated with GridSolve when a computational problem essentially forms a workflow consisting of a sequence of tasks with data dependencies between them. First, intermediate results are always passed through the client, resulting in unnecessary data transport. Second, since the execution of each individual task is a separate RPC session, it is difficult to enable any potential parallelism among tasks. This paper presents a request sequencing technique that addresses these deficiencies and enables workflow executions. Building on the request sequencing work, one way to generate workflows is by taking higher level service requests and decomposing them into a sequence of simpler service requests using a technique called service trading. A service trading component is added to GridSolve to take advantage of the new dynamic request sequencing. The features described here include automatic DAG construction and data dependency analysis, direct interserver data transfer, parallel task execution capabilities, and a service trading component. %B Journal of Supercomputing %V 64 %P 1133-1152 %8 2013-06 %G eng %N 3 %& 1133 %R 10.1007/s11227-010-0549-1 %0 Journal Article %J Concurrency and Computation: Practice and Experience (to appear) %D 2010 %T SmartGridRPC: The new RPC model for high performance Grid Computing and Its Implementation in SmartGridSolve %A Thomas Brady %A Alexey Lastovetsky %A Keith Seymour %A Michele Guidolin %A Jack Dongarra %K netsolve %B Concurrency and Computation: Practice and Experience (to appear) %8 2010-01 %G eng %0 Journal Article %J Cluster Computing Journal: Special Issue on High Performance Distributed Computing %D 2009 %T Paravirtualization Effect on Single- and Multi-threaded Memory-Intensive Linear Algebra Software %A Lamia Youseff %A Keith Seymour %A Haihang You %A Dmitrii Zagorodnov %A Jack Dongarra %A Rich Wolski %B Cluster Computing Journal: Special Issue on High Performance Distributed Computing %I Springer Netherlands %V 12 %P 101-122 %8 2009-00 %G eng %0 Journal Article %J in Cloud Computing and Software Services: Theory and Techniques (to appear) %D 2009 %T Transparent Cross-Platform Access to Software Services using GridSolve and GridRPC %A Keith Seymour %A Asim YarKhan %A Jack Dongarra %E Syed Ahson %E Mohammad Ilyas %K netsolve %B in Cloud Computing and Software Services: Theory and Techniques (to appear) %I CRC Press %8 2009-00 %G eng %0 Conference Proceedings %B The 3rd international Workshop on Automatic Performance Tuning %D 2008 %T A Comparison of Search Heuristics for Empirical Code Optimization %A Keith Seymour %A Haihang You %A Jack Dongarra %K gco %B The 3rd international Workshop on Automatic Performance Tuning %C Tsukuba, Japan %8 2008-10 %G eng %0 Journal Article %J Recent developments in Grid Technology and Applications %D 2008 %T High Performance GridRPC Middleware %A Yves Caniou %A Eddy Caron %A Frederic Desprez %A Hidemoto Nakada %A Yoshio Tanaka %A Keith Seymour %E George A. Gravvanis %E John P. Morrison %E Hamid R. Arabnia %E D. A. Power %K netsolve %B Recent developments in Grid Technology and Applications %I Nova Science Publishers %8 2008-00 %G eng %0 Conference Proceedings %B ACM/IEEE International Symposium on High Performance Distributed Computing %D 2008 %T The Impact of Paravirtualized Memory Hierarchy on Linear Algebra Computational Kernels and Software %A Lamia Youseff %A Keith Seymour %A Haihang You %A Jack Dongarra %A Rich Wolski %K gco %K netsolve %B ACM/IEEE International Symposium on High Performance Distributed Computing %C Boston, MA. %8 2008-06 %G eng %0 Journal Article %J Computing and Informatics %D 2008 %T Interactive Grid-Access Using Gridsolve and Giggle %A Marcus Hardt %A Keith Seymour %A Jack Dongarra %A Michael Zapf %A Nicole Ruiter %K netsolve %B Computing and Informatics %V 27 %P 233-248,ISSN1335-9150 %8 2008-00 %G eng %0 Journal Article %J Proc. SciDAC 2008 %D 2008 %T PERI Auto-tuning %A David Bailey %A Jacqueline Chame %A Chun Chen %A Jack Dongarra %A Mary Hall %A Jeffrey K. Hollingsworth %A Paul D. Hovland %A Shirley Moore %A Keith Seymour %A Jaewook Shin %A Ananta Tiwari %A Sam Williams %A Haihang You %K gco %B Proc. SciDAC 2008 %I Journal of Physics %C Seatlle, Washington %V 125 %8 2008-01 %G eng %0 Conference Proceedings %B International Conference on Grid and Cooperative Computing (GCC 2008) (submitted) %D 2008 %T Request Sequencing: Enabling Workflow for Efficient Problem Solving in GridSolve %A Yinan Li %A Jack Dongarra %A Keith Seymour %A Asim YarKhan %B International Conference on Grid and Cooperative Computing (GCC 2008) (submitted) %C Shenzhen, China %8 2008-10 %G eng %0 Generic %D 2007 %T Automated Empirical Tuning of a Multiresolution Analysis Kernel %A Haihang You %A Keith Seymour %A Jack Dongarra %A Shirley Moore %K gco %B ICL Technical Report %P 10 %8 2007-01 %G eng %0 Generic %D 2007 %T Empirical Tuning of a Multiresolution Analysis Kernel using a Specialized Code Generator %A Haihang You %A Keith Seymour %A Jack Dongarra %A Shirley Moore %K gco %B ICL Technical Report %8 2007-01 %G eng %0 Conference Proceedings %B Grid-Based Problem Solving Environments: IFIP TC2/WG 2.5 Working Conference on Grid-Based Problem Solving Environments (Prescott, AZ, July 2006) %D 2007 %T GridSolve: The Evolution of Network Enabled Solver %A Asim YarKhan %A Jack Dongarra %A Keith Seymour %E Patrick Gaffney %K netsolve %B Grid-Based Problem Solving Environments: IFIP TC2/WG 2.5 Working Conference on Grid-Based Problem Solving Environments (Prescott, AZ, July 2006) %I Springer %P 215-226 %8 2007-00 %G eng %0 Journal Article %J Parallel Processing Letters %D 2007 %T Improved Runtime and Transfer Time Prediction Mechanisms in a Network Enabled Servers Middleware %A Emmanuel Jeannot %A Keith Seymour %A Asim YarKhan %A Jack Dongarra %B Parallel Processing Letters %V 17 %P 47-59 %8 2007-03 %G eng %0 Generic %D 2006 %T ATLAS on the BlueGene/L – Preliminary Results %A Keith Seymour %A Haihang You %A Jack Dongarra %K gco %B ICL Technical Report %8 2006-01 %G eng %0 Journal Article %J Parallel Processing Letters %D 2006 %T Improved Runtime and Transfer Time Prediction Mechanisms in a Network Enabled Server %A Emmanuel Jeannot %A Keith Seymour %A Asim YarKhan %A Jack Dongarra %K netsolve %B Parallel Processing Letters %V 17 %P 47-59 %8 2006-03 %G eng %0 Journal Article %J International Journal of High Performance Computing Applications (Special Issue: Scheduling for Large-Scale Heterogeneous Platforms) %D 2006 %T Recent Developments in GridSolve %A Asim YarKhan %A Keith Seymour %A Kiran Sagi %A Zhiao Shi %A Jack Dongarra %E Yves Robert %K netsolve %B International Journal of High Performance Computing Applications (Special Issue: Scheduling for Large-Scale Heterogeneous Platforms) %I Sage Science Press %V 20 %8 2006-00 %G eng %0 Journal Article %J IBM Journal of Research and Development %D 2006 %T Self Adapting Numerical Software SANS Effort %A George Bosilca %A Zizhong Chen %A Jack Dongarra %A Victor Eijkhout %A Graham Fagg %A Erika Fuentes %A Julien Langou %A Piotr Luszczek %A Jelena Pjesivac–Grbovic %A Keith Seymour %A Haihang You %A Sathish Vadhiyar %K gco %B IBM Journal of Research and Development %V 50 %P 223-238 %8 2006-01 %G eng %0 Generic %D 2005 %T An Effective Empirical Search Method for Automatic Software Tuning %A Haihang You %A Keith Seymour %A Jack Dongarra %K gco %B ICL Technical Report %8 2005-01 %G eng %0 Journal Article %J Grid Computing and New Frontiers of High Performance Processing %D 2005 %T NetSolve: Grid Enabling Scientific Computing Environments %A Keith Seymour %A Asim YarKhan %A Sudesh Agrawal %A Jack Dongarra %E Lucio Grandinetti %K netsolve %B Grid Computing and New Frontiers of High Performance Processing %I Elsevier %8 2005-00 %G eng %0 Conference Paper %B International Conference on Computational Science (ICCS 2004) %D 2004 %T Accurate Cache and TLB Characterization Using Hardware Counters %A Jack Dongarra %A Shirley Moore %A Phil Mucci %A Keith Seymour %A Haihang You %K gco %K lacsi %K papi %X We have developed a set of microbenchmarks for accurately determining the structural characteristics of data cache memories and TLBs. These characteristics include cache size, cache line size, cache associativity, memory page size, number of data TLB entries, and data TLB associativity. Unlike previous microbenchmarks that used time-based measurements, our microbenchmarks use hardware event counts to more accurately and quickly determine these characteristics while requiring fewer limiting assumptions. %B International Conference on Computational Science (ICCS 2004) %I Springer %C Krakow, Poland %8 2004-06 %G eng %R https://doi.org/10.1007/978-3-540-24688-6_57 %0 Conference Paper %B 2nd ACM SIGPLAN Workshop on Memory System Performance (MSP 2004) %D 2004 %T Automatic Blocking of QR and LU Factorizations for Locality %A Qing Yi %A Ken Kennedy %A Haihang You %A Keith Seymour %A Jack Dongarra %K gco %K papi %K sans %X QR and LU factorizations for dense matrices are important linear algebra computations that are widely used in scientific applications. To efficiently perform these computations on modern computers, the factorization algorithms need to be blocked when operating on large matrices to effectively exploit the deep cache hierarchy prevalent in today's computer memory systems. Because both QR (based on Householder transformations) and LU factorization algorithms contain complex loop structures, few compilers can fully automate the blocking of these algorithms. Though linear algebra libraries such as LAPACK provides manually blocked implementations of these algorithms, by automatically generating blocked versions of the computations, more benefit can be gained such as automatic adaptation of different blocking strategies. This paper demonstrates how to apply an aggressive loop transformation technique, dependence hoisting, to produce efficient blockings for both QR and LU with partial pivoting. We present different blocking strategies that can be generated by our optimizer and compare the performance of auto-blocked versions with manually tuned versions in LAPACK, both using reference BLAS, ATLAS BLAS and native BLAS specially tuned for the underlying machine architectures. %B 2nd ACM SIGPLAN Workshop on Memory System Performance (MSP 2004) %I ACM %C Washington, DC %8 2004-06 %G eng %R 10.1145/1065895.1065898 %0 Journal Article %J Concurrency and Computation: Practice and Experience %D 2003 %T Automatic Translation of Fortran to JVM Bytecode %A Keith Seymour %A Jack Dongarra %K f2j %B Concurrency and Computation: Practice and Experience %V 15 %P 202-207 %8 2003-00 %G eng %0 Journal Article %J Making the Global Infrastructure a Reality %D 2003 %T NetSolve: Past, Present, and Future - A Look at a Grid Enabled Server %A Sudesh Agrawal %A Jack Dongarra %A Keith Seymour %A Sathish Vadhiyar %E Francine Berman %E Geoffrey Fox %E Anthony Hey %K netsolve %B Making the Global Infrastructure a Reality %I Wiley Publishing %8 2003-00 %G eng %0 Generic %D 2002 %T GridRPC: A Remote Procedure Call API for Grid Computing %A Keith Seymour %A Hidemoto Nakada %A Satoshi Matsuoka %A Jack Dongarra %A Craig Lee %A Henri Casanova %B ICL Technical Report %8 2002-11 %G eng %0 Journal Article %J Scientific Programming %D 2002 %T JLAPACK - Compiling LAPACK Fortran to Java %A David Doolin %A Jack Dongarra %A Keith Seymour %K f2j %B Scientific Programming %V 7 %P 111-138 %8 2002-10 %G eng %0 Conference Proceedings %B Proceedings of the Third International Workshop on Grid Computing %D 2002 %T Overview of GridRPC: A Remote Procedure Call API for Grid Computing %A Keith Seymour %A Hidemoto Nakada %A Satoshi Matsuoka %A Jack Dongarra %A Craig Lee %A Henri Casanova %E Manish Parashar %B Proceedings of the Third International Workshop on Grid Computing %P 274-278 %8 2002-01 %G eng %0 Generic %D 2002 %T Users' Guide to NetSolve v1.4.1 %A Sudesh Agrawal %A Dorian Arnold %A Susan Blackford %A Jack Dongarra %A Michelle Miller %A Kiran Sagi %A Zhiao Shi %A Keith Seymour %A Sathish Vadhiyar %K netsolve %B ICL Technical Report %8 2002-06 %G eng %0 Conference Proceedings %B Joint ACM Java Grande - ISCOPE 2001 Conference (submitted) %D 2001 %T Automatic Translation of Fortran to JVM Bytecode %A Keith Seymour %A Jack Dongarra %K f2j %B Joint ACM Java Grande - ISCOPE 2001 Conference (submitted) %C Stanford University, California %8 2001-06 %G eng %0 Conference Paper %B International Conference on Parallel and Distributed Computing Systems %D 2001 %T End-user Tools for Application Performance Analysis, Using Hardware Counters %A Kevin London %A Jack Dongarra %A Shirley Moore %A Phil Mucci %A Keith Seymour %A T. Spencer %K papi %X One purpose of the end-user tools described in this paper is to give users a graphical representation of performance information that has been gathered by instrumenting an application with the PAPI library. PAPI is a project that specifies a standard API for accessing hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count "events", which are occurrences of specific signals and states related to a processor’s function. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture. The perfometer tool developed by the PAPI project provides a graphical view of this information, allowing users to quickly see where performance bottlenecks are in their application. Only one function call has to be added by the user to their program to take advantage of perfometer. This makes it quick and simple to add and remove instrumentation from a program. Also, perfometer allows users to change the "event" they are monitoring. Add the ability to monitor parallel applications, set alarms and a Java front-end that can run anywhere, and this gives the user a powerful tool for quickly discovering where and why a bottleneck exists. A number of third-party tools for analyzing performance of message-passing and/or threaded programs have also incorporated support for PAPI so as to be able to display and analyze hardware counter data from their interfaces. %B International Conference on Parallel and Distributed Computing Systems %C Dallas, TX %8 2001-08 %G eng %0 Conference Paper %B Department of Defense Users' Group Conference Proceedings %D 2001 %T The PAPI Cross-Platform Interface to Hardware Performance Counters %A Kevin London %A Shirley Moore %A Phil Mucci %A Keith Seymour %A Richard Luczak %K papi %X The purpose of the PAPI project is to specify a standard API for accessing hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count "events," which are occurrences of specific signals and states related to the processor's function. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture. This correlation has a variety of uses in performance analysis and tuning. The PAPI project has developed a standard set of hardware events and a standard cross-platform library interface to the underlying counter hardware. The PAPI library has been implemented for a number of Shared Resource Center platforms. The PAPI project is developing end-user tools for dynamically selecting and displaying hardware counter performance data. PAPI support is also being incorporated into a number of third-party tools. %B Department of Defense Users' Group Conference Proceedings %C Biloxi, Mississippi %8 2001-06 %G eng