Dynaprof Users Guide
Release 0.8

November, 2002
Philip J. Mucci

perfapi-devel@cs.utk.edu

http://www.cs.utk.edu/~mucci/dynaprof





Table of Contents

Dynaprof Overview

Installing Dynaprof

Environment Variables

Running Dynaprof

Command Line Options

Dynaprof Commands

Loading a New Application

Attaching to a Running Application

Unloading an Application

Detaching from an Application

Exploring the Application

Using Probes

The PAPI Probe

The Wallclock Probe

Instrumenting the Application

Controlling the Application

Starting the Application

Interrupting the Application

Resuming the Application

Examining the Probe Output

Wallclock Probe Output

PAPI Probe Output

References

Appendices

Appendix A1. AIX/DPCL Installation Notes

Appendix A2. Linux/Dyninst Installation Notes

Appendix A3. Bugs in the v0.8 Release

Dynaprof Overview

Dynaprof is a performance analysis tool designed to insert performance measurement instrumentation directly into a running applications' address space at run time. The instrumentation included with this release of Dynaprof can measure real-time as well as any hardware performance metrics available through the PAPI. Run-time instrumentation of the object code has numerous advantages over traditional source-based performance profiling systems. Most significant of which is the elimination of the interference of calls to the instrumentation with the compiler's optimization passes. For aggressively scheduled processors, significant code reorganization and subroutine inlining is often required for maximal utilization of the processors functional units. When additional subroutine calls are added, the performance of an application can change especially for compute intensive regions. An additional benefit is the removal of the instrumentations dependency on the compilation process. The type and format of the instrumentation can be changed without recompiling the application.

Installing Dynaprof

Dynaprof comes as a compressed tar file of precompiled binaries. The directory structure is as follows:
./INSTALL
Machine specific installation notes and dependencies
./usr/bin
Contains the Dynaprof binary and the reporting scripts for the included probes
./usr/lib
Contains the Dynaprof probes that are inserted into the application to be profiled
./usr/doc
Contains this document as well as any other machine specific information

To install dynaprof, simply untar/unzip this distribution into an installation area, set the DYNAPROF_PROBEDIR environment variable, as described below, and follow the remaining instructions in the INSTALL file. Usually, this consists of making sure you have installed the proper shared libraries on which the probe modules depend.

Most administrators/users will want to set the DYNAPROF_PROBEDIR environment variable in their login scripts. The other option is to create a wrapper script that sets this variable automatically. If you do not set this variable ahead of time, you will either have to set it at run-time using the set command or explicitly name the full path to the probe in the use command.

Environment Variables

All of the following variables are optional. Most have intelligent defaults set by the program at startup or by your system administrator during the installation process. Most of these variables can also be set at run time using the set and unset command. DYNAPROF_MAKE This variable sets the name of the make command. This is used mainly for short cuts during the performance tuning process.

Example:

[mucci@nebula]$ setenv DYNAPROF_MAKE gmake -f Makefile.aix-power
DYNAPROF_DEBUG

This variable enables debugging output in Dynaprof. See the -d / --debug in the section on Command Line Options. Any non-NULL value enables this option. Not recommended.

Example:

[mucci@nebula]$ setenv DYNAPROF_DEBUG 1
DYNAPROF_DEBUGGER

This variable sets the name of the command to start the debugger. This is mainly used for short cuts during the performance tuning process.

Example:

[mucci@nebula]$ setenv DYNAPROF_DEBUGGER gdb -q
DYNAPROF_POEBIN    THIS VARIABLE IS FOR AIX SYSTEMS ONLY

This variable sets the full path and name of the POE binary for starting parallel programs under AIX.

Example:

[mucci@nebula]$ setenv DYNAPROF_POEBIN /usr/local/bin/poe
DYNAPROF_PROBEDIR

This variable sets the full path to the directory containing Dynaprof probes.

Example:

[mucci@nebula]$ setenv DYNAPROF_PROBEDIR /usr/local/dynaprof/usr/lib

Running Dynaprof

Dynaprof is just a regular executable like most other tools on your system. To start DynaProf, you have two options.

Command Line Options

Dynaprof has a number of command line options, most of which are reasonable self-explanatory. The less obvious options are explained below.

[mucci@nebula]$ dynaprof -h
DynaProf 0.8

Philip J. Mucci, mucci@cs.utk.edu, 2000-2002
Provided courtesy of UTK's Innovative Computing Laboratory. See
http://icl.cs.utk.edu for more information.

This is Open Source Software!

  ./dynaprof [options] [[--] executable-file [executable-args]]

Options:

  -b        | --batch           Exit after processing options.
  -c  | --commmand=     Execute Dynaprof commands from .
  -d        | --debug           Enable debugging statements in Dynaprof.
  -h        | --help            Print this message.
  -q        | --quiet           Do not print version number on startup.
  -t   | --tty= Use  for input/output by the program being profiled.
  -g        | --gui             Gui mode, only buffering one line.
  -v        | --version         Print version information and then exit.

Batch Mode

Dynaprof can run in batch mode, executing commands from a file and then optionally exiting after the application completes. This is normally used where the user has already identified the bottlenecks of an application and is now in the tune/compile/evalute cycle. In this case, dynaprof commands can be placed in a file, one per line.

GUI Mode

This mode affects Dynaprof's output to make parsing easier.

1. The output of the tool is completely unbuffered.

2. All output from any commands after startup is prefaced by an integer representing the number of lines to follow.

3. An extra newline is appended at the end of the above output.

4. All input to the tool is echoed to the screen.

TTY Mode

Occasionally, an application will generate copious amounts of data to the terminal. At these times, it is often beneficial to open a new XTerm and have the Application interact with the new terminal. Dynaprof itself will still talk to the original terminal. This feature is also very useful in conjunction with GUI Mode. Currently this feature is not implemented.

Dynaprof Commands

Loading a Serial Application

In order to instrument an application with Dynaprof, the user must either load the application into the tool or attach to an already running application. The load command takes one or more arguments. The first argument  must be the name of the executable, possibly including a path component. The remaining arguments are simply those that you would pass to the executable as arguments on the command line. Note that glob-style shell expansion is not supported. Upon return from this command, the application will have been created and placed in a stopped state at the first instruction.

Usage: load <executable> [command line arguments]

Example:

(dynaprof) load tests/simple 1 2 3
(dynaprof)

Loading a Threaded Application

 To instrument a threaded application with Dynaprof, the user also uses the load command. Only bound threads, threads that are associated with a kernel thread, are supported at the moment. Some run time environments provide environment variables to control this policy.

On AIX systems, set the following environment variable.

[mucci@nebula]$ setenv AIXTHREAD_SCOPE S

Loading an MPI Application

Only AIX systems with DPCL, the user instruments the entire MPI application with the help of the POE runtime environment. The application is loaded using the poeload command. Instrumentation is performed on every process in a POE application. Currently the selection of specific processes is not supported.

Example:

(dynaprof) poeload tests/mpicount -procs 4
On other systems (DynInst), parallel applications can be instrumented in two ways.

The first method is for doing interactive performance analysis of only one process of the application. It requires that the parallel runtime allows the user to start the processes manually. MPICH and the p4 device serve as a good example. By providing the -t option to mpirun, the user can find out the exact commands that need to be run to start the application. The user is then free to start one or more of those processes under an instance of Dynaprof using the mpiload command. The mpiload is exactly like the load command except that it waits for the process to return from MPI_Init() before allowing the user to perform instrumentation.

Example:

First, have mpirun tell us what it would normally do.

[mucci@nebula]$ mpirun -t -np 2 tests/mpicount
Procgroup file:
localhost.localdomain 0 /home/mucci/work/dynaprof/tests/mpicount
localhost.localdomain 1 /home/mucci/work/dynaprof/tests/mpicount
/home/mucci/work/dynaprof/tests/mpicount -p4pg /home/mucci/work/dynaprof/tests/PI9172 -p4wd /home/mucci/work/dynaprof/tests
ssh localhost.localdomain /home/mucci/work/dynaprof/tests/mpicount -p4pg /home/mucci/work/dynaprof/tests/PI9172 -p4wd /home/mucci/work/dynaprof/tests
Next, start Dynaprof and load in the first process using the arguments from MPI.
(dynaprof) mpiload tests/mpicount /home/mucci/work/dynaprof/tests/mpicount -p4pg /home/mucci/work/dynaprof/tests/PI9172 -p4wd /home/mucci/work/dynaprof/tests
(dynaprof)^Z
[1] Suspended
[mucci@nebula]$
Now start the remote application as mpirun would.
[mucci@nebula]$ ssh localhost.localdomain /home/mucci/work/dynaprof/tests/mpicount -p4pg /home/mucci/work/dynaprof/tests/PI9172 -p4wd /home/mucci/work/dynaprof/tests &
[2] 9283
[mucci@nebula]$ bg
[1] ./dynaprof &
[mucci@nebula]$ fg
(dynaprof)
The second method is for doing batch-mode performance analysis of all the processes of the application. It assumes that the user has made a script file containing the Dynaprof
commands to be executed on every process. requires that the parallel runtime allows the user to start the processes manually. In this case, our script will simply consist of
the mpiload command. We will use the -b argument to Dynaprof to tell it to exit after processing the commands in the script file.

Example:

[mucci@nebula]$ cat > cpi_drv
mpiload /home/mucci/work/dynaprof/tests/cpi
^D
[mucci@nebula]$ mpirun -np 2 dynaprof -b -c cpi_drv
[mucci@nebula]$

Attaching to a Running Serial Application

The attach command takes exactly two arguments. The first argument is the name of the executable, possibly including a path component. The second is the PID or process identifier of the application. This application should already be running. Upon return from this command, the application will be placed in a stopped state at whatever instruction the application happened to be executing at the time.

Usage: attach <executable> <process identifier>

Example:

[mucci@nebula]$ tests/count > /dev/null &
[3] 6327
[mucci@nebula]$ dynaprof
(dynaprof) attach tests/count 6327
(dynaprof)
Unloading an Application

The unload command takes no arguments. If the application is still running, the application will be terminated.

Usage: unload

Detaching from an Application

The detach command takes no arguments. If the application is still running, Dynaprof will leave any instrumentation in place and let the application continue running.

Usage: detach

Exploring the Application


The list command allows the user to explore the internal structure of an application without needing the source code. This is the mechanism by which the user chooses which points he would like to instrument. There are multiple incarnations of the list command, each of which has a different purpose. The basic list command by itself prints out the modules as found in the program's text segment. A module is defined as an object file, library archive or shared library.

Usage: list [module [function]]

Example:

(dynaprof) load tests/swim

(dynaprof) list

DEFAULT_MODULE

swim.F

libm.so.6

libc.so.6

(dynaprof)
DEFAULT_MODULE is something unique to g77. It contains all the Fortran run-time routines. If your application is not compiled with -g, more than likely you'll find your code in the DEFAULT_MODULE. Now let's list the functions found in the module swim.F
(dynaprof) list swim.F
MAIN__
inital_
calc1_
calc2_
calc3z_
calc3_
(dynaprof)
Now let's list the function calls found in the module swim.F in the MAIN__ routine. Note there is no exit point as main doesn't really ever return. What you see is the entry point followed by numerous calls to Fortran I/O functions. Nestled in between are calls to the user's code.
(dynaprof) list swim.F MAIN__
Entry
Call s_wsle
Call do_lio
Call e_wsle
Call s_wsle
Call do_lio
Call e_wsle
Call inital_
Call s_wsfe
Call do_fio
Call do_fio
Call do_fio
Call do_fio
Call do_fio
Call do_fio
Call do_fio
Call e_wsfe
Call calc1_
Call calc2_
Call s_wsfe
Call do_fio
Call do_fio
Call e_wsfe
Call s_wsfe
Call do_fio
Call do_fio
Call do_fio
Call e_wsfe
Call s_stop
Call calc3z_
Call calc3_
(dynaprof)

Using Probes

Before the user can instrument an application, he must decide what that instrumentation will consist of. There are currently two probes shipped with Dynaprof, the PAPI Probe and the Wallclock Probe. Each probe performs its measurement per-thread. This means that each thread will be counted separately from the others.

The PAPI Probe

The PAPI probe gathers measurements using PAPI, the Performance Application Programming Interface. A full description of the interface is beyond the scope of this document but it can be found on the PAPI Home Page. Simply put, PAPI uses the processor's hardware performance counters to measure specific hardware events like cache misses, branch mispredictions and floating point instructions. By default, if no argument is specified, the PAPI probe defaults to counting with PAPI_FP_INS or floating point instructions. Note that this is very different than counting floating point operations, which is a very subjective. Counting hardware events always comes with a caveat: you must know a little about the architecture on which you are running. What is counted as a floating point operation on one architecture may not be counted on another. For example, the fpmv or floating point register move instruction in the IBM Power Architecture is counted as a floating point instruction. If you have any doubts about what the PAPI presets are counting, please see the papi_avail program in the Dynaprof installation directory. It will tell you exactly what PAPI events are available and exactly what they are counting. It is up to you to dig out the processor reference manual to decode the register definitions and understand what you're counting.

Currently, Dynaprof uses PAPI in the user domain. This means that only events that occur in user context will be counted. Other activity on the system will not appreciably affect the counts of most operations except resources that must be flushed and reloaded upon context switches, like caches and TLBs. Note that the PAPI probe also supports multiplexing of counters. That is, if you pass more events than your processor can count at any one given time, PAPI will timeshare the counting hardware to give the illusion that there are far more counters available than actually exist on the hardware. This approach has been shown to work well.

Usage: use papiprobe [arg1,arg2,...]

argN can take one of two forms.

1. A PAPI preset event name. It is the user's responsibility to make sure this preset exists on the host architecture. If this event does not exist, the PAPI probe will exit and so will the application.

2. A native event name of the form 0x<hex>@<reg> where <hex> is a hexadecimal event code of the native event and <reg> is the number of the hardware performance register to program.

Example:

Let's use the PAPI probe to count the total number of cycles and the total number of instructions as defined by the PAPI presets.

(dynaprof) use papiprobe PAPI_TOT_CYC, PAPI_TOT_INS

Or let's use the native interface to measure FMA's on FPU 0 and FPU1 on the Power 3. These correspond to event 11 on counter 4 and event 20 on counter 5.

(dynaprof) use papiprobe 0xb@4, 0x14@5

The Wallclock Probe

The Wallclock probe takes no arguments. It very simply measures elapsed real-time which is sometimes referred to as wallclock time. It does this using the highest resolution and lowest latency real time clock available on the host architecture. The output units are in microseconds.

Usage: use wallclock

Instrumenting the Application
Dynaprof inserts instrumentation directly into the applications' address space. This is accomplished through a run-time code generation and patching mechanism based upon either Dyninst or DPCL, IBM's derivative effort. Whenever a function is instrumented, all it's children are instrumented as well. This is to enable the probe to generate both inclusive and exclusive metrics.

Usage: instr
Usage: instr module <module_pattern>
Usage: instr function <module> <function_pattern>

The instr function has three forms. The first form, the command by itself, simply prints out the previously instrumented points. The second form instruments all functions inside any modules that match the glob-style pattern. The third form performs instrumentation only matching functions inside a specific module.

Example:

First let's load the fpsx application and enable the use of the Wallclock probe.

(dynaprof) load tests/fspx
(dynaprof) use wallclock
Module wallclock.so was loaded.
Now let's see what's inside.
(dynaprof) list
DEFAULT_MODULE
eos.F
phase.F
setup.F
update.F
supmain.F
io.F
properties.F
solveT.F
libm.so.6
libc.so.6
Ok, let's examine the interesting ones.
(dynaprof) list solveT.F
tinmush_
tinsol_
tinvoid_
(dynaprof) list update.F
akw.1
proflux_
flux_
pde_
Ok, let's instrument the entire solver module first.
(dynaprof) instr module solveT.F
solveT.F, inserted 3 instrumentation points
Now let's just instrument all the flux computation routines.
(dynaprof) instr function update.F *flux_
update.F, inserted 2 instrumentation points
Finally let's look at all the instrumented functions.
(dynaprof) instr
tinmush_
tinsol_
tinvoid_
proflux_
flux_
Looks good. We're ready to continue with execution.

Controlling the Application

Starting the Application

To begin an application, the user issues the run command. For attached applications, the run command is functionally equivalent to the continue command as described below.

Usage: run

Interrupting the Application

When the application is running, the applications' input and output are directed to Dynaprof's controlling terminal unless otherwise specified with the -tcommand line option. You interact with the application just as if you were running it from the command line. If you would like to interrupt execution, simply send Dynaprof a SIGINT, easily done with a Control-C. This is particularly useful if you'd like to do your instrumentation at a later phase or would like to insert additional instrumentation midway through a run.

Usage: ^C

Resuming the Application

To resume execution, one simply issues the continue command.

Usage: continue

Example: (input from the user is in BOLD)

(dynaprof) load tests/input
(dynaprof) run
  input the order of the matrix, 0 to exit
1000
^C
Program received signal SIGINT, Interrupt.
Program stopped.
(dynaprof) continue
     norm. resid     
resid           machep        
x(1)          x(n)
  8.77321770E+00  3.89573462E-12  2.22044605E-16 
1.00000000E+00  1.00000000E+00
    times are reported for matrices of order  1000       factor     solve      total     mflops times for array with leading dimension of1001   1.579E+05  3.000E+02  1.582E+05  4.227E-03   input the order of the matrix, 0 to exit 0 Program exited normally.

Examining the Probe Output

Dynaprof does not enforce the manner in which each probe is to generate its output. By not placing these restrictions on the probe modules, the probe designer is free to determine whatever output format is most appropriate, be that a real time binary data feed to a visualization engine or a static data file dumped to disk at the end of the run. The probes included with Dynaprof write the collected data to disk either when the application finishes or the user explicitly sends the application a SIGHUP signal. This signal causes the probe module to flush the data to disk. Note that this data will be overwritten at the end of the run, so it is recommended that the user copy this data to a new file as soon as the flush has been performed. Currently, both the PAPI probe and the Wallclock probe produce a compact file consisting of encoded ASCII data. The data files are created in the directory where the application exists. Each probe prints a message to this effect when the probe is first initialized. The files are named <executable.pid>, where pid is the process identifier. For multithreaded applications, each thread generates a data file of the form <executable.pid.tid> where tid is the thread identifier. Example:

(dynaprof) use wallclock
Module wallclock.so was loaded.
(dynaprof) instr module simple.c
simple.c, inserted 3 instrumentation points
(dynaprof) run
output goes to /home/mucci/dynaprof/tests/simple.8874
In main()
In quickstuff()
In quickstuff()
In slowstuff()
Program exited normally.

Example of Multithreaded Operation:
(dynaprof) load tests/pthread_count
(dynaprof) use wallclock
Module wallclocksmp.so was loaded.
(dynaprof) listDEFAULT_MODULE
pcount.c
libc.so.6
(dynaprof) instr module pcount.c
pcount.c, inserted 4 instrumentation points
(dynaprof) run
output goes to /home/mucci/dynaprof/tests/pthread_count.8885.1024
output goes to /home/mucci/dynaprof/tests/pthread_count.8887.1026
output goes to /home/mucci/dynaprof/tests/pthread_count.8888.2051
Program exited normally.
This data is then interpreted, formatted and displayed by the reporting scripts included with the Dynaprof distribution. There are two scripts, one for each probe. Each script takes one argument, the name of the data file to process.

Usage: wallclockrpt <Wallclock data file> Usage: papiproberpt <PAPI probe data file>

Wallclock Probe Output
Let's take the output from the above run of the simple test program and see what it looks like.

Example:

[mucci@nebula]$ wallclockrpt tests/simple.8874

Exclusive Profile.

Name            Percent         Total           Calls   
-------------   -------         -----           --------
TOTAL           100             1.442e+10       1       
unknown         100             1.442e+10       1       
main            0.0001598       2.305e+04       1       
quickstuff      0.0001001       1.444e+04       2       
slowstuff       9.211e-05       1.328e+04       1       

Inclusive Profile.

Name            Percent         Total           SubCalls
-------------   -------         -----           --------
TOTAL           100             1.442e+10       0       
main            99.98           1.442e+10       5       
slowstuff       83.22           1.2e+10         2       
quickstuff      16.76           2.417e+09       4       

1-Level Inclusive Call Tree.

Parent/-Child   Percent         Total           Calls   
-------------   -------         -----           --------
TOTAL           100             1.442e+10       1       
quickstuff      100             2.417e+09       2       
-    unknown    0.001276        3.084e+04       2       
-      sleep    100             2.417e+09       2       
slowstuff       100             1.2e+10         1       
-    unknown    0.0002016       2.419e+04       1       
-      sleep    100             1.2e+10         1       
main            100             1.442e+10       1       
-    unknown    0.0001381       1.992e+04       1       
- quickstuff    8.366           1.206e+09       1       
- quickstuff    8.398           1.211e+09       1       
-  slowstuff    83.24           1.2e+10         1       
-       exit    0               0               1

PAPI Probe Output

The PAPI Probe reporting script prints out a header containing machine information and then possible multiple profiles resembling the output from the Wallclock Probe. Let's instrument the swim application, a popular shallow water benchmark, to measure Level 1 Instruction and Level 1 Data Cache Misses.

Example:

(dynaprof) load tests/swim
(dynaprof) use probes/papiprobe PAPI_L1_DCM, PAPI_L1_ICM 
(dynaprof) instr function swim.F calc* 
swim.F, inserted 8 instrumentation points 
(dynaprof) run
papiprobe: output goes to /home/mucci/work/dynaprof/tests/swim.7366
 SPEC benchmark 102.swim
  
 NUMBER OF POINTS IN THE X DIRECTION     512
 NUMBER OF POINTS IN THE Y DIRECTION     512
 GRID SPACING IN THE X DIRECTION      25000.
 GRID SPACING IN THE Y DIRECTION      25000.
 TIME STEP                               20.
 TIME FILTER PARAMETER                 0.001
 NUMBER OF ITERATIONS                    120

 CYCLE NUMBER   60 MODEL TIME IN  HOURS  0.33

 Pcheck =   0.1314E+11
 Ucheck =   0.5215E+05
 Vcheck =   0.5215E+05


 CYCLE NUMBER  120 MODEL TIME IN  HOURS  0.67

 Pcheck =   0.1314E+11
 Ucheck =   0.5215E+05
 Vcheck =   0.5215E+05

Program exited normally.
Now let's visualize the data.
[mucci@nebula]$ probes/papiproberpt /home/mucci/work/dynaprof/tests/swim.7366 > out
Output file             : /home/mucci/work/dynaprof/tests/swim.7366
Option string           : PAPI_L1_DCM,PAPI_L1_ICM
Processor               : 1198 Mhz GenuineIntel Intel Pentium III rev 0x1 (1-way)
Total metrics measured  : 2
Metric 1:               : PAPI_L1_DCM, Level 1 data cache misses (Native 0x45,0x45)
Metric 2:               : PAPI_L1_ICM, Level 1 instruction cache misses (Native 0xf28,0xf28)
Total functions         : 4

Exclusive Profile of Metric PAPI_L1_DCM.

Name            Percent         Total           Calls   
-------------   -------         -----           --------
TOTAL           100             5.155e+08       1       
calc3_          52.73           2.718e+08       118     
calc2_          38.52           1.986e+08       120     
calc1_          8.086           4.168e+07       120     
unknown         0.3937          2.03e+06        1       
calc3z_         0.2722          1.403e+06       1       

Inclusive Profile of Metric PAPI_L1_DCM.

Name            Percent         Total           SubCalls
-------------   -------         -----           --------
TOTAL           100             5.155e+08       0       
calc3_          52.73           2.718e+08       0       
calc2_          38.52           1.986e+08       0       
calc1_          8.086           4.168e+07       0       
calc3z_         0.2722          1.403e+06       0       

1-Level Inclusive Call Tree of Metric PAPI_L1_DCM.

Parent/-Child   Percent         Total           Calls   
-------------   -------         -----           --------
TOTAL           100             5.155e+08       1       
calc1_          100             4.168e+07       120     
calc2_          100             1.986e+08       120     
calc3z_         100             1.403e+06       1       
calc3_          100             2.718e+08       118     

Exclusive Profile of Metric PAPI_L1_ICM.

Name            Percent         Total           Calls   
-------------   -------         -----           --------
TOTAL           100             9.916e+04       1       
unknown         29.52           2.927e+04       1       
calc2_          24.01           2.381e+04       120     
calc1_          23.5            2.331e+04       120     
calc3_          22.87           2.268e+04       118     
calc3z_         0.09378         93              1       

Inclusive Profile of Metric PAPI_L1_ICM.

Name            Percent         Total           SubCalls
-------------   -------         -----           --------
TOTAL           100             9.916e+04       0       
calc2_          24.01           2.381e+04       0       
calc1_          23.5            2.331e+04       0       
calc3_          22.87           2.268e+04       0       
calc3z_         0.09378         93              0       

1-Level Inclusive Call Tree of Metric PAPI_L1_ICM.

Parent/-Child   Percent         Total           Calls   
-------------   -------         -----           --------
TOTAL           100             9.916e+04       1       
calc1_          100             2.331e+04       120     
calc2_          100             2.381e+04       120     
calc3z_         100             93              1       
calc3_          100             2.268e+04       118

References

Dynaprof Home Page http://www.cs.utk.edu/~mucci/dynaprof

PAPI Home Page

http://icl.cs.utk.edu/projects/papi

PMToolkit Home Page

http://www.alphaworks.ibm.com/tech/pmapi

Perfctr Download Page

http://www.csd.uu.se/~mikpe/linux/perfctr

DPCL Project Web Site

http://oss.software.ibm.com/developerworks/opensource/dpcl

DYNINST API Home Page

http://www.dyninst.org

Appendices

Appendix A1. AIX/DPCL Installation Notes

This release is for AIX 4.3.x on the Power 3 or below only. If you are interested in an AIX 5.x and/or a Power 4 version, please contact the author. In addition, the following software must be installed on your system. Unfortunately, this must be done by root. oPmtoolkit 1.3.x. This toolkit contains the AIX kernel support necessary to support access to the hardware performance counters. Without it, PAPI will not work and neither will the PAPI probe.

Appendix A2. Linux/Dyninst Installation Notes

This release is for Linux 2.4.x on the Pentium Pro, II or III. If you are interested in a Pentium 4 version with very limited event counting capabilities, please contact the author. In addition, the following software must be installed on your system. Yes, it requires changes to the kernel, so it must be done by root. oPerfctr 2.3.4 or above. This release contains the kernel patch and the shared library necessary to support access to the hardware performance counters. Without it, PAPI will not work and neither will the PAPI probe.

In addition, please make sure the following environment variables are set appropriately.

LD_LIBRARY_PATH must be set to a colon separated list of directories where the GNU run-time linker can find the following shared libraries. These libraries are included in the usr/lib directory of the Dynaprof distribution.

libpapi.so

libperfctr.so

libdyninstAPI.so

libdyninstAPI_RT.so

DYNINSTAPI_RT_LIB must be set to the fully qualified filename of the libdyninstAPI_RT.so shared library. As mentioned above, this library is included with the Dynaprof distribution.

Appendix A3. Bugs in the v0.8 Release

11/14/02 mucci@cs.utk.edu
-------------------------

- xdynaprof does not detect load/attach failures
- xdynaprof windows jump around
 
7/31/02 mucci@cs.utk.edu
------------------------

- The probe reporting scripts should make sure they only take one argument.
- The SIGHUP flush feature of papiprobe has been broken by the new header
  written to disk.

7/29/02 mucci@cs.utk.edu
------------------------

- instr of the same functions does not return error
- The -tty argument does not work. The stdin/stdout of dynaprof itself
  is redirected instead of just the application.
- Multiple instr/interrupt/instr cycles do not reinitialize the probes
  properly.

7/16/02 mucci@cs.utk.edu
------------------------

- This release only works with GNU Make