
A mindless task for a lazy weekend: convert all the README and README.txt files to Markdown. Paired with the slow conversion of all of our man pages to Markdown, this gives a uniform language to the Open MPI docs. This commit moved a bunch of copyright headers out of the top-level README.txt file, so I updated the relevant copyright header years in the top-level LICENSE file to match what was removed from README.txt. Additionally, this commit did (very) little to update the actual content of the README files. A very small number of updates were made for topics that I found blatently obvious while Markdown-izing the content, but in general, I did not update content during this commit. For example, there's still quite a bit of text about ORTE that was not meaningfully updated. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Co-authored-by: Josh Hursey <jhursey@us.ibm.com>
210 строки
6.2 KiB
Markdown
210 строки
6.2 KiB
Markdown
# Open MPI common monitoring module
|
|
|
|
Copyright (c) 2013-2015 The University of Tennessee and The University
|
|
of Tennessee Research Foundation. All rights
|
|
reserved.
|
|
Copyright (c) 2013-2015 Inria. All rights reserved.
|
|
|
|
Low level communication monitoring interface in Open MPI
|
|
|
|
## Introduction
|
|
|
|
This interface traces and monitors all messages sent by MPI before
|
|
they go to the communication channels. At that levels all
|
|
communication are point-to-point communications: collectives are
|
|
already decomposed in send and receive calls.
|
|
|
|
The monitoring is stored internally by each process and output on
|
|
stderr at the end of the application (during `MPI_Finalize()`).
|
|
|
|
|
|
## Enabling the monitoring
|
|
|
|
To enable the monitoring add `--mca pml_monitoring_enable x` to the
|
|
`mpirun` command line:
|
|
|
|
* If x = 1 it monitors internal and external tags indifferently and aggregate everything.
|
|
* If x = 2 it monitors internal tags and external tags separately.
|
|
* If x = 0 the monitoring is disabled.
|
|
* Other value of x are not supported.
|
|
|
|
Internal tags are tags < 0. They are used to tag send and receive
|
|
coming from collective operations or from protocol communications
|
|
|
|
External tags are tags >=0. They are used by the application in
|
|
point-to-point communication.
|
|
|
|
Therefore, distinguishing external and internal tags help to
|
|
distinguish between point-to-point and other communication (mainly
|
|
collectives).
|
|
|
|
## Output format
|
|
|
|
The output of the monitoring looks like (with `--mca
|
|
pml_monitoring_enable 2`):
|
|
|
|
```
|
|
I 0 1 108 bytes 27 msgs sent
|
|
E 0 1 1012 bytes 30 msgs sent
|
|
E 0 2 23052 bytes 61 msgs sent
|
|
I 1 2 104 bytes 26 msgs sent
|
|
I 1 3 208 bytes 52 msgs sent
|
|
E 1 0 860 bytes 24 msgs sent
|
|
E 1 3 2552 bytes 56 msgs sent
|
|
I 2 3 104 bytes 26 msgs sent
|
|
E 2 0 22804 bytes 49 msgs sent
|
|
E 2 3 860 bytes 24 msgs sent
|
|
I 3 0 104 bytes 26 msgs sent
|
|
I 3 1 204 bytes 51 msgs sent
|
|
E 3 1 2304 bytes 44 msgs sent
|
|
E 3 2 860 bytes 24 msgs sent
|
|
```
|
|
|
|
Where:
|
|
|
|
1. the first column distinguishes internal (I) and external (E) tags.
|
|
1. the second column is the sender rank
|
|
1. the third column is the receiver rank
|
|
1. the fourth column is the number of bytes sent
|
|
1. the last column is the number of messages.
|
|
|
|
In this example process 0 as sent 27 messages to process 1 using
|
|
point-to-point call for 108 bytes and 30 messages with collectives and
|
|
protocol related communication for 1012 bytes to process 1.
|
|
|
|
If the monitoring was called with `--mca pml_monitoring_enable 1`,
|
|
everything is aggregated under the internal tags. With the e above
|
|
example, you have:
|
|
|
|
```
|
|
I 0 1 1120 bytes 57 msgs sent
|
|
I 0 2 23052 bytes 61 msgs sent
|
|
I 1 0 860 bytes 24 msgs sent
|
|
I 1 2 104 bytes 26 msgs sent
|
|
I 1 3 2760 bytes 108 msgs sent
|
|
I 2 0 22804 bytes 49 msgs sent
|
|
I 2 3 964 bytes 50 msgs sent
|
|
I 3 0 104 bytes 26 msgs sent
|
|
I 3 1 2508 bytes 95 msgs sent
|
|
I 3 2 860 bytes 24 msgs sent
|
|
```
|
|
|
|
## Monitoring phases
|
|
|
|
If one wants to monitor phases of the application, it is possible to
|
|
flush the monitoring at the application level. In this case all the
|
|
monitoring since the last flush is stored by every process in a file.
|
|
|
|
An example of how to flush such monitoring is given in
|
|
`test/monitoring/monitoring_test.c`.
|
|
|
|
Moreover, all the different flushed phased are aggregated at runtime
|
|
and output at the end of the application as described above.
|
|
|
|
## Example
|
|
|
|
A working example is given in `test/monitoring/monitoring_test.c` It
|
|
features, `MPI_COMM_WORLD` monitoring , sub-communicator monitoring,
|
|
collective and point-to-point communication monitoring and phases
|
|
monitoring
|
|
|
|
To compile:
|
|
|
|
```
|
|
shell$ make monitoring_test
|
|
```
|
|
|
|
## Helper scripts
|
|
|
|
Two perl scripts are provided in test/monitoring:
|
|
|
|
1. `aggregate_profile.pl` is for aggregating monitoring phases of
|
|
different processes This script aggregates the profiles generated by
|
|
the `flush_monitoring` function.
|
|
|
|
The files need to be in in given format: `name_<phase_id>_<process_id>`
|
|
They are then aggregated by phases.
|
|
If one needs the profile of all the phases he can concatenate the different files,
|
|
or use the output of the monitoring system done at `MPI_Finalize`
|
|
in the example it should be call as:
|
|
```
|
|
./aggregate_profile.pl prof/phase to generate
|
|
prof/phase_1.prof
|
|
prof/phase_2.prof
|
|
```
|
|
|
|
1. `profile2mat.pl` is for transforming a the monitoring output into a
|
|
communication matrix. Take a profile file and aggregates all the
|
|
recorded communicator into matrices. It generated a matrices for
|
|
the number of messages, (msg), for the total bytes transmitted
|
|
(size) and the average number of bytes per messages (avg)
|
|
|
|
The output matrix is symmetric.
|
|
|
|
For instance, the provided examples store phases output in `./prof`:
|
|
|
|
```
|
|
shell$ mpirun -np 4 --mca pml_monitoring_enable 2 ./monitoring_test
|
|
```
|
|
|
|
Should provide the following output:
|
|
|
|
```
|
|
Proc 3 flushing monitoring to: ./prof/phase_1_3.prof
|
|
Proc 0 flushing monitoring to: ./prof/phase_1_0.prof
|
|
Proc 2 flushing monitoring to: ./prof/phase_1_2.prof
|
|
Proc 1 flushing monitoring to: ./prof/phase_1_1.prof
|
|
Proc 1 flushing monitoring to: ./prof/phase_2_1.prof
|
|
Proc 3 flushing monitoring to: ./prof/phase_2_3.prof
|
|
Proc 0 flushing monitoring to: ./prof/phase_2_0.prof
|
|
Proc 2 flushing monitoring to: ./prof/phase_2_2.prof
|
|
I 2 3 104 bytes 26 msgs sent
|
|
E 2 0 22804 bytes 49 msgs sent
|
|
E 2 3 860 bytes 24 msgs sent
|
|
I 3 0 104 bytes 26 msgs sent
|
|
I 3 1 204 bytes 51 msgs sent
|
|
E 3 1 2304 bytes 44 msgs sent
|
|
E 3 2 860 bytes 24 msgs sent
|
|
I 0 1 108 bytes 27 msgs sent
|
|
E 0 1 1012 bytes 30 msgs sent
|
|
E 0 2 23052 bytes 61 msgs sent
|
|
I 1 2 104 bytes 26 msgs sent
|
|
I 1 3 208 bytes 52 msgs sent
|
|
E 1 0 860 bytes 24 msgs sent
|
|
E 1 3 2552 bytes 56 msgs sent
|
|
```
|
|
|
|
You can then parse the phases with:
|
|
|
|
```
|
|
shell$ /aggregate_profile.pl prof/phase
|
|
Building prof/phase_1.prof
|
|
Building prof/phase_2.prof
|
|
```
|
|
|
|
And you can build the different communication matrices of phase 1
|
|
with:
|
|
|
|
```
|
|
shell$ ./profile2mat.pl prof/phase_1.prof
|
|
prof/phase_1.prof -> all
|
|
prof/phase_1_size_all.mat
|
|
prof/phase_1_msg_all.mat
|
|
prof/phase_1_avg_all.mat
|
|
|
|
prof/phase_1.prof -> external
|
|
prof/phase_1_size_external.mat
|
|
prof/phase_1_msg_external.mat
|
|
prof/phase_1_avg_external.mat
|
|
|
|
prof/phase_1.prof -> internal
|
|
prof/phase_1_size_internal.mat
|
|
prof/phase_1_msg_internal.mat
|
|
prof/phase_1_avg_internal.mat
|
|
```
|
|
|
|
## Authors
|
|
|
|
Designed by George Bosilca <bosilca@icl.utk.edu> and
|
|
Emmanuel Jeannot <emmanuel.jeannot@inria.fr>
|