1
1
Граф коммитов

21 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
28abe78f8c Add new platform files. Modify scaling.pl to support ppn option
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-29 15:55:49 -08:00
Ralph Castain
6509f60929 Complete the memprobe support. This provides a new scaling tool called "mpi_memprobe" that samples the memory footprint of the local daemon and the client procs, and then reports the results. The output contains the footprint of the daemon on each node, plus the average footprint of the client procs on that node.
Samples are taken after MPI_Init, and then again after MPI_Barrier. This allows the user to see memory consumption caused by add_procs, as well as any modex contribution from forming connections if pmix_base_async_modex is given.

Using the probe simply involves executing it via mpirun, with however many copies you want per node. Example:

$ mpirun -npernode 2 ./mpi_memprobe
Sampling memory usage after MPI_Init
Data for node rhc001
	Daemon: 12.483398
	Client: 6.514648

Data for node rhc002
	Daemon: 11.865234
	Client: 4.643555

Sampling memory usage after MPI_Barrier
Data for node rhc001
	Daemon: 12.520508
	Client: 6.576660

Data for node rhc002
	Daemon: 11.879883
	Client: 4.703125

Note that the client value on node rhc001 is larger - this is where rank=0 is housed, and apparently it gets a larger footprint for some reason.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-05 10:32:17 -08:00
Ralph Castain
f355fb926d Continue cleanup of notifications. Resolve a race condition that can result in attempt to send a message on a closed socket
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-04 09:16:33 -08:00
Ralph Castain
9eab9a1ed3 Remove stale global variables
Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers.

Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation).

Begin first cut at memory profiler

Some minor cleanups of memprobe

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-02 14:04:24 -08:00
Jeff Squyres
1187212f5d scaling.pl: minor change to perl quoting
Makes emacs syntax hilighting work better.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-08 09:25:08 -08:00
Ralph Castain
d5a428b646 Scaling test should only launch one proc/node
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-08 09:24:22 -08:00
Ralph Castain
af9a55ccf1 Fix the session directory cleanup - only remove the jobfam session dir level if we are the local daemon and are cleaning up our own session directory.
Update the scaling test to run more trials and report the options being tested each time

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-03 09:59:18 -08:00
Ralph Castain
b11c9574d4 Remove debug and update copyright 2016-10-11 23:28:16 -07:00
Ralph Castain
a2326e3ba0 Update the scaling test to properly use orterun for orte-dvm tests, and extend by adding params for async mpi init/finalize 2016-10-11 23:24:52 -07:00
Ralph Castain
84eb21d6bf Update the script to properly run on the Cray. Add rawout option to retain the raw timing output in case the formats don't match 2015-11-12 12:11:17 -08:00
Ralph Castain
1607daeb10 Update the scaling script to output data into a CSV file for easy import into Excel 2015-11-11 13:29:37 -08:00
Ralph Castain
efbea40a8b Minor typo for slurm scaling test support, add aprun for use on Cray 2015-11-11 13:29:37 -08:00
Ralph Castain
187fa9b131 Extend the scaling test script to support multiple starters, including mpirun, orterun (if mpirun not present), orte-dvm, and srun. Auto-detect which are p
resent and allow the user to run all of them. Auto-detect the number of nodes in the allocation.
2015-11-08 11:34:06 -08:00
Ralph Castain
73c8c30c5d Update the scaling.pl test script to support orte-dvm and srun 2015-11-07 13:13:36 -08:00
Ralph Castain
18c5cb48ff Update the scaling test script 2015-11-06 21:51:40 -08:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Ralph Castain
8ebf235a56 Use preconnect as a better test of startup scaling than barrier
This commit was SVN r26530.
2012-06-01 02:35:15 +00:00
Ralph Castain
978897ade2 Little more cleanup, working now
This commit was SVN r26522.
2012-05-29 21:33:06 +00:00
Ralph Castain
e4d80001dc Little cleanup to handle the Mac
This commit was SVN r26513.
2012-05-29 18:21:47 +00:00
Ralph Castain
197f923ce3 Update scaling script
This commit was SVN r26510.
2012-05-29 17:41:38 +00:00
Ralph Castain
3068438022 Add scaling tests and script
This commit was SVN r26509.
2012-05-29 15:21:44 +00:00