openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	6ec2ad5288	Fix the pmix_query API when it asks for something that returns an array of pmix_info_t. Protect the PMIX_INFO_FREE macro from NULL arrays. Update the mpi_memprobe scaling test Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-22 20:11:36 -07:00
Ralph Castain	6d6bc9bd07	Update alps module to new APIs Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-12 09:43:07 -07:00
Ralph Castain	48fc339718	Create an alternative mapping method that pushes responsibility onto the backend daemons. By default, let mpirun only pack the app_context info and send that to the backend daemons where the mapping will be done. This significantly reduces the computational time on mpirun as it isn't running up/down the topology tree computing thousands of binding locations, and it reduces the launch message to a very small number of bytes. When running -novm, fall back to the old way of doing things where mpirun computes the entire map and binding, and then sends the full info to the backend daemon. Add a new cmd line option/mca param --fwd-mpirun-port that allows mpirun to dynamically select a port, but then passes that back to all the other daemons so they will use that port as a static port for their own wireup. In this mode, we no longer "phone home" directly to mpirun, but instead use the static port to wireup at daemon start. We then use the routing tree to rollup the initial launch report, and limit the number of open sockets on mpirun's node. Update ras simulator to track the new nidmap code Cleanup some bugs in the nidmap regex code, and enhance the error message for not enough slots to include the host on which the problem is found. Update gadget platform file Initialize the range count when starting a new range Fix the no-np case in managed allocation Ensure DVM node usage gets cleaned up after each job Update scaling.pl script to use --fwd-mpirun-port. Pre-connect the daemon to its parent during launch while we are otherwise waiting for the daemon's children to send their "phone home" rollup messages Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-07 20:43:12 -08:00
Ralph Castain	28abe78f8c	Add new platform files. Modify scaling.pl to support ppn option Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-29 15:55:49 -08:00
Ralph Castain	6509f60929	Complete the memprobe support. This provides a new scaling tool called "mpi_memprobe" that samples the memory footprint of the local daemon and the client procs, and then reports the results. The output contains the footprint of the daemon on each node, plus the average footprint of the client procs on that node. Samples are taken after MPI_Init, and then again after MPI_Barrier. This allows the user to see memory consumption caused by add_procs, as well as any modex contribution from forming connections if pmix_base_async_modex is given. Using the probe simply involves executing it via mpirun, with however many copies you want per node. Example: $ mpirun -npernode 2 ./mpi_memprobe Sampling memory usage after MPI_Init Data for node rhc001 Daemon: 12.483398 Client: 6.514648 Data for node rhc002 Daemon: 11.865234 Client: 4.643555 Sampling memory usage after MPI_Barrier Data for node rhc001 Daemon: 12.520508 Client: 6.576660 Data for node rhc002 Daemon: 11.879883 Client: 4.703125 Note that the client value on node rhc001 is larger - this is where rank=0 is housed, and apparently it gets a larger footprint for some reason. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-05 10:32:17 -08:00
Ralph Castain	f355fb926d	Continue cleanup of notifications. Resolve a race condition that can result in attempt to send a message on a closed socket Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-04 09:16:33 -08:00
Ralph Castain	9eab9a1ed3	Remove stale global variables Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers. Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation). Begin first cut at memory profiler Some minor cleanups of memprobe Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-02 14:04:24 -08:00
Jeff Squyres	1187212f5d	scaling.pl: minor change to perl quoting Makes emacs syntax hilighting work better. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-12-08 09:25:08 -08:00
Ralph Castain	d5a428b646	Scaling test should only launch one proc/node Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-08 09:24:22 -08:00
Ralph Castain	af9a55ccf1	Fix the session directory cleanup - only remove the jobfam session dir level if we are the local daemon and are cleaning up our own session directory. Update the scaling test to run more trials and report the options being tested each time Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-03 09:59:18 -08:00
Ralph Castain	b11c9574d4	Remove debug and update copyright	2016-10-11 23:28:16 -07:00
Ralph Castain	a2326e3ba0	Update the scaling test to properly use orterun for orte-dvm tests, and extend by adding params for async mpi init/finalize	2016-10-11 23:24:52 -07:00
Ralph Castain	84eb21d6bf	Update the script to properly run on the Cray. Add rawout option to retain the raw timing output in case the formats don't match	2015-11-12 12:11:17 -08:00
Ralph Castain	1607daeb10	Update the scaling script to output data into a CSV file for easy import into Excel	2015-11-11 13:29:37 -08:00
Ralph Castain	efbea40a8b	Minor typo for slurm scaling test support, add aprun for use on Cray	2015-11-11 13:29:37 -08:00
Ralph Castain	187fa9b131	Extend the scaling test script to support multiple starters, including mpirun, orterun (if mpirun not present), orte-dvm, and srun. Auto-detect which are p resent and allow the user to run all of them. Auto-detect the number of nodes in the allocation.	2015-11-08 11:34:06 -08:00
Ralph Castain	73c8c30c5d	Update the scaling.pl test script to support orte-dvm and srun	2015-11-07 13:13:36 -08:00
Ralph Castain	18c5cb48ff	Update the scaling test script	2015-11-06 21:51:40 -08:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Ralph Castain	8ebf235a56	Use preconnect as a better test of startup scaling than barrier This commit was SVN r26530.	2012-06-01 02:35:15 +00:00
Ralph Castain	978897ade2	Little more cleanup, working now This commit was SVN r26522.	2012-05-29 21:33:06 +00:00
Ralph Castain	e4d80001dc	Little cleanup to handle the Mac This commit was SVN r26513.	2012-05-29 18:21:47 +00:00
Ralph Castain	197f923ce3	Update scaling script This commit was SVN r26510.	2012-05-29 17:41:38 +00:00
Ralph Castain	3068438022	Add scaling tests and script This commit was SVN r26509.	2012-05-29 15:21:44 +00:00

24 Коммитов