1
1

1088 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
6509f60929 Complete the memprobe support. This provides a new scaling tool called "mpi_memprobe" that samples the memory footprint of the local daemon and the client procs, and then reports the results. The output contains the footprint of the daemon on each node, plus the average footprint of the client procs on that node.
Samples are taken after MPI_Init, and then again after MPI_Barrier. This allows the user to see memory consumption caused by add_procs, as well as any modex contribution from forming connections if pmix_base_async_modex is given.

Using the probe simply involves executing it via mpirun, with however many copies you want per node. Example:

$ mpirun -npernode 2 ./mpi_memprobe
Sampling memory usage after MPI_Init
Data for node rhc001
	Daemon: 12.483398
	Client: 6.514648

Data for node rhc002
	Daemon: 11.865234
	Client: 4.643555

Sampling memory usage after MPI_Barrier
Data for node rhc001
	Daemon: 12.520508
	Client: 6.576660

Data for node rhc002
	Daemon: 11.879883
	Client: 4.703125

Note that the client value on node rhc001 is larger - this is where rank=0 is housed, and apparently it gets a larger footprint for some reason.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-05 10:32:17 -08:00
Ralph Castain
f355fb926d Continue cleanup of notifications. Resolve a race condition that can result in attempt to send a message on a closed socket
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-04 09:16:33 -08:00
Ralph Castain
9eab9a1ed3 Remove stale global variables
Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers.

Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation).

Begin first cut at memory profiler

Some minor cleanups of memprobe

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-02 14:04:24 -08:00
Jeff Squyres
6002a8bca5 buildrpm.sh: don't use $HOME
This is news to me: I didn't know that some distros do not set $HOME.
So use "~" instead, and only try to grep ~/.rpmmacros if it exists.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-21 07:42:32 -08:00
Jeff Squyres
5ecd271934 buildrpm.sh: minor fixes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-20 10:54:37 -08:00
Jeff Squyres
bd1828c54d Merge pull request #2451 from martinkontsek/master
master: Add arguments to rpmbuild script and update README.
2016-12-17 12:28:59 -05:00
Martin Kontsek
30d076a2f7 Add arguments to rpmbuild script and update README, implement pull request suggestions.
Signed-off-by: Martin Kontsek <mkontsek@cisco.com>
2016-12-15 11:18:41 -08:00
Jeff Squyres
a28ae984ee make-authors: we no longer require organizations
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-14 10:20:56 -08:00
Jeff Squyres
1187212f5d scaling.pl: minor change to perl quoting
Makes emacs syntax hilighting work better.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-08 09:25:08 -08:00
Ralph Castain
d5a428b646 Scaling test should only launch one proc/node
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-08 09:24:22 -08:00
Ralph Castain
144a9d267b Update the purge-tab-indents.pl script to avoid resetting permissions
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-05 09:49:58 -08:00
Ralph Castain
af9a55ccf1 Fix the session directory cleanup - only remove the jobfam session dir level if we are the local daemon and are cleaning up our own session directory.
Update the scaling test to run more trials and report the options being tested each time

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-03 09:59:18 -08:00
Ralph Castain
1e2019ce2a Revert "Update to sync with OMPI master and cleanup to build"
This reverts commit cb55c88a8b7817d5891ff06a447ea190b0e77479.
2016-11-22 15:03:20 -08:00
Ralph Castain
cb55c88a8b Update to sync with OMPI master and cleanup to build
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-22 14:24:54 -08:00
Ralph Castain
0c8359b0b9 Avoid adding blank lines when purging tabs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-22 09:38:37 -08:00
Ralph Castain
8ecb240955 Use quiet print for debug
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-19 11:47:27 -08:00
Ralph Castain
14b4698890 Fix executable mode 2016-11-19 11:44:19 -08:00
Ralph Castain
fb644abd1e Add a couple of helper tools to prepare git commits by removing all trailing blank lines, and replacing tabs with indents. These tools default to looking only at modified files, but can also be used to scan the entire directory tree via the --all option.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-19 11:44:19 -08:00
Jeff Squyres
06e75d65c3 nightly-tarball: update Coverity configure params
* Point to local libfabric v1.4 install
* Add MPI C++ bindings
* Remove PSM support (if someone can install PSM/PSM2 libraries on the
  build server, let's re-enable this)

Also change from -j8 to -j4 (the new AWS build instance only has 1
core / 2 hyperthreads).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-11-03 12:27:34 -04:00
Jeff Squyres
7ccf253063 Remove old/now-useless SVN integration scripts
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-11-03 12:18:14 -04:00
Jeff Squyres
a47ad865d3 create_tarball.sh: make sure to just get the git hash
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-11-02 08:57:32 -07:00
Jeff Squyres
78d1e4ebff create_tarball.sh: update snapshot filename
Nightly snapshots will now be named:

openmpi-${BRANCHNAME}-${YYYYMMDDHHMM}-${SHORTHASH}.tar.${COMPRESSION}.

Fixes #2337

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-11-01 17:09:17 -07:00
Ralph Castain
649301a3a2 Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier).
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
2016-10-23 21:52:39 -07:00
Ralph Castain
2f966bf3bf Cleanup external PMIx v3 component for copy/paste errors - component and module require unique names 2016-10-20 09:11:46 -07:00
Ralph Castain
50bb0ded70 Update the PMIx nightly scripts to generalize locations 2016-10-14 08:40:05 -07:00
Ralph Castain
b11c9574d4 Remove debug and update copyright 2016-10-11 23:28:16 -07:00
Ralph Castain
a2326e3ba0 Update the scaling test to properly use orterun for orte-dvm tests, and extend by adding params for async mpi init/finalize 2016-10-11 23:24:52 -07:00
Ralph Castain
a2919174d0 Bring the RML modifications across. This is the first step in a revamp of the ORTE messaging subsystem to support fabric-based communications during launch and wireup phases. When completed, the grpcomm and plm frameworks will each have their own "conduit" for communication - each conduit corresponds to a particular RML messaging transport. This can be the active OOB-based component, or a provider from within the RML/OFI component. Messages sent down the conduit will flow across the associated transport.
Multiple conduits can exist at the same time, and can even point to the same base transport. Each conduit can have its own characteristics (e.g., flow control) based on the info keys provided to the "open_conduit" call. For ease during the transition period, the "legacy" RML interfaces remain as wrappers over the new conduit-based APIs using a default conduit opened during orte_init - this default conduit is tied to the OOB framework so that current behaviors are preserved. Once the transition has been completed, a one-time cleanup will be done to update all RML calls to the new APIs and the "legacy" interfaces will be deleted.

While we are at it: Remove oob/usock component to eliminate the TMPDIR length problem - get all working, including oob_stress
2016-10-11 16:01:02 -07:00
Ralph Castain
5b1484a836 Implement the backend support for process-generated event notification 2016-10-08 09:24:28 -07:00
Joshua Hursey
fc3cf994db build: Custom libmpi_FOO name fix for wrapper compilers
* In open-mpi/ompi@f6f24a4f67 I missed
   updating the library references for the wrapper compilers.
 * Fixes the CXX wrapper compiler and CXX library is renamed as needed.
 * Fixes the Java wrapper compiler and the Java library is renamed as needed.
2016-09-30 16:40:56 -05:00
Ralph Castain
3acbc92efd If everyone is going to start using this script, then let's at least line up the entries 2016-09-27 09:05:05 -07:00
mpiteam
4a4b83b466 Update the nightly tarball script to point to the OMPI master repo 2016-09-20 19:38:01 -07:00
Ralph Castain
9c3ae64297 Merge branch 'master' of https://github.com/open-mpi/ompi 2016-09-16 15:49:34 -05:00
Ralph Castain
408199ce20 Fix a typo in the remove-old script that caused it to ignore all non-directory files, including the tarballs it was meant to delete 2016-09-16 15:48:24 -05:00
Ralph Castain
037020e448 Add the new v2.0.x branch to nightly tarballs 2016-09-14 16:16:26 -07:00
Jeff Squyres
17ca44b25e Merge pull request #1984 from jsquyres/pr/auto-generate-AUTHORS
Be able to auto-generate AUTHORS and preserve org affiliations
2016-08-22 15:37:22 -04:00
Ralph Castain
700ad84243 Send the pmix build results to me 2016-08-20 07:32:06 -07:00
Ralph Castain
c9dc286f25 Update the hwloc coverity submission script 2016-08-19 09:20:48 -07:00
Jeff Squyres
1ba1e9e0b7 make-authors.pl: Auto-generate the entire AUTHORS file
Update the script to auto-generate the entire AUTHORS file from two
sources:

1. The existing AUTHORS file
2. The output from "git log --format=tformat:=tformat:'%aN <%aE>'"

Merge these two together (which will preserve organization
affiliations) and warn in two cases:

1. If a person has no organization affiliation
1. If the same email address appears for more than one person

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-18 07:29:18 -05:00
Ralph Castain
be8424b691 Provide backward compatible keys so that the non-PMIx components in the opal/pmix framework don't have to adjust as we continue to work on finalizing the PMIx reference scheme. Activate and utilize the new PMIx show_help capability to provide more meaningful error output when the server cannot start.
Add a contrib script to cleanup permissions incorrectly modified due to things like smb mounts

dd
2016-08-13 12:13:04 -07:00
Ralph Castain
23886754f0 Trim the coverity build line to packages available on this machine 2016-08-10 13:55:55 -07:00
Ralph Castain
55551a4fb7 Complete debug of the nightly coverity submittal 2016-08-10 12:05:21 -07:00
Ralph Castain
375f04b277 Update the nightly builds to submit to coverity 2016-08-10 08:45:18 -07:00
Ralph Castain
3e17e2fb29 Grrr...silly lists moved to new address! 2016-08-03 21:17:01 -07:00
Ralph Castain
b61eb3a133 Update addresses for nightly build results 2016-08-03 14:25:59 -07:00
Ralph Castain
16cfbf9828 Only post tarballs if something changed 2016-08-02 21:59:18 -07:00
Ralph Castain
9cbe6af9bb Move upload cmds from outside an "if master" test so all versions get uploaded 2016-08-02 20:18:15 -07:00
Ralph Castain
e2d0cfeb3f Shift address to myself pending mailing list update 2016-08-02 20:07:23 -07:00
Jeff Squyres
f619c61366 coverity: update tool download URL
Coverity changes its tool download URL every once in a while; update
our scripts for the newest URL.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-02 18:54:19 -07:00
Ralph Castain
c4c04f6a2c Cleanup minor typo 2016-07-29 16:20:10 -07:00