1
1
Граф коммитов

1138 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
6509f60929 Complete the memprobe support. This provides a new scaling tool called "mpi_memprobe" that samples the memory footprint of the local daemon and the client procs, and then reports the results. The output contains the footprint of the daemon on each node, plus the average footprint of the client procs on that node.
Samples are taken after MPI_Init, and then again after MPI_Barrier. This allows the user to see memory consumption caused by add_procs, as well as any modex contribution from forming connections if pmix_base_async_modex is given.

Using the probe simply involves executing it via mpirun, with however many copies you want per node. Example:

$ mpirun -npernode 2 ./mpi_memprobe
Sampling memory usage after MPI_Init
Data for node rhc001
	Daemon: 12.483398
	Client: 6.514648

Data for node rhc002
	Daemon: 11.865234
	Client: 4.643555

Sampling memory usage after MPI_Barrier
Data for node rhc001
	Daemon: 12.520508
	Client: 6.576660

Data for node rhc002
	Daemon: 11.879883
	Client: 4.703125

Note that the client value on node rhc001 is larger - this is where rank=0 is housed, and apparently it gets a larger footprint for some reason.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-05 10:32:17 -08:00
Ralph Castain
f355fb926d Continue cleanup of notifications. Resolve a race condition that can result in attempt to send a message on a closed socket
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-04 09:16:33 -08:00
Ralph Castain
9eab9a1ed3 Remove stale global variables
Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers.

Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation).

Begin first cut at memory profiler

Some minor cleanups of memprobe

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-02 14:04:24 -08:00
Jeff Squyres
6002a8bca5 buildrpm.sh: don't use $HOME
This is news to me: I didn't know that some distros do not set $HOME.
So use "~" instead, and only try to grep ~/.rpmmacros if it exists.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-21 07:42:32 -08:00
Jeff Squyres
5ecd271934 buildrpm.sh: minor fixes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-20 10:54:37 -08:00
Jeff Squyres
bd1828c54d Merge pull request #2451 from martinkontsek/master
master: Add arguments to rpmbuild script and update README.
2016-12-17 12:28:59 -05:00
Martin Kontsek
30d076a2f7 Add arguments to rpmbuild script and update README, implement pull request suggestions.
Signed-off-by: Martin Kontsek <mkontsek@cisco.com>
2016-12-15 11:18:41 -08:00
Jeff Squyres
a28ae984ee make-authors: we no longer require organizations
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-14 10:20:56 -08:00
Jeff Squyres
1187212f5d scaling.pl: minor change to perl quoting
Makes emacs syntax hilighting work better.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-08 09:25:08 -08:00
Ralph Castain
d5a428b646 Scaling test should only launch one proc/node
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-08 09:24:22 -08:00
Ralph Castain
144a9d267b Update the purge-tab-indents.pl script to avoid resetting permissions
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-05 09:49:58 -08:00
Ralph Castain
af9a55ccf1 Fix the session directory cleanup - only remove the jobfam session dir level if we are the local daemon and are cleaning up our own session directory.
Update the scaling test to run more trials and report the options being tested each time

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-03 09:59:18 -08:00
Ralph Castain
1e2019ce2a Revert "Update to sync with OMPI master and cleanup to build"
This reverts commit cb55c88a8b.
2016-11-22 15:03:20 -08:00
Ralph Castain
cb55c88a8b Update to sync with OMPI master and cleanup to build
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-22 14:24:54 -08:00
Ralph Castain
0c8359b0b9 Avoid adding blank lines when purging tabs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-22 09:38:37 -08:00
Ralph Castain
8ecb240955 Use quiet print for debug
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-19 11:47:27 -08:00
Ralph Castain
14b4698890 Fix executable mode 2016-11-19 11:44:19 -08:00
Ralph Castain
fb644abd1e Add a couple of helper tools to prepare git commits by removing all trailing blank lines, and replacing tabs with indents. These tools default to looking only at modified files, but can also be used to scan the entire directory tree via the --all option.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-19 11:44:19 -08:00
Jeff Squyres
06e75d65c3 nightly-tarball: update Coverity configure params
* Point to local libfabric v1.4 install
* Add MPI C++ bindings
* Remove PSM support (if someone can install PSM/PSM2 libraries on the
  build server, let's re-enable this)

Also change from -j8 to -j4 (the new AWS build instance only has 1
core / 2 hyperthreads).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-11-03 12:27:34 -04:00
Jeff Squyres
7ccf253063 Remove old/now-useless SVN integration scripts
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-11-03 12:18:14 -04:00
Jeff Squyres
a47ad865d3 create_tarball.sh: make sure to just get the git hash
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-11-02 08:57:32 -07:00
Jeff Squyres
78d1e4ebff create_tarball.sh: update snapshot filename
Nightly snapshots will now be named:

openmpi-${BRANCHNAME}-${YYYYMMDDHHMM}-${SHORTHASH}.tar.${COMPRESSION}.

Fixes #2337

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-11-01 17:09:17 -07:00
Ralph Castain
649301a3a2 Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier).
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
2016-10-23 21:52:39 -07:00
Ralph Castain
2f966bf3bf Cleanup external PMIx v3 component for copy/paste errors - component and module require unique names 2016-10-20 09:11:46 -07:00
Ralph Castain
50bb0ded70 Update the PMIx nightly scripts to generalize locations 2016-10-14 08:40:05 -07:00
Ralph Castain
b11c9574d4 Remove debug and update copyright 2016-10-11 23:28:16 -07:00
Ralph Castain
a2326e3ba0 Update the scaling test to properly use orterun for orte-dvm tests, and extend by adding params for async mpi init/finalize 2016-10-11 23:24:52 -07:00
Ralph Castain
a2919174d0 Bring the RML modifications across. This is the first step in a revamp of the ORTE messaging subsystem to support fabric-based communications during launch and wireup phases. When completed, the grpcomm and plm frameworks will each have their own "conduit" for communication - each conduit corresponds to a particular RML messaging transport. This can be the active OOB-based component, or a provider from within the RML/OFI component. Messages sent down the conduit will flow across the associated transport.
Multiple conduits can exist at the same time, and can even point to the same base transport. Each conduit can have its own characteristics (e.g., flow control) based on the info keys provided to the "open_conduit" call. For ease during the transition period, the "legacy" RML interfaces remain as wrappers over the new conduit-based APIs using a default conduit opened during orte_init - this default conduit is tied to the OOB framework so that current behaviors are preserved. Once the transition has been completed, a one-time cleanup will be done to update all RML calls to the new APIs and the "legacy" interfaces will be deleted.

While we are at it: Remove oob/usock component to eliminate the TMPDIR length problem - get all working, including oob_stress
2016-10-11 16:01:02 -07:00
Ralph Castain
5b1484a836 Implement the backend support for process-generated event notification 2016-10-08 09:24:28 -07:00
Joshua Hursey
fc3cf994db build: Custom libmpi_FOO name fix for wrapper compilers
* In open-mpi/ompi@f6f24a4f67 I missed
   updating the library references for the wrapper compilers.
 * Fixes the CXX wrapper compiler and CXX library is renamed as needed.
 * Fixes the Java wrapper compiler and the Java library is renamed as needed.
2016-09-30 16:40:56 -05:00
Ralph Castain
3acbc92efd If everyone is going to start using this script, then let's at least line up the entries 2016-09-27 09:05:05 -07:00
mpiteam
4a4b83b466 Update the nightly tarball script to point to the OMPI master repo 2016-09-20 19:38:01 -07:00
Ralph Castain
9c3ae64297 Merge branch 'master' of https://github.com/open-mpi/ompi 2016-09-16 15:49:34 -05:00
Ralph Castain
408199ce20 Fix a typo in the remove-old script that caused it to ignore all non-directory files, including the tarballs it was meant to delete 2016-09-16 15:48:24 -05:00
Ralph Castain
037020e448 Add the new v2.0.x branch to nightly tarballs 2016-09-14 16:16:26 -07:00
Jeff Squyres
17ca44b25e Merge pull request #1984 from jsquyres/pr/auto-generate-AUTHORS
Be able to auto-generate AUTHORS and preserve org affiliations
2016-08-22 15:37:22 -04:00
Ralph Castain
700ad84243 Send the pmix build results to me 2016-08-20 07:32:06 -07:00
Ralph Castain
c9dc286f25 Update the hwloc coverity submission script 2016-08-19 09:20:48 -07:00
Jeff Squyres
1ba1e9e0b7 make-authors.pl: Auto-generate the entire AUTHORS file
Update the script to auto-generate the entire AUTHORS file from two
sources:

1. The existing AUTHORS file
2. The output from "git log --format=tformat:=tformat:'%aN <%aE>'"

Merge these two together (which will preserve organization
affiliations) and warn in two cases:

1. If a person has no organization affiliation
1. If the same email address appears for more than one person

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-18 07:29:18 -05:00
Ralph Castain
be8424b691 Provide backward compatible keys so that the non-PMIx components in the opal/pmix framework don't have to adjust as we continue to work on finalizing the PMIx reference scheme. Activate and utilize the new PMIx show_help capability to provide more meaningful error output when the server cannot start.
Add a contrib script to cleanup permissions incorrectly modified due to things like smb mounts

dd
2016-08-13 12:13:04 -07:00
Ralph Castain
23886754f0 Trim the coverity build line to packages available on this machine 2016-08-10 13:55:55 -07:00
Ralph Castain
55551a4fb7 Complete debug of the nightly coverity submittal 2016-08-10 12:05:21 -07:00
Ralph Castain
375f04b277 Update the nightly builds to submit to coverity 2016-08-10 08:45:18 -07:00
Ralph Castain
3e17e2fb29 Grrr...silly lists moved to new address! 2016-08-03 21:17:01 -07:00
Ralph Castain
b61eb3a133 Update addresses for nightly build results 2016-08-03 14:25:59 -07:00
Ralph Castain
16cfbf9828 Only post tarballs if something changed 2016-08-02 21:59:18 -07:00
Ralph Castain
9cbe6af9bb Move upload cmds from outside an "if master" test so all versions get uploaded 2016-08-02 20:18:15 -07:00
Ralph Castain
e2d0cfeb3f Shift address to myself pending mailing list update 2016-08-02 20:07:23 -07:00
Jeff Squyres
f619c61366 coverity: update tool download URL
Coverity changes its tool download URL every once in a while; update
our scripts for the newest URL.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-02 18:54:19 -07:00
Ralph Castain
c4c04f6a2c Cleanup minor typo 2016-07-29 16:20:10 -07:00
Ralph Castain
2c19971048 Send hwloc nightly tarball messages to their mailing list 2016-07-29 15:53:47 -07:00
Ralph Castain
11d4002954 Update the nightly tarball scripts to support the new web site 2016-07-29 15:48:28 -07:00
Jeff Squyres
835657e700 Save Open MPI's gitdub config
README information about gitdub coming soon.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-29 10:15:53 -04:00
Jeff Squyres
48938f542c openmpi.spec: don't export FFLAGS
The Open MPI configure script has long-since only paid attention to
FCFLAGS.  Indeed, it will warn if you set FFLAGS or F77FLAGS.  So
remove them from the spec file.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-12 14:22:20 -07:00
Jeff Squyres
95ecae8688 coverity: add --enable-debug to nightly Coverity builds
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-06-08 08:01:47 -07:00
Alina Sklarevich
a2be17ec14 Revert "mellanox/optimized: set enable_openib_rdmacm_ibaddr=yes in the mellanox/optimized file."
This reverts commit 6cd7282631.
2016-06-06 11:26:11 +03:00
Jeff Squyres
cf27ec36b3 mpirun.zsh: add options to zsh shell completion
Add the following to zsh shell completion:

* --get-stack-traces
* --report-state-upon-timeout
* --timeout

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-31 16:33:46 -07:00
Nathan Hjelm
1e6b4f2f55 Merge pull request #1495 from hjelmn/new_hooks
Add new patcher memory hooks
2016-04-13 18:19:23 -06:00
Nathan Hjelm
27f8a4e806 opal: add code patcher framework
This commit adds a framework to abstract runtime code patching.
Components in the new framework can provide functions for either
patching a named function or a function pointer. The later
functionality is not being used but may provide a way to allow memory
hooks when dlopen functionality is disabled.

This commit adds two different flavors of code patching. The first is
provided by the overwrite component. This component overwrites the
first several instructions of the target function with code to jump to
the provided hook function. The hook is expected to provide the full
functionality of the hooked function.

The linux patcher component is based on the memory hooks in ucx. It
only works on linux and operates by overwriting function pointers in
the symbol table. In this case the hook is free to call the original
function using the function pointer returned by dlsym.

Both components restore the original functions when the patcher
framework closes.

Changes had to be made to support Power/PowerPC with the Linux
dynamic loader patcher. Some of the changes:

 - Move code necessary for powerpc/power support to the patcher
   base. The code is needed by both the overwrite and linux
   components.

 - Move patch structure down to base and move the patch list to
   mca_patcher_base_module_t. The structure has been modified to
   include a function pointer to the function that will unapply the
   patch. This allows the mixing of multiple different types of
   patches in the patch_list.

 - Update linux patching code to keep track of the matching between
   got entry and original (unpatched) address. This allows us to
   completely clean up the patch on finalize.

All patchers keep track of the changes they made so that they can be
reversed when the patcher framework is closed.

At this time there are bugs in the Linux dynamic loader patcher so
its priority is lower than the overwrite patcher.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:16:13 -06:00
Nathan Hjelm
b1670f844d contrib/platform: don't disable dlopen
The --enable-static gives us what we want: statically linked components.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:16:12 -06:00
Alina Sklarevich
6cd7282631 mellanox/optimized: set enable_openib_rdmacm_ibaddr=yes in the mellanox/optimized file. 2016-04-11 18:01:16 +03:00
George Bosilca
f69eba1bc4 Update the copyright and cleanup the code.
Per @jsquyres suggestion remove all trailing spaces.
Credit to `sed -i.bak 's/ *$//' */[ch]`.
2016-03-28 14:41:01 -04:00
George Bosilca
32277db6ab Add support for async progress in the BTL TCP.
All BTL-only operations (basically all data movements
with the exception of the matching operation) can now
be handled for the TCP BTL by a progress thread.
2016-03-28 14:40:50 -04:00
Nathan Hjelm
018e3ebeb6 Merge pull request #1471 from hjelmn/lanl_platform
contrib/lanl: update platform files for TOSS2
2016-03-18 15:04:34 -06:00
Ralph Castain
972026b9c1 Add the option to not make the greek tarball, only making the non-greek one 2016-03-18 12:25:20 -07:00
Nathan Hjelm
147e780fa5 contrib/lanl: update platform files for TOSS2
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-17 14:30:50 -06:00
Ralph Castain
079ea14dab Update symbol-hiding script 2015-12-21 20:49:14 -08:00
Igor Ivanov
36d3a7aa6c contrib: Add bash script to measure performance
This script is useful to measure times from launching ompi
application to different internal points. A user can easy add
it`s test basing on existing tests.
See readme information inside the script.
2015-12-14 17:42:19 +02:00
Ralph Castain
10db7ebfab Update the symbol-hiding script to capture a broader range of symbols 2015-12-04 21:05:57 -08:00
Jeff Squyres
a549db8ce2 create_tarball.sh: use "autogen.pl --force" 2015-11-20 22:42:01 -05:00
Jeff Squyres
acf94eef7e create_tarball: cleanup whitespace errors
No code / logic changes.
2015-11-20 22:40:23 -05:00
Gilles Gouaillardet
c61ef30980 autogen.pl: aborts if autogen.pl is invoked from an Open MPI tarball and without the --force option
Thanks Jeff for the wording and review
2015-11-20 13:00:55 +09:00
Ralph Castain
84eb21d6bf Update the script to properly run on the Cray. Add rawout option to retain the raw timing output in case the formats don't match 2015-11-12 12:11:17 -08:00
Gilles Gouaillardet
6ab3289582 rpm: fix openmpi.spec not to include the /usr directory
/usr cannot be included on RHEL7 like distros
2015-11-12 16:21:40 +09:00
Ralph Castain
1607daeb10 Update the scaling script to output data into a CSV file for easy import into Excel 2015-11-11 13:29:37 -08:00
Ralph Castain
efbea40a8b Minor typo for slurm scaling test support, add aprun for use on Cray 2015-11-11 13:29:37 -08:00
Ralph Castain
187fa9b131 Extend the scaling test script to support multiple starters, including mpirun, orterun (if mpirun not present), orte-dvm, and srun. Auto-detect which are p
resent and allow the user to run all of them. Auto-detect the number of nodes in the allocation.
2015-11-08 11:34:06 -08:00
Ralph Castain
f2805fb0f9 Provide a mechanism for renaming symbols in the opposite direction - i.e., #define prefix_foo[suffix] foo. 2015-11-07 18:07:09 -08:00
Ralph Castain
73c8c30c5d Update the scaling.pl test script to support orte-dvm and srun 2015-11-07 13:13:36 -08:00
Ralph Castain
1f44fef4d6 Add ability to provide a suffix to the symbol renames 2015-11-07 12:37:14 -08:00
Ralph Castain
6864a9b68a Add a new script for creating symbol hiding "rename" files 2015-11-07 12:11:54 -08:00
Ralph Castain
18c5cb48ff Update the scaling test script 2015-11-06 21:51:40 -08:00
Mike Dubman
cdffe4f92d BUILD: update mellanox platform file
add support for UCX
2015-10-21 11:39:30 +03:00
Howard Pritchard
89b9be3732 lanl/platform: fixes to pick up lustre
Fixes to lanl platform files to pick up lustre header
files, etc. for romio and ompi i/o.

Fixes #1033

Thanks to Jerome Vienne for spotting this.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-10-15 14:32:21 -05:00
Ralph Castain
c1bbbb5e2f Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds 2015-09-15 13:08:35 -07:00
Jeff Squyres
12367d8444 whitespace-purge: switch from sed to perl
The perl form is more portable.
2015-09-08 09:36:36 -07:00
Nathan Hjelm
b4fcd3897e Simplify whitespace purge script a little.
Do not need to test with -f. If directories ever show up in git
ls-files (unlikely) their mime type is application/x-directory which
will fail the test ${file::4} == text check.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 09:30:46 -06:00
Nathan Hjelm
92ec9ca2bb update whitespace purge script to skip all non-text files
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 09:24:37 -06:00
Jeff Squyres
7217f37092 nightly tarball: only save 7 days worth of nightly build trees
This should help alleviate running out of disk space on the IU build
server.
2015-08-22 03:19:44 -07:00
Ralph Castain
d1ac247e0d Update whitespace-purge script 2015-08-19 12:09:42 -07:00
Jeff Squyres
98b5551126 openmpi-nightly-tarball: update libfabric install location 2015-08-05 17:57:07 -04:00
Howard Pritchard
d4eb5addb0 contrib/lanl: update Makefile
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-07-31 14:38:48 -07:00
Howard Pritchard
13bfb2baab Merge pull request #744 from hppritcha/topic/help_lanl_admins
lanl: help out lanl admins
2015-07-24 11:48:17 -06:00
Howard Pritchard
5eccba17af lanl: help out lanl admins
LANL admins want platform files and *.conf
files so oblige them.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-07-24 08:03:52 -07:00
Jeff Squyres
ac547126ca openmpi-nightly-tarball.sh: more verbosity, fix coverity
Add --with-libfabric CLI argument to the coverity build so that usnic
support can be built.
2015-07-22 21:44:10 -04:00
Jeff Squyres
162da0e98d authors-to-cvsimport.pl: remove stale file
This script isn't needed any more.
2015-07-22 21:43:25 -04:00
Ralph Castain
67769d4a59 Update search_compare script 2015-07-17 17:36:37 -07:00
Jeff Squyres
cce57da0c4 openmpi-update-hg-svn.h: remove stale file
This file accidentaly got left over when we switched from SVN to git.
2015-07-02 12:17:02 -04:00
MPI Team
34f7e30158 openmpi-nightly-tarball: fix typo
The branch name is "v2.x", not "2.x".
2015-06-29 17:04:06 -04:00
Ralph Castain
75ceec663a Now that it has been officially released, update the embedded HWLOC to 1.11.0 2015-06-28 14:07:45 -07:00