Ralph Castain
666386fc19
Merge pull request #3294 from rhc54/topic/modx
...
Enable SLURM on Cray with constraints and fix bug in nidmap
2017-04-06 09:55:07 -07:00
Ralph Castain
a29ca2bb0d
Enable slurm operations on Cray with constraints
...
Cleanup some errors in the nidmap code that caused us to send unnecessary topologies
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-06 08:58:06 -07:00
Nadia Derbey
f918d88c3e
Fix yalla PML: Update previous commit after Yossofe's review
...
Signed-off-by: Nadia Derbey <Nadia.Derbey@atos.net>
2017-04-06 07:58:26 +02:00
Brian Barrett
d7f283cbce
README: Update supported platform list
...
Per discussion at last developer's forum, platforms
not actively being tested (either in Jenkins or
at least weekly in MTT) are not eligible to be listed
as supported platforms. Move a number of systems out
of the supported list.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2017-04-05 17:25:01 -07:00
Nathaniel Graham
36d660e07a
Add parsable option to help arguments
...
This commit adds a "parsable" option to the help
arguments, which prints out a machine readable
list of all the mpirun options.
Fixes #3279
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2017-04-05 17:01:43 -06:00
Ralph Castain
bf668ad1e9
Merge pull request #3287 from rhc54/topic/ht
...
Provide further (hopefully) helpful messages about the hotel size
2017-04-05 07:26:03 -07:00
Ralph Castain
db8943cedd
Provide further (hopefully) helpful messages about the hotel size
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-05 04:27:32 -07:00
Ralph Castain
840d6c9a1d
Merge pull request #3284 from rhc54/topic/hotel
...
Resolve the direct modex race condition.
2017-04-05 03:09:27 -07:00
Gilles Gouaillardet
8d1369db49
Merge pull request #3283 from ggouaillardet/topic/nvml
...
build nvml support only with CUDA support
2017-04-05 16:14:23 +09:00
Gilles Gouaillardet
f3581c8259
coll/base: have alltoallv send/recv zero-bytes messages
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-05 13:44:17 +09:00
Gilles Gouaillardet
5492edd71e
coll/base: have ompi_coll_base_sendrecv() send/recv zero-bytes messages
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-05 13:44:05 +09:00
Ralph Castain
b7e9711f45
Resolve the direct modex race condition. The request hotel was running out of rooms, thereby returning an error upon checkin - and we had missed error_logging a couple of those places. Hence no error message and things just hung.
...
Output a (hopefully) helpful message when we timeout an operation
Thanks to Nathan for tracking it down.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-04 21:32:44 -07:00
Ralph Castain
9a69b20d09
Merge pull request #3282 from rhc54/topic/direct
...
Set the PARENT vpid for direct routed module
2017-04-04 20:55:12 -07:00
Gilles Gouaillardet
10ea991d0a
hwloc: add CUDA include dir to CPPFLAGS
...
so hwloc configury can find nvml.h when CUDA support is built
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-05 11:46:22 +09:00
Gilles Gouaillardet
8d7541f766
hwloc: disable nvml is CUDA support is not built in Open MPI
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-05 11:07:34 +09:00
Ralph Castain
9132bb26fe
Merge pull request #3281 from rhc54/topic/dmx
...
Adjust the timeout for direct modex requests to reflect the size of t…
2017-04-04 19:04:33 -07:00
Ralph Castain
40ca43e157
Set the PARENT vpid for direct routed module
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-04 19:03:28 -07:00
Ralph Castain
734b90aa6b
Adjust the timeout for direct modex requests to reflect the size of the job. It can take several seconds to start all the procs, and we don't want to timeout due to differences in start times of the various procs
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-04 18:20:51 -07:00
Ralph Castain
9cb18b8348
Merge pull request #3280 from rhc54/topic/dvm
...
Fix the DVM by ensuring that all nodes, even those that didn't partic…
2017-04-04 18:15:33 -07:00
Ralph Castain
74863a0ea4
Fix the DVM by ensuring that all nodes, even those that didn't participate (i.e., didn't have any local children) in a job, clean up all resources associated with that job upon its completion. With the advent of backend distributed mapping, nodes that weren't part of the job would still allocate resources on other nodes - and then start from that point when mapping the next job. This change ensures that all daemons start from the same point each time.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-04 17:31:38 -07:00
Aurelien Bouteiller
308d33dca7
Merge pull request #3266 from abouteiller/bugfix/f90mpiext
...
Fix the top_ompi_srcdir in Fortran mpiext building system
2017-04-04 16:25:30 -04:00
Nathaniel Graham
7063f3021f
Merge pull request #3231 from nrgraham23/revamp_mpirun_help
...
mpirun --help output revamp
2017-04-04 12:32:20 -06:00
Nathaniel Graham
19e5d15491
mpirun --help output revamp
...
This commit modifies the output from the mpirun --help
command. The options have been split into groups, to
make the output smaller and more readable. The groups
are: general, debug, output, input, mapping, ranking,
binding, devel, compatibility, launch, dvm, and
unsupported. There is also a special "full" command
that can be used to get the old behaviour of printing
out all of the options. Unsupported options may only
be seen with this full output.
This commit also adds a special case for the help
argument. It makes it possible for the user to
enter 0 or 1 arguments instead of having to always
enter an argument. This defaults to printing out
the "general" help options so the user can then
see what help arguments there are.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2017-04-04 10:59:32 -06:00
Ralph Castain
a605bd4265
Merge pull request #3278 from rhc54/topic/tm
...
Remove stale code line
2017-04-04 09:04:52 -07:00
Ralph Castain
393c4536eb
Remove stale code line
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-04 08:13:15 -07:00
Nathan Hjelm
1322e5dee8
Merge pull request #3274 from hjelmn/osc_rdma_fix
...
osc/rdma: fix typo in atomic code
2017-04-04 00:20:42 -06:00
Gilles Gouaillardet
5dfd4ab6ca
coll/tuned: remove set-but-not-used variables
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-04 13:18:11 +09:00
Ralph Castain
fe64144892
Merge pull request #3259 from rhc54/topic/launch
...
Update how we pass the node regex so we pass _all_ nodes, even those without daemons.
2017-04-03 20:40:48 -07:00
Ralph Castain
92c996487c
Update how we pass the node regex so we pass _all_ nodes, even those without daemons. This allows the backend daemons to form a complete picture of the allocation. Include info on which nodes have daemons on them, and populate that info on the backend as well.
...
Set the daemons' state to "running" and mark them as "alive" by default when constructing the nidmap
Get the DVM running again
Fix direct modex by eliminating race condition caused by releasing data while sending it
Up the size limit before compressing
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-03 19:25:15 -07:00
Nathan Hjelm
fad0803920
osc/rdma: fix typo in atomic code
...
Fixes #3267
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-03 15:54:28 -06:00
Ralph Castain
9850832dbd
Merge pull request #3273 from rhc54/topic/pmix2.0
...
Update to PMIx v2.0alpha
2017-04-03 11:07:08 -07:00
Aurélien Bouteiller
6ef6a3fb18
Fix the Fortran mpiext building system
...
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
2017-04-03 13:46:32 -04:00
Ralph Castain
2cc5fea8be
Update to PMIx v2.0alpha
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-03 10:02:29 -07:00
Nathan Hjelm
533a8e6dae
cma: restore --with-cma=no configure option
...
This support broke when we enabled CMA by default. Addreses the issue
raised by #3270 .
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-03 10:42:34 -06:00
Yossi
b3736701c4
Merge pull request #3239 from xinzhao3/topic/ucx-num-eps
...
Passing estimated_num_procs to UCX init in PML and SPML.
2017-04-03 18:31:27 +03:00
Ralph Castain
5f6ba81f11
Merge pull request #3263 from ggouaillardet/topic/hwloc1116
...
hwloc: update hwloc to 1.11.6
2017-03-31 08:03:04 -07:00
Gilles Gouaillardet
81062b7cd2
hwloc: update hwloc to 1.11.6
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-03-31 13:35:16 +09:00
Nadia Derbey
b6de94e449
Fix yalla PML: MPI_Recv does not return MPI_ERR_TRUNCATE upon overflow
...
Signed-off-by: Nadia Derbey <Nadia.Derbey@atos.net>
2017-03-30 15:18:31 +02:00
Jeff Squyres
7e57075f0d
Merge pull request #3248 from jsquyres/pr/remove-macosx-pkg-support
...
dist: remove OS X package script
2017-03-29 18:46:14 -04:00
Jeff Squyres
f0a8a0af51
dist: remove OS X package script
...
We stopped supporting this long ago.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-29 10:13:01 -04:00
Jeff Squyres
f980cba6b3
Merge pull request #3249 from rhc54/topic/abort
...
Use the correct callback data - the callback function was expecting a…
2017-03-28 20:54:39 -04:00
Jeff Squyres
c7c13253ea
Merge pull request #3250 from jsquyres/pr/modulefile-in-srpm
...
openmpi.spec: also put the modulefile in /opt if install_in_opt==1
2017-03-28 20:46:26 -04:00
Kevin Buckley
9e23c5e3f6
openmpi.spec: also put the modulefile in /opt if install_in_opt==1
...
Thanks to Kevin Buckley for noticing the issue and supplying the
patch.
[skip ci]
bot:notest
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-28 20:45:09 -04:00
Ralph Castain
7dd34d0c9a
Use the correct callback data - the callback function was expecting a bool*, not a pmix_ptl_sr_t*.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-28 17:21:47 -07:00
Nathan Hjelm
676cfe2a35
mca/base: accept y and n for bool and auto bool enumerator
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-03-28 09:20:14 -06:00
Xin Zhao
ee952fcccd
Passing estimated_num_procs to UCX init in PML and SPML.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-27 20:36:52 +03:00
Ralph Castain
d782542c5c
Merge pull request #3241 from rhc54/topic/cov
...
Silence coverity dead-code warning
2017-03-27 08:33:33 -07:00
Ralph Castain
71c9bc1f7e
Merge pull request #3242 from jsquyres/pr/orterun-run-as-root-minor-tweaks
...
orte: minor tweaks to run-as-root message
2017-03-27 08:29:43 -07:00
Jeff Squyres
a333cf691a
orte: minor tweaks to run-as-root message
...
Two updates:
1. Remove the "run as root" error message from orterun.c, because that
functionality is now in orted_submit.c (although it is still
required in orte-dvm.c -- so sync the message in orted_submit.c and
orte-dvm.c to be identical).
2. Slightly tweak the text of the "run as root" error message to
explicitly state that we (strongly) suggest running as a non-root
user (and add a little whitespace).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-03-27 04:50:21 -07:00
Ralph Castain
583dbe954c
Silence coverity dead-code warning
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-26 20:36:43 -07:00