Since we intend to provide cross version compatibility
between versions with the same major and minor, use
MAJOR.MINOR.0 instead of orte_version_string
(e.g. MAJOR.MINOR.RELEASEGREEK).
Open MPI 4.0.0 has already been released, so in order to make
it compatible with future 4.0.x releases, we have to use 4.0.0
as the version string, that is why we use MAJOR.MINOR.0 instead
of MAJOR.MINOR
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Example:
For the list of hosts `a01,b00,a00` a regex is generated:
`a[2:1.0],b[2:0]`, where `a`-hosts prefixes moved to the begining,
it breaks the hosts ordering.
This commit fixes regex for that case to `a[2:1],b[2:0],a[2:0]`
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 46e38b9193)
Example:
For the nodelist `jjss,jjss0000001,jjss0000003,jjss0000002` a regular
expression was `jjss[0:0],jjss[7:1,3,2]` that led to incorrect unpacking
the first host as `jjs0`. This commit fixes an adding empty range for
not numeric hostnames. Here is the fixed regex for this exapmle:
`jjss,jjss[7:1,3,2]`
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 1967e41a71)
Needed to apply commit from PR #5778 to get this commit
from PR #6238 to apply cleanly.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit b19e5edf76)
Correctly transfer job-level mapping directives for dynamically spawned
jobs to the mapping system.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 45f23ca5c9)
This is a cherry-pick of master (2820aef). The propagation is intended to resolve issue #6130
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
Causes the MCA param to be ignored, while the cmd line option still
works.
Thanks to @iassiour for the report!
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Follow on to 430c659908: clarify the help message and fix one typo.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit e9bf318dcb)
Update the show_help message for when there are not enough slots to
run an application.
Also, remove a bunch of copies of this message in various show_help
text files that aren't used/referred to anywhere in the code.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 430c659908)
If we detect that we are being debugged by an MPIR-based debugger, then
print a warning that OMPI's MPIR support has been deprecated and will be
removed in a subsequent release.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 2cb271716b)
Thanks to @hjelmn for debugging it and providing the patch
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit efa8bcc17078c89f1c9d6aabed35c90973a469bf)
(cherry picked from commit 647a760b7e)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
When a job terminates normally but with a non zero exit code,
display the error message to stderr.
Thanks Emre Brookes for the bug report.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit open-mpi/ompi@893270caee)
This commit removes some code that protected the odls/alps component
from closing alps file descriptors. For some unknown reason leaving
these file descriptors open causes can cause an orted to hang when
launching apps.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 98172163e6)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Since version hwloc 2.0.0 has a new organization of NUMA nodes on the
topology tree. This commit adds the detection of local NUMA object for
hwloc => 2.0.0, which fixes the procs bindings policy for rmaps mindist
component.
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit e5291ccc34)
In some scenarios, we can have a daemon sharing the node with mpirun. In
those cases, we need to avoid race conditions in cleanup
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 8d1be27a1e)
Per suggestion by @bangerth, allow mpirun to execute as root if two
envars are set to specific values
Per conversation with @jsquyres, name the envars OMPI_ALLOW_RUN_AS_ROOT
and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM
Fixes#4451
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 7f1444d5f9)
Things got a little out of whack and we weren't actually processing the map-by modifiers, plus an error crept into the display of the binding report. So clean those up.
Thanks to @tonyreina for the error report
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit bcdb1f45ac)
Flag that we provided a notification and ignore it if it attempts to come back up.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit ea0d70bc9396def61545e2ce492a55c4c3aa7772)
Do not have child jobs inherit launch directives unless requested to do so. This affects the map-by, rank-by, bind-to, npernode, pernode, npersocket, persocket, and cpus-per-rank directives. Values provided in the spawn call always take precedence - if a particular value isn't specified, then the ORTE defaults will be used if inheritance is not requested, and the values specified by MCA param will be used if inheritance is set.
Always inherit oversubscribe for now as otherwise MTT will break
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Since the new binding option is tied to the --cpu-list orterun CLI
option, make the --bind-to option reflect the same name (vs. the
--cpu-set CLI option, which is entirely different). For example:
mpirun --bind-to cpu-list:ordered ...
Note that "--bind-to cpulist:ordered" is accepted as a synonym,
because people will be lazy.
Also add some minor updates to the orterun.1in man page for
clarification.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Allow users to request that procs be bound to a cpu in a given cpu-list based on their corresponding local rank
Signed-off-by: Ralph Castain <rhc@open-mpi.org>