This is a cherry-pick of master (2820aef). The propagation is intended to resolve issue #6130
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
Causes the MCA param to be ignored, while the cmd line option still
works.
Thanks to @iassiour for the report!
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Follow on to 430c659908: clarify the help message and fix one typo.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit e9bf318dcb)
Update the show_help message for when there are not enough slots to
run an application.
Also, remove a bunch of copies of this message in various show_help
text files that aren't used/referred to anywhere in the code.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 430c659908)
If we detect that we are being debugged by an MPIR-based debugger, then
print a warning that OMPI's MPIR support has been deprecated and will be
removed in a subsequent release.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 2cb271716b)
Thanks to @hjelmn for debugging it and providing the patch
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit efa8bcc17078c89f1c9d6aabed35c90973a469bf)
(cherry picked from commit 647a760b7e)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
When a job terminates normally but with a non zero exit code,
display the error message to stderr.
Thanks Emre Brookes for the bug report.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit open-mpi/ompi@893270caee)
This commit removes some code that protected the odls/alps component
from closing alps file descriptors. For some unknown reason leaving
these file descriptors open causes can cause an orted to hang when
launching apps.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 98172163e6)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Since version hwloc 2.0.0 has a new organization of NUMA nodes on the
topology tree. This commit adds the detection of local NUMA object for
hwloc => 2.0.0, which fixes the procs bindings policy for rmaps mindist
component.
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit e5291ccc34)
In some scenarios, we can have a daemon sharing the node with mpirun. In
those cases, we need to avoid race conditions in cleanup
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 8d1be27a1e)
Per suggestion by @bangerth, allow mpirun to execute as root if two
envars are set to specific values
Per conversation with @jsquyres, name the envars OMPI_ALLOW_RUN_AS_ROOT
and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM
Fixes#4451
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 7f1444d5f9)
Things got a little out of whack and we weren't actually processing the map-by modifiers, plus an error crept into the display of the binding report. So clean those up.
Thanks to @tonyreina for the error report
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit bcdb1f45ac)
Flag that we provided a notification and ignore it if it attempts to come back up.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit ea0d70bc9396def61545e2ce492a55c4c3aa7772)
Do not have child jobs inherit launch directives unless requested to do so. This affects the map-by, rank-by, bind-to, npernode, pernode, npersocket, persocket, and cpus-per-rank directives. Values provided in the spawn call always take precedence - if a particular value isn't specified, then the ORTE defaults will be used if inheritance is not requested, and the values specified by MCA param will be used if inheritance is set.
Always inherit oversubscribe for now as otherwise MTT will break
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Since the new binding option is tied to the --cpu-list orterun CLI
option, make the --bind-to option reflect the same name (vs. the
--cpu-set CLI option, which is entirely different). For example:
mpirun --bind-to cpu-list:ordered ...
Note that "--bind-to cpulist:ordered" is accepted as a synonym,
because people will be lazy.
Also add some minor updates to the orterun.1in man page for
clarification.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Allow users to request that procs be bound to a cpu in a given cpu-list based on their corresponding local rank
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
A race condition exists based on whether or not the userdata object attached to a hwloc_obj_t has been initialized. These objects are setup whenever we scan for resources under that location. You therefore must not set a variable to the pointer to the userdata object and then call a function that will initialize the data in it - you need to set the variable after the function call, and protect against a NULL pointer
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
The PMIx support for "instant on" remains experimental, so disable it by default. Provide an MCA param and corresponding command line option to enable it at runtime.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
This is a minor abstraction break in naming, but hopefully acceptable for now. I will update the contents of the program a little later. This resolves the immediate issue of naming conflict with the PRRTE binary.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
This still leaves two unresolved warnings:
base/rmaps_base_binding.c:577:22: warning: variable ‘clvm’ set but not used [-Wunused-but-set-variable]
unsigned clvl=0, clvm=0;
^~~~
base/rmaps_base_binding.c:576:27: warning: variable ‘hwm’ set but not used [-Wunused-but-set-variable]
hwloc_obj_type_t hwb, hwm;
^~~
The problem is that these values are used in the OPAL_HWLOC_MAKE_OBJ_CACHE macro to form a variable name. Thus, the compiler doesn't recognize the values as being "used". I'm not entirely sure how to resolve it cleanly.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Don't bother doing a lookup upwards or downwards for the target object type.
Just use the target depth, iterate over the level until we find the min_bound
object that intersects the locale cpuset.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
This fixes a problem reported by @bgoglin where rank-by was incorrectly generating values when ranking by a type of object (e.g., socket). It also corrects the handling of the pernode, npernode, and npersocket options - these should only set the #procs and the default mapping pattern. They specifically should not prohibit the user from requesting a different mapping.
Thus, the following should be valid:
mpirun -npernode 2 --map-by socket ...
should put 2 procs on each node, mapping them by-socket on each node.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
When there is no alias for a given node, do not set the
ORTE_NODE_ALIAS attribute to an empty string any more.
Thanks Erico for reporting this issue.
Thanks Ralph for the guidance.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
An incorrectly named variable caused all pml variables to disappear
from ompi_info. This commit fixes the typo. We may add some logic into
the MCA base to catch these sorts of things in the future.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Shorten the loops as much as possible - if someone wants to further optimize, they are welcome to do so.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
While it may be faster to reverse the order of the assignment loops, it also results in the wrong answer
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Reseting the `ORTE_NODE_FLAG_MAPPED` flag after hosts filtering, this
flag is used subsequently and can be affect to the node mapping logic
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
* The `MPIR_PROCDESC` structure needs to be visible even in optimized
builds so that debuggers can attach to `mpirun` and properly read the
`MPIR_proctable`.
* In the v2.0.x and v2.x series this structure resided in the `orterun`
directory and included the `CFLAGS` fix included here. This code
moved in the v3.x series and the `CFLAGS` did not move causing this
issue.
- Instead of applying the debug `CFLAGS` globally to libopen-rte,
only apply them to the `orted_submit.c` compile which contains the
MPIR symbols.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
The current code path for PMIx_Resolve_peers and PMIx_Resolve_nodes executes a threadshift in the preg components themselves. This is done to ensure thread safety when called from the user level. However, it causes thread-stall when someone attempts to call the regex functions from _inside_ the PMIx code base should the call occur from within an event.
Accordingly, move the threadshift to the client-level functions and make the preg components just execute their algorithms. Create a new pnet/test component to verify that the prge code can be safely accessed - set that component to be selected only when the user directly specifies it. The new component will be used to validate various logical extensions during development, and can then be discarded.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 456ac7f7af3d9ba09888e3c899eb001daaa24aef)
This is a point-in-time update that includes support for several new PMIx features, mostly focused on debuggers and "instant on":
* initial prototype support for PMIx-based debuggers. For the moment, this is restricted to using the DVM. Supports direct launch of apps under debugger control, and indirect launch using prun as the intermediate launcher. Includes ability for debuggers to control the environment of both the launcher and the spawned app procs. Work continues on completing support for indirect launch
* IO forwarding for tools. Output of apps launched under tool control is directed to the tool and output there - includes support for XML formatting and output to files. Stdin can be forwarded from the tool to apps, but this hasn't been implemented in ORTE yet.
* Fabric integration for "instant on". Enable collection of network "blobs" to be delivered to network libraries on compute nodes prior to local proc spawn. Infrastructure is in place - implementation will come later.
* Harvesting and forwarding of envars. Enable network plugins to harvest envars and include them in the launch msg for setting the environment prior to local proc spawn. Currently, only OmniPath is supported. PMIx MCA params control which envars are included, and also allows envars to be excluded.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Since output-filename has been moved to a per-job attribute,
remove the orte_output_filename global variable, and stop passing
this option to orted.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Sync command line output for Slurm with RSH launcher.
Currently Slurm launch cmdline will only be visible in debug mode, while for RSH
it is enabled always.
cmdline makes sense for troubleshooting and should be enabled.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Warn that relative path will be converted to absolute path, meaning that the file system on remote nodes must be the same as on the node where mpirun is executed.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Since we now support the dynamic addition of hosts to the orte_node_pool, there is no longer any reason to require advanced specification of all possible nodes. Instead, use a precedence method to initially allocate only those hosts that were specified in the cmd line:
* rankfile, if given, as that will specify the nodes
* -host, aggregated across all app_contexts
* -hostfile, aggregated across all app_contexts
* default hostfile
* assign local node
Fix slots_inuse accounting so that the nodes are correctly reset upon error termination - e.g., when oversubscribed without permission.
Ensure we accurately track the user's specified desires for oversubscribe and no-use-local when dynamically spawning jobs.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit c9b3e68ce596a68a2ed2fbf73f211b3334b0a6a8)
Fixed the desync of job-nodelists between mpirun and orted
daemons. The issue was observed when using RSH launching because user
can provide arbitrary order of nodes regarding HNP placement.
The mpirun process propagate the daemon's nodelist order to nodes.
The problem was that HNP itself is assembling the nodelist based on
user provided order. As the result ranks assignment was calculated
differently on orted and mpirun.
Consider following example:
* User launches mpirun on node cn2.
* Hostlist is cn1,cn2,cn3,cn4; ppn=1
* mpirun is passing hostlist cn[2:2,1,3-4]@0(4) to orteds
So as result mpirun will assing rank 0 on cn1 while orted will assign
rank 0 on cn2 (because orted sees cn2 as the first element in the node
list)
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
When too much data is available on stdin, it might not be
forwarded immediatly to the task (write() might fail with -EAGAIN),
so when stdin is terminated, there might be some remaining data
to be pushed to the task. In this case, delay the release of the sink
so no data is discarded.
Refs open-mpi/ompi#4744
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
this option was only used by the iof/mr_hnp (aka Map/Reduce)
component that is no more part of master nor v3 branches.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Fix a problem in packing/unpacking job updates. There remains a race condition that causes messages to attempt to be sent to the second new daemon before it is completely ready. Not entirely sure where it is coming from.
Refs #4665
Rebase to master. Reset orte_nidmap_communicated if hosts are added. Check for duplicate hostnames in an add_host command. Turn off tree_spawn for dynamic launch of additional daemons.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>