Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message.
Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Repair rsh/ssh tree spawn by unpacking and updating the nidmap in remote_spawn.
Add more specific error messages so the cause of a messaging problem is a little clearer. Remove some stale code. Ensure we stop trying to send a message after a few times.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Given that we only set OOB contact info from inside of events, or before we begin threaded operations (e.g., in the ess), allow set_contact_info to directly update the oob/base framework globals.
Correct the nidmap regex decompression routine.
Ensure that rank=1 daemon always sends back its topology as this is the most common use-case.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Since the oob and connections systems do not work the same way they
did in older versions of Open MPI these operations are no longer
necessary. At best they do nothing and at worst they hurt performance
by making us enter the event library more often in opal_progress().
Fixes#2839
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a bug in the registration of the num_threads MCA
variable. The variable is of type int and was being registered as
a boolean.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Silence a warning in orted_submit
Protect against a free'd value in an error path when forming oob tcp connections
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
The problem was observed for direct modex used with recursive doubling
algorithm (used for collective ID calculation prior to d52a2d081e9598a9ac9a50fb4b013a6d2a72375b)
that has pairwise nature and counter-connections are highly likely.
The following scenario was uncovering the issue:
* ranks `x` and `y` want to communicate with each other, `x` < `y`;
* rank `x` initiates the connection and sends the ack;
* rank `y` starts to `connect()` and gets the ack from `x`;
* `y` identifies that it already started connecting and `y` > `x` so it rejects incoming connection.
* `x` sees that his connection was rejected in `mca_oob_tcp_peer_recv_connect_ack()` when trying to
read the message header using `tcp_peer_recv_blocking()` which calls `mca_oob_tcp_peer_close()`
that effectively flushes all the messages in the peer->send_queue.
* `y` send the ack to `x` and the connection is established, however all the messages for the peer
at `x` are vanished (except the front one in peer->send_msg).
This commit introduces a "nack" function that will be used at `y` side to tell `x` that `y` has the
priority and `x`'s connection should be closed. This allows to avoid "guessing" on the unexpectedly
closed connection.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
PR open-mpi/ompi#2432 introduced a regression where configure
and build with --disable-dlopn caused build failure owing
to unresolved alps lli symbols in the libopal-pal shared library.
This commit fixes this problem.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
there is no need for a configure option as well - so remove the
--enable-orte-static-ports configure option. When decoding the daemon
nidmap, mark new daemons as ALIVE by default - we will discover dead
ones as we go.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
display the hop node used to send a message
(if the message is sent directly, then the hop is the destination)
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
Multiple conduits can exist at the same time, and can even point to the same base transport. Each conduit can have its own characteristics (e.g., flow control) based on the info keys provided to the "open_conduit" call. For ease during the transition period, the "legacy" RML interfaces remain as wrappers over the new conduit-based APIs using a default conduit opened during orte_init - this default conduit is tied to the OOB framework so that current behaviors are preserved. Once the transition has been completed, a one-time cleanup will be done to update all RML calls to the new APIs and the "legacy" interfaces will be deleted.
While we are at it: Remove oob/usock component to eliminate the TMPDIR length problem - get all working, including oob_stress
so usock_peer_create_socket know it must re-create the socket
/* assuming it is ever supposed to occur */
also fix a typo (peer->sd >= 0) in usock_peer_create_socket
* qos framework is moving to the scon layer and is no longer required in ORTE
* remove the rml/ftrm component as we now have multiple active components, and so the wrapper needs to be rethought
* no need for separating the "base" from "API" module definition. The two are identical
* move the "stub" functions into their own file for cleanliness
* general cleanup to meet coding standards
* cleanup some logic in the stubs
If ibv_fork_init() has been invoked the pages are marked MADV_DONTFORK.
If we only partially use a page, any data allocated on the remainder of
the page will be inaccessible to the child process.
Fixesopen-mpi/ompi#1363