1
1

435 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
ac4fcd3f97 Ensure that oob/base level data is always accessed in the oob/base event thread. Make debruijn the default routed component
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-22 10:33:32 -08:00
Ralph Castain
466cbd4d29 Rework the threading in oob/tcp so that daemons (including mpirun) use multiple progress threads to get messages out to their children, and so that the oob/base uses a separate one to setup sends. This allows the daemon cmd processor to execute in parallel with relay of messages, which significantly reduces launch times at scale
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-21 13:26:19 -08:00
Ralph Castain
668421b6ec Compress the xcast message if bigger than a defined size to further improve launch performance at scale
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-19 22:08:02 -08:00
Ralph Castain
e5f687f896 Speed-up the OOB/TCP communications by using writev instead of writing the header, and then separately write the body
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-19 13:03:44 -08:00
Gilles Gouaillardet
24c61b0625 oob/tcp: plug a memory leak in mca_oob_tcp_component_lost_connection()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Ralph Castain
dd491db21f Fix IOF when outputing to files - the remote orteds were failing to output stdout/err from their procs.
Silence a warning in orted_submit

Protect against a free'd value in an error path when forming oob tcp connections

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-01 14:12:47 -08:00
Artem Polyakov
58300afff2 orte/oob/tcp: Plug the memory leak.
Plug coverity defect CID 1396541.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2016-12-01 06:48:25 +07:00
Ralph Castain
30ff8be9c9 Silence minor warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-29 08:33:22 -08:00
Artem Polyakov
ada93e0c02 orte/oob/tcp: Fix message dropping in case of concurrent connection.
The problem was observed for direct modex used with recursive doubling
algorithm (used for collective ID calculation prior to d52a2d081e9598a9ac9a50fb4b013a6d2a72375b)
that has pairwise nature and counter-connections are highly likely.

The following scenario was uncovering the issue:
* ranks `x` and `y` want to communicate with each other, `x` < `y`;
* rank `x` initiates the connection and sends the ack;
* rank `y` starts to `connect()` and gets the ack from `x`;
* `y` identifies that it already started connecting and `y` > `x` so it rejects incoming connection.
* `x` sees that his connection was rejected in `mca_oob_tcp_peer_recv_connect_ack()` when trying to
read the message header using `tcp_peer_recv_blocking()` which calls `mca_oob_tcp_peer_close()`
that effectively flushes all the messages in the peer->send_queue.
* `y` send the ack to `x` and the connection is established, however all the messages for the peer
at `x` are vanished (except the front one in peer->send_msg).

This commit introduces a "nack" function that will be used at `y` side to tell `x` that `y` has the
priority and `x`'s connection should be closed. This allows to avoid "guessing" on the unexpectedly
closed connection.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2016-11-27 04:58:34 +07:00
Ralph Castain
188880be3f Since static ports are only used by ORTE if the runtime option is given,
there is no need for a configure option as well - so remove the
--enable-orte-static-ports configure option. When decoding the daemon
nidmap, mark new daemons as ALIVE by default - we will discover dead
ones as we go.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-04 05:01:42 -07:00
Gilles Gouaillardet
da0c873e14 oob/tcp: enhance debugging output
display the hop node used to send a message
(if the message is sent directly, then the hop is the destination)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-11-04 14:16:06 +09:00
Gilles Gouaillardet
30298cc83c oob/tcp: remove debug that should have never been commited
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-31 16:41:14 +09:00
Gilles Gouaillardet
75e96004a4 oob/tcp: fix a typo in mca_oob_tcp_component_no_route()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-31 16:30:24 +09:00
Gilles Gouaillardet
3d4285b04d oob/tcp: silence valgrind warning
fully initialize allocated memory to keep valgrind happy

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-27 17:12:46 +09:00
Ralph Castain
649301a3a2 Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier).
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
2016-10-23 21:52:39 -07:00
Ralph Castain
a2919174d0 Bring the RML modifications across. This is the first step in a revamp of the ORTE messaging subsystem to support fabric-based communications during launch and wireup phases. When completed, the grpcomm and plm frameworks will each have their own "conduit" for communication - each conduit corresponds to a particular RML messaging transport. This can be the active OOB-based component, or a provider from within the RML/OFI component. Messages sent down the conduit will flow across the associated transport.
Multiple conduits can exist at the same time, and can even point to the same base transport. Each conduit can have its own characteristics (e.g., flow control) based on the info keys provided to the "open_conduit" call. For ease during the transition period, the "legacy" RML interfaces remain as wrappers over the new conduit-based APIs using a default conduit opened during orte_init - this default conduit is tied to the OOB framework so that current behaviors are preserved. Once the transition has been completed, a one-time cleanup will be done to update all RML calls to the new APIs and the "legacy" interfaces will be deleted.

While we are at it: Remove oob/usock component to eliminate the TMPDIR length problem - get all working, including oob_stress
2016-10-11 16:01:02 -07:00
Gilles Gouaillardet
c92e9a5406 use the new OPAL_HASH_TABLE_FOREACH convenience macro 2016-10-08 16:58:20 +09:00
Ralph Castain
de7b1494d9 Clean out old cruft from the ORCM project 2016-09-21 00:13:30 -07:00
Gilles Gouaillardet
e84b35217f oob/tcp: plug a memory leak
as reported by Coverity with CID 1196711
2016-09-08 18:50:18 +09:00
Ralph Castain
f85dcaee2a Fixes CID 1369067 and CID 1196684
Fixes CID 1369648

    Fixes CID 1372409
2016-09-06 08:43:15 -07:00
Ralph Castain
a4c8e8c28a Cleanup the proposed change:
* qos framework is moving to the scon layer and is no longer required in ORTE

* remove the rml/ftrm component as we now have multiple active components, and so the wrapper needs to be rethought

* no need for separating the "base" from "API" module definition. The two are identical

* move the "stub" functions into their own file for cleanliness

* general cleanup to meet coding standards

* cleanup some logic in the stubs
2016-03-10 13:14:17 -08:00
Ralph Castain
351070659e Correct ordering when checking for privileged ports 2016-02-14 09:43:01 -08:00
Ralph Castain
233bd085ca Protect against a non-privileged port connecting to us when we are running as root
Don't close the listener socket upon error unless we are giving up

Cleanup the incoming socket
2016-02-13 08:07:27 -08:00
Ralph Castain
0a6b8d2c14 Correctly handle connection terminations during finalize so mpirun doesn't hang. Cleanup some corner cases in the error notification system 2015-12-30 07:16:43 -08:00
Ralph Castain
1cdc1c121c Revert "Standardize the handling of shutdown in the OOB TCP component"
This reverts commit open-mpi/ompi@12dccaa911.
2015-12-30 07:05:40 -08:00
Ralph Castain
12dccaa911 Standardize the handling of shutdown in the OOB TCP component 2015-12-29 07:57:22 -08:00
Federico Reghenzani
6536a6a9f5 oob_tcp: fix peer->state wrong check 2015-10-29 16:43:58 +01:00
John Westlund
044fea8df7 re-order != comparison, OBJ_RELEASE mca_oob_tcp_addr_t on failure 2015-10-02 15:59:48 -07:00
John Westlund
6bfaa925ec simplify use of sockaddr* structs to work around buffer overflow warning 2015-10-02 14:26:52 -07:00
Ralph Castain
1b7930ad52 Silence some warnings and address Coverity issues 2015-09-16 07:58:22 -07:00
Ralph Castain
0d5814b5ca Cleanup Coverity issues 2015-08-29 21:19:27 -07:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
Ralph Castain
89c80b2294 Only start a listener for processes that will actually receive connection requests. Tools such as orte-submit always initiate connections and thus do not need to start a listener. 2015-08-27 16:41:00 -07:00
Nathan Hjelm
156ce6af21 periodic whitespace purge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 09:32:33 -06:00
Ralph Castain
023936e84b Silence coverity warnings 2015-07-29 07:28:08 -07:00
Gilles Gouaillardet
429bdf1af7 oob/tcp: fix a race condition when finalizing the oob/tcp component 2015-07-28 09:16:13 +09:00
Ralph Castain
4352123c26 Protect the oob/tcp component from port scanners 2015-06-26 01:40:57 -07:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Ralph Castain
bc7a7f3de5 Fix abnormal shutdown when a node dies 2015-05-22 17:29:06 -07:00
Jeff Squyres
3069daa015 oob_tcp_listener: slightly refactor EAGAIN/EWOULDBLOCK
Have only a single level of "if" conditionals.  Also, slightly change
the logic such that we only die/break out of the loop if we get EMFILE
-- all other errors are ok to go on to the next fd.

Finally, use a real show_help() message to warn when other errors occur.
2015-05-20 21:10:11 -04:00
Jeff Squyres
e43c8dc291 oob tcp: label a few #endif's
Only bother labeling the ones that are a little far away from their
corresponding #if statements.
2015-05-20 21:10:11 -04:00
Jeff Squyres
4b2f0d4827 oob tcp: reset MCA params from level 9
Set various MCA param levels
2015-05-20 21:10:11 -04:00
Jeff Squyres
1a4c9960e1 oob tcp: set KEEPALIVE timeout 60s, retry interval 5s
The timeout is frequency at which to send keepalive pings; the retry
interval is how often to send successive pings once a keepalive has
not replied.

Also update comments and MCA param help strings.

60 seconds -- squashme
2015-05-20 21:08:37 -04:00
Jeff Squyres
c95215dfc2 oob_tcp: do not set KEEPALIVE on listening sockets 2015-05-20 17:28:45 -04:00
Jeff Squyres
32d81af35f oob tcp: re-enable keepalive option for Mac
Plus very minor #if/#endif reduction.
2015-05-20 17:28:45 -04:00
Ralph Castain
d3d3e73099 Per request from George, use defined(__APPLE__) instead of OPAL_HAVE_MAC. Don't try to close a negative socket 2015-05-15 07:13:42 -06:00
Ralph Castain
0a345d34e6 Plug the memory leak identified by George 2015-05-14 21:33:48 -06:00
Ralph Castain
8e30579e6e The Mac appears to have problems with the keepalive support - once keepalive starts, the memory footprint soars. So disable keepalive on the Mac 2015-05-14 18:09:13 -06:00
Ralph Castain
3cee4152fc Fix the intercommunictor issue reported by Gilles. Instead of directly checking the reachability bitmap, ask the component if the proc is reachable when doing a send as the component is the final arbiter in such cases. Recirculate any messages that a daemon is trying to send to void race conditions. Cleanup listener sockets so we don't leak them 2015-05-11 09:16:25 -07:00
Ralph Castain
b5382c9bf9 Rework the OOB selection logic to allow a component (e.g., usock) to direct that it be the sole active component. Remove prior disqualifying code in the oob/tcp component as it was too restrictive - if usock wasn't able to run, it left apps with no way to communicate to their daemon. Have the local daemon check the global modex for the RML URI info of the local procs so it can route messages between them when tcp is the primary channel.
A few other minor cleanups included.
2015-05-08 11:15:21 -07:00