openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	0ddbc75ce5	Merge pull request #4930 from kizill/fix-ipv6 fixed ipv6 OOB connection problems (fix issue #1585)	2018-06-26 09:13:53 -07:00
Jeff Squyres	0f8077ace6	oob/tcp: add show_help message about version mismatch Be more explicit about version mismatch between ORTE processes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-03-21 20:18:28 -07:00
Stanislav Kirillov	86061fbf8d	fixed ipv6 OOB connection problems Signed-off-by: Stanislav Kirillov <staskirillof@yandex.ru>	2018-03-20 16:07:53 +00:00
Ralph Castain	93cf3c7203	Update OPAL and ORTE for thread safety (I swear, if I look this over one more time, I'll puke) Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-06 12:30:57 -07:00
Ralph Castain	83199979ba	Remove the stale opal/sec framework Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-02 15:41:56 -08:00
Ralph Castain	b59ae14a2a	Fix static port and partial allocation operations Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message. Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-28 10:09:44 -08:00
Ralph Castain	184ccc8e91	Cleanup some code so it is clear that it is executing in an event. Ensure that peer event base is properly set on incoming connections Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-25 06:55:11 -08:00
Ralph Castain	466cbd4d29	Rework the threading in oob/tcp so that daemons (including mpirun) use multiple progress threads to get messages out to their children, and so that the oob/base uses a separate one to setup sends. This allows the daemon cmd processor to execute in parallel with relay of messages, which significantly reduces launch times at scale Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-21 13:26:19 -08:00
Ralph Castain	dd491db21f	Fix IOF when outputing to files - the remote orteds were failing to output stdout/err from their procs. Silence a warning in orted_submit Protect against a free'd value in an error path when forming oob tcp connections Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-01 14:12:47 -08:00
Artem Polyakov	58300afff2	orte/oob/tcp: Plug the memory leak. Plug coverity defect CID 1396541. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2016-12-01 06:48:25 +07:00
Ralph Castain	30ff8be9c9	Silence minor warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-29 08:33:22 -08:00
Artem Polyakov	ada93e0c02	orte/oob/tcp: Fix message dropping in case of concurrent connection. The problem was observed for direct modex used with recursive doubling algorithm (used for collective ID calculation prior to d52a2d081e9598a9ac9a50fb4b013a6d2a72375b) that has pairwise nature and counter-connections are highly likely. The following scenario was uncovering the issue: * ranks `x` and `y` want to communicate with each other, `x` < `y`; * rank `x` initiates the connection and sends the ack; * rank `y` starts to `connect()` and gets the ack from `x`; * `y` identifies that it already started connecting and `y` > `x` so it rejects incoming connection. * `x` sees that his connection was rejected in `mca_oob_tcp_peer_recv_connect_ack()` when trying to read the message header using `tcp_peer_recv_blocking()` which calls `mca_oob_tcp_peer_close()` that effectively flushes all the messages in the peer->send_queue. * `y` send the ack to `x` and the connection is established, however all the messages for the peer at `x` are vanished (except the front one in peer->send_msg). This commit introduces a "nack" function that will be used at `y` side to tell `x` that `y` has the priority and `x`'s connection should be closed. This allows to avoid "guessing" on the unexpectedly closed connection. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2016-11-27 04:58:34 +07:00
Gilles Gouaillardet	3d4285b04d	oob/tcp: silence valgrind warning fully initialize allocated memory to keep valgrind happy Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-10-27 17:12:46 +09:00
Ralph Castain	649301a3a2	Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier). Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.	2016-10-23 21:52:39 -07:00
Ralph Castain	0d5814b5ca	Cleanup Coverity issues	2015-08-29 21:19:27 -07:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	023936e84b	Silence coverity warnings	2015-07-29 07:28:08 -07:00
Ralph Castain	4352123c26	Protect the oob/tcp component from port scanners	2015-06-26 01:40:57 -07:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Ralph Castain	a4b466efc4	Support attempts to connect async processes by allowing the oob/tcp connection to retry the attempt to connect to a peer. Off by default, operates if someone specifies how long to wait between retry attempts.	2015-04-01 20:21:23 -07:00
Ralph Castain	d07dc362d5	Ensure we can authenticate when crossing security domains by including all available credentials, and letting the receiver use the highest priority one they have in common.	2015-03-28 20:34:26 -07:00
Ralph Castain	d2d02a1642	ckpt	2015-03-28 07:59:20 -07:00
Ralph Castain	1b24536941	Allow for different security domains. Let the initiator of the connection determine the method to be used - if the receiver cannot support it, then that's an error that will cause the connection attempt to fail.	2015-03-25 13:22:01 -07:00
Gilles Gouaillardet	a69d935d55	oob/tcp: fix misc issues as reported by Coverity with CIDs 70726, 710564, 1196630, 1269805, 1269803, 1269932	2015-03-10 19:32:01 +09:00
Ralph Castain	6c4d5a51c4	Close tcp sockets upon exec	2014-12-13 20:23:53 -08:00
Ralph Castain	780c93ee57	Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.	2014-11-11 17:00:42 -08:00
Gilles Gouaillardet	9661e4537f	oob/tcp: fix a race condition Mimick the btl/tcp protocol to solve the race condition that happens when two peers try to connect to each other at the same time cmr=v1.8.4:reviewer=rhc This commit was SVN r32799.	2014-09-26 06:54:30 +00:00
Gilles Gouaillardet	5fa2b6c59c	oob/tcp: fix a race condition Refs trac:4909 This commit was SVN r32754. The following Trac tickets were found above: Ticket 4909 --> https://svn.open-mpi.org/trac/ompi/ticket/4909	2014-09-18 08:17:25 +00:00
Ralph Castain	414f4e9783	Try to provide a real hostname for the remote host to aid in debugging Refs trac:4908 This commit was SVN r32748. The following Trac tickets were found above: Ticket 4908 --> https://svn.open-mpi.org/trac/ompi/ticket/4908	2014-09-17 00:39:49 +00:00
Jeff Squyres	9dc49c5f92	oob_tcp_connection: print "<unknown>" instead of "NULL" "NULL" doesn't meany anything to the user, and is somewhat confusing to see in an error message. "<unknown>" at least indicates that there's an error, and we know who the peer is. This commit was SVN r32747.	2014-09-16 22:47:57 +00:00
Ralph Castain	09aecea55a	Can't use show_help as the RML has already been enabled, but we haven't successfully connected back to the HNP. So use opal_output instead and hardwire the message. Refs trac:4908 This commit was SVN r32746. The following Trac tickets were found above: Ticket 4908 --> https://svn.open-mpi.org/trac/ompi/ticket/4908	2014-09-16 22:21:02 +00:00
Ralph Castain	4bbc9a28d6	Try to resolve the simultaneous connection problem by being a little more careful about the choice of returned status when a connection is refused. As before, have the higher vpid of the two peers retry the connection, while the lower one waits. This can happen in a couple of places, so try to hit them all. Since this is hard to test, will ask Gilles to give it a try since he's the one who is seeing it. cmr=v1.8.3:reviewer=rhc This commit was SVN r32744.	2014-09-16 18:59:36 +00:00
Ralph Castain	a74428513d	Provide a better help message when we are unable to complete a connection due to a firewall. cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32743.	2014-09-16 16:28:29 +00:00
Ralph Castain	fa28710d53	Track down the last piece of the connection problem. It appears that providing a netmask of 0 to opal_net_samenetwork results in everything looking like it is on the same network. Hence, we were not retaining any of the alternative addresses, so we had no other way to check them. Refs trac:4870 This commit was SVN r32556. The following Trac tickets were found above: Ticket 4870 --> https://svn.open-mpi.org/trac/ompi/ticket/4870	2014-08-20 16:55:36 +00:00
Ralph Castain	343038af7b	Frazzle-frump! Missed that we reset the peer state just before the new check. Refs trac:4870 This commit was SVN r32554. The following Trac tickets were found above: Ticket 4870 --> https://svn.open-mpi.org/trac/ompi/ticket/4870	2014-08-19 22:34:49 +00:00
Ralph Castain	0a91fdf85f	If an initial address fails to connect, record that fact and attempt the next address for that proc. If nothing succeeds, then declare failure. cmr=v1.8.2:reviewer=edgar This commit was SVN r32553.	2014-08-19 19:48:24 +00:00
Ralph Castain	e21bfeadcd	Now that the BTLs are moving down to OPAL and becoming available to ORTE, there no longer is a need/desire to push performance in the OOB/TCP component. So we don't need multiple modules driving NICs in parallel, and can drop all the complicated distribution logic. Fall back to the simplified single module model, but retain the ability to run that module in its own progress thread if so directed. This should eliminate the connectivity issues that have been reported, and will make maintenance of this component much easier. cmr=v1.8.2:reviewer=jsquyres:subject=simplify the OOB/TCP component This commit was SVN r31956.	2014-06-06 02:24:17 +00:00
Ralph Castain	c4c9bc1573	As per the RFC: http://www.open-mpi.org/community/lists/devel/2014/04/14496.php Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM This commit was SVN r31557.	2014-04-29 21:49:23 +00:00
Ralph Castain	5520d6971b	We do have to track the origin of messages sent over usock as the daemon does route them back down, and we need to get the "sender" info correct. Also do a better job of dealing with simultaneous connections to avoid binding to a used socket. Refs trac:4280 This commit was SVN r30781. The following Trac tickets were found above: Ticket 4280 --> https://svn.open-mpi.org/trac/ompi/ticket/4280	2014-02-20 17:27:05 +00:00
Ralph Castain	230336b6a8	Upgrade the security framework to avoid multiple hits against the global security server. Add support for future case where mpirun assings a global security credential for a given run, though we need to work out how to handle connect-accept from other mpirun's in that case. Remove a bunch of duplicate code in the OOB by consolidating the connection handshake code. Refs trac:4221 This commit was SVN r30554. The following Trac tickets were found above: Ticket 4221 --> https://svn.open-mpi.org/trac/ompi/ticket/4221	2014-02-04 14:47:04 +00:00
Ralph Castain	5980b7e042	Add a security framework for authenticating connections - we will add LDAP, Kerberos, and Keystone support in the next month. For now, just put a placeholder "basic" module that does the minimum. Wire the security check into ORTE's OOB handshake, and add a "version" check to ensure that both ends are from the same ORTE version. If not, report the mismatch and refuse the connection Fixes trac:4171 cmr=v1.7.5:reviewer=jsquyres:subject=Add a security framework for authenticating connections This commit was SVN r30551. The following Trac tickets were found above: Ticket 4171 --> https://svn.open-mpi.org/trac/ompi/ticket/4171	2014-02-04 01:38:45 +00:00
Ralph Castain	38f46641ce	Ensure the recv handler has been initialized Refs trac:4026 This commit was SVN r30068. The following Trac tickets were found above: Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026	2013-12-24 06:09:45 +00:00
Ralph Castain	7d8c0459a4	Attempt to debug hang that is hitting some environments. Posting to 1.7.4 as a placeholder for the eventual solution cmr=v1.7.4:reviewer=rhc This commit was SVN r30060.	2013-12-23 19:57:05 +00:00
George Bosilca	24879f9def	Code cleanup while chasing valgrind complaints. This commit was SVN r30048.	2013-12-21 23:28:14 +00:00
Ralph Castain	264150872b	Add a bunch of debug output to the OOB connection completion code so we can track down a handshake problem. Available in optimized builds as well as debug ones by setting -mca oob_base_verbose 10 No review will be required as this is just debug code for those helping us debug the 1.7.4 release candidates cmr-=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r30043.	2013-12-21 16:09:26 +00:00
Ralph Castain	39957df08e	Fixes trac:3963. Fix the tool ess procedure so it opens and selects the OOB framework, and have the OOB TCP module update the route to new connections (the routed modules know what to do). Thanks to Dave Love and Ashley Pittman for pointing out the problem. cmr=v1.7.4:reviewer=jsquyres:subject=Fix tool communications with mpirun This commit was SVN r29959. The following Trac tickets were found above: Ticket 3963 --> https://svn.open-mpi.org/trac/ompi/ticket/3963	2013-12-18 23:13:46 +00:00
Ralph Castain	2a116ecdfc	Fix a race condition created when two processes attempt to send to each other at the same time. This causes both processes to start connection procedures, resulting in a c onflict that can cause messages to be lost. Add detection of this condition, and have both processes cancel their connect operations. The process with the higher rank will reconnect, while the lower rank process will simply wait for the connection to be created. Refs trac:3696 This commit was SVN r29139. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-09-06 05:15:25 +00:00
Dave Goodell	d17f104e7a	oob: squash some valgrind warnings These warnings were harmless, but they appeared even for simple programs like single-process runs of `ring_c`. This commit was SVN r29093.	2013-08-29 21:08:44 +00:00
Ralph Castain	12d4f45b5e	Silence warning: oob_tcp_connection.c: In function 'mca_oob_tcp_peer_accept': oob_tcp_connection.c:725:9: warning: variable 'cmpval' set but not used [-Wunused-but-set-variable] Refs trac:3696 This commit was SVN r29091. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-08-29 20:56:05 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00

50 Коммитов