openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	4d186e6402	Properly protect the MCA parameters being registered by the OOB/TCP component when IPv6 is enabled cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32662.	2014-09-02 14:53:00 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Ralph Castain	5db717f090	Some small leak cleanups cmr=v1.8.3:reviewer=artpol This commit was SVN r32358.	2014-07-30 15:46:02 +00:00
Adrian Reber	4aca7095dc	fix a syntax error in the FT code This commit was SVN r32087.	2014-06-25 20:35:50 +00:00
Ralph Castain	e21bfeadcd	Now that the BTLs are moving down to OPAL and becoming available to ORTE, there no longer is a need/desire to push performance in the OOB/TCP component. So we don't need multiple modules driving NICs in parallel, and can drop all the complicated distribution logic. Fall back to the simplified single module model, but retain the ability to run that module in its own progress thread if so directed. This should eliminate the connectivity issues that have been reported, and will make maintenance of this component much easier. cmr=v1.8.2:reviewer=jsquyres:subject=simplify the OOB/TCP component This commit was SVN r31956.	2014-06-06 02:24:17 +00:00
Ralph Castain	7df500ecf5	Break the loop caused by retrying to send a message to a hop that is unknown by the TCP oob component. We attempt to provide a way for other components to try, but need to mark that the TCP component is not able to reach that process so the OOB base will know to give up. This commit was SVN r31928.	2014-06-02 15:00:33 +00:00
Nathan Hjelm	59d09ad9de	orte: fix several small memory leaks grpcomm: fix memory leaks We were leaking the caddy object used to pass data to the callback function. This commit fixes these leaks. oob,rml: fix memory leaks This commit fixes several leaks: - Both the oob/base and oob/tcp were leaking objects on their peer hash tables. Iterate on the hash tables and free any objects. - Leaked sent messages because of missing OBJ_RELEASE. I placed the release in ORTE_RML_SEND_COMPLETE to catch all the possible paths. ess/base: close the state framework cmr=v1.8.2:reviewer=rhc This commit was SVN r31776.	2014-05-15 15:06:27 +00:00
Ralph Castain	445b552d3a	Try again to get an error message printed when a daemon fails to successfully report back to mpirun. In this case, there is no guaranteed way for the daemon to output the error report itself - we don't have a connection back to the HNP, and we have tied stderr off to /dev/null (for good reasons). So the HNP has to detect the failure itself and report it. The HNP can't know the precise reason, of course - all it knows is that the daemon failed. So output a generic error message that provides guidance on probable causes. Refs trac:4571 This commit was SVN r31589. The following Trac tickets were found above: Ticket 4571 --> https://svn.open-mpi.org/trac/ompi/ticket/4571	2014-05-01 19:48:21 +00:00
Ralph Castain	3723b39f30	Ensure we don't silently fail when unable to make a connection - bark pleasantly first. Refs trac:4571 This commit was SVN r31537. The following Trac tickets were found above: Ticket 4571 --> https://svn.open-mpi.org/trac/ompi/ticket/4571	2014-04-28 19:16:32 +00:00
Ralph Castain	d642babff6	Derived from patch provided by Artem, cleanup the "abnormal" code path for selecting TCP OOB modules to connect to a remote process. If we can't find a direct interface-to-address match, then assign all the provided addresses to the first available TCP module and let the normal failure process determine if the remote proc is truly reachable. cmr=v1.8.2:reviewer=artpol:subject=fix abnormal code connection path in tcp oob This commit was SVN r31536.	2014-04-28 19:05:14 +00:00
Ralph Castain	bbdbc5f8a8	Per suggestion from George, use a pipe for terminating the thread. Refs trac:4510 This commit was SVN r31381. The following Trac tickets were found above: Ticket 4510 --> https://svn.open-mpi.org/trac/ompi/ticket/4510	2014-04-14 01:02:46 +00:00
Ralph Castain	2d8dff837c	Ensure we properly terminate the listening thread prior to exiting, but do so in a way that doesn't make us wait for select to timeout. Refs trac:4510 This commit was SVN r31376. The following Trac tickets were found above: Ticket 4510 --> https://svn.open-mpi.org/trac/ompi/ticket/4510	2014-04-12 15:01:24 +00:00
Ralph Castain	9b30b2b783	Shave some time off of mpirun's operation by not waiting for the listener thread to terminate before exiting cmr=v1.8.1:reviewer=rhc This commit was SVN r31368.	2014-04-11 04:16:28 +00:00
Dave Goodell	5f3b81e291	oob: delete events when destroying a peer Without this patch running ring_c with the usnic BTL under valgrind will cause the orteds to segfault. Reviewed-by: Jeff Squyres <jsquyres@cisco.com> Reviewed-by: Ralph Castain <rhc@open-mpi.org> cmr=v1.7.5:reviewer=ompi-rm1.7 This commit was SVN r31161.	2014-03-19 22:15:49 +00:00
Ralph Castain	2abed09d7c	Continue to resolve priority issues. Cleanup the case of forced termination in mpirun during launch processing by ensuring we can respond to socket closures, and ensuring that the remote daemons correctly close their sockets when terminating. Jeff: please test a variety of conditions to ensure we get this right cmr=v1.7.5:reviewer=jsquyres This commit was SVN r31058.	2014-03-13 04:02:24 +00:00
Ralph Castain	a254d2db34	Silence warning when CR is not enabled This commit was SVN r31025.	2014-03-12 13:47:03 +00:00
Adrian Reber	4512b3375e	OOB/TCP: wire up the existing ft_event() function This commit was SVN r31022.	2014-03-12 12:47:20 +00:00
Ralph Castain	da4cb39683	If we can't find a route to communicate, emit an error message rather than just exiting with a non-zero status cmr=v1.7.5:reviewer=jsquyres:subject=print error if cannot communicate This commit was SVN r30922.	2014-03-04 04:57:53 +00:00
Ralph Castain	14bb7a117c	Fix bugs in the oob base - ensure we get the components in high-to-low priority, and that we correctly track reachability via all components. Adjust the priority of the tcp component to leave headroom for others Refs trac:267 This commit was SVN r30740. The following Trac tickets were found above: Ticket 267 --> https://svn.open-mpi.org/trac/ompi/ticket/267	2014-02-16 03:19:08 +00:00
Ralph Castain	3f9db36e0d	Make Jeff smile - pretty-up the indentation Refs trac:4267 This commit was SVN r30733. The following Trac tickets were found above: Ticket 4267 --> https://svn.open-mpi.org/trac/ompi/ticket/4267	2014-02-14 23:25:48 +00:00
Ralph Castain	4e1c07cbf2	If we are given a TCP oob address that doesn't match any active module, it is still possible that we could route to the address if a router is in the system. No harm in trying, so arbitrarily pick the first connection in the active module list and assign the peer to it. If that module can't reach it, we'll follow the usual failover mechanism until finally concluding that nobody can get there. cmr=v1.7.5:reviewer=jsquyres:subject=handle non-matching addresses This commit was SVN r30719.	2014-02-13 23:37:22 +00:00
Ralph Castain	fc6101b508	Handle "localhost" better Refs trac:4263 This commit was SVN r30702. The following Trac tickets were found above: Ticket 4263 --> https://svn.open-mpi.org/trac/ompi/ticket/4263	2014-02-12 20:30:39 +00:00
Ralph Castain	a8a9801a0b	Ensure an orted exits with non-zero status if it is unable to send a message. Add more diagnostic messages to the OOB set_addr code cmr=v1.7.5:reviewer=jsquyres This commit was SVN r30701.	2014-02-12 19:44:01 +00:00
Ralph Castain	fa7b686ccc	Provide better messages when we don't find any included interfaces, and/or don't find any interfaces for use by OOB. cmr=v1.7.5:reviewer=jsquyres This commit was SVN r30675.	2014-02-11 19:29:03 +00:00
Ralph Castain	5980b7e042	Add a security framework for authenticating connections - we will add LDAP, Kerberos, and Keystone support in the next month. For now, just put a placeholder "basic" module that does the minimum. Wire the security check into ORTE's OOB handshake, and add a "version" check to ensure that both ends are from the same ORTE version. If not, report the mismatch and refuse the connection Fixes trac:4171 cmr=v1.7.5:reviewer=jsquyres:subject=Add a security framework for authenticating connections This commit was SVN r30551. The following Trac tickets were found above: Ticket 4171 --> https://svn.open-mpi.org/trac/ompi/ticket/4171	2014-02-04 01:38:45 +00:00
Rolf vandeVaart	f7055de78e	Stop listening thread and wait for it to terminate. This commit was SVN r30507.	2014-01-30 20:37:15 +00:00
Ralph Castain	956aab03a7	Track the origin of a message so it can be passed across transports Refs trac:4184 This commit was SVN r30433. The following Trac tickets were found above: Ticket 4184 --> https://svn.open-mpi.org/trac/ompi/ticket/4184	2014-01-26 21:09:26 +00:00
Ralph Castain	657796f9e0	Revert r30327 - turns out it isn't quite right just yet. :-( Closes trac:4138 This commit was SVN r30328. The following SVN revision numbers were found above: r30327 --> open-mpi/ompi@87d5f86025 The following Trac tickets were found above: Ticket 4138 --> https://svn.open-mpi.org/trac/ompi/ticket/4138	2014-01-18 23:38:39 +00:00
Ralph Castain	87d5f86025	Enable use of unix domain sockets for local OOB communications, thereby removing the requirement for an active network interface when running strictly on a single node. Update the overall OOB system to support cross-transport movement of messages so that the OOB can move a received message to another transport for transmission. cmr=v1.7.5:reviewer=jsquyres:subject=Enable use of unix domain sockets for local OOB communications This commit was SVN r30327.	2014-01-18 21:36:49 +00:00
Ralph Castain	85f2429819	Ensure the ipv6 lists get initialized and finalized cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30081.	2013-12-24 17:24:39 +00:00
Jeff Squyres	ce02002a5e	Free minor memory leak / squash valgrind still-reachable warning. cmr=v1.7.5:reviewer=rhc This commit was SVN r30071.	2013-12-24 11:04:38 +00:00
Ralph Castain	6239e64f36	Further cleanup of orte-ps so it doesn't abort when hitting a stale HNP - only report that event once and just keep working. Refs trac:3992 This commit was SVN r29974. The following Trac tickets were found above: Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992	2013-12-19 03:28:05 +00:00
Ralph Castain	fb59b6b875	Silence compiler warning when --disable-orte-static-ports This commit was SVN r29783.	2013-12-03 01:53:31 +00:00
Ralph Castain	8c5c7d0db4	Correct a bug in handling of oob_tcp_if_include/exclude addresses by using the kernel index instead of the raw index of the interface. Refs trac:3696 This commit was SVN r29522. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-10-26 00:47:14 +00:00
Ralph Castain	2a116ecdfc	Fix a race condition created when two processes attempt to send to each other at the same time. This causes both processes to start connection procedures, resulting in a c onflict that can cause messages to be lost. Add detection of this condition, and have both processes cancel their connect operations. The process with the higher rank will reconnect, while the lower rank process will simply wait for the connection to be created. Refs trac:3696 This commit was SVN r29139. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-09-06 05:15:25 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00

36 Коммитов