Dave Goodell
5f3b81e291
oob: delete events when destroying a peer
...
Without this patch running ring_c with the usnic BTL under valgrind will
cause the orteds to segfault.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Ralph Castain <rhc@open-mpi.org>
cmr=v1.7.5:reviewer=ompi-rm1.7
This commit was SVN r31161.
2014-03-19 22:15:49 +00:00
Ralph Castain
0257d32eeb
There is no OOB component object - it is a simple struct with an opal_list_item_t element at the beginning
...
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r31087.
2014-03-17 21:23:59 +00:00
Ralph Castain
fbc5e3b773
Deal with the corner case where we encounter an error when attempting to launch a daemon. In this case, we will order abnormal termination before daemons callback to us, and thus any attempt to send them a "die" message will fail. Ensure that mpirun at least exits cleanly in this scenario, thereby allowing the remote daemons that did get launched to commit suicide when comm fails.
...
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r31068.
2014-03-14 15:32:30 +00:00
Ralph Castain
2abed09d7c
Continue to resolve priority issues. Cleanup the case of forced termination in mpirun during launch processing by ensuring we can respond to socket closures, and ensuring that the remote daemons correctly close their sockets when terminating.
...
Jeff: please test a variety of conditions to ensure we get this right
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r31058.
2014-03-13 04:02:24 +00:00
Ralph Castain
a254d2db34
Silence warning when CR is not enabled
...
This commit was SVN r31025.
2014-03-12 13:47:03 +00:00
Adrian Reber
4512b3375e
OOB/TCP: wire up the existing ft_event() function
...
This commit was SVN r31022.
2014-03-12 12:47:20 +00:00
Adrian Reber
8d40cd53ae
use the existing pretty-print function for information about the job state
...
This commit was SVN r31020.
2014-03-12 12:34:25 +00:00
Adrian Reber
49173ccd61
add debug output for the ft_event handler
...
This commit was SVN r30990.
2014-03-11 15:39:16 +00:00
Adrian Reber
7304b700e1
Fix the newly added FT event state when compiling --with-ft
...
This commit was SVN r30988.
2014-03-11 13:20:08 +00:00
Ralph Castain
2cd1cfc7fe
Remove this ignore for now
...
This commit was SVN r30985.
2014-03-11 03:02:13 +00:00
Ralph Castain
7a44af375c
Add an FT event state and set the state machine to callback to the OOB base ft event when activated
...
This commit was SVN r30950.
2014-03-06 02:44:29 +00:00
Ralph Castain
9793909988
Correct the constant we check for an error. Thanks to George for noticing it.
...
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r30949.
2014-03-06 02:21:27 +00:00
Ralph Castain
da4cb39683
If we can't find a route to communicate, emit an error message rather than just exiting with a non-zero status
...
cmr=v1.7.5:reviewer=jsquyres:subject=print error if cannot communicate
This commit was SVN r30922.
2014-03-04 04:57:53 +00:00
Ralph Castain
0319d5fb19
Seeing some errors coming out of MTT on this component, so turn it off for now and will debug later
...
This commit was SVN r30789.
2014-02-21 16:31:52 +00:00
Ralph Castain
5520d6971b
We do have to track the origin of messages sent over usock as the daemon does route them back down, and we need to get the "sender" info correct. Also do a better job of dealing with simultaneous connections to avoid binding to a used socket.
...
Refs trac:4280
This commit was SVN r30781.
The following Trac tickets were found above:
Ticket 4280 --> https://svn.open-mpi.org/trac/ompi/ticket/4280
2014-02-20 17:27:05 +00:00
Adrian Reber
6b45d475e9
Fix compiler warnings when compiling with --with-ft
...
With enabled fault tolerance code different functions
are selected during compilation. Most of the ft
code is #ifdef'd out. This #ifdef's more code out
so that compiler warnings like
warning: unused variable 'item' [-Wunused-variable]
opal_list_item_t *item;
are removed.
This commit was SVN r30747.
2014-02-17 10:53:44 +00:00
Ralph Castain
ea0217c337
Remove unused file and minimize the usock uri contribution (add explanation as to why)
...
Refs trac:4280
This commit was SVN r30744.
The following Trac tickets were found above:
Ticket 4280 --> https://svn.open-mpi.org/trac/ompi/ticket/4280
2014-02-16 22:37:30 +00:00
Ralph Castain
d42f4be8a4
Add unix socket component to OOB - no longer require active network for local operations. Demonstrate inter-transport crossover.
...
VERY tentatively schedule this for 1.7.5 - only to be applied if we see no troubles AND the branch is ready in advance.
cmr=v1.7.5:reviewer=rhc:subject=Add unix socket component to OOB
This commit was SVN r30742.
2014-02-16 20:54:12 +00:00
Ralph Castain
14bb7a117c
Fix bugs in the oob base - ensure we get the components in high-to-low priority, and that we correctly track reachability via all components. Adjust the priority of the tcp component to leave headroom for others
...
Refs trac:267
This commit was SVN r30740.
The following Trac tickets were found above:
Ticket 267 --> https://svn.open-mpi.org/trac/ompi/ticket/267
2014-02-16 03:19:08 +00:00
Ralph Castain
3f9db36e0d
Make Jeff smile - pretty-up the indentation
...
Refs trac:4267
This commit was SVN r30733.
The following Trac tickets were found above:
Ticket 4267 --> https://svn.open-mpi.org/trac/ompi/ticket/4267
2014-02-14 23:25:48 +00:00
Ralph Castain
4e1c07cbf2
If we are given a TCP oob address that doesn't match any active module, it is still possible that we could route to the address if a router is in the system. No harm in trying, so arbitrarily pick the first connection in the active module list and assign the peer to it. If that module can't reach it, we'll follow the usual failover mechanism until finally concluding that nobody can get there.
...
cmr=v1.7.5:reviewer=jsquyres:subject=handle non-matching addresses
This commit was SVN r30719.
2014-02-13 23:37:22 +00:00
Ralph Castain
fc6101b508
Handle "localhost" better
...
Refs trac:4263
This commit was SVN r30702.
The following Trac tickets were found above:
Ticket 4263 --> https://svn.open-mpi.org/trac/ompi/ticket/4263
2014-02-12 20:30:39 +00:00
Ralph Castain
a8a9801a0b
Ensure an orted exits with non-zero status if it is unable to send a message. Add more diagnostic messages to the OOB set_addr code
...
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r30701.
2014-02-12 19:44:01 +00:00
Ralph Castain
fa7b686ccc
Provide better messages when we don't find any included interfaces, and/or don't find any interfaces for use by OOB.
...
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r30675.
2014-02-11 19:29:03 +00:00
Ralph Castain
230336b6a8
Upgrade the security framework to avoid multiple hits against the global security server. Add support for future case where mpirun assings a global security credential for a given run, though we need to work out how to handle connect-accept from other mpirun's in that case. Remove a bunch of duplicate code in the OOB by consolidating the connection handshake code.
...
Refs trac:4221
This commit was SVN r30554.
The following Trac tickets were found above:
Ticket 4221 --> https://svn.open-mpi.org/trac/ompi/ticket/4221
2014-02-04 14:47:04 +00:00
Ralph Castain
5980b7e042
Add a security framework for authenticating connections - we will add LDAP, Kerberos, and Keystone support in the next month. For now, just put a placeholder "basic" module that does the minimum.
...
Wire the security check into ORTE's OOB handshake, and add a "version" check to ensure that both ends are from the same ORTE version. If not, report the mismatch and refuse the connection
Fixes trac:4171
cmr=v1.7.5:reviewer=jsquyres:subject=Add a security framework for authenticating connections
This commit was SVN r30551.
The following Trac tickets were found above:
Ticket 4171 --> https://svn.open-mpi.org/trac/ompi/ticket/4171
2014-02-04 01:38:45 +00:00
Ralph Castain
993198cfba
Fix lost message problem - if multiple messages are queued before the connection is formed, we lost all but the first one. Ensure that all messages get properly queued prior to completing the connection
...
cmr=v1.7.4:reviewer=jsquyres:subject=Fix lost message problem
This commit was SVN r30516.
2014-01-31 05:30:51 +00:00
Rolf vandeVaart
f7055de78e
Stop listening thread and wait for it to terminate.
...
This commit was SVN r30507.
2014-01-30 20:37:15 +00:00
Ralph Castain
db92ac3ce1
Cleanup role of aggregator relative to daemons
...
Refs trac:4176
This commit was SVN r30495.
The following Trac tickets were found above:
Ticket 4176 --> https://svn.open-mpi.org/trac/ompi/ticket/4176
2014-01-30 00:53:30 +00:00
Ralph Castain
956aab03a7
Track the origin of a message so it can be passed across transports
...
Refs trac:4184
This commit was SVN r30433.
The following Trac tickets were found above:
Ticket 4184 --> https://svn.open-mpi.org/trac/ompi/ticket/4184
2014-01-26 21:09:26 +00:00
Ralph Castain
657796f9e0
Revert r30327 - turns out it isn't quite right just yet. :-(
...
Closes trac:4138
This commit was SVN r30328.
The following SVN revision numbers were found above:
r30327 --> open-mpi/ompi@87d5f86025
The following Trac tickets were found above:
Ticket 4138 --> https://svn.open-mpi.org/trac/ompi/ticket/4138
2014-01-18 23:38:39 +00:00
Ralph Castain
87d5f86025
Enable use of unix domain sockets for local OOB communications, thereby removing the requirement for an active network interface when running strictly on a single node. Update the overall OOB system to support cross-transport movement of messages so that the OOB can move a received message to another transport for transmission.
...
cmr=v1.7.5:reviewer=jsquyres:subject=Enable use of unix domain sockets for local OOB communications
This commit was SVN r30327.
2014-01-18 21:36:49 +00:00
Brian Barrett
8b778903d8
Fix longstanding issue with our multi-project support. Rather than using
...
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi. This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.
This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Ralph Castain
85f2429819
Ensure the ipv6 lists get initialized and finalized
...
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r30081.
2013-12-24 17:24:39 +00:00
Ralph Castain
2e08219cac
Silence the valgrind report from the OOB
...
Refs trac:4033
This commit was SVN r30080.
The following Trac tickets were found above:
Ticket 4033 --> https://svn.open-mpi.org/trac/ompi/ticket/4033
2013-12-24 17:06:45 +00:00
Ralph Castain
01ee5f380b
Remove debug - problem has been identified
...
Refs trac:4026
This commit was SVN r30075.
The following Trac tickets were found above:
Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026
2013-12-24 15:22:18 +00:00
Jeff Squyres
ce02002a5e
Free minor memory leak / squash valgrind still-reachable warning.
...
cmr=v1.7.5:reviewer=rhc
This commit was SVN r30071.
2013-12-24 11:04:38 +00:00
Ralph Castain
38f46641ce
Ensure the recv handler has been initialized
...
Refs trac:4026
This commit was SVN r30068.
The following Trac tickets were found above:
Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026
2013-12-24 06:09:45 +00:00
Ralph Castain
65228d3571
Don't use "size_t" for the nbytes field in the header - use uint32_t to ensure that ntohl/htonl correctly match it
...
Refs trac:4026
This commit was SVN r30062.
The following Trac tickets were found above:
Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026
2013-12-23 21:39:49 +00:00
Ralph Castain
7d8c0459a4
Attempt to debug hang that is hitting some environments. Posting to 1.7.4 as a placeholder for the eventual solution
...
cmr=v1.7.4:reviewer=rhc
This commit was SVN r30060.
2013-12-23 19:57:05 +00:00
George Bosilca
24879f9def
Code cleanup while chasing valgrind complaints.
...
This commit was SVN r30048.
2013-12-21 23:28:14 +00:00
Ralph Castain
264150872b
Add a bunch of debug output to the OOB connection completion code so we can track down a handshake problem. Available in optimized builds as well as debug ones by setting -mca oob_base_verbose 10
...
No review will be required as this is just debug code for those helping us debug the 1.7.4 release candidates
cmr-=v1.7.4:reviewer=ompi-gk1.7
This commit was SVN r30043.
2013-12-21 16:09:26 +00:00
Ralph Castain
6239e64f36
Further cleanup of orte-ps so it doesn't abort when hitting a stale HNP - only report that event once and just keep working.
...
Refs trac:3992
This commit was SVN r29974.
The following Trac tickets were found above:
Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 03:28:05 +00:00
Ralph Castain
39957df08e
Fixes trac:3963. Fix the tool ess procedure so it opens and selects the OOB framework, and have the OOB TCP module update the route to new connections (the routed modules know what to do).
...
Thanks to Dave Love and Ashley Pittman for pointing out the problem.
cmr=v1.7.4:reviewer=jsquyres:subject=Fix tool communications with mpirun
This commit was SVN r29959.
The following Trac tickets were found above:
Ticket 3963 --> https://svn.open-mpi.org/trac/ompi/ticket/3963
2013-12-18 23:13:46 +00:00
Ralph Castain
77553f72be
Per this email thread:
...
http://www.open-mpi.org/community/lists/devel/2013/12/13412.php
fix the backtrace function to avoid async issues. Thanks to Takahiro Kawashima for the patch
This commit was SVN r29955.
2013-12-18 17:57:37 +00:00
Jeff Squyres
2e7653e4c2
Add missing argv.h includes.
...
Noticed these as part of #3694 : external libevent's don't cause argv.h
to automatically get included.
Refs trac:3694
This commit was SVN r29897.
The following Trac tickets were found above:
Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:17:36 +00:00
Ralph Castain
fb59b6b875
Silence compiler warning when --disable-orte-static-ports
...
This commit was SVN r29783.
2013-12-03 01:53:31 +00:00
Ralph Castain
5b38259264
Ouch - remove an extraneous line.
...
Thanks to Tetsuya Mishima for reporting it
cmr=v1.7.4:reviewer=rhc:subject=Remove extraneous line from OOB
This commit was SVN r29677.
2013-11-13 04:02:05 +00:00
Ralph Castain
8c5c7d0db4
Correct a bug in handling of oob_tcp_if_include/exclude addresses by using the kernel index instead of the raw index of the interface.
...
Refs trac:3696
This commit was SVN r29522.
The following Trac tickets were found above:
Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-10-26 00:47:14 +00:00
Ralph Castain
7c86a843c8
Silence compiler warning
...
This commit was SVN r29477.
2013-10-23 04:13:36 +00:00