Jeff Squyres
d8715f1e3a
Close 3 more fd's that were leaking into child processes.
...
Child processes now look clean; I can't find any more fd's that are
leaking from the parent to children.
Refs trac:4550
This commit was SVN r31515.
The following Trac tickets were found above:
Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
2014-04-24 15:36:24 +00:00
Jeff Squyres
e1655ae68d
opal/util/fd.c: add new convenience function for setting FD_CLOEXEC
...
Paul Hargrove pointed out that Stevens tells us that we should
FD_GETFL before FD_SETFL. And so we shall.
Make a new convenience function to do this (opal_fd_set_cloexec()),
just so that we don't have to litter this 2-step process throughout
the code.
Refs trac:4550
This commit was SVN r31513.
The following Trac tickets were found above:
Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
2014-04-24 13:04:49 +00:00
Jeff Squyres
410f5bfb91
oob_tcp_listener.c: set both ends of this thread to be close-on-exec
...
This pipe is used to communicate between threads in this process.
Mark both fd as close-on-exec so that children don't inherit this
pipe.
Refs trac:4550
This commit was SVN r31512.
The following Trac tickets were found above:
Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
2014-04-23 21:46:41 +00:00
Ralph Castain
bbdbc5f8a8
Per suggestion from George, use a pipe for terminating the thread.
...
Refs trac:4510
This commit was SVN r31381.
The following Trac tickets were found above:
Ticket 4510 --> https://svn.open-mpi.org/trac/ompi/ticket/4510
2014-04-14 01:02:46 +00:00
Ralph Castain
2d8dff837c
Ensure we properly terminate the listening thread prior to exiting, but do so in a way that doesn't make us wait for select to timeout.
...
Refs trac:4510
This commit was SVN r31376.
The following Trac tickets were found above:
Ticket 4510 --> https://svn.open-mpi.org/trac/ompi/ticket/4510
2014-04-12 15:01:24 +00:00
Ralph Castain
9b30b2b783
Shave some time off of mpirun's operation by not waiting for the listener thread to terminate before exiting
...
cmr=v1.8.1:reviewer=rhc
This commit was SVN r31368.
2014-04-11 04:16:28 +00:00
Ralph Castain
92ca647d3d
Fix copy error in file name
...
This commit was SVN r31344.
2014-04-08 15:31:55 +00:00
Dave Goodell
5f3b81e291
oob: delete events when destroying a peer
...
Without this patch running ring_c with the usnic BTL under valgrind will
cause the orteds to segfault.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Ralph Castain <rhc@open-mpi.org>
cmr=v1.7.5:reviewer=ompi-rm1.7
This commit was SVN r31161.
2014-03-19 22:15:49 +00:00
Ralph Castain
0257d32eeb
There is no OOB component object - it is a simple struct with an opal_list_item_t element at the beginning
...
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r31087.
2014-03-17 21:23:59 +00:00
Ralph Castain
fbc5e3b773
Deal with the corner case where we encounter an error when attempting to launch a daemon. In this case, we will order abnormal termination before daemons callback to us, and thus any attempt to send them a "die" message will fail. Ensure that mpirun at least exits cleanly in this scenario, thereby allowing the remote daemons that did get launched to commit suicide when comm fails.
...
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r31068.
2014-03-14 15:32:30 +00:00
Ralph Castain
2abed09d7c
Continue to resolve priority issues. Cleanup the case of forced termination in mpirun during launch processing by ensuring we can respond to socket closures, and ensuring that the remote daemons correctly close their sockets when terminating.
...
Jeff: please test a variety of conditions to ensure we get this right
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r31058.
2014-03-13 04:02:24 +00:00
Ralph Castain
a254d2db34
Silence warning when CR is not enabled
...
This commit was SVN r31025.
2014-03-12 13:47:03 +00:00
Adrian Reber
4512b3375e
OOB/TCP: wire up the existing ft_event() function
...
This commit was SVN r31022.
2014-03-12 12:47:20 +00:00
Adrian Reber
8d40cd53ae
use the existing pretty-print function for information about the job state
...
This commit was SVN r31020.
2014-03-12 12:34:25 +00:00
Adrian Reber
49173ccd61
add debug output for the ft_event handler
...
This commit was SVN r30990.
2014-03-11 15:39:16 +00:00
Adrian Reber
7304b700e1
Fix the newly added FT event state when compiling --with-ft
...
This commit was SVN r30988.
2014-03-11 13:20:08 +00:00
Ralph Castain
2cd1cfc7fe
Remove this ignore for now
...
This commit was SVN r30985.
2014-03-11 03:02:13 +00:00
Ralph Castain
7a44af375c
Add an FT event state and set the state machine to callback to the OOB base ft event when activated
...
This commit was SVN r30950.
2014-03-06 02:44:29 +00:00
Ralph Castain
9793909988
Correct the constant we check for an error. Thanks to George for noticing it.
...
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r30949.
2014-03-06 02:21:27 +00:00
Ralph Castain
da4cb39683
If we can't find a route to communicate, emit an error message rather than just exiting with a non-zero status
...
cmr=v1.7.5:reviewer=jsquyres:subject=print error if cannot communicate
This commit was SVN r30922.
2014-03-04 04:57:53 +00:00
Ralph Castain
0319d5fb19
Seeing some errors coming out of MTT on this component, so turn it off for now and will debug later
...
This commit was SVN r30789.
2014-02-21 16:31:52 +00:00
Ralph Castain
5520d6971b
We do have to track the origin of messages sent over usock as the daemon does route them back down, and we need to get the "sender" info correct. Also do a better job of dealing with simultaneous connections to avoid binding to a used socket.
...
Refs trac:4280
This commit was SVN r30781.
The following Trac tickets were found above:
Ticket 4280 --> https://svn.open-mpi.org/trac/ompi/ticket/4280
2014-02-20 17:27:05 +00:00
Adrian Reber
6b45d475e9
Fix compiler warnings when compiling with --with-ft
...
With enabled fault tolerance code different functions
are selected during compilation. Most of the ft
code is #ifdef'd out. This #ifdef's more code out
so that compiler warnings like
warning: unused variable 'item' [-Wunused-variable]
opal_list_item_t *item;
are removed.
This commit was SVN r30747.
2014-02-17 10:53:44 +00:00
Ralph Castain
ea0217c337
Remove unused file and minimize the usock uri contribution (add explanation as to why)
...
Refs trac:4280
This commit was SVN r30744.
The following Trac tickets were found above:
Ticket 4280 --> https://svn.open-mpi.org/trac/ompi/ticket/4280
2014-02-16 22:37:30 +00:00
Ralph Castain
d42f4be8a4
Add unix socket component to OOB - no longer require active network for local operations. Demonstrate inter-transport crossover.
...
VERY tentatively schedule this for 1.7.5 - only to be applied if we see no troubles AND the branch is ready in advance.
cmr=v1.7.5:reviewer=rhc:subject=Add unix socket component to OOB
This commit was SVN r30742.
2014-02-16 20:54:12 +00:00
Ralph Castain
14bb7a117c
Fix bugs in the oob base - ensure we get the components in high-to-low priority, and that we correctly track reachability via all components. Adjust the priority of the tcp component to leave headroom for others
...
Refs trac:267
This commit was SVN r30740.
The following Trac tickets were found above:
Ticket 267 --> https://svn.open-mpi.org/trac/ompi/ticket/267
2014-02-16 03:19:08 +00:00
Ralph Castain
3f9db36e0d
Make Jeff smile - pretty-up the indentation
...
Refs trac:4267
This commit was SVN r30733.
The following Trac tickets were found above:
Ticket 4267 --> https://svn.open-mpi.org/trac/ompi/ticket/4267
2014-02-14 23:25:48 +00:00
Ralph Castain
4e1c07cbf2
If we are given a TCP oob address that doesn't match any active module, it is still possible that we could route to the address if a router is in the system. No harm in trying, so arbitrarily pick the first connection in the active module list and assign the peer to it. If that module can't reach it, we'll follow the usual failover mechanism until finally concluding that nobody can get there.
...
cmr=v1.7.5:reviewer=jsquyres:subject=handle non-matching addresses
This commit was SVN r30719.
2014-02-13 23:37:22 +00:00
Ralph Castain
fc6101b508
Handle "localhost" better
...
Refs trac:4263
This commit was SVN r30702.
The following Trac tickets were found above:
Ticket 4263 --> https://svn.open-mpi.org/trac/ompi/ticket/4263
2014-02-12 20:30:39 +00:00
Ralph Castain
a8a9801a0b
Ensure an orted exits with non-zero status if it is unable to send a message. Add more diagnostic messages to the OOB set_addr code
...
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r30701.
2014-02-12 19:44:01 +00:00
Ralph Castain
fa7b686ccc
Provide better messages when we don't find any included interfaces, and/or don't find any interfaces for use by OOB.
...
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r30675.
2014-02-11 19:29:03 +00:00
Ralph Castain
230336b6a8
Upgrade the security framework to avoid multiple hits against the global security server. Add support for future case where mpirun assings a global security credential for a given run, though we need to work out how to handle connect-accept from other mpirun's in that case. Remove a bunch of duplicate code in the OOB by consolidating the connection handshake code.
...
Refs trac:4221
This commit was SVN r30554.
The following Trac tickets were found above:
Ticket 4221 --> https://svn.open-mpi.org/trac/ompi/ticket/4221
2014-02-04 14:47:04 +00:00
Ralph Castain
5980b7e042
Add a security framework for authenticating connections - we will add LDAP, Kerberos, and Keystone support in the next month. For now, just put a placeholder "basic" module that does the minimum.
...
Wire the security check into ORTE's OOB handshake, and add a "version" check to ensure that both ends are from the same ORTE version. If not, report the mismatch and refuse the connection
Fixes trac:4171
cmr=v1.7.5:reviewer=jsquyres:subject=Add a security framework for authenticating connections
This commit was SVN r30551.
The following Trac tickets were found above:
Ticket 4171 --> https://svn.open-mpi.org/trac/ompi/ticket/4171
2014-02-04 01:38:45 +00:00
Ralph Castain
993198cfba
Fix lost message problem - if multiple messages are queued before the connection is formed, we lost all but the first one. Ensure that all messages get properly queued prior to completing the connection
...
cmr=v1.7.4:reviewer=jsquyres:subject=Fix lost message problem
This commit was SVN r30516.
2014-01-31 05:30:51 +00:00
Rolf vandeVaart
f7055de78e
Stop listening thread and wait for it to terminate.
...
This commit was SVN r30507.
2014-01-30 20:37:15 +00:00
Ralph Castain
db92ac3ce1
Cleanup role of aggregator relative to daemons
...
Refs trac:4176
This commit was SVN r30495.
The following Trac tickets were found above:
Ticket 4176 --> https://svn.open-mpi.org/trac/ompi/ticket/4176
2014-01-30 00:53:30 +00:00
Ralph Castain
956aab03a7
Track the origin of a message so it can be passed across transports
...
Refs trac:4184
This commit was SVN r30433.
The following Trac tickets were found above:
Ticket 4184 --> https://svn.open-mpi.org/trac/ompi/ticket/4184
2014-01-26 21:09:26 +00:00
Ralph Castain
657796f9e0
Revert r30327 - turns out it isn't quite right just yet. :-(
...
Closes trac:4138
This commit was SVN r30328.
The following SVN revision numbers were found above:
r30327 --> open-mpi/ompi@87d5f86025
The following Trac tickets were found above:
Ticket 4138 --> https://svn.open-mpi.org/trac/ompi/ticket/4138
2014-01-18 23:38:39 +00:00
Ralph Castain
87d5f86025
Enable use of unix domain sockets for local OOB communications, thereby removing the requirement for an active network interface when running strictly on a single node. Update the overall OOB system to support cross-transport movement of messages so that the OOB can move a received message to another transport for transmission.
...
cmr=v1.7.5:reviewer=jsquyres:subject=Enable use of unix domain sockets for local OOB communications
This commit was SVN r30327.
2014-01-18 21:36:49 +00:00
Brian Barrett
8b778903d8
Fix longstanding issue with our multi-project support. Rather than using
...
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi. This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.
This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Ralph Castain
85f2429819
Ensure the ipv6 lists get initialized and finalized
...
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r30081.
2013-12-24 17:24:39 +00:00
Ralph Castain
2e08219cac
Silence the valgrind report from the OOB
...
Refs trac:4033
This commit was SVN r30080.
The following Trac tickets were found above:
Ticket 4033 --> https://svn.open-mpi.org/trac/ompi/ticket/4033
2013-12-24 17:06:45 +00:00
Ralph Castain
01ee5f380b
Remove debug - problem has been identified
...
Refs trac:4026
This commit was SVN r30075.
The following Trac tickets were found above:
Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026
2013-12-24 15:22:18 +00:00
Jeff Squyres
ce02002a5e
Free minor memory leak / squash valgrind still-reachable warning.
...
cmr=v1.7.5:reviewer=rhc
This commit was SVN r30071.
2013-12-24 11:04:38 +00:00
Ralph Castain
38f46641ce
Ensure the recv handler has been initialized
...
Refs trac:4026
This commit was SVN r30068.
The following Trac tickets were found above:
Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026
2013-12-24 06:09:45 +00:00
Ralph Castain
65228d3571
Don't use "size_t" for the nbytes field in the header - use uint32_t to ensure that ntohl/htonl correctly match it
...
Refs trac:4026
This commit was SVN r30062.
The following Trac tickets were found above:
Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026
2013-12-23 21:39:49 +00:00
Ralph Castain
7d8c0459a4
Attempt to debug hang that is hitting some environments. Posting to 1.7.4 as a placeholder for the eventual solution
...
cmr=v1.7.4:reviewer=rhc
This commit was SVN r30060.
2013-12-23 19:57:05 +00:00
George Bosilca
24879f9def
Code cleanup while chasing valgrind complaints.
...
This commit was SVN r30048.
2013-12-21 23:28:14 +00:00
Ralph Castain
264150872b
Add a bunch of debug output to the OOB connection completion code so we can track down a handshake problem. Available in optimized builds as well as debug ones by setting -mca oob_base_verbose 10
...
No review will be required as this is just debug code for those helping us debug the 1.7.4 release candidates
cmr-=v1.7.4:reviewer=ompi-gk1.7
This commit was SVN r30043.
2013-12-21 16:09:26 +00:00
Ralph Castain
6239e64f36
Further cleanup of orte-ps so it doesn't abort when hitting a stale HNP - only report that event once and just keep working.
...
Refs trac:3992
This commit was SVN r29974.
The following Trac tickets were found above:
Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 03:28:05 +00:00