1
1
openmpi/orte/util
Ralph Castain ea35e47228 Fat SMPs (i.e., systems with nodes containing large numbers of cpus) were failing to start due to connection failures of the opal/pmix support. Root cause was that (a) we were setting the client socket to non-blocking before calling connect, and (b) the server was using the event library to harvest the accepts, and also did the handshake while in that event. So the server would backup beyond the connection backlog limit, and we would fail.
Changing the client to leave its socket as blocking during the connect doesn't solve the problem by itself - you also have to introduce a sleep delay once the backlog is hit to avoid simply machine-gunning your way thru retries. This gets somewhat difficult to adjust as you don't want to unnecessarily prolong startup time.

We've solved this before by adding a listening thread that simply reaps accepts and shoves them into the event library for subsequent processing. This would resolve the problem, but meant yet another daemon-level thread. So I centralized the listening thread support and let multiple elements register listeners on it. Thus, each daemon now has a single listening thread that reaps accepts from multiple sources - for now, the orte/pmix server and the oob/usock support are using it. I'll add in the oob/tcp component later.

This still didn't fully resolve the SMP problem, especially on coprocessor cards (e.g., KNC). Removing the shared memory dstore support helped further improve the behavior - it looks like there is some kind of memory paging issue there that needs further understanding. Given that the shared memory support was about to be lost when I bring over the PMIx integration (until it is restored in that library), it seemed like a reasonable thing to just remove it at this point.
2015-05-29 14:37:14 -07:00
..
comm Per RFC: 2014-06-01 16:14:10 +00:00
dash_host Add the ability to specify the number of desired slots in the --host option. Just giving a host name => one slot (multiple copies of the name yield one slot per copy). Giving "foo:3" indicates you want three slots - a shorthand notation for saying "foo" three times. Giving "foo:*" indicates you want the topology to set the number of slots based on the orte_set_slots param. 2015-04-30 20:35:23 -07:00
hostfile initialize common symbols from orte 2015-05-08 10:11:58 +09:00
attr.c Fix bad test - opal_buffer and opal_ptr can support NULL locations 2015-02-17 21:46:23 -08:00
attr.h Consolidate all the QOS changes into one clean commit 2015-05-06 19:48:42 -07:00
context_fns.c First part of memory leak cleanups from Gilles 2014-11-24 16:53:33 -08:00
context_fns.h ... Delayed due to notifier commits earlier this day ... 2009-04-29 01:32:14 +00:00
error_strings.c Rework the OOB selection logic to allow a component (e.g., usock) to direct that it be the sole active component. Remove prior disqualifying code in the oob/tcp component as it was too restrictive - if usock wasn't able to run, it left apps with no way to communicate to their daemon. Have the local daemon check the global modex for the RML URI info of the local procs so it can route messages between them when tcp is the primary channel. 2015-05-08 11:15:21 -07:00
error_strings.h When we comm_spawn, we really want to respect the original -host directives and not expand the daemon virtual machine unless directed to do so in the comm_spawn command. Otherwise, we will automatically launch daemons on every node in the allocation. 2014-04-30 22:26:18 +00:00
help-regex.txt check-help-strings cleanup 2014-08-11 03:25:22 +00:00
hnp_contact.c orte/util: fix memory leaks 2015-03-05 14:06:18 +09:00
hnp_contact.h As requested by Aurelien at the July design meeting - long time coming, but finally got around to it. 2008-12-10 17:10:39 +00:00
listener.c Fat SMPs (i.e., systems with nodes containing large numbers of cpus) were failing to start due to connection failures of the opal/pmix support. Root cause was that (a) we were setting the client socket to non-blocking before calling connect, and (b) the server was using the event library to harvest the accepts, and also did the handshake while in that event. So the server would backup beyond the connection backlog limit, and we would fail. 2015-05-29 14:37:14 -07:00
listener.h Fat SMPs (i.e., systems with nodes containing large numbers of cpus) were failing to start due to connection failures of the opal/pmix support. Root cause was that (a) we were setting the client socket to non-blocking before calling connect, and (b) the server was using the event library to harvest the accepts, and also did the handshake while in that event. So the server would backup beyond the connection backlog limit, and we would fail. 2015-05-29 14:37:14 -07:00
Makefile.am Fat SMPs (i.e., systems with nodes containing large numbers of cpus) were failing to start due to connection failures of the opal/pmix support. Root cause was that (a) we were setting the client socket to non-blocking before calling connect, and (b) the server was using the event library to harvest the accepts, and also did the handshake while in that event. So the server would backup beyond the connection backlog limit, and we would fail. 2015-05-29 14:37:14 -07:00
name_fns.c orte/util: fix misc memory leaks 2015-02-17 12:27:23 +09:00
name_fns.h Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. 2014-11-11 17:00:42 -08:00
nidmap.c Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. 2014-11-11 17:00:42 -08:00
nidmap.h Per the PMIx RFC: 2014-08-21 18:56:47 +00:00
parse_options.c orte/util: fix misc memory leaks 2015-02-17 12:27:23 +09:00
parse_options.h Use ports as multicast channels instead of networks so we avoid stepping into reserved spaces. 2011-04-29 18:46:40 +00:00
pre_condition_transports.c orte/util: fix misc memory leaks 2015-02-17 12:27:23 +09:00
pre_condition_transports.h In the case of direct-launched processes running under slurm, psm requires that the pre_condition_transports MCA param be set. This is normally computed by mpirun and inserted into each proc's environ, but that doesn't work here. 2011-04-28 13:54:33 +00:00
proc_info.c Include the FQDN version and non-stripped version of the hostname in our list of aliases as these (plus localhost) are the most common aliases we see. 2015-03-17 06:26:26 -07:00
proc_info.h Attempt to reduce the RARP traffic during definition of allocations 2015-03-16 16:26:40 -07:00
regex.c Update the regex to resolve a bug 2014-08-29 22:24:20 +00:00
regex.h Extend the regular expression parsing support 2014-03-17 21:25:05 +00:00
session_dir.c For scalability reasons, and to make life easier for the poor Cray-ites, don't bang on the system for the username - we'll just use the uid. 2015-03-19 21:24:13 -07:00
session_dir.h - Revert r20739 2009-03-05 21:56:03 +00:00
show_help.c Continue refinement of the DVM operations. Send the spawn request to the right place (it helps) as it isn't a comm_spawn request and has to be treated a little differently. Ensure IO gets forwarded back to the tool. Ensure the tool outputs show_help locally as there is no place to send it. 2015-02-04 06:21:54 -08:00
show_help.h Remove the old configure option for disabling full rte support - we now use the OMPI rte framework for such purposes 2013-02-28 01:35:55 +00:00