ea35e47228
Changing the client to leave its socket as blocking during the connect doesn't solve the problem by itself - you also have to introduce a sleep delay once the backlog is hit to avoid simply machine-gunning your way thru retries. This gets somewhat difficult to adjust as you don't want to unnecessarily prolong startup time. We've solved this before by adding a listening thread that simply reaps accepts and shoves them into the event library for subsequent processing. This would resolve the problem, but meant yet another daemon-level thread. So I centralized the listening thread support and let multiple elements register listeners on it. Thus, each daemon now has a single listening thread that reaps accepts from multiple sources - for now, the orte/pmix server and the oob/usock support are using it. I'll add in the oob/tcp component later. This still didn't fully resolve the SMP problem, especially on coprocessor cards (e.g., KNC). Removing the shared memory dstore support helped further improve the behavior - it looks like there is some kind of memory paging issue there that needs further understanding. Given that the shared memory support was about to be lost when I bring over the PMIx integration (until it is restored in that library), it seemed like a reasonable thing to just remove it at this point. |
||
---|---|---|
.. | ||
comm | ||
dash_host | ||
hostfile | ||
attr.c | ||
attr.h | ||
context_fns.c | ||
context_fns.h | ||
error_strings.c | ||
error_strings.h | ||
help-regex.txt | ||
hnp_contact.c | ||
hnp_contact.h | ||
listener.c | ||
listener.h | ||
Makefile.am | ||
name_fns.c | ||
name_fns.h | ||
nidmap.c | ||
nidmap.h | ||
parse_options.c | ||
parse_options.h | ||
pre_condition_transports.c | ||
pre_condition_transports.h | ||
proc_info.c | ||
proc_info.h | ||
regex.c | ||
regex.h | ||
session_dir.c | ||
session_dir.h | ||
show_help.c | ||
show_help.h |