1
1

32 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
ea35e47228 Fat SMPs (i.e., systems with nodes containing large numbers of cpus) were failing to start due to connection failures of the opal/pmix support. Root cause was that (a) we were setting the client socket to non-blocking before calling connect, and (b) the server was using the event library to harvest the accepts, and also did the handshake while in that event. So the server would backup beyond the connection backlog limit, and we would fail.
Changing the client to leave its socket as blocking during the connect doesn't solve the problem by itself - you also have to introduce a sleep delay once the backlog is hit to avoid simply machine-gunning your way thru retries. This gets somewhat difficult to adjust as you don't want to unnecessarily prolong startup time.

We've solved this before by adding a listening thread that simply reaps accepts and shoves them into the event library for subsequent processing. This would resolve the problem, but meant yet another daemon-level thread. So I centralized the listening thread support and let multiple elements register listeners on it. Thus, each daemon now has a single listening thread that reaps accepts from multiple sources - for now, the orte/pmix server and the oob/usock support are using it. I'll add in the oob/tcp component later.

This still didn't fully resolve the SMP problem, especially on coprocessor cards (e.g., KNC). Removing the shared memory dstore support helped further improve the behavior - it looks like there is some kind of memory paging issue there that needs further understanding. Given that the shared memory support was about to be lost when I bring over the PMIx integration (until it is restored in that library), it seemed like a reasonable thing to just remove it at this point.
2015-05-29 14:37:14 -07:00
Nathan Hjelm
1d27b1f944 pmix/native: fix coverity issue
CID 1269730 Dereference after null check (FORWARD_NULL)

The code checked for cb == NULL before checking for a callback
function but did not have the same protection around the
OBJ_RELEASE(cb).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-29 08:48:15 -06:00
Gilles Gouaillardet
c809aace47 initialize common symbols from opal
A few uninitialized common symbols are remaining:

common symbols generated by flex :
 * opal/util/keyval/keyval_lex.l: opal_util_keyval_yyleng
 * opal/util/keyval/keyval_lex.o: opal_util_keyval_yytext
 * opal/util/show_help_lex.l: opal_show_help_yyleng
 * opal/util/show_help_lex.l: opal_show_help_yytext

common symbol generated by "external" hwloc library:
 * opal/mca/hwloc/hwloc191/hwloc/src/components.o: component_map
2015-05-08 09:48:51 +09:00
Nathan Hjelm
33181b2543 opal: use C99 subobject naming for component initialization
This commit helps future-proof opal components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Ralph Castain
acc2c7937c Thanks Nathan - decrement the counter to ensure singleton's startup correctly 2015-04-08 11:23:35 -07:00
Ralph Castain
d07dc362d5 Ensure we can authenticate when crossing security domains by including all available credentials, and letting the receiver use the highest priority one they have in common. 2015-03-28 20:34:26 -07:00
Ralph Castain
1b24536941 Allow for different security domains. Let the initiator of the connection determine the method to be used - if the receiver cannot support it, then that's an error that will cause the connection attempt to fail. 2015-03-25 13:22:01 -07:00
Ralph Castain
d7d8ae46ed We no longer pass the RML URI for procs launched via mpirun as the daemon has no need for that info. 2015-03-17 06:10:20 -07:00
Gilles Gouaillardet
0ce59f2d29 pmix: fix misc memory leaks
as reported by Coverity as CID 1269843, 1269854, 1269856, 1269857 and 1269858
2015-02-16 11:19:43 +09:00
Ralph Castain
3ae3b96c17 Fix master compilation - a buried header dependency must have been removed. 2015-02-10 07:22:10 -08:00
Ralph Castain
a3275aa867 Once again, fix the blasted singleton comm_spawn 2015-02-05 17:34:25 -08:00
Ralph Castain
294ebc907a Fix singleton operations so they can work inside a slurm environment 2015-01-27 09:29:42 -06:00
Ralph Castain
028b00154d Complete implementation of the schizo framework to support OMPI component 2015-01-27 09:29:42 -06:00
Gilles Gouaillardet
9e9261e90a pmix: correctly set locality flags in proc_flags
do not use opal_process_info.cpuset which is not
set at that time.
2014-12-26 15:37:08 +09:00
Ralph Castain
9658256a98 Restore the passing of the complete job map to the local proc on first get_attr so the info can be used by the MPI layer without continual calls back to the server. We'll find a more memory efficient method later. 2014-12-13 18:44:09 -08:00
Gilles Gouaillardet
578fe41788 fix hangs introduced by previous commit a6744b81777ab8247908350bd15cca49bedf5208 2014-11-25 17:50:44 +09:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Elena
03fc809bc9 This commit contains new dstore component sm which is used for communication between pmix server and clients at the same node via shared memory. 2014-11-06 16:01:19 +02:00
Gilles Gouaillardet
ca0b969991 pmix: fix a return status in native_get_attr 2014-10-30 15:26:23 +09:00
Gilles Gouaillardet
8c556bbc66 pmix: fix alignment issue 2014-10-29 13:19:23 +09:00
Gilles Gouaillardet
5c81658d58 pmix: fix big endian arch
use the appropriate 64 bits type otherwise data gets incorrectly
truncated on big endian arch
2014-10-15 17:17:09 +09:00
Gilles Gouaillardet
5c5453b8b1 pmix: fix test in native_get_attr 2014-10-03 11:54:08 +09:00
Ralph Castain
8d0b4f222a The pmix.get functions should not be returning "success" if the requested info isn't found. Fix the macros and the component functions so they correctly return "not found" in that situation, and set the data regions and size to NULL and 0, respectively.
This commit was SVN r32818.
2014-09-30 18:03:12 +00:00
Ralph Castain
6323b226c7 Bring over some updates from the PMIx branch - mostly just minor cleanups. Make the direct grpcomm component no longer be the default. For now, we seem to be having problems with non-blocking fence operations, so make them not be the default under any scenario (e.g., when sm is the only btl in operation).
This commit was SVN r32673.
2014-09-06 19:19:44 +00:00
Ralph Castain
5cdbc00136 Re-enable the usock oob component. Ensure the TCP component promotes messages for other procs to the OOB base so that other components have a chance to send the relay. Seems to be passing MTT, so let's see how it works for others.
This commit was SVN r32650.
2014-08-30 19:33:46 +00:00
Ralph Castain
730e28349e Some minor uninitialized variable cleanups
This commit was SVN r32629.
2014-08-29 02:21:13 +00:00
Gilles Gouaillardet
d743da18bf pmix: fix process name parsing on 32 bits systems
opal_process_name_t is an uint64_t which is not equivalent to
an unsigned long on 32 bits systems.
this is now parsed as an unsigned long long.

This commit was SVN r32592.
2014-08-25 03:08:02 +00:00
Ralph Castain
f00af81c1d Little more cleanup under the abort cases cited by Gilles. All seem to be working now
This commit was SVN r32585.
2014-08-22 19:57:57 +00:00
Ralph Castain
b1a7375192 Fix the "unreachable" message so it outputs the correct hostname for the remote proc. Cleanup some of the pmix stuff when running corner cases of errors
This commit was SVN r32584.
2014-08-22 19:20:45 +00:00
Ralph Castain
6ff2a60829 Handle the non-blocking fence case correctly, and ensure we always at least pass back the hostname of the process whose info is being requested so that the ompi_proc_t can correctly initialize it when we are in a non-blocking fence with np < cutoff scenario
This commit was SVN r32578.
2014-08-22 14:26:24 +00:00
Ralph Castain
8f1b9b463e Fix shared memory operations - need to pass the local topology and cpusets of all local peers so we can properly compute relative locality for them. Also need to set default locality to "on node" in case where cpusets are not passed because procs are not bound.
This commit was SVN r32577.
2014-08-22 05:17:51 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00