Bring Slurm PMI-1 component online
Bring the s2 component online
Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.
Bring the OMPI pubsub/pmi component online
Get comm_spawn working again
Ensure we always provide a cpuset, even if it is NULL
pmix/cray: adjust cray pmix component for pmix
Make changes so cray pmix can work within the integrated
ompi/pmix framework.
Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet
Cleanup comm_spawn - procs now starting, error in connect_accept
Complete integration
This commit adds implementations of opal_atomic_ll_32/64 and
opal_atomic_sc_32/64. These atomics can be used to implement more
efficient lifo/fifo operations on supported platforms. The only
supported platform with this commit is powerpc/power.
This commit also adds an implementation of opal_atomic_swap_32/64 for
powerpc.
Tested with Power8.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit removes alpha asm support. No current processor
manufacturer makes chips compatible with DEC alpha and no
participating organization has alpha processors. This makes it
difficult to support alpha via assembly.
This doesn't mean Open MPI will no longer build/work on alpha
processors. It should continue to work with gcc's builtin sync
atomics.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
operands
Also added support for the xchg instruction. The instruction is
supported by ia32 and may benefit vader.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Use of this configuration option can cause crashing, hanging, and
(worse) incorrect results when btl/sm, btl/scif, or btl/vader are
in use. We discussed this at the January 2015 developers meeting
and it was decided to remove the option entirely. This commit does
just that. All usage of OPAL_WANT_SMP_LOCKS has been removed.
This was more complicated than I would like, but it's just an
unfortunate GCC/clang difference. I don't have access to all the C
compilers out there, so this may still have problems with other
compilers that implement some form of `#pragma GCC diagnostic` support
but don't actually behave the same as some versions of GCC.
fixes#323
There currently is no standard support for 128-bit integer types. Any use
of the __int128 and int128_t types can lead to warnings from the compiler
when using -Wpedantic. Additionally, some compilers may support __int128
and other may support int128_t. This commit addresses both issues by
defining opal_int128_t if there is a supported 128-bit type. In the
case of GCC a pragma has been added to suppress warnings about __int128
not being a standard C type.
A 128-bit compare-and-swap will enable a better atomic lifo implementation
that uses the pointer + counter method to avoid ABA issues. This commit
adds configury to check for the instruction (cmpxchg16b) and adds an
implementation that uses the __int128 type available in C99.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
WHAT: Merge the PMIx branch into the devel repo, creating a new
OPAL “lmix” framework to abstract PMI support for all RTEs.
Replace the ORTE daemon-level collectives with a new PMIx
server and update the ORTE grpcomm framework to support
server-to-server collectives
WHY: We’ve had problems dealing with variations in PMI implementations,
and need to extend the existing PMI definitions to meet exascale
requirements.
WHEN: Mon, Aug 25
WHERE: https://github.com/rhc54/ompi-svn-mirror.git
Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.
All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.
Accordingly, we have:
* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.
* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.
* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint
* removed the prior OMPI/OPAL modex code
* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.
* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand
This commit was SVN r32570.
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.
Since Paul is the only one of the team with the required hardware to test it, and he has done so, consider this RM-approved.
cmr=v1.7.5:reviewer=ompi-gk1.7
This commit was SVN r30401.
Thanks to Paul Hargrove for tracking it down.
RM-approved
cmr=v1.7.4:reviewer=ompi-gk1.7
This commit was SVN r30397.
The following SVN revision numbers were found above:
r26226 --> open-mpi/ompi@12781482b9