As more providers get added to libfabric, the default exclude list would need
to be updated.
Instead, we choose to include only the providers known to work by default.
New default:
- include: psm,psm2,gni
- exclude: none
Intel TrueScale and Intel OmniPath, and detect a link in ACTIVE state.
This fix addresses the scenario reported in the below OMPI users email,
including formerly named Qlogic IB, now Intel True scale. Given the
nature of the PSM/PSM2 mtls this fix applies to OmniPath:
https://www.open-mpi.org/community/lists/users/2016/04/29018.php
This commit removes the --with-mpi-thread-multiple option and forces
MPI_THREAD_MULTIPLE support. This cleans up an abstration violation
in opal where OMPI_ENABLE_THREAD_MULTIPLE determines whether the
opal_using_threads is meaningful. To reduce the performance hit on
MPI_THREAD_SINGLE programs an OPAL_UNLIKELY is used for the
check on opal_using_threads in OPAL_THREAD_* macros.
This commit does not clean up the arguments to the various functions
that take whether muti-threading support is enabled. That should be
done at a later time.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
When mtl-portals4 is configured for logical mapping, coll-portals4
must disqualify because it does not yet support logical mapping.
coll-portals4 looks for the endpoint pid to be zero which tells it
that mtl-portals4 is configured for logical mapping. This commit
initializes the endpoint nid/pid to zero for logical mapping.
accelerate message queue lookups if the lookups have
the proper tag&mask layout. OpenMPI should follow
PSM2's preferred tag&mask spec, so that PSM2 can provide
a performance benefit.
During component finalize, mtl-portals4 would blindly release
resources without testing if the handle was valid. This was OK,
but resource allocation is now delayed until add_procs(). If
mtl-portals4 is deselected, it will be finalized without
add_procs() ever being called. This commit ensures that invalid
handles are not released.
This commit changes the priority of mtl components to be relative to
pml/ob1 and updates the mtl interface to expose this priority. cm now
sets its own priority based on the priority of the selected mtl
component.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The mca_base_select function uses returned priorities to select the
best component/module. This priority may be of use to the caller so
pass that information back in an optional argument. If the priority is
not needed pass NULL.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The Portals4 get_peer family incorrectly cast the ompi_proc_t to
ptl_process_t and returned that as the peer. The ptl_process_t is
actually found in the endpoint array. This commit fixes the
Portals4 get_peer family to return the dereferenced endpoint
pointer.
In the default mode of operation, the Portals4 components support
dynamic add_procs().
The Portals4 components have two alternate modes (flow control and
logical-to-physical) that require knowledge of all procs at startup.
In these modes, mtl-portals4 sets the MCA_MTL_BASE_FLAG_REQUIRE_WORLD
flag and btl-portals4 sets the MCA_BTL_FLAGS_SINGLE_ADD_PROCS flag
to tell the PML that we need all the procs in one add_procs() call.
This commit adds support to the pml, mtl, and btl frameworks for
components to indicate at runtime that they do not support the new
dynamic add_procs behavior. At the high end the lack of dynamic
add_procs support is signalled by the pml using the new pml_flags
member to the pml module structure. If the
MCA_PML_BASE_FLAG_REQUIRE_WORLD flag is set MPI_Init will generate the
ompi_proc_t array passed to add_proc from ompi_proc_world () instead
of ompi_proc_get_allocated ().
Both cm and ob1 have been updated to detect if the underlying mtl and
btl components support dynamic add_procs.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds two new functions:
- ompi_proc_get_allocated - Returns all procs in the current job that
have already been allocated. This is used in init/finalize to
determine which procs to pass to add_procs/del_procs.
- ompi_proc_world_size - returns the number of processes in
MPI_COMM_WORLD. This may be removed in favor of callers just
looking at ompi_process_info.
The behavior of ompi_proc_world has been restored to return
ompi_proc_t's for all processes in the current job. The use of this
function is discouraged.
Code that was using ompi_proc_world() has been updated to make use of
the new functions to avoid the memory overhead of ompi_comm_world ().
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Add an accessor for the proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]
member of the ompi_proc_t structure. This accessort calls add_procs
with the ompi_proc_t if the member is NULL.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
Add an accessor for the proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]
member of the ompi_proc_t structure. This accessort calls add_procs
with the ompi_proc_t if the member is NULL. Tested on an infinipath
system with InfiniPath_QLE7340 HCAs.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
Bring Slurm PMI-1 component online
Bring the s2 component online
Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.
Bring the OMPI pubsub/pmi component online
Get comm_spawn working again
Ensure we always provide a cpuset, even if it is NULL
pmix/cray: adjust cray pmix component for pmix
Make changes so cray pmix can work within the integrated
ompi/pmix framework.
Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet
Cleanup comm_spawn - procs now starting, error in connect_accept
Complete integration
mtl_ofi_provider_include (resp. mtl_ofi_provider_exclude) can be used
to specify which provider(s) the OFI MTL can select (resp. ignore).
e.g. --mca mtl_ofi_provider_include "psm,sockets"
By default, mtl_ofi_provider_exclude is set to "sockets,mxm".
This deprecates the old MCA var named "mtl_ofi_provider".
Some OFI providers such as "sockets" are used for debugging
purposes mostly. For these providers, other components usually
offer better performance -- e.g. for sockets, the BTL/TCP would
be a better choice.
Thus, we chose to ignore some providers unless explicitly asked
by the user on the command line:
e.g. --mca mtl_ofi_provider sockets