This commit adds the data necessesary for supporting dynamic add_procs
to the rdma message (opal_process_name_t). The endpoint lookup
function has been updated to match the code in udcm.
Closesopen-mpi/ompi#1468.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Fix CID 1345825 (1 of 1): Dereference before null check (REVERSE_INULL):
ib_proc should not be NULL in this case. Removed the check and added a
check for NULL after OBJ_NEW.
CID 1269821 (1 of 1): Dereference null return value (NULL_RETURNS):
I labeled this one as a false positive (which it is) but the code in
question could stand be be cleaned up.
Fix CID 1356424 (1 of 1): Argument cannot be negative (NEGATIVE_RETURNS):
While trying to silence another Coverity issue another was
flagged. Protect the close of fd with if (fd >= 0).
CID 70772 (1 of 1): Dereference null return value (NULL_RETURNS):
CID 70773 (1 of 1): Dereference null return value (NULL_RETURNS):
CID 70774 (1 of 1): Dereference null return value (NULL_RETURNS):
None of these are errors and are intentional but now that we have a
list release function use that to make these go away. The cleanup is
similar to CID 1269821.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
+ Added an mca parameter to allow connecting processes from different
subnets. Its current default value is 'false' - don't allow, to keep the
current flow the way it is now.
+ rmdacm: when calling ibv_query_gid, use the gid index from
btl_openib_gid_index.
This commit rewrites both the mpool and rcache frameworks. Summary of
changes:
- Before this change a significant portion of the rcache
functionality lived in mpool components. This meant that it was
impossible to add a new memory pool to use with rdma networks
(ugni, openib, etc) without duplicating the functionality of an
existing mpool component. All the registration functionality has
been removed from the mpool and placed in the rcache framework.
- All registration cache mpools components (udreg, grdma, gpusm,
rgpusm) have been changed to rcache components. rcaches are
allocated and released in the same way mpool components were.
- It is now valid to pass NULL as the resources argument when
creating an rcache. At this time the gpusm and rgpusm components
support this. All other rcache components require non-NULL
resources.
- A new mpool component has been added: hugepage. This component
supports huge page allocations on linux.
- Memory pools are now allocated using "hints". Each mpool component
is queried with the hints and returns a priority. The current hints
supported are NULL (uses posix_memalign/malloc), page_size=x (huge
page mpool), and mpool=x.
- The sm mpool has been moved to common/sm. This reflects that the sm
mpool is specialized and not meant for any general
allocations. This mpool may be moved back into the mpool framework
if there is any objection.
- The opal_free_list_init arguments have been updated. The unused0
argument is not used to pass in the registration cache module. The
mpool registration flags are now rcache registration flags.
- All components have been updated to make use of the new framework
interfaces.
As this commit makes significant changes to both the mpool and rcache
frameworks both versions have been bumped to 3.0.0.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes an inconsistency between btl_openib_receive_queues,
btl_openib_max_send_size and btl_openib_eager_limit. Before this
commit if the ini file specified a set of default receive queues that
happen to not contain one large enough for the default max_send_size
of eager_limit users would see an error like:
WARNING: The largest queue pair buffer size specified in the
btl_openib_receive_queues MCA parameter is smaller than the maximum
send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
that no queue is large enough to receive the largest possible incoming
message fragment. The OpenFabrics (openib) BTL will therefore be
deactivated for this run.
Local host: somehost
Largest buffer size: 65536
Maximum send fragment size: 131072
This commit adds code that detects the source of the max_send_size and
eager_limit values and sets either or both of them to the size
supported by the largest queue pair if both 1) the value is larger
than the largest queue pair size, and 2) the value was not set by the
user or a MCA configuration file.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds two m4 macros: OPAL_SUMMARY_ADD, OPAL_SUMMARY_PRINT.
OPAL_SUMMARY_ADD adds an item to a section in the summary. For example
OPAL_SUMMARY_ADD([[Transports]],[[Foo]],...,[yes]) will add the
following to the summary:
Transports
-----------------------
Foo: yes
With this commit two sections are added: Transports, Resource Managers.
The OPAL_SUMMARY_PRINT macro is called after AC_OUTPUT and prints out
some information about the build (version, projects, etc) and then
the summarys sections. It will additionally print a warning if
internal debugging is enabled.
Example output:
Open MPI configuration:
-----------------------
Version: 3.0.0 a1
Build Open Platform Abstration project: yes
Build Open Runtime project: yes
Build Open MPI project: yes
Build Open SHMEM project: no
MPI C++ bindings (deprecated): no
MPI Fortran bindings: mpif.h, use mpi, use mpi_f08
Debug build: yes
Transports
-----------------------
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no
KNEM Shared Memory: no
Linux CMA IPC: no
Mellanox MXM: no
Open UCX: no
OpenFabrics libfabric: no
OpenFabrics Verbs: no
portals4: no
QLogic Infinipath (PSM): no
tcp: yes
XPMEM Shared Memory: no
Resource Managers
-----------------------
Cray Alps: no
Grid Engine: no
LSF: no
Slurm: yes
Torque: yes
INTERNAL DEBUGGING IS ENABLED. DO NOT USE THIS BUILD FOR PERFORMANCE MEASUREMENTS!
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit fixes a bug that can occur when communicating via XRC to
peers on the same node. UDCM was not saving the SRQ numbers on the
loopback endpoint (which shares its ib_addr info with all local peers)
so any messages to local peers use an invalid SRQ number.
Fixesopen-mpi/ompi#1383
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This bug fixes two issue with the ib_addr lock:
- The ib_addr lock must always be obtained regardless of
opal_using_threads() as the CPC is run in a seperate thread.
- The ib_addr lock is held in mca_btl_openib_endpoint_connected when
calling back into the CPC start_connect on any pending
connections. This will attempt to obtain the ib_addr lock
again. Since this is not a performance-critical part of the code
the lock has been changed to be recursive.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a bug that occurs when attempting a get or put
operation on an endpoint that is not already connected. In this case
the remote_srqn may be set to an invalid value as the rem_srqs array
on the endpoint is not populated. This commit moves the usage of the
rem_srqs array to the internal put/get functions where it is
guaranteed this array is populated.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit ensures ib_addr->remote_xrc_rcv_qp_num value is set when
creating the loopback queue pair. This is needed when communicating
with any other local peer.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes two bugs in XRC support
- When dynamic add_procs support was added to master the remote
process name was added to the non-XRC request structure. The same
value was not added to the XRC xconnect structure. This error was
not caught because the send/recv code was incorrectly using the
wrong structure member. This commmit adds the member and ensure the
xconnect code uses the correct structure.
- XRC loopback QP support has been fixed. It was 1) not setting the
correct fields on the endpoint structure, 2) calling
udcm_xrc_recv_qp_connect, and 3) was not initializing the endpoint
data.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
mca_btl_openib_put incorrectly checks the qp inline max before
allowing an inline put. This check will always fail for an endpoint
that has not been connected. The commit changes the check to use the
btl_put_local_registration_threshold instead.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Commit open-mpi/ompi@400af6c52d
introduced a regression in XRC support. The commit reversed the
ordering of shared receive queue (SRQ) and completion queue (CQ)
completion. CQ creation must always preceed SRQ creation when using
XRC as the CQs are needed to create the SRQs. This commit fixes the
ordering so that CQs are always created before SRQs.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
These changes fix issue https://github.com/open-mpi/ompi/issues/1336
- improve abstractions: opal/memory/linux component should be single place that opeartes with
Memory Allocation Hooks.
- avoid collisions in case dynamic component open/close: it is safe because it is linked statically.
- does not change original behaivour.
`cm_message_event_active == 1` but main thread has already stopped
processing messages and thus we will have the situation where one
message was left unhandled leading to a hang.
The problem was in mca_btl_openib_proc_create. This function may be called
from several places simultaneously:
* from the main thread when somebody wants to do `MPI_Send()` (for example) for
the first time;
* from udcm if the counterpart peer is trying to connect and `mca_btl_openib_get_ep()`
is called.
In this case one of the threads may add an uninitialized proc structure
to the `mca_btl_openib_component.ib_procs` and the other will read it and
treat as initialized.
This commit turns ib_proc initialization into a single atomic operation.
It was decided some time ago that there is no benefit to using any
per-peer receive queues on infiniband. At the time we decided not to
change the default but that objection has been dropped. This commit
changes the 128 message queue to use SRQ instead of PP. This has no
impact on iWarp which sets the default in a different way.
Closesopen-mpi/ompi#1156
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Mofed 2.2 does not have the IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG attribute
flag. Add a check to fix compilation for mofed 2.2. This commit only
fixes complilation with the older mofed. It will not allow an Open MPI
compiled with mofed 2.3 or newer to work on a machine with mofed 2.2.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This update adds an additional check (if supported) to see if 8-byte
atomics are supported by the hardware. If 8-byte atomics are not
supported the atomics support is disabled.
This commit also includes some cleanup.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds support for fetch-and-add and compare-and-swap when
using the mlx5 driver. The support is only enabled if the expanded
verbs interface is detected. This is required because mlx5 HCAs return
the atomic result in network byte order. This support may need to be
tweaked if Mellanox commits their changes into upstream verbs.
Closesopen-mpi/ompi#1077
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes the following bugs:
- On send failure release newly allocated message.
- In the destructor for udcm_message_sent_t always remove the send
timeout event from the event base. Failure to do this can lead to
memory corruption since the destructor may be called from an event
callback.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This was fixed on my btl 3.0 branch but the changeset got lost in a
rebase. Fixes issues with lock ups when using osc/rdma.
References open-mpi/ompi#1010
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit removes the service and async event threads from the
openib btl. Both threads are replaced by opal progress thread
support. The run_in_main function is now supported by allocating an
event and adding it to the sync event base. This ensures that the
requested function is called as part of opal_progress.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>