The vader btl kept a per-peer registration cache to keep track of
attachments. This is not really a problem with small numbers of local
ranks but can be a problem with large SMP machines. To reduce the
footprint there is now one registration cache for all xpmem
attachments. This will probably increase the lookup time for large
transfers but is a worthwhile trade-off.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
libnl and libnl-3 are known to conflict with each other, so detect
and abort if these two libs are both used directly (e.g. Open MPI
uses libnl-3) or indirectly (e.g. libibverbs.so might depend on libnl)
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
OMPI header cannot be included in OPAL source code, hence removed it.
Fixes: (740b636db) btl/openib: Disqualify rdmacm CPC if
MPI_THREAD_MULTIPLE.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
- replace MAXHOSTNAMELEN with hardcoded 1024.
unlike Linux, Solaris #define MAXHOSTNAMELEN in <netdb.h>,
so use a hard coded value to keep the test simpl
- stdout cannot be assigned on Solaris, so use freopen instead
(back-ported from upstream commit pmix/master@a63f6e53f4)
this is a convenience macro similar to the PMIX_LIST_FOREACH macro,
that can be used to iterate on all the key/value pairs of a pmix_hash_table_t
(back-ported from upstream commit pmix/master@349971c68c)
We only support running with libfabric v1.3 or greater. So it's safe
to remove the legacy/adaptive cq_readerr() behavior.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
There are critical usnic libfabric AV insert bugs before v1.3, so
don't allow any version prior to v1.3 at run time (still allow
*compiling* with earlier versions, though, since the ABI guarantees
allow us to compile with an earlier libfabric and run with a later
libfabric).
Switch to using fi_version() to check the version (instead of calling
fi_getinfo()) as a potentially lighter-weight / simpler solution.
This allows us to only call fi_getinfo() once.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The (effective) "+42" computation was, in fact, the incorrect answer
in this case (gasp!).
We should just take the max_msg_size from the command (which came from
the libfabric endpoint max_msg_size attribute in the client) and
subtract off the max header size: 68 (which is explained in the
comment). This will result in a "large" message size which is likely
slightly smaller than the MTU, but still right up near the MTU, and
therefore good enough.
Note: the old computation (i.e., -(68-42)) worked fine when we asked
for Libfabric API v1.1 because the usnic provider would return a
max_msg_size that was already less than the MTU due to FI_PREFIX
behavior shenanigans. Once we started asking for Libfabric API v1.4,
the usnic Libfabric provider started returning (MTU + prefix_size),
and the -(68-42) computation started giving a value that was over the
MTU. This caused sendto() on the connectivity checker UDP socket
to fail.
This commit also removes an old/misleading comment.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
* Add a configure time option to rename libmpi(_FOO).*
- `--with-libmpi-name=STRING`
* This commit only impacts the installed libraries.
Internal, temporary libraries have not been renamed to limit the
scope of the patch to only what is needed.
For example:
```shell
shell$ ./configure --with-libmpi-name=wookie
...
shell$ find . -name "libmpi*"
shell$ find . -name "libwookie*"
./lib/libwookie.so.0.0.0
./lib/libwookie.so.0
./lib/libwookie.so
./lib/libwookie.la
./lib/libwookie_mpifh.so.0.0.0
./lib/libwookie_mpifh.so.0
./lib/libwookie_mpifh.so
./lib/libwookie_mpifh.la
./lib/libwookie_usempi.so.0.0.0
./lib/libwookie_usempi.so.0
./lib/libwookie_usempi.so
./lib/libwookie_usempi.la
shell$
```
theses three pmix components use the same class name,
declare it as static so Open MPI can be built with --disable-dlopen
Thanks Limin Gu for the report
The rdmacm CPC in the openib BTL is not thread safe. The rdmacm CPC
should disqualify itself (instead of failing in random ways) if
MPI_THREAD_MULTIPLE is the thread level.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
This patch is based on the "RFC: Reenabling the TCP BTL over local
interfaces (when specifically requested)". It removes the hardcoded
exception for the local devices that has been enforced by the
TCP BTL. Instead, we exclude the local interface only via the
exclude MCA (both IPv4 and IPv6 local addresses are already in the
default if_exclude), which is also the behavior currently described in
our README file.
pmix_progress_thread_finalize() invokes libevent event_base_free,
so all libevent stuff cannot be used after.
Hence, pmix_client_globals.myserver must be PMIX_DESTRUCT'ed
before invoking pmix_progress_thread_finalize()
pmix1_value_unload() was added a "key" argument which is unused,
and pmix1_value_unload() was sometimes invoked with two arguments instead of three.
since the "key" argument is unused, simply remove it from the
subroutine prototype and calls.
the LT_* macros do overwrite the enable_dlopen variable,
so it must be tested and saved before invoking LT_INIT.
delay the invokation of the LT_* macros and use the
PMIX_ENABLE_DLOPEN_SUPPORT variable to figure out whether
--disable-dlopen was invoked
This commit fixes an abort during finalize because pending events were
removed from the list twice.
References #2030
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>