Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
OMPI header cannot be included in OPAL source code, hence removed it.
Fixes: (740b636db) btl/openib: Disqualify rdmacm CPC if
MPI_THREAD_MULTIPLE.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
if MPI_Init[_thread]/MPI_Finalize and MPI_T_init_thread/MPI_T_finalize
are balanced, opal_initialized is zero, and hence opal_cleanup destructor
never invokes opal_class_finalize.
if MPI_Init[_thread] nor MPI_T_init_thread have been called, classes is NULL,
so opal_class_finalize does nothing
- replace MAXHOSTNAMELEN with hardcoded 1024.
unlike Linux, Solaris #define MAXHOSTNAMELEN in <netdb.h>,
so use a hard coded value to keep the test simpl
- stdout cannot be assigned on Solaris, so use freopen instead
(back-ported from upstream commit pmix/master@a63f6e53f4)
this is a convenience macro similar to the PMIX_LIST_FOREACH macro,
that can be used to iterate on all the key/value pairs of a pmix_hash_table_t
(back-ported from upstream commit pmix/master@349971c68c)
We only support running with libfabric v1.3 or greater. So it's safe
to remove the legacy/adaptive cq_readerr() behavior.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
There are critical usnic libfabric AV insert bugs before v1.3, so
don't allow any version prior to v1.3 at run time (still allow
*compiling* with earlier versions, though, since the ABI guarantees
allow us to compile with an earlier libfabric and run with a later
libfabric).
Switch to using fi_version() to check the version (instead of calling
fi_getinfo()) as a potentially lighter-weight / simpler solution.
This allows us to only call fi_getinfo() once.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The (effective) "+42" computation was, in fact, the incorrect answer
in this case (gasp!).
We should just take the max_msg_size from the command (which came from
the libfabric endpoint max_msg_size attribute in the client) and
subtract off the max header size: 68 (which is explained in the
comment). This will result in a "large" message size which is likely
slightly smaller than the MTU, but still right up near the MTU, and
therefore good enough.
Note: the old computation (i.e., -(68-42)) worked fine when we asked
for Libfabric API v1.1 because the usnic provider would return a
max_msg_size that was already less than the MTU due to FI_PREFIX
behavior shenanigans. Once we started asking for Libfabric API v1.4,
the usnic Libfabric provider started returning (MTU + prefix_size),
and the -(68-42) computation started giving a value that was over the
MTU. This caused sendto() on the connectivity checker UDP socket
to fail.
This commit also removes an old/misleading comment.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
* Add a configure time option to rename libmpi(_FOO).*
- `--with-libmpi-name=STRING`
* This commit only impacts the installed libraries.
Internal, temporary libraries have not been renamed to limit the
scope of the patch to only what is needed.
For example:
```shell
shell$ ./configure --with-libmpi-name=wookie
...
shell$ find . -name "libmpi*"
shell$ find . -name "libwookie*"
./lib/libwookie.so.0.0.0
./lib/libwookie.so.0
./lib/libwookie.so
./lib/libwookie.la
./lib/libwookie_mpifh.so.0.0.0
./lib/libwookie_mpifh.so.0
./lib/libwookie_mpifh.so
./lib/libwookie_mpifh.la
./lib/libwookie_usempi.so.0.0.0
./lib/libwookie_usempi.so.0
./lib/libwookie_usempi.so
./lib/libwookie_usempi.la
shell$
```
theses three pmix components use the same class name,
declare it as static so Open MPI can be built with --disable-dlopen
Thanks Limin Gu for the report
The rdmacm CPC in the openib BTL is not thread safe. The rdmacm CPC
should disqualify itself (instead of failing in random ways) if
MPI_THREAD_MULTIPLE is the thread level.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
This patch is based on the "RFC: Reenabling the TCP BTL over local
interfaces (when specifically requested)". It removes the hardcoded
exception for the local devices that has been enforced by the
TCP BTL. Instead, we exclude the local interface only via the
exclude MCA (both IPv4 and IPv6 local addresses are already in the
default if_exclude), which is also the behavior currently described in
our README file.
pmix_progress_thread_finalize() invokes libevent event_base_free,
so all libevent stuff cannot be used after.
Hence, pmix_client_globals.myserver must be PMIX_DESTRUCT'ed
before invoking pmix_progress_thread_finalize()
Prevent a race condition between a thread checking count and then
going in cond_wait, and another thread setting the count to 0 and
signaling the condition.
Thanks to Pascal Deveze for catching up the bug and for
the initial patch.