1
1

5484 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
f63d6e2335
Merge pull request #7998 from wckzhang/v4.1.x
V4.1.x btl/ofi backport disabling EFA and ofi_rxm providers
2020-08-17 12:12:07 -07:00
Jeff Squyres
2a0757f997
Merge pull request #8006 from bosilca/fix/smcuda_alignment
Fix the cacheline usage in the CUDA BTL.
2020-08-17 15:08:42 -04:00
George Bosilca
a81c16b9d0 Fix the cacheline usage in the CUDA BTL.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-14 14:36:33 -04:00
William Zhang
27c252e211 btl/ofi: Disable ofi_rxm provider
The ofi_rxm provider is dependent upon the underlying hardware for its
implementation of FI_DELIVERY_COMPLETE. Since this can lead to early
completions, we disable the provider to avoid correctness issues.

This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.

Signed-off-by: William Zhang <wilzhang@amazon.com>
(cherry picked from commit 41acfee2bbfc5495aeeeae4b72f385ca8d1d8cee)
2020-08-12 14:03:08 -07:00
William Zhang
357ac2e2f5 btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0
EFA incorrectly implements FI_DELIVERY_COMPLETE in earlier libfabric
versions. While FI_DELIVERY_COMPLETE would be advertised by the
provider, completions would return too early by not accounting for
bounce buffers on the receive side. This would cause the BTL
to receive early completions that lead to correctness issues.

This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.

Signed-off-by: William Zhang <wilzhang@amazon.com>
(cherry picked from commit a7dcfd98742d7f44d96d74a60a38529894d4099c)
2020-08-12 14:02:51 -07:00
Brian Barrett
22163d37b5
Merge pull request #7991 from wckzhang/v4.1.x
v4.1.x: btl/ofi: Use common provider include/exclude list
2020-08-11 11:21:07 -07:00
Mikhail Kurnosov
11620038f9 Fix a typo in parsing locality string: L0 changed to L1
(`prte_hwloc_base_get_locality_string` never returns locality string with L0).

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 4708458d6b12d1b7a8e11fa3c3f784c545780707)
2020-08-11 22:35:00 +07:00
William Zhang
abe3aaa4a7 btl/ofi: Use common provider include/exclude list
The btl/ofi does not currently utilize the common ofi include/exclude
list. Added verification code similar to the mtl/ofi that will check if
the info object is in the include or exclude list. If it isn't in the
include list or is in the exclude list, validate_info will return
OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint
when calling getinfo, instead filtering the provider during
validate_info.

This patch also moves the is_in_list MTL function into common code and
adds additional debugging output to the BTL to match the MTL standard.

Signed-off-by: William Zhang <wilzhang@amazon.com>
(cherry picked from commit 9b8f463a768206a26e5b1dcea0612a403462b1d0)
2020-08-10 14:28:59 -07:00
Jeff Squyres
6b4e2f02f8
Merge pull request #7956 from tomhers/topic/fix_ofi_btl_include_file
v4.1.x: BTL/OFI: Fix missing include file.
2020-07-27 15:26:01 -04:00
tomhers
229724799e BTL/OFI: Fix missing include file.
The missing include file causes an error when using an external version of LibEvent.

Signed-off-by: tomhers <tom.herschberg@gmail.com>
(cherry picked from commit 88f9d2c90f1730f2e0b5bc4951893f60ff5a1332)
2020-07-15 17:14:55 -04:00
Nikola Dancejic
c018337d03 v4.1.x: common/ofi: added address format check to fix provider selection
bugfix: provider selection would not differentiate between ipv4
and ipv6 addresses which would cause some nodes to be unable
to communicate between each other. Adding a check for address
format to provider selection to ensure that all nodes use the
same address format.

Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
(cherry picked from commit 7e463713014ae58f7e78d7a5b49e9e63d62e374e)
2020-07-15 00:32:37 +00:00
Jeff Squyres
13b7844513
Merge pull request #7881 from cniethammer/uct-supported-version-update
v4.1.x: Accept UCX 1.8 in configure of btl/uct
2020-07-06 07:34:17 -04:00
Jeff Squyres
b4106e94e1 pmix3x: Remove --enable-install-libpmix option
This option is problematic, and has never worked in an Open MPI v4.0.x
release tarball.  Given that PMIx is now available elsewhere, it isn't
worth fixing this option.

See https://github.com/open-mpi/ompi/issues/6228 for more detail.

NOTE: This is a v4.0.x-specific commit because this option no longer
exists on master because we deleted the entire pmix3x component.
Hence, it's not possible to cherry-pick anything from master back to
the v4.0.x branch.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 447b14061880e218371f9eb0cbe427b8358d45b8)
2020-06-27 10:01:23 -07:00
Christoph Niethammer
ad1d427d60 Accept UCX 1.8 in configure of btl/uct
The configure script for the btl uct component reports an error for
the new UCX 1.8.0 versions as it was fixed up to UCX 1.7.

This fixes #7612

Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
(cherry picked from commit 9b10f46126b0a5aa796d9fec063c2d454a9a1bc9)
2020-06-26 21:20:49 +02:00
Jeff Squyres
7987a7f56e common_ofi: fix preprocessor macro typo
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f64c30e93c719b0c910e81f716b68d5889de8ce5)
2020-06-26 07:56:53 -07:00
Brian Barrett
173142bf32
Merge pull request #7824 from hoopoepg/topic/ucx-test-external-events-v4.1
COMMON/UCX: improved missing events test - v4.1
2020-06-25 08:10:54 -07:00
Brian Barrett
9ffe9535f3
Merge pull request #7837 from markalle/interception_early_toc_read_v41x
noinline to avoid compiler reading TOC before PATCHER_BEGIN
2020-06-24 10:55:35 -07:00
Howard Pritchard
8fcf9cee3b add a common ofi whitelist/blacklist
also add common verbose variable.

Note the verbosity thing is a little tricky owing to the way the MCA frameworks and components are registered and
and initialized.  The BTL's are registered/initialized prior to the MTL components even getting registered.

Here's the change in ofi mtl mca parameters.  Before commit:

            MCA mtl ofi: parameter "mtl_ofi_provider_include" (current value: "psm2", data source: environment, level: 1 user/basic, type: string)
                          Comma-delimited list of OFI providers that are considered for use (e.g., "psm,psm2"; an empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_exclude.
             MCA mtl ofi: parameter "mtl_ofi_provider_exclude" (current value: "shm,sockets,tcp,udp,rstream", data source: default, level: 1 user/basic, type: string)
                          Comma-delimited list of OFI providers that are not considered for use (default: "sockets,mxm"; empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_include.

After commit:

             MCA btl ofi: parameter "btl_ofi_provider_include" (current value: "", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_include)
                          Comma-delimited list of OFI providers that are considered for use (e.g., "psm,psm2"; an empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_exclude.
             MCA btl ofi: parameter "btl_ofi_provider_exclude" (current value: "shm,sockets,tcp,udp,rstream", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_exclude)
                          Comma-delimited list of OFI providers that are not considered for use (default: "sockets,mxm"; empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_include.
             MCA mtl ofi: parameter "mtl_ofi_provider_exclude" (current value: "shm,sockets,tcp,udp,rstream", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_exclude)
                          Comma-delimited list of OFI providers that are not considered for use (default: "sockets,mxm"; empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_include.
             MCA mtl ofi: parameter "mtl_ofi_verbose" (current value: "0", data source: default, level: 3 user/all, type: int, synonym of: opal_common_ofi_verbose)

related to #7755

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 9f1081a07ac3c7b7277a27277ed970ed713207c9)
(cherry picked from commit 45b643d0cfa46f1abb9a5f43cf0ff304cf6a5fea)
2020-06-23 14:02:33 -06:00
Mark Allen
1f5adc65bc noinline to avoid compiler reading TOC before PATCHER_BEGIN
This bug was first seen in a different product that's using the same
interception code as OMPI.  But I think it's potentially in OMPI too.

In my vanilla build of OMPI master on RH8 if I "gdb libopen-pal.so" and
"disassemble intercept_brk", I'm seeing a suspicious extra instruction
in front of PATCHER_BEGIN:
   0x00000000000d6778 <+40>:    std     r2,24(r1) // something gcc put in front
   0x00000000000d677c <+44>:    std     r2,96(r1) // PATCHER_BEGIN's toc_save
   0x00000000000d6780 <+48>:    nop               // NOPs from PATCHER_BEGIN
   0x00000000000d6784 <+52>:    nop               // that get replaced
   0x00000000000d6788 <+56>:    nop               // by instructions that
   0x00000000000d678c <+60>:    nop               // change r2
   0x00000000000d6790 <+64>:    nop               //

Later there are loads from that location like
   0x000000000019e0e4 <+132>:   ld      r2,24(r1)
that make me nervous since that's the pre-updated value.

I believe this is the same thing Nathan is describing way back in a9bc692d
and his solution was to put a second call around each interception, where
the outer call is just
    intercept_brk():
        PATCHER_BEGIN
        _intercept_brk()
        PATCHER_END
and the inner call _intercept_brk() is where the bulk of the code goes.

What I'm seeing is that _intercept_brk() is being inlined and probably
negating Nathan's fix.  So I want to add __opal_attribute_noinline__ to
restore the fix.

With this commit in place, the disassembly of intercept_brk becomes tiny
because it's no longer inlining _intercept_brk() and the susipicious
early save of r2 is gone.  I made the same fix to all the intercept_*
functions, although intercept_brk was the only one that had a suspicious
save of r2.

As far as empirical failures though, we only have those from the non-OMPI
product that's using the same patcher code.  I'm not actually getting OMPI
to fail from the above suspicious data being saved in r1+24.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit ddd1f578ecfc443d05250e09bf5e5077c6d6f304)
2020-06-17 18:45:54 -04:00
Harumi Kuno
61b11e4101 mtl_btl_ofi_rcache_init() before creating domain
mtl_btl_ofi_rcache_init() initializes patcher which should only take
place things are single threaded.  OFI providers may start spawn threads,
so initialize the rcache before creating OFI objects to prevent races.

Authored-by: John L. Byrne <john.l.byrne@hpe.com>
Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
(cherry picked from commit f1b21cb77680106be870ad29e5a7534862fceed7)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Nikola Dancejic
c51917675c common/ofi: Fixing compilation issue with ofi versions that do not support fi_info.nic
Added the flag OPAL_OFI_PCI_DATA_AVAILABLE to remove accessing the nic
object in
fi_info when the ofi version does not support that structure.

Signed-off-by: Nikola Dancejic dancejic@amazon.com
(cherry picked from commit ae2a447b0eddaac057beecdba99e10903051a2a7)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Jeff Squyres
c8659a98c4 ofi: revamp OPAL_CHECK_OFI configury
Update the OPAL_CHECK_OFI configury macro:

- Make it safe to call the macro multiple times:
  - The checks only execute the first time it is invoked
  - Subsequent invocations, it just emits a friendly "checking..."
    message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
  output variables from OPAL_CHECK_OFI to be
  opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
  causes the macro to be invoked at a fairly random time, which makes
  configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
  resulted in an indenting change, making the change to
  opal_check_ofi.m4 look larger than it really is).

Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f5e1a672ccd5db127e85e1e8f6bcfeb8a8b04527)

NOTE: This patch was cherry-picked into the v4.0.x branch as 9ad871fc,
but the OFI BTL changes were skipped, because the OFI BTL was not in
the v4.0.x branch.  This version of the cherry pick brings in the
changes to the OFI BTL.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Nikola Dancejic
d9bfb1c4da common/ofi: Added multi-NIC support to provider selection
Adds the capability to select a NIC based on hardware locality.
Creates a list of NICs that share the same cpuset as the process,
then selects the NIC based on the (local rank) % (number of NICs).
If no NICs are available that share the same cpuset, the selection process
will create a list of all available NICs and make a selection based on
(local rank) % (number of NICs)

Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
(cherry picked from commit 167d75b42ac3ca4770d59c796c011d72e0fffde3)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Harumi Kuno
2e137ae31c set ep to NULL to avoid double close
Per suggestion of @awlauria

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
(cherry picked from commit ab4875ddc2b76ceced120fbfe09d8dcbde6e6ff3)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Harumi Kuno
faaaf5ca1c Add comments about order of close ops
Per suggestion of @awlauria, added some comments about
the need to free ep before resources it points to.

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
(cherry picked from commit 1bc3dab118bb694d932ef365a7a922e514f82a20)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
harumi kuno
bb57080c84 Fix mca_btl_ofi_finalize clean-up logic
This fix is from John L. Byrne (john.l.byrne@hpe.com).

When OFI Libfabric binds objects to endpoints, before the object can
be successfully closed, the endpoint must first be freed.  For scalable
endpoints, objects can also be bound to transmit and receive contexts,
and for objects that are bound to contexts, we need to first free the
contexts before freeing the endpoint. We also need to clear the memory
registration cache.

If we don't clean up properly, then fi\_close may not be able to close
the domain because the dom will have a non-zero ref count.

Signed-off-by: harumi kuno <harumi.kuno@hpe.com>
(cherry picked from commit 3095fabf94de249716fc146a4c4a609bd33a1aff)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Jeff Squyres
ac66f46dc7 mtl/ofi: check for FI_LOCAL_COMM+FI_REMOTE_COMM
Make sure to get an RDM provider that can provide both local and
remote communication.  We need this check because some providers could
be selected via RXD or RXM, but can't provide local communication, for
example.

Add OPAL_CHECK_OFI_VERSION_GE() m4 macro to check that the Libfabric
we're building against is >= a target version.  Use this check in two
places:

1. MTL/OFI: Make sure it is >= v1.5, because the FI_LOCAL_COMM /
   FI_REMOTE_COMM constants were introduced in Libfabric API v1.5.
2. BTL/usnic: It already had similar configury to check for Libfabric
   >= v1.1, but the usnic component was checking for >= v1.3.  So
   update the btl/usnic configury to use the new macro and check for
   >= v1.3.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 21bc9042e1ff9c3075ababda9eaf49ccc90f64db)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
guserav
fc74a25b7c common/ofi: Set HPE as owner of component
Signed-off-by: guserav <erik.zeiske@web.de>
(cherry picked from commit 56c3d9a2380b073508a6760599dbe6178cb8fdf9)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
guserav
d22d413baa common/ofi: Fix open-mpi/ompi#2519
As discussed in open-mpi/ompi#2519 the common component does not depend
on libfabric yet. This commit introduces this dependency by just calling
fi_version().

Signed-off-by: guserav <erik.zeiske@hpe.com>
(cherry picked from commit 8a67a95c993dbfc2e3fa652777cab6ee20a4a735)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
guserav
19771ada0c common/ofi: Fix check for OFI in build files
The changes made in f5e1a672ccd5db127e85e1e8f6bcfeb8a8b04527
have been done after the common/ofi component was removed and thus the
component doesn't reflect the changes made their.

Namely f5e1a672ccd5db127e85e1e8f6bcfeb8a8b04527 changed:
- How to call OPAL_CHECK_OFI (It sets opal_ofi_happy to yes now)
- Dropped the common part in the build flags for ofi

Signed-off-by: guserav <erik.zeiske@web.de>
(cherry picked from commit 0e25c95eaeabca67cbe5ad6370171e1cff47e52a)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
guserav
a3752b681b Revert "Remove opal/mca/common/ofi."
This reverts commit dd20174532928e0c9cdbe7b206868e6e4bea9d0b.

Signed-off-by: guserav <erik.zeiske@web.de>
(cherry picked from commit 4ad78aaa156f118c907451b4e72d13dfebcbccf2)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Jeff Squyres
9ad1c152dd btl/ofi/Makefile.am: down with tabs!
Replace all tabs with spaces.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b556cabfe937f84f30e8870e0a8c128b839e5dfd)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Aravind Gopalakrishnan
0d2a0b1568 btl/ofi: Fix valgrind complaints on uninitialized pointer use
It doesn't seem like the BTL was using uninitialized pointer. But simply
setting the rcache pointer to NULL after destroying it makes the valgrind
errors go away.

Fixes Issue #6345

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 786e686d4347655b574e609c65626c8323bb49b2)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Thananon Patinyasakdikul
8ebd0d8f24 btl/ofi: fixed compiler warning on OSX.
This commit closes #6049

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
(cherry picked from commit d9bd54c628f73b9c4ded3c5baa5c6454d8f173f1)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:18 +00:00
Thananon Patinyasakdikul
f9439c6d18 btl/ofi: Added 2 side communication support.
The 2 sided communication support is added for non-tagmatching provider
to take advantage of this BTL and PML OB1. The current state is
"functional" and not optimized for performance.

Two sided support is disabled by default and can be turned on by mca
parameter: "mca_btl_ofi_mode".

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
(cherry picked from commit 080115d44069e0c461a1af105cd41f28849cdffc)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:18 +00:00
Brian Barrett
e975c9975c Revert "Remove the OFI/BTL component"
This reverts commit 192f0f6fff4b4ba0f207054626c935941ff0b5d8.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:18 +00:00
Sergey Oblomov
d52b64c488 COMMON/UCX: improved missing events test
- there is new API to detect missing memmory events.
  Enabled using of new UCX API to detect missing events

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d6bff6ffbd70cfafacc3eefe592f900dc2e0be68)
2020-06-16 14:27:02 +03:00
Mark Allen
57c7d68233 adding op-codes for syscall ipc for shmat/shmdt
These op codes used to be in bits/ipc.h but were removed in glibc in 2015
with a comment saying they should be defined in internal headers:
https://sourceware.org/bugzilla/show_bug.cgi?id=18560
and when glibc uses that syscall it seems to do so from its own definitions:
https://github.com/bminor/glibc/search?q=IPCOP_shmat&unscoped_q=IPCOP_shmat

So I think using #ifndef and defining them if they're not already defined
using the values from glibc is the best option.

At IBM it was the testing on redhat 8 that found this as an issue
(the opcodes being undefined on the system made the #define HAS_SHMDT
evaluate to false so intercept_shmat / intercept_shmdt were
left undefined so shmat/shmdt memory events went unintercepted).

(cherry picked from commit e8fab058dac7300569cb54b08e5500115f8bab8f)
Signed-off-by: Mark Allen <markalle@us.ibm.com>
2020-06-04 14:24:17 -04:00
Gilles Gouaillardet
806654074c opal/util: fix opal_str_to_bool()
correctly use strlen(char *) instead of sizeof(char *)

Thanks Georg Geiser for reporting this issue.

Refs. open-mpi/ompi#7772

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit c450b2140540a1f8eae1a6e6f9a22d17cd40e7d8)
2020-06-01 10:16:25 +09:00
Boris Karasev
6e42a3c66e sys limits: fixed soft limit setting if it is less than hard limit
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit fb9eca55cfdfc4638521b431a4e4d545d9d22559)
2020-05-21 07:34:01 +03:00
Joshua Hursey
7dba35f785 A slightly stronger check for LSF's libevent
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 05e095a1eef737849750a5f4f649e72c393eeb74)
2020-05-18 15:10:17 -04:00
Joshua Hursey
886f41fe33 Move from legacy -levent to recommended -levent_core
* `libevent_core.so` contains the core functionality that we depend upon
   - `libevent.so` library has been identified as the legacy target.
   - `libevent_core.so` exists as far back as Libevent 2.0.5 (oldest supported by OMPI)
 * `libevent_pthreads.so` can work with either `-levent` or `-levent_core`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-05-09 14:55:39 -04:00
Joshua Hursey
fc4199e3ba Add checks for libevent.so conflict with LSF
* LSF ships a `libevent.so` that is no related to the `libevent.so`
   shipped with Libevent.
 * Add some checks to the configure logic to detect scenarios where this
   conflict can be detected, and provide the user with a descriptive
   warning message.
   - When detected by `event/external` this is just a warning since
     the internal component may be able to be used instead.
     - This happens when the user supplies the LSF path via the
       `LDFLAGS` envar instead of via `--with-lsf-libdir`.
   - When detected by a LSF component and LSF was explicitly requested
     then this becomes an error. Otherwise it will just print the warning
     and that component will fail to build.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-05-09 14:55:39 -04:00
Joshua Hursey
22d8fa197b event/external: Fix typo in LDFLAGS vs LIBS var before check
* This should have been `LDFLAGS` not `LIBS`. Either works, but
   `LDFLAGS` is more correct. We should also include `CPPFLAGS`
   just in case the header is important to the check.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-05-05 12:10:40 -04:00
Ralph Castain
e1543e37c6
Silence unnecessary error logs
Tracks https://github.com/openpmix/openpmix/pull/1669

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-09 07:13:02 -07:00
Joseph Schuchart
7b1beb0f6c Harmonize return values of progress callbacks
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 2c97187ee05e592346206a697ca3d9531d600fcc)
2020-03-30 18:58:57 +02:00
Ralph Castain
898b4f2210
Support hwloc retrieval using legacy key
Re-enable support for Slurm plugin using earlier PMIx_LOCAL_TOPO key

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-17 08:48:04 -07:00
Artem Polyakov
253502b1b1 timings: Fix timings when 'prefix' is used
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
(cherry picked from commit 7c17a38c96db6da1c28d6c46d9293124dee6a23c)
2020-03-11 21:05:30 -07:00
Austen Lauria
f7979fbc82 Fix segv in btl/vader.
Keep track of the connected procs in vader_add_procs().
Otherwise, the same rank will reconnect the same shmem
segment (rank 0+...) multiple times instead of the next
one as intended.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit f69c8d6819dcc14f471cea90b50dc8ca98de12d4)
2020-03-06 11:21:24 -05:00
Geoffrey Paulsen
11d79d1f6e Updating OMPI v4.0.x to PMIx v3.1.5 released
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-02-24 12:13:20 -05:00