1
1

29967 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
9ad1c152dd btl/ofi/Makefile.am: down with tabs!
Replace all tabs with spaces.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b556cabfe937f84f30e8870e0a8c128b839e5dfd)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Jeff Squyres
2dd7aa0587 mtl/ofi/Makefile.am: down with tabs!
Replace all tabs with spaces.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit aba2571881951a5fc88574125bc374a30f4cdd98)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Gilles Gouaillardet
a7045bceef mtl/ofi: fix configury when VPATH is used
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 945f830f7a47acec52f02ec8d215c159f53550cd)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Aravind Gopalakrishnan
0d2a0b1568 btl/ofi: Fix valgrind complaints on uninitialized pointer use
It doesn't seem like the BTL was using uninitialized pointer. But simply
setting the rcache pointer to NULL after destroying it makes the valgrind
errors go away.

Fixes Issue #6345

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 786e686d4347655b574e609c65626c8323bb49b2)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Aravind Gopalakrishnan
48df4efb56 mtl/ofi: Fix reference to help text object
When we exceed the threshold number of contexts created, print appropriate help
text

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 9cabcfdbba49f8b97f830f90e3d88be2f7685de4)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Brian Barrett
a2cf9a41e3 mtl/ofi: Provide av count hint during initialization
Provide the av_attr.count hint (number of addresses that will be
inserted into the address vector through the life of the process)
at initialization of the address vector.  It's ok to be a bit
wrong, but some endpoints (RxR) can benefit by not going through
the slow growth realloc churn.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 44be7f139ac8e3130ffeb0afd4f43abdde31dd83)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Brian Barrett
29e8544243 mtl/ofi: Print descriptive error message on modex failure
With MTLs, there's no "other transport" when the remote side
does not have an active NIC, so we should print a useful error
message when the modex failed (indicating lack of a NIC on
the remote side).

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit fe25097194145234c175ee0487b1fa9b13658d49)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Aravind Gopalakrishnan
6a27da6d7f mtl/ofi: Add MCA variables to enable SEP and to request number of OFI contexts
Moving to a model where we have users actively _enable_ SEP feature for use
rather than opening SEP by default if provider supports it. This allows us to
not regress (either functionally or for performance reasons) any apps that were
working correctly on regular endpoints.

Also, providing MCA to specify number of OFI contexts to create and default
this value to 1 (Given btl/ofi also creates one by default, this reduces the
incidence of a scenario where we allocate all available contexts by default and
if btl/ofi asks for one more, then provider breaks as it doesn't support it).

While at it, spruce up README on SEP content.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 37f9aff2a02e9d51628cc7075853ec4dba93d1e7)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Spruit, Neil R
f770b6cfa1 MTL_OFI: Generation of specialized functions at build time
-> Added new targets in Makefile.am to call a new build script
   generate-opt-funcs.pl to generate specialized functions for
   each *.pm file.

-> Added new perl module *.pm files for send,isend,irecv,iprobe,improbe
   which are loaded by generate-opt-funcs.pl to create new source files
   that correspond to the name of the .pm file to be used as part of
   MTL OFI.

-> Added mtl_ofi_opt.pm.template and updated README with details on the
   specialization features and how to add additional specialization
   support.

-> Added new opt_common/mtl_ofi_opt_common.pm containing common
   functions for generating the specialized functions used by
   all other *.pm modules.

-> Added new mtl_ofi.h which includes the definitions for the
   function symbol table for storing the specialized functions along
   with the definitions for the initialization functions for the
   corresponding function pointers.

-> Based off the OFI provider capabilities the specialized function
   pointers are assigned at mtl_ofi_component_init to the corresponding
   MTL OFI function.

-> mca_mtl_ofi_module_t has been updated with the symbol table
   struct which is assigned at component init.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit bef5f50a42f53f4c6c30610aba236861f652f30e)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Aravind Gopalakrishnan
3858b51d11 Fix for SEP when num local procs is greater than available contexts
For cases when the number of local processes is greater than the number of
available contexts, the SEP initialization phase would calculate the number of
contexts to provision for each rank to be 0 and would eventually crash.

Fix the issue here by using regular endpoints in the event the number of local
processes is more than available contexts. This fixes issue #6182.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit e5e19dfcf7618185bdc89f1e91506a9bf15355b1)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Brian Barrett
4b293d3823 mtl/ofi: Fix crash if no providers found
Commit 109d0569ffd introduced a crash when an error occurred
before ofi_ctxt was allocated, including when no providers
passed the selection logic.  Properly check that the pointer
is not NULL in the error cleanup code before dereferencing
the pointer.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 6e15128d960aaa40dd7e905ae3e9e53d9cdaac2a)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Aravind Gopalakrishnan
22d0857ee5 MTL/OFI: Add OFI Scalable Endpoint support
OFI MTL supports OFI Scalable Endpoints feature as means to improve
multi-threaded application throughput and message rate. Currently the feature
is designed to utilize multiple TX/RX contexts exposed by the OFI provider in
conjunction with a multi-communicator MPI application model. For more
information, refer to README under mtl/ofi.

Reviewed-by: Matias Cabral <matias.a.cabral@intel.com>
Reviewed-by: Neil Spruit <neil.r.spruit@intel.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 109d0569ffdc29f40518d02ad7a4d5bca3adc3d1)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:18 +00:00
Thananon Patinyasakdikul
8ebd0d8f24 btl/ofi: fixed compiler warning on OSX.
This commit closes #6049

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
(cherry picked from commit d9bd54c628f73b9c4ded3c5baa5c6454d8f173f1)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:18 +00:00
Aravind Gopalakrishnan
ee3f6ab841 MTL OFI: Ask for FI_THREAD_DOMAIN support when not using MPI_THREAD_MULTIPLE
When an application is not using multiple threads to call into MPI, we can
safely ask for FI_THREAD_DOMAIN setting from the provider as it should
translate to the least amount of locking in provider.

Conversely, for applications using THREAD_MULTIPLE, explicitly ask for
FI_THREAD_SAFE to prevent race conditions.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 5cbcae79d8619b99e4768578c8d11d3149e5c7c8)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:18 +00:00
Thananon Patinyasakdikul
f9439c6d18 btl/ofi: Added 2 side communication support.
The 2 sided communication support is added for non-tagmatching provider
to take advantage of this BTL and PML OB1. The current state is
"functional" and not optimized for performance.

Two sided support is disabled by default and can be turned on by mca
parameter: "mca_btl_ofi_mode".

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
(cherry picked from commit 080115d44069e0c461a1af105cd41f28849cdffc)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:18 +00:00
Matias Cabral
02ac75434a MTL OFI: Add support for mem_tag_format
OFI providers may reserve some of the upper bits of the tag for
internal usage and expose it using mem_tag_format. Check for that
and adjust communicator bits as needed.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit d996f529c0377d794dea261c801e504dfbb33170)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:18 +00:00
Brian Barrett
e975c9975c Revert "Remove the OFI/BTL component"
This reverts commit 192f0f6fff4b4ba0f207054626c935941ff0b5d8.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:18 +00:00
Jeff Squyres
dd4b4b13ed .mailmap: Add entry for Harumi Kuno
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-06-17 20:49:18 +00:00
Jeff Squyres
2779a6a96b tests/asm/run_tests: fix basename usage
Looks like this script was left over from quite a long time ago, and
was expecting CLI params from the "old"-style Automake test engine.
Update it to look for `--test-name` to get the test name, and update a
few other minor style things.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit e8277d9d0605ee8cd9a6e6b3b63b22526aef9c38)
2020-06-17 11:12:15 -07:00
Michael Heinz
72bdae409d Add minimum library version needed to use PSM2 in OMPI #7779
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
(cherry picked from commit fcabd349e4559ecca990d41e694b151ae3d9be16)
2020-06-16 10:38:25 -06:00
Michael Heinz
b680893917 Add check for PSM2 reference counting to PSM2 MTL #7721
As discussed, a feature is being added to libpsm2 to correctly handle
the case where the library is opened by multiple OMPI transports in the same
process. (For example, the OFI BTL and the PSM2 MTL).

* Improved error message to indicate required libpsm2 version.

* Adds a test at autogen/configure time for the existence of
  PSM2_LIB_REFCOUNT_CAP.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
(cherry picked from commit f10305a49facec8daef24ac81a52207a3f4fb73f)
2020-06-16 10:38:22 -06:00
Edgar Gabriel
eeee011ac0 common/ompio: use avg. file view size in the aggregator selection logic
This is a fix  based on a bugreport on github/mailing list from CGNS.
The core of the problem was that different processes entered different branches of
our aggregator selection logic, due to the fact that in some cases processes had
a matching file_view size and contiguous chunk size (thus assuming 1-D distribution),
and some processes did not (thus assuming 2-D distribution). The fix is to calculate
the avg. file view size across all processes and use this value, thus ensuring that
all processes enter the same branch.

Fixes issue #7809

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit 4a8a330bbaf9fe5ea07cd01146afb83b569f3138)
2020-06-16 10:21:59 -05:00
Sergey Oblomov
d52b64c488 COMMON/UCX: improved missing events test
- there is new API to detect missing memmory events.
  Enabled using of new UCX API to detect missing events

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d6bff6ffbd70cfafacc3eefe592f900dc2e0be68)
2020-06-16 14:27:02 +03:00
Jeff Squyres
f334a699b7
Merge pull request #7822 from jsquyres/pr/v4.1.x/fixup-mpih-stdc-version-usage
v4.1.x: fixup mpih stdc version usage
2020-06-16 05:40:50 -04:00
Jeff Squyres
5179f80165 mpi.h.in: Remove //-style comments
Keep all comments in the user-facing mpi.h.in as "old style" C
comments: /* */.  This gives us maximum portability, just on the off
chance that a user's C compiler does not support //-style comments.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit d522c270373264aff0a7a2066bc3163b09e9a94b)
2020-06-15 21:52:52 -04:00
Jeff Squyres
020e9e4627 mpi.h.in: fixups for static assert messages
1. __STDC_VERSION__ isn't necessarily defined (e.g., by C++
   compilers).  So check to make sure it is defined before we actually
   check the value.
2. If we're in C++11 (or later), use static_assert().
3. Split the static assert macro in two macros:
   * THIS_SYMBOL_WAS_REMOVED_IN_MPI30(...): Insert a valid expression
     (i.e., 0, because it's only used with MPI_Datatype values, and
     since MPI_Datatype is a pointer, 0 is a valid RHS expression)
     before invoking the static assert so that we don't get a syntax
     error instead of the actual static assert error.
   * THIS_FUNCTION_WAS_REMOVED_IN_MPI30(...): No need for the valid
     expression; just invoke the assert functionality.

Also remove an errant "\".

Thanks to Constantine Khrulev and Martin Audet for identifying the
issue and suggesting to use C11's static_assert().

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 835f8f1834b8798a23ee0db6ad94315e30cb9be3)
2020-06-15 21:52:49 -04:00
raafatfeki
0864b62e12 fs/gpfs: Support of GPFS file system
Creation of gpfs module under fs component.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2020-06-12 12:57:18 -04:00
Joshua Hursey
3234079bbc
Add detection for JSM direct launch
* Adds the `schizo/jsm` component that detects if the process was
   direct launched with IBM's Job Step Manager (JSM). JSM is a PMIx
   enhanced runtime environment so flag it as such.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 4f1de51371048085b86ee64e05849ad929c9f35c)
2020-06-11 08:51:17 -05:00
Brian Barrett
441e88f2b4 dist: Start v4.1.x release series
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-10 12:58:58 -07:00
Howard Pritchard
3137a78bce RAS:ALPS add support for ANL Cobalt
This commit enables the ALPS RAS to get reservation information
from COBALT.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-06-09 19:38:21 +00:00
Geoff Paulsen
56470b4aba
Merge pull request #7785 from hppritcha/topic/new_for_4.0.4rc4
NEWS: update for 4.0.4rc3
2020-06-08 18:58:30 -05:00
Geoffrey Paulsen
f2dcf4b129 Adding Info about PR7778 to NEWS and README
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-06-08 17:50:24 -05:00
Geoff Paulsen
9fb905f05b
Merge pull request #7778 from markalle/IPCOP_shmat_v40x
v4.0.x: adding op-codes for syscall ipc for shmat/shmdt
2020-06-08 16:54:32 -05:00
Geoff Paulsen
395395813e
Merge pull request #7791 from bwbarrett/dist/v4.0.x-NEWS
Update 4.0.x news file with news from v3.x releases, and fix unicode chars.
2020-06-07 08:34:45 -05:00
Howard Pritchard
d1d9c29cfa
Merge pull request #7787 from gpaulsen/topic/v4.0.x/VERSION_rc3
Updating VERSION to rc3
2020-06-05 17:20:05 -06:00
Brian Barrett
3110473e67 dist: Update NEWS from release branches
We have been bad about updating the NEWS file in master with all
the changes that have gone into the release branches.  Patch up
NEWS with the changes from v3.0, v3.1, and v4.0 branches.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 50765ae5a26f718c8840e65cb0cb813f4a65004b)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-05 15:20:23 -07:00
Brian Barrett
8c7a51dda3 dist: Fix character encodings in NEWS
The NEWS file had a mix of ISO-8859-1 and UTF-8 encodings, which
was making a mess of decoding the non-ASCII characters in the
file.  This patch unifies the NEWS file as a UTF-8 encoded file
and changes many of the places where we had ASCII-ified a persons
name.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 2e23893f04ef30c598a23889a4242f8cf4a45238)

Cherry-pick was modified to fix one more ISO-8859-1 character that
was in the v4.0.x branch but not in the master branch.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-05 15:18:51 -07:00
Mark Allen
57c7d68233 adding op-codes for syscall ipc for shmat/shmdt
These op codes used to be in bits/ipc.h but were removed in glibc in 2015
with a comment saying they should be defined in internal headers:
https://sourceware.org/bugzilla/show_bug.cgi?id=18560
and when glibc uses that syscall it seems to do so from its own definitions:
https://github.com/bminor/glibc/search?q=IPCOP_shmat&unscoped_q=IPCOP_shmat

So I think using #ifndef and defining them if they're not already defined
using the values from glibc is the best option.

At IBM it was the testing on redhat 8 that found this as an issue
(the opcodes being undefined on the system made the #define HAS_SHMDT
evaluate to false so intercept_shmat / intercept_shmdt were
left undefined so shmat/shmdt memory events went unintercepted).

(cherry picked from commit e8fab058dac7300569cb54b08e5500115f8bab8f)
Signed-off-by: Mark Allen <markalle@us.ibm.com>
2020-06-04 14:24:17 -04:00
Geoffrey Paulsen
2454bc0571 Updating VERSION to rc3
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-06-04 12:06:46 -05:00
Howard Pritchard
e9c2af935f NEWS: update for 4.0.4rc3
[skip-ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-06-04 09:58:12 -06:00
Geoff Paulsen
6dae117cff
Merge pull request #7774 from ggouaillardet/topic/v4.0.x/opal_str_to_bool
v3.0: opal/util: fix opal_str_to_bool()
2020-06-01 14:08:54 -05:00
Gilles Gouaillardet
806654074c opal/util: fix opal_str_to_bool()
correctly use strlen(char *) instead of sizeof(char *)

Thanks Georg Geiser for reporting this issue.

Refs. open-mpi/ompi#7772

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit c450b2140540a1f8eae1a6e6f9a22d17cd40e7d8)
2020-06-01 10:16:25 +09:00
Geoff Paulsen
d5bc830026
Merge pull request #7756 from karasevb/fix_sys_limits
v4.0.x/sys limits: fixed soft limit setting if it is less than hard limit
2020-05-22 08:23:28 -05:00
Boris Karasev
6e42a3c66e sys limits: fixed soft limit setting if it is less than hard limit
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit fb9eca55cfdfc4638521b431a4e4d545d9d22559)
2020-05-21 07:34:01 +03:00
Geoff Paulsen
351b53fc1f
Merge pull request #7751 from hppritcha/topic/new_for_404rc2
update NEWS for 4.0.4rc2
2020-05-19 12:45:07 -05:00
Howard Pritchard
6a882dbba7 update NEWS for 4.0.4rc2
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-05-19 10:36:13 -06:00
Howard Pritchard
63cc3daaaa
Merge pull request #7698 from jjhursey/v4-fix-lsf-libevent
Add checks for libevent.so conflict with LSF
2020-05-19 09:14:40 -06:00
Geoff Paulsen
f562f847c5
Merge pull request #7750 from gpaulsen/topic/v4.0.x/VERSION_v4.0.4_rc2
VERSION -> v4.0.4rc2
2020-05-18 18:09:04 -05:00
Geoffrey Paulsen
cac77786bc VERSION -> v4.0.4rc2
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-05-18 16:52:32 -05:00
Joshua Hursey
76500e6cf8 Fix LSF configure check for libevent conflict
* Want to make sure that the result from `wc` is trimmed of spaces,
   so the `0` check returns properly
 * Add a few more comments, and fix wording in the warning message.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-05-18 15:10:46 -04:00