1
1

7258 Коммитов

Автор SHA1 Сообщение Дата
Josh Hursey
8c89e3c621
Merge pull request from jjhursey/prot-enum
Update hook component to use enum MCA parameter
2021-01-11 13:29:16 -06:00
Jeff Squyres
8115bd29b7
Merge pull request from bosilca/topic/portable_avx
Allow fallback to a lesser AVX support during make
2021-01-10 11:03:07 -05:00
Joshua Hursey
ed0697f01e Update hook component to use enum MCA parameter
* `--mca ompi_display_comm VALUE` where `VALUE` is one or more of:
   - `mpi_init` : Display during `MPI_Init`
   - `mpi_finalize` : Display during `MPI_Finalize`
 * hook/comm_method: Use enum flags to select protocols

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2021-01-08 12:25:42 -05:00
Goldman, Adam
1e64da9a84 mtl/ofi: Add mising cq_data_size in hints for ofi mtl
Fixes 
Signed-off-by: Goldman, Adam <adam.goldman@intel.com>
2021-01-06 10:12:28 -05:00
George Bosilca
fcf2766a03
AVX code generation improvements
1. Allow fallback to a lesser AVX support during make

Due to the fact that some distro restrict the compiule architecture
during make (while not setting any restrictions during configure) we
need to detect the target architecture also during make in order to
restrict the code we generate.

2. Add comments and better protect the arch specific code.

Identify all the vectorial functions used and clasify them according to
the neccesary hardware capabilities.
Use these requirements to protect the code for load and stores (the rest
of the code being automatically generated it is more difficult to
protect).

3. Correctly check for AVX* support.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2021-01-04 13:43:18 -05:00
George Bosilca
31068e063b
Major update to the AVX* detection and support
1. Consistent march flag order between configure and make.

2. op/avx: give the option to skip some tests

it is possible to skip some intrinsic tests by setting some environment variables to "no" before invoking configure:
 - ompi_cv_op_avx_check_avx512
 - ompi_cv_op_avx_check_avx2
 - ompi_cv_op_avx_check_avx
 - ompi_cv_op_avx_check_sse41
 - ompi_cv_op_avx_check_sse3

3. op/avx: update AVX512 flags

try
-mavx512f -mavx512bw -mavx512vl -mavx512dq
instead of
-march=skylake-avx512

since the former is less likely to conflict with user provided CFLAGS
(e.g. -march=...)

Thanks Bart Oldeman for pointing this.

4. op/avx: have the op/avx library depend on libmpi.so

Refs. 

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2021-01-04 13:41:39 -05:00
Joshua Hursey
185a459995
Fix PGI compiler error with compare arg
* PGI was throwing the following error.
```
NVC++-S-0103-Illegal operand types for comparison operator (osc_rdma_frag.h: 75)
NVC++/power Linux 20.11-0: compilation completed with severe errors
```
 * It must not have liked the inline declaration of the NULL pointer.
   - So replace with a variable, as we do in other places in the code base.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-12-30 11:37:39 -06:00
Ralph Castain
194e66b3e2 Silence warnings
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-12-27 12:18:40 -08:00
Edgar Gabriel
56dbd096d3 io/ompio: remove the special handling of Lustre in the selection logic
ompio is now the default on Lustre as well

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-12-17 13:26:51 -06:00
Edgar Gabriel
aa2d21ee50 lustre_file_open: avoid explicit locking on lustre file systems
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-12-17 12:22:54 -06:00
Edgar Gabriel
2c61074739 dynamic_gen2: code cleanup
remove now unused mca parameter, get rid of an unnecesary if-else part,
and move setting the flag outside of the while loop.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-12-17 11:43:23 -06:00
Edgar Gabriel
d65480df35 fbtl_posix_pwritev: add datasieving support for write
its however restricted to collective I/O operations, at this point
only from vulcan and dynamic_gen2. required some more infrastructure
to be added to recognize individual I/O and multi-threaded environments.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-12-16 15:34:45 -06:00
Edgar Gabriel
90d8c8c39c fbtl_posix_preadv: limit the size of the temporary buffer
when using data sieving.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-12-16 11:17:15 -06:00
Edgar Gabriel
dbf0d6e5a3 fbtl_posix: add control logic for data sieving
only implemented for read at the moment, but the parameters
for write are also in place.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-12-15 18:12:07 -06:00
Edgar Gabriel
5385e5f85f fbtl/posix/preadv.c: first cut on adding data sieving
the lack of performing data sieving has been identified as a main reason for the poor performance in some instances on the Lustre file system. This commit introduces the fundamental ability to perform data sieving for read operations (which should not be controversial). The code itself is correct, what is still lacking is a) the logic when and how to activate data sieving and b) the logic to limit the size of the temporary buffer when doing data sieving.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-12-10 14:16:07 -06:00
Edgar Gabriel
f70bb4774a dynamic_gen2/file_write_all: fix chunk assignment per stride
the dynamic_gen_file_write_all component distinguishes between the amount of data communicated
to aggregators, and the amount of data written in a cycle by the aggregator (in contrary e.g. to the vulcan component).
There was a bug in calculating which chunks have to be written in a cycle by an aggregator: we added as many elements into the
io_array until we filled one stripe. Unfortuantely, the metric used was the amount of data instead of ensuring that all offsets
fall within a single stripe. This commit fixes this issue. Note, the bug did not create a correctness problem, just a performance
problem in case there were gaps in the file view.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-12-10 08:53:11 -06:00
Jeff Squyres
a8f883a73a
Merge pull request from devreal/fix-han-commselect-new
coll/han: fix coll preference selection in mca_coll_han_comm_create_new
2020-11-24 09:21:57 -05:00
Joseph Schuchart
33105b031b coll/han: fix coll preference selection in mca_coll_han_comm_create_new
Exclude HAN, don't include it.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-11-24 10:09:14 +01:00
Jeff Squyres
3edd62e568
Merge pull request from jsquyres/pr/fix-warnings
Fix many compiler warnings
2020-11-23 10:15:15 -05:00
Raghu Raja
38d2f12112
Merge pull request from devreal/fix-coll-base-preference
Fix preference treatment in coll/base
2020-11-20 08:14:37 -08:00
Julien EMMANUEL
7d493c6bcd Uniform conditions in ob1 recv
In ob1 we have four similar conditions but they are not written
in a uniform way

Signed-off-by: Julien EMMANUEL <julien.emmanuel@inria.fr>
2020-11-19 22:27:49 +01:00
Joseph Schuchart
1cdc85564e coll/han: reduce default segment size for reduce/allreduce to 64k
This has shown to be more effective in achieving overlap
of inter- and intra-node communication and reduces the inital
delay before hitting the network.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-11-19 19:23:32 +01:00
Joseph Schuchart
971d58c524 coll/han: remove references to experimental solo and shared collective components
Also make coll/tuned the default for shared memory communication
as coll/sm has shown performance issues that need investigation.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-11-19 19:22:06 +01:00
Joseph Schuchart
09c2f4af94 coll/[sm|han|adapt]: don't disqualify on priority 0
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-11-19 19:10:09 +01:00
Joseph Schuchart
dd54af9450 coll/base: Fix collective module selection preference treatment
The selectable list is sorted with lowest to highest priority so the
user-defined preferences should be appended to the list.
The preference treatment should also maintain the order provided by the user
(first item has highest priority) so switch the loop order.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-11-19 19:06:28 +01:00
Julien EMMANUEL
208c2d270b Typo in ob1 comments
Seems like a copy/pasted typo in ob1 comments

Signed-off-by: Julien EMMANUEL <julien.emmanuel@inria.fr>
2020-11-19 16:50:29 +01:00
Jeff Squyres
12796a4aad
Merge pull request from devreal/fix-tuned-allgatherv
COLL TUNED: Use per-rank data size instead of total size for decision in allgatherv
2020-11-17 11:46:35 -05:00
Jeff Squyres
282be20e6f
Merge pull request from vspetrov/master
coll/hcoll: svatterv inplace fix
2020-11-14 10:25:25 -05:00
Jeff Squyres
14aa5fae3c Fix many compiler warnings:
- Add some missing AC_CHECK_SIZEOF's in configure.ac
- Remove some unused variables
- Initialize some variables
- Fix some parameter types
- Cast where appropriate/safe to fix warnings
- Move ompi/mca/common/monitoring Fortran bindings to a separate .c
  file so that they can use different #define's than the C bindings,
  and therefore compile properly / without warnings.
- Fix signedness discrepancies
- Who knew?  Separated these into multiple #if's, instead:
  ```
  // This is undefined behavior
  #define HAVE_FOO defined(FOO)
  #define YOW (HAVE_FOO && defined(BAR))
  ```
- Fix some typos in OMPI_BUILD_HOST logic
- Don't "2>/dev/null" in OMPI_BUILD_HOST logic; it just hides errors

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-11-14 07:20:30 -08:00
Jeff Squyres
67421c5d23
Merge pull request from ggouaillardet/topic/retain_datatypes_w
coll/base: do not drop const qualifier
2020-11-14 09:21:37 -05:00
Artem Polyakov
f9ef4b4ac0
Merge pull request from devreal/osc-ucx-progress
UCX osc: make progress on idle worker if none are active
2020-11-13 13:27:31 -08:00
Valentin Petrov
9fa00155f3 coll/hcoll: scatterv inplace fix
Signed-off-by: Valentin Petrov <valentinp@nvidia.com>
2020-11-13 19:15:37 +02:00
Joseph Schuchart
f670364d76 COLL TUNED: Use per-rank data size instead of total size for decision
The total size depends on number of ranks so the usual ranges don't work.
Thus, use the average across all ranks to make a decision.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-11-13 12:18:42 +01:00
Gilles Gouaillardet
c49e5e5c4a coll/base: do not drop const qualifier
MPI_Ialltoallw() and friends take a const MPI_Datatype types[] argument.
In order to be able to call OBJ_RELEASE(types[0]), we used to simply
drop the const modifier. This change make it right by introducing the
OBJ_RELEASE_NO_NULLIFY(object) macro that no more set object = NULL
if the object is freed.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-11-13 17:31:04 +09:00
Raghu Raja
b7b9254488
Merge pull request from rajachan/whack-remote-cq-data-query
mtl/ofi: Check cq_data_size without querying providers again
2020-11-12 13:00:10 -08:00
Raghu Raja
30831fb7f0
Merge pull request from devreal/fix-tuned-dynamic
Fix some issues with dynamic algorithm selection in coll/tuned
2020-11-12 11:20:57 -08:00
Joseph Schuchart
581478dc91 UCX osc: make progress on default worker if none are active
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-11-11 23:23:31 +01:00
Joseph Schuchart
a15e5dc7f0 COLL TUNED: remove stray selection of linear algs for alreduce and allgather
These selections seem harmful in my measurements and don't seem to be
motivated by previous measurement data.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-11-11 18:40:24 +01:00
Jeff Squyres
c960d292ec Convert all README files to Markdown
A mindless task for a lazy weekend: convert all the README and
README.txt files to Markdown.  Paired with the slow conversion of all
of our man pages to Markdown, this gives a uniform language to the
Open MPI docs.

This commit moved a bunch of copyright headers out of the top-level
README.txt file, so I updated the relevant copyright header years in
the top-level LICENSE file to match what was removed from README.txt.

Additionally, this commit did (very) little to update the actual
content of the README files.  A very small number of updates were made
for topics that I found blatently obvious while Markdown-izing the
content, but in general, I did not update content during this commit.
For example, there's still quite a bit of text about ORTE that was not
meaningfully updated.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Co-authored-by: Josh Hursey <jhursey@us.ibm.com>
2020-11-10 13:52:29 -05:00
Jeff Squyres
686c2142e2 ompi/mca/common/monitoring: add x perms to Perl scripts
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-11-10 13:52:28 -05:00
Raghu Raja
6233dea68d mtl/ofi: Check cq_data_size without querying providers again
This commit removes the unnecessary call to `fi_getinfo()` when
initializing the MTL. `cq_data_size` is a domain attribute that will be
available to the MTL from the initial query itself. FI_DIRECTED_RECV is
a primary capability that has to be requested for a provider to enable
it, so adding that to the initial requirement.  The redundant query was
also overwriting the contents of the prov object, which already had the
include/exclude filtering and multi-NIC logic applied to it.

Signed-off-by: Raghu Raja <craghun@amazon.com>
2020-11-09 19:22:25 +00:00
Raghu Raja
7a922c8774
Merge pull request from rajachan/coverity-fixes
Coverity fixes for recent OFI changes
2020-11-06 08:48:29 -08:00
Joseph Schuchart
22e289b742 coll/tuned: fix minor errors in comments
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-11-05 18:32:47 +01:00
Joseph Schuchart
04d198fc9f coll/tuned: don't select algorithms knowing when it's clear they would fall back to linear
Bcast: scatter_allgather and scatter_allgather_ring expect N_elem >= N_procs
Allreduce: rabenseifner expects N_elem >= pow2 nearest to N_procs

In all cases, the implementations will fall back to a linear implementation,
which will most likely yield the worst performance (noted for 4B bcast on 128 ranks)

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-11-05 18:32:12 +01:00
Joseph Schuchart
7261255b8d coll/tuned: Mark global static algorithm as const
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-11-05 18:25:59 +01:00
Joseph Schuchart
06f605c1e1 coll/tuned: add hint about dynamic rules to mca parameters
The mca parameters coll_tuned_*_algorithm are ignored unless coll_tuned_use_dynamic_rules is true so mention that in the description.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-11-05 18:20:24 +01:00
Gilles Gouaillardet
26e42f9a0c op/avx: check for _mm512_mullo_epi64() AVX512 intrinsic
PGI (20.4) compiler do not define this intrinsic, so only build
AVX512 support if _mm512_mullo_epi64() intrisic is defined.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-11-04 14:45:03 +09:00
Raghu Raja
917269b699 Coverity fixes for recent OFI changes
8017f12 introduced a new function to get the package rank of a process,
which had a pass-by-value signature (opal_process_info_t); and coverity
was not happy about it. This commit changes the signature to take a
reference to opal_process_info_t instead.

Signed-off-by: Raghu Raja <craghun@amazon.com>
2020-11-04 00:18:54 +00:00
Yossi Itigin
1f3e33441c
Merge pull request from hoopoepg/topic/pml-ucx-recv-improved-errhandling
PML/UCX: improved error processing in MPI_Recv
2020-11-03 13:08:42 +02:00
Jeff Squyres
e9e5dab8b9
Merge pull request from dancejic/multi
Using package_rank to select between NIC of equal distance from the process.
2020-11-02 15:27:37 -05:00