1
1

728 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
8115bd29b7
Merge pull request #8322 from bosilca/topic/portable_avx
Allow fallback to a lesser AVX support during make
2021-01-10 11:03:07 -05:00
George Bosilca
20be3fc257
A better test for MPI_OP performance.
The test now has the ability to add a shift to all or to any of the
input and output buffers to assess the impact of unaligned operations.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2021-01-05 22:40:26 -05:00
Joseph Schuchart
d11f625ed5 SPC: allow counters to be attached solely through MPI_T and reduce overhead
- only make MCA parameters available if SPC is enabled

- do not compile SPC code if SPC is disabled

- move includes into ompi_spc.c

- allow counters to be enabled through MPI_T without setting MCA parameter

- inline counter update calls that are likely in the critical path

- fix test to succeed even if encountering invalid pvars

- move timer_[start|stop] to header and move attachment info into ompi_spc_t

There is no need to store the name in the ompi_spc_t struct too, we can use that space
for the attachment info instead to avoid accessing another cache line.

- make timer/watermark flags a property of the spc description

This is meant to making adding counters easier in the future by
centralizing the necessary information. By storing a copy of these flags
in the ompi_spc_t structure (without adding to its size) reduces
cache pollution for timer/watermark events.

- allocate ompi_spc_t objects with cache-alignment

This prevents objects from spanning multiple cache lines and thus
ensures that only one cache line is loaded per update.

- fix handling of timer and timer conversion

- only call opal_timer_base_get_cycles if necesary to reduce overhead

- Remove use of OPAL_UNLIKELY to improve code generated by GCC

It appears that GCC makes less effort in optimizing the unlikely path
and generates bloated code.

- Allocate ompi_spc_events statically to reduce loads in critical path

- duplicate comm_world only when dumping is requested

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
2020-11-12 21:17:56 +01:00
Jeff Squyres
c960d292ec Convert all README files to Markdown
A mindless task for a lazy weekend: convert all the README and
README.txt files to Markdown.  Paired with the slow conversion of all
of our man pages to Markdown, this gives a uniform language to the
Open MPI docs.

This commit moved a bunch of copyright headers out of the top-level
README.txt file, so I updated the relevant copyright header years in
the top-level LICENSE file to match what was removed from README.txt.

Additionally, this commit did (very) little to update the actual
content of the README files.  A very small number of updates were made
for topics that I found blatently obvious while Markdown-izing the
content, but in general, I did not update content during this commit.
For example, there's still quite a bit of text about ORTE that was not
meaningfully updated.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Co-authored-by: Josh Hursey <jhursey@us.ibm.com>
2020-11-10 13:52:29 -05:00
Brian Barrett
9ffac85650 build: Move libevent to a 3rd-party package
With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.

This patch moves libevent from an MCA framework to a stand-alone
library built outside of OPAL.  A wrapper in opal/util is provided
to minimize the unnecessary changes in the rest of the code.  When
using the internal Libevent, it will be installed as a stand-alone
libevent.a, instead of bundled in OPAL.  Any pre-installed version
of Libevent at or after 2.0.21 is preferred over the internal
version.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-10-01 16:55:58 +00:00
Aurelien Bouteiller
efbc6ff6a5
Merge pull request #7798 from abouteiller/mpi-next/unbounderr-self
MPI-4 error handling: 'unbound' errors to MPI_COMM_SELF
2020-08-03 15:59:14 -04:00
Aurelien Bouteiller
06c563625a
Add a test for mpi_errors_mpi3 behavior and non-catastrophic errors
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-23 05:09:29 -04:00
George Bosilca
c4e88a43a3
Check unaligned ops for correctness.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-22 11:26:07 -04:00
Aurelien Bouteiller
7118755ae8
Add a tester for the initial error handler
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-16 03:10:32 -04:00
bosilca
1f237f5fc9
Merge pull request #7419 from bosilca/topic/avx512
Add support for AVX512/AVX2/SSE/MMX
2020-07-13 11:56:50 -04:00
dongzhong
14b3c70628
Add supports for MPI_OP using AVX512, AVX2 and MMX
Add logic to handle different architectural capabilities
Detect the compiler flags necessary to build specialized
versions of the MPI_OP. Once the different flavors (AVX512,
AVX2, AVX) are built, detect at runtime which is the best
match with the current processor capabilities.

Add validation checks for loadu 256 and 512 bits.
Add validation tests for MPI_Op.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-10 21:25:35 -04:00
Austen Lauria
3ed466e629
Merge pull request #7800 from abouteiller/mpi-next/errors_abort
MPI4: Add ERRORS_ABORT infrastructure
2020-06-29 15:45:29 -04:00
Jeff Squyres
e8277d9d06 tests/asm/run_tests: fix basename usage
Looks like this script was left over from quite a long time ago, and
was expecting CLI params from the "old"-style Automake test engine.
Update it to look for `--test-name` to get the test name, and update a
few other minor style things.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-06-17 10:23:13 -07:00
Aurélien Bouteiller
e2f53b76fb
Add a tester for the ERRORS_ABORT and communicator abort features
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
2020-06-10 12:01:24 -04:00
Rainer Keller
a8cdc0d38b Restore testing all datatypes.
Signed-off-by: Rainer Keller <rainer.keller@hs-esslingen.de>
2020-05-20 17:21:54 +02:00
Ralph Castain
bd29ab0ae9
Update dpm to handle deprecation of MPI_Info keys
Deprecate the current OMPI-specific MPI_Info key definitions for
MPI_Comm_spawn and replace them with their PMIx equivalents. Issue a
deprecation/conversion warning as this is done. Also issue deprecation
warnings for options such as "ompi_non_mpi" that are no longer used.

Handle both cases where the user might pass either the PMIx attribute
name itself (e.g., "PMIX_MAPBY") or the string value of the attribute
(e.g., PMIX_MAPBY, which translates to "pmix.mapby"). This can only be
done for PMIx v4 and above, so protect that code.

Silence a couple of Coverity warnings and add a test along the way.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-29 14:56:38 -07:00
Ralph Castain
6d29bbfde8
Cleanup heterogeneous builds
Consolidate the ompi_process_info and opal_process_info structs to
remove duplicate storage and conversion issues. Unwind some interweaving
of include files using opal.h. Silence a couple of warnings.

For now, set the arch to local if PMIX_ARCH is not found.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-22 12:46:27 -07:00
Jeff Squyres
b2e0957d6f
Merge pull request #7610 from bosilca/topic/fix_MPI_T
Follow the MPI_T guidelines on return errors.
2020-04-12 14:12:32 -04:00
Ralph Castain
a210f8046f
Cleanup ompi/dpm operations
Do some code cleanup in the connect/accept code. Ensure that the OMPI
layer has access to the PMIx identifier for the process. Add macros for
converting PMIx names to/from strings. Cleanup a few of the simple test
programs. Add a little more info to a btl/tcp error message.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-08 08:37:25 -07:00
George Bosilca
f4af1848c9
Follow the MPI_T guidelines on return errors.
As indicated in the MPI3.2 document 14.3.10 page 599 line 1, the only
MPI error code possible is MPI_SUCCESS. All other errors must be in the
error class MPI_T_ERR*.
Fix the return of few pvar/cvar function that failed to correctly
convert to an MPI error code.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-04-08 00:02:45 -04:00
Jeff Squyres
9687d5e867 Upgrade all www.open-mpi.org URLs to https
Found a handful of other URLs that weren't https-ized, so I updated
them, too (after verifying that they support https, of course).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-04-02 10:43:50 -04:00
Nathan Hjelm
160ff188b8
Merge pull request #7169 from hjelmn/fix_what_wg21_calls_our_problem_not_theirs_seriously__in_some_ways_they_are_correct_but_wtf
configure: use -iquote for non-system include paths
2020-03-30 09:22:54 -07:00
Shintaro Iwasaki
8cab081770 test/class: fix opal_fifo and opal_lifo
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
2020-03-27 10:16:03 -06:00
Noah Evans
ee3517427e Add threads framework
Add a framework to support different types of threading models including
user space thread packages such as Qthreads and argobot:

https://github.com/pmodels/argobots

https://github.com/Qthreads/qthreads

The default threading model is pthreads.  Alternate thread models are
specificed at configure time using the --with-threads=X option.

The framework is static.  The theading model to use is selected at
Open MPI configure/build time.

mca/threads: implement Argobots threading layer

config: fix thread configury

- Add double quotations
- Change Argobot to Argobots
config: implement Argobots check

If the poll time is too long, MPI hangs.

This quick fix just sets it to 0, but it is not good for the
Pthreads version. Need to find a good way to abstract it.

Note that even 1 (= 1 millisecond) causes disastrous performance
degradation.

rework threads MCA framework configury

It now works more like the ompi/mca/rte configury,
modulo some edge items that are special for threading package
linking, etc.

qthreads module
some argobots cleanup

Signed-off-by: Noah Evans <noah.evans@gmail.com>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-03-27 10:15:45 -06:00
Gilles Gouaillardet
69bc2e8372 misc: fix <> vs "" includes throught the ompi codebase
This commit fixes an issue with the include usage in some
ompi source files. These source files are using the <> form
of include when the "" form is correct (as these are internal,
**not** system headers).

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-03-09 21:13:49 -04:00
Ralph Castain
dcf110d432
Add missing Makefile
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-22 13:17:34 -08:00
Ralph Castain
7e2874a83d
Save the old ORTE simple tests
Useful when debugging RTE-related issues

Not for inclusion in the tarball - just added to git repo for use by
developers.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-21 06:15:06 -08:00
Charles Shereda
cbc6feaab2 Created opal_gethostname() as safer gethostname substitute.
The opal_gethostname() function provides a more robust mechanism
to retrieve the hostname than gethostname(), which can return
results that are not null-terminated, and which can vary in its
behavior from system to system.

opal_gethostname() just returns the value in opal_process_info.nodename;
this is populated in opal_init_gethostname() inside opal_init.c.

-Changed all gethostname calls in opal subtree to opal_gethostname
-Changed all gethostname calls in orte subtree to opal_gethostname
-Changed all gethostname calls in ompi subdir to opal_gethostname
-Changed all gethostname calls in oshmem subdir to opal_gethostname
-Changed opal_if.c in test subdir to use opal_gethostname
-Changed opal_init.c to include opal_init_gethostname. This function
 returns an int and directly sets opal_process_info.nodename per
 jsquyres' modifications.

Relates to open-mpi#6801

Signed-off-by: Charles Shereda <cpshereda@lanl.gov>
2020-01-13 08:52:17 -08:00
Joseph Schuchart
c385c927fb Ensure proper alignment of memory provided by MPI
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2019-10-01 11:54:29 +02:00
Ralph Castain
373e816b37
Ensure buffer_unload leaves the buffer in a clean state
Silence a warning in orte/nidmap

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-09-04 08:32:27 -07:00
George Bosilca
82d632278a
Add a test for datatypes composed by multiple predefined
elements that can be merged into a larger UINT1 type.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-30 19:56:48 -04:00
Jeff Squyres
2ab8109be1 Update OPAL DDT variable names
These variables were renamed in
904276bb44caec207638247f23139bc21bc6a09e; update them to use the new
names.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-08-27 12:00:20 -07:00
George Bosilca
0a24f0374e
Small improvements on the test.
Rework the to_self test to be able to be used as a benchmark.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-07-09 14:50:09 -04:00
George Bosilca
6c75334162
Use the correct counter name in the example.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-29 00:54:56 -04:00
George Bosilca
d141bf7912 Update the datatype dump to match the actual types.
Update the comments to better reflect what is going on.
Minor indentations.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-10 18:03:57 -04:00
George Bosilca
e42b573cd3
Fix the PVAR allocation usage.
According to the MPI standard the obj_handle is a pointer to an MPI
object, and therefore cannot be MPI_COMM_WORLD. The MPI standard example
14.6 highlight this usage.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-02-02 19:03:43 -05:00
George Bosilca
5a82c4fd07
Provide a better fix for #6285.
The issue was a little complicated due to the internal stack used in the
convertor. The main issue was that in the case where we run out of iov
space to save the raw description of the data while hanbdling a
repetition (loop), instead of saving the current position and bailing out
directly we reading of the next predefined type element. It worked in
most cases, except the one identified by the HDF5 test. However, the
biggest issue here was the drop in performance for all ensuing calls to
the convertor pack/unpack, as instead of handling contiguous loops as a
whole (and minimizing the number of memory copies) we copied data
description by data description.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-01-31 10:01:48 -05:00
Nathan Hjelm
ea40d48899
Merge pull request #6295 from ggouaillardet/topic/opal_convertor_raw
opal/datatype: fix opal_convertor_raw()
2019-01-29 10:57:29 -07:00
Gilles Gouaillardet
45fb69b2b9 ompi/datatype: fix how we compute the space needed for the args
Refs. open-mpi/ompi#6275

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-28 15:26:11 +09:00
Gilles Gouaillardet
0832ab5acc opal/datatype: fix opal_convertor_raw
correctly handle the case in which iovec is full and the
last accessed element of the datatype is the beginning of a loop

Refs. open-mpi/ompi#6285

Thanks Axel Huebl for reporting this

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-23 15:38:43 +09:00
bosilca
182a2db2a4
Merge pull request #6029 from ggouaillardet/topic/large_datatypes
opal/datatype: correctly handle large datatypes
2018-12-24 12:49:52 -05:00
Nathan Hjelm
46255d0790 test: call opal_init/finalize_util in ddt tests
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-12-18 14:37:04 -07:00
Nathan Hjelm
0edfd328f8 opal: clean up init/finalize
This commit contains the following changes:

 - Remove the unused opal_test_init/opal_test_finalize
   functions. These functions are not used by anything in the code
   base or MTT. Tests use opal_init_util/opal_finalize_util instead.

 - Get rid of gotos in opal_init_util and opal_init. Replaced them
   with a cleaner solution.

 - Automatically register cleanup functions in init functions. The
   cleanup functions are executed in the reverse order of the
   initialization functions. The cleanup functions are run in
   opal_finalize_util() before tearing down the class system.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-12-18 14:37:04 -07:00
George Bosilca
1d8ad9281f Add more details about what is going on.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-12-06 13:30:58 +09:00
George Bosilca
88a693bf71 Add a test for very large data.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-12-06 13:30:58 +09:00
Brian Barrett
e9e4d2a4bc Handle asprintf errors with opal_asprintf wrapper
The Open MPI code base assumed that asprintf always behaved like
the FreeBSD variant, where ptr is set to NULL on error.  However,
the C standard (and Linux) only guarantee that the return code will
be -1 on error and leave ptr undefined.  Rather than fix all the
usage in the code, we use opal_asprintf() wrapper instead, which
guarantees the BSD-like behavior of ptr always being set to NULL.
In addition to being correct, this will fix many, many warnings
in the Open MPI code base.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-08 16:43:53 -07:00
Nathan Hjelm
000f9eed4d opal: add types for atomic variables
This commit updates the entire codebase to use specific opal types for
all atomic variables. This is a change from the prior atomic support
which required the use of the volatile keyword. This is the first step
towards implementing support for C11 atomics as that interface
requires the use of types declared with the _Atomic keyword.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-14 10:48:55 -06:00
Gilles Gouaillardet
a02be5e91a test: protect <sys/mount.h> with the HAVE_SYS_MOUNT_H macro
Thanks Zoltan Mizsei for bringing this to our attention.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-08-24 17:03:54 +09:00
Nathan Hjelm
1c84f48640 config: remove OPAL_ENABLE_MULTI_THREADS config macro
We long ago hard-coded this value to 1. This commit cleans it out
entirely.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-23 13:47:02 -06:00
Thananon Patinyasakdikul
390d72addd
Merge pull request #4885 from davideberius/spc_pr
Initial Software-based Performance Counters PR
2018-06-12 14:04:49 -07:00