1
1

7070 Коммитов

Автор SHA1 Сообщение Дата
Nikola Dancejic
167d75b42a common/ofi: Added multi-NIC support to provider selection
Adds the capability to select a NIC based on hardware locality.
Creates a list of NICs that share the same cpuset as the process,
then selects the NIC based on the (local rank) % (number of NICs).
If no NICs are available that share the same cpuset, the selection process
will create a list of all available NICs and make a selection based on
(local rank) % (number of NICs)

Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
2020-05-01 01:05:13 +00:00
Yossi Itigin
5dcd1f4e6c
Merge pull request #7575 from yosefe/topic/pml-ucx-fix-usage-of-mca-pml
pml/ucx: Fix usage of mca_pml_base_pml_check_selected()
2020-03-30 20:06:12 +03:00
Nathan Hjelm
160ff188b8
Merge pull request #7169 from hjelmn/fix_what_wg21_calls_our_problem_not_theirs_seriously__in_some_ways_they_are_correct_but_wtf
configure: use -iquote for non-system include paths
2020-03-30 09:22:54 -07:00
Yossi Itigin
124f0c0d1f pml/ucx: Fix usage of mca_pml_base_pml_check_selected()
Pass the correct ompi_proc_t and array length to
mca_pml_base_pml_check_selected() during dynamic modex.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2020-03-29 17:46:45 +03:00
Howard Pritchard
f136a20cae
Merge pull request #6578 from hppritcha/topic/thread_framework2
Implement a MCA framework for threads
2020-03-27 15:55:48 -06:00
Austen Lauria
8a624ab613
Merge pull request #7523 from mkurnosov/fix-bcast-scatterallgather
Fix Bcast scatter_allgather
2020-03-27 14:17:53 -04:00
Noah Evans
ee3517427e Add threads framework
Add a framework to support different types of threading models including
user space thread packages such as Qthreads and argobot:

https://github.com/pmodels/argobots

https://github.com/Qthreads/qthreads

The default threading model is pthreads.  Alternate thread models are
specificed at configure time using the --with-threads=X option.

The framework is static.  The theading model to use is selected at
Open MPI configure/build time.

mca/threads: implement Argobots threading layer

config: fix thread configury

- Add double quotations
- Change Argobot to Argobots
config: implement Argobots check

If the poll time is too long, MPI hangs.

This quick fix just sets it to 0, but it is not good for the
Pthreads version. Need to find a good way to abstract it.

Note that even 1 (= 1 millisecond) causes disastrous performance
degradation.

rework threads MCA framework configury

It now works more like the ompi/mca/rte configury,
modulo some edge items that are special for threading package
linking, etc.

qthreads module
some argobots cleanup

Signed-off-by: Noah Evans <noah.evans@gmail.com>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-03-27 10:15:45 -06:00
Brian Barrett
64d70b3076 ofi: Call add_procs through PML
Change ompi_mtl_ofi_get_endpoint() to call the active PML's
add_procs() rather than the OFI MTL add_procs() directly when
discovering a new process during operation.

Functionally, this has no impact in correct operation.  However,
the current behavior means that the heterogenous and active PML
checks are not being executed in the dynamic discovery case.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-03-27 06:06:42 -07:00
Ralph Castain
33ab928e1b ompi_proc_t size reduction: part 1
We currently save the hostname of a proc when we create the ompi_proc_t for it. This was originally done because the only method we had for discovering the host of a proc was to include that info in the modex, and we had to therefore store it somewhere proc-local. Obviously, this ccarried a memory penalty for storing all those strings, and so we added a "cutoff" parameter so that we wouldn't collect hostnames above a certain number of procs.

Unfortunately, this still results in an 8-byte/proc memory cost as we have a char* pointer in the opal_proc_t that is contained in the ompi_proc_t so that we can store the hostname of the other procs if we fall below the cutoff. At scale, this can consume a fair amount of memory.

With the switch to relying on PMIx, there is no longer a need to cache the proc hostnames. Using the "optional" feature of PMIx_Get, we restrict the retrieval to be purely proc-local - i.e., we retrieve the info either via shared memory or from within the proc-internal hash storage (depending upon the active PMIx components). Thus, the retrieval of a hostname is purely a local operation involving no communication.

All RM's are required to provide a complete hostname map of all procs at startup. Thus, we have full access to all hostnames without including them in a modex or having to cache them on each proc. This allows us to remove the char* pointer from the opal_proc_t, saving us 8-bytes/proc.

Unfortunately, PMIx_Get does not currently support the return of a static pointer to memory. Thus, even though PMIx has the hostname in its memory, it can only return a malloc'd version of it. I have therefore ensured that the return from opal_get_proc_hostname is consistently malloc'd and free'd wherever used. This shouldn't be a burden as the hostname is only used in one of two circumstances:

(a) in an error message
(b) in a verbose output for debugging purposes

Thus, there should be no performance penalty associated with the malloc/free requirement. PMIx will eventually be returning static pointers, and so we can eventually simplify this method and return a "const char*" - but as noted, this really isn't an issue even today.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-23 12:49:44 -07:00
Mikhail Kurnosov
66b6b8d34e Fix Bcast scatter_allgather
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2020-03-11 12:47:47 +07:00
Gilles Gouaillardet
69bc2e8372 misc: fix <> vs "" includes throught the ompi codebase
This commit fixes an issue with the include usage in some
ompi source files. These source files are using the <> form
of include when the "" form is correct (as these are internal,
**not** system headers).

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-03-09 21:13:49 -04:00
Austen Lauria
04a3a28a74 Some memchecker cleanup and others.
- Port memchecker call from a1d502c.
- Remove unused memcheck macro variables.
- Some code readability improvements.
- Remove some stray +1's in dynamic comm cleanup.
- Re-add OPAL_ENABLE_DEBUG macro to osc header.
- Cleanup some printf's, and includes.
- Refactor cleanup of dpm_disconnect_objs.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2020-03-05 16:44:18 -05:00
Gilles Gouaillardet
e2ad184db5 pml/ob1: silence valgrind errors
always define and initialize padding in various structs
when OPAL_ENABLE_DEBUG is set

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-03-05 16:10:43 -05:00
Gilles Gouaillardet
fc2516457b osc/pt2pt: silence valgrind warnings
explicitly add and initialize padding to keep valgrind happy

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-03-05 16:10:42 -05:00
Harumi Kuno
397cc44aa4 Fix typo in mca_pml_base_pml_check_selected
Addresses issue https://github.com/open-mpi/ompi/issues/7494

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
2020-03-02 22:49:40 -08:00
Ralph Castain
4fe9ae329c
Add missing include and remove stale PML
The "yalla" pml no longer exists

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-29 11:54:38 -08:00
Ralph Castain
cbbe67eff9
Merge pull request #7487 from bosilca/topic/pml_from_vpid0
Make sure the PML selection is consistent across the world.
2020-02-28 17:19:26 -08:00
bosilca
c4d36859ec
Merge pull request #7228 from devreal/progress-returns
Harmonize return values of progress callbacks
2020-02-28 20:15:37 -05:00
bosilca
806b35157d
Merge pull request #7261 from bosilca/fix/vprotocol
Fix/vprotocol initialization
2020-02-28 20:14:30 -05:00
George Bosilca
21d743393f
Make sure the PML is consistent across the world.
Temporary solution for the PML inconsistency issue discussed in #7475.
This patch address 2 things: first it make the PMIx key optional so that
if we are not in a full modex mode we don't do a direct modex, and
second it get the PML info from the vpid 0 instead of from the local
rank.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-02-28 17:53:48 -05:00
raafatfeki
f46cfc120d fs/gpfs: Solve issues while setting GPFS hints
1- Remove the common symbols issue: global variable not initialized. (#7424)
Move the variables to local scope within the set_info function.
2- Remove GPFS hints using datashipping: not used anymore
3- Redirect output stream to corresponding fs framework.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2020-02-25 18:14:20 -05:00
raafatfeki
9ba6ab8209 mca/fs: Check the existence of communicator in file query
The communicator might be not existent yet when mca_fs_gpfs_component_file_query() is called.
Therefore, we need to check it first before calling brodcast function.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2020-02-25 16:26:46 -05:00
Geoff Paulsen
207b267135
Merge pull request #7455 from rhc54/topic/warn
Silence a bunch of warnings
2020-02-24 08:55:30 -06:00
Edgar Gabriel
28776c5d95
Merge pull request #7448 from edgargabriel/topic/individual-as-dummy-module
sharedfp/individual: defer error when not being able to open datafile
2020-02-23 16:35:23 -06:00
Ralph Castain
86de81baca
Silence a bunch of warnings
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-22 13:05:28 -08:00
Edgar Gabriel
df6e3e503a sharedfp/individual: defer error when not being able to open datafile
This commit changes the behavior of the individual sharedfp component. If
the component cannot create either the datafile or the metadatafile during File_open,
no error is being raised going forward. This allows applications that do not use shared
file pointer operations to continue execution without any issue.

If the user however subsequently calls MPI_File_write_shared or similar operations, an error
will be raised.

Fixes issue #7429

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-02-21 12:13:39 -06:00
Austen Lauria
dd5991f513
Merge pull request #7204 from devreal/shmwin_contig
Correctly set baseptr in contiguous shared memory window with local size zero
2020-02-20 13:22:40 -05:00
Nathan Hjelm
8ee80d8855 osc/rdma: fix bug in attach for non-debug builds
This commit fixes an issue with non-debug builds where adding an
attachment to the attachment list doesn't actually happen. This
causes all MPI_Win_detach calls to fail. The call was within an
assert which is optimized out in optimized builds.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-18 12:59:46 -08:00
Nathan Hjelm
eeb3d7f845
Merge pull request #7387 from hjelmn/osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap
osc/rdma: modify attach to check for region overlap
2020-02-17 14:26:53 -07:00
Nathan Hjelm
54c8233f4f osc/rdma: bump the default max dynamic attachments to 64
This commit increaes the osc_rdma_max_attach variable from 32
to 64. The new default is kept low due to the small number
of registration resources on some systems (Cray Aries). A
larger max attachement value can be set by the user on other
systems.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-16 17:09:20 -08:00
Nathan Hjelm
6649aef8bd osc/rdma: modify attach to check for region overlap
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References #7384

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-16 17:09:06 -08:00
Tsubasa Yanagibashi
a07a83d189 osc/sm: fix typo and minor correction
- fix a typo `alloc_shared_contig` to `alloc_shared_noncontig`
- correct the value of `blocking_fence`

Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
2020-02-14 17:52:56 +09:00
Gilles Gouaillardet
174e967dbc
Remove ORTE project
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.

remove orte: misc fixes

 - UCX fixes
 - VPATH issue
 - oshmem fixes
 - remove useless definition
 - Add PRRTE submodule
 - Get autogen.pl to traverse PRRTE submodule
 - Remove stale orcm reference
 - Configure embedded PRRTE
 - Correctly pass the prefix to PRRTE
 - Correctly set the OMPI_WANT_PRRTE am_conditional
 - Move prrte configuration to the end of OMPI's configure.ac
 - Make mpirun a symlink to prun, when available
 - Fix makedist with --no-orte/--no-prrte option
 - Add a `--no-prrte` option which is the same as the legacy
   `--no-orte` option.
 - Remove embedded PMIx tarball. Replace it with new submodule
   pointing to OpenPMIx master repo's master branch
 - Some cleanup in PRRTE integration and add config summary entry
 - Correctly set the hostname
 - Fix locality
 - Fix singleton operations
 - Fix support for "tune" and "am" options

Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-02-07 18:20:06 -08:00
George Bosilca
72501f8f9c Consistent return from all progress functions.
This fix ensures that all progress functions return the number of
completed events.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-01-28 20:16:53 +01:00
Joseph Schuchart
2c97187ee0 Harmonize return values of progress callbacks
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-01-28 20:15:03 +01:00
Austen Lauria
3c0fa537c2
Merge pull request #7113 from devreal/osc_rdma_cas
RDMA osc: perform CAS in shared memory if possible
2020-01-28 10:59:17 -05:00
Austen Lauria
10f6a77640
Merge pull request #7315 from abouteiller/export/tcp_errors_v2
Handle error cases in TCP BTL (v2)
2020-01-27 17:03:07 -05:00
Aurelien Bouteiller
395fcc4253
Disable inband PML error reporting during MPI Finalize as it interferes with the Finalize process.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-01-27 13:32:34 -05:00
Austen Lauria
969eb0286c
Merge pull request #6970 from ggouaillardet/topic/cuda_no_orte
coll/cuda: remove unnecessary references to ORTE
2020-01-27 13:12:32 -05:00
Austen Lauria
4f6978466d
Merge pull request #7284 from bosilca/fix/monitoring_registration
Minor cleanup in the monitoring PML.
2020-01-27 13:01:30 -05:00
Jeff Squyres
f16d2903aa
Merge pull request #7328 from DDNStorage/ime_fs_missing_header
fs/ime: fix compilation errors due to missing header inclusion
2020-01-23 14:13:59 -05:00
Sylvain Didelot
01ae0c22b8 fs/ime: fix compilation errors due to missing header inclusion
OpenMPI doesn't compile anymore with IME because the header
file "ompi/mca/fs/base/base.h" needs to be include in every
file where mca_fs_base_get_mpi_err() is used.

Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
2020-01-22 17:56:55 +01:00
Austen Lauria
df9745a251
Merge pull request #7299 from awlauria/fix_warnings
Fix some compiler warnings.
2020-01-17 11:33:41 -05:00
Jeff Squyres
25931ea8bf
Merge pull request #7200 from cpshereda/master-opal_gethostname-change
Fix unsafe use of gethostname()
2020-01-13 16:07:43 -05:00
Charles Shereda
cbc6feaab2 Created opal_gethostname() as safer gethostname substitute.
The opal_gethostname() function provides a more robust mechanism
to retrieve the hostname than gethostname(), which can return
results that are not null-terminated, and which can vary in its
behavior from system to system.

opal_gethostname() just returns the value in opal_process_info.nodename;
this is populated in opal_init_gethostname() inside opal_init.c.

-Changed all gethostname calls in opal subtree to opal_gethostname
-Changed all gethostname calls in orte subtree to opal_gethostname
-Changed all gethostname calls in ompi subdir to opal_gethostname
-Changed all gethostname calls in oshmem subdir to opal_gethostname
-Changed opal_if.c in test subdir to use opal_gethostname
-Changed opal_init.c to include opal_init_gethostname. This function
 returns an int and directly sets opal_process_info.nodename per
 jsquyres' modifications.

Relates to open-mpi#6801

Signed-off-by: Charles Shereda <cpshereda@lanl.gov>
2020-01-13 08:52:17 -08:00
Robert Wespetal
49128a7adb mtl/ofi: Add workaround for EFA local/remote capabilities bug
Some versions of Libfabric contain a bug in EFA where FI_REMOTE_COMM and
FI_LOCAL_COMM are not advertised. In order to workaround this, we need to call
fi_getinfo() without those capability bits to see if EFA is available first.

Also move around some of the provider include/exclude list logic so we can skip
this workaround if applicable.

Signed-off-by: Robert Wespetal <wesper@amazon.com>
2020-01-13 08:26:01 -08:00
Jeff Squyres
21bc9042e1 mtl/ofi: check for FI_LOCAL_COMM+FI_REMOTE_COMM
Make sure to get an RDM provider that can provide both local and
remote communication.  We need this check because some providers could
be selected via RXD or RXM, but can't provide local communication, for
example.

Add OPAL_CHECK_OFI_VERSION_GE() m4 macro to check that the Libfabric
we're building against is >= a target version.  Use this check in two
places:

1. MTL/OFI: Make sure it is >= v1.5, because the FI_LOCAL_COMM /
   FI_REMOTE_COMM constants were introduced in Libfabric API v1.5.
2. BTL/usnic: It already had similar configury to check for Libfabric
   >= v1.1, but the usnic component was checking for >= v1.3.  So
   update the btl/usnic configury to use the new macro and check for
   >= v1.3.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-01-13 08:19:53 -08:00
George Bosilca
05093f9cb1
Minor cleanup in the monitoring PML.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-01-13 09:24:00 -05:00
Austen Lauria
b65ec27307 Fix some compiler warnings.
Silence unused variables, incompatible pointer types,
un-initialized variables, and signed/unsigned comparisons.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2020-01-10 13:10:53 -05:00
George Bosilca
7fac7a07ab
allow thread support in the pessimist vprotocol.
Add MCA parameter to allow support for message logging even when the MPI
library is initialized with support for thread multiple.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-01-08 08:00:06 -05:00