1
1

26984 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
f2a27cc991 Merge pull request #3396 from hppritcha/topic/swat_compiler_warning
btl/sm: swat a compiler warning
2017-04-22 14:31:21 -06:00
Gilles Gouaillardet
ebe6125750 mpi/c: MPI_PROC_NULL is not a valid rank in MPI_Win_{lock,unlock}
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-22 11:13:13 +09:00
Ralph Castain
f2ed293ecd Merge pull request #3398 from rhc54/topic/modex
Implement a background fence that collects all data during modex operation
2017-04-21 15:15:49 -07:00
Jeff Squyres
dec707f018 Merge pull request #3397 from jsquyres/pr/usnic-moar-iov-limit-fixes
usnic: more iov_limit fixes
2017-04-21 16:19:45 -04:00
Ralph Castain
9fc3079ac2 Implement a background fence that collects all data during modex operation
The direct modex operation is slow, especially at scale for even modestly-connected applications. Likewise, blocking in MPI_Init while we wait for a full modex to complete takes too long. However, as George pointed out, there is a middle ground here. We could kickoff the modex operation in the background, and then trap any modex_recv's until the modex completes and the data is delivered. For most non-benchmark apps, this may prove to be the best of the available options as they are likely to perform other (non-communicating) setup operations after MPI_Init, and so there is a reasonable chance that the modex will actually be done before the first modex_recv gets called.

Once we get instant-on-enabled hardware, this won't be necessary. Clearly, zero time will always out-perform the time spent doing a modex. However, this provides a decent compromise in the interim.

This PR changes the default settings of a few relevant params to make "background modex" the default behavior:

* pmix_base_async_modex -> defaults to true

* pmix_base_collect_data -> continues to default to true (no change)

* async_mpi_init - defaults to true. Note that the prior code attempted to base the default setting of this value on the setting of pmix_base_async_modex. Unfortunately, the pmix value isn't set prior to setting async_mpi_init, and so that attempt failed to accomplish anything.

The logic in MPI_Init is:

* if async_modex AND collect_data are set, AND we have a non-blocking fence available, then we execute the background modex operation

* if async_modex is set, but collect_data is false, then we simply skip the modex entirely - no fence is performed

* if async_modex is not set, then we block until the fence completes (regardless of collecting data or not)

* if we do NOT have a non-blocking fence (e.g., we are not using PMIx), then we always perform the full blocking modex operation.

* if we do perform the background modex, and the user requested the barrier be performed at the end of MPI_Init, then we check to see if the modex has completed when we reach that point. If it has, then we execute the barrier. However, if the modex has NOT completed, then we block until the modex does complete and skip the extra barrier. So we never perform two barriers in that case.

HTH
Ralph

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-21 10:29:23 -07:00
Jeff Squyres
1d5e08f44a usnic: more iov_limit fixes
Follow on to 7bd2de9960419422a4591f4b5d286f1f911a0a47: move setting
the iov_limit to 1 earlier in the startup sequence.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-04-21 09:14:28 -07:00
Howard Pritchard
782f1bb9af btl/sm: swat a compiler warning
gnu 6.3.1 complaining about uninitialized variable

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-21 10:02:56 -05:00
Jeff Squyres
e9e89e502b Merge pull request #3245 from hjelmn/auto_bool
mca/base: accept y and n for bool and auto bool enumerator
2017-04-21 10:41:10 -04:00
Howard Pritchard
462342d148 Merge pull request #3311 from hppritcha/topic/libfabric_moves_to_ofi
common/libfabric: move libfabric to ofi
2017-04-21 07:50:38 -06:00
Jeff Squyres
68956ea100 Merge pull request #3386 from jsquyres/pr/usnic-set-iov-limit
usnic: ensure to set the iov_limit to 1
2017-04-20 21:51:19 -04:00
Jeff Squyres
7bd2de9960 usnic: ensure to set the iov_limit to 1
The usNIC BTL does not use more than 1 iov, so be sure to set it to 1
so that we don't allocate cq/rq/sq entries based on a default (i.e.,
>1) number of iovs per entry.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-04-20 13:28:15 -07:00
Howard Pritchard
841192645b common/libfabric: move libfabric to ofi
This PR renames the common library for OFI libfabric from
libfabric to ofi.  There are a number of reasons this
is good to do:

1) its shorter and replaces 9 characters with three for
   function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
   the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
   the term "ofi" internally and also in their configure options
   relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)

There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.

This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-20 13:07:16 -06:00
Ralph Castain
d2b603986d Merge pull request #3383 from rhc54/topic/timing
Increase fine grain of timing info
2017-04-20 05:37:51 -07:00
Ralph Castain
c86f71376a Increase fine grain of timing info
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-20 00:17:40 -07:00
Ralph Castain
46ea7bf841 Merge pull request #3382 from rhc54/topic/gadget
Update gadget platform file
2017-04-19 21:02:37 -07:00
Ralph Castain
243076dd8c Update gadget platform file
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-19 21:45:13 -06:00
Gilles Gouaillardet
ded63c5e0c ompi: use ompi_coll_base_sendrecv_actual() whenever possible
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-20 10:01:28 +09:00
Gilles Gouaillardet
52551d96c1 Merge pull request #3285 from ggouaillardet/topic/coll_zerobyte_messages
coll/base: always send/recv zero-byte messages
2017-04-20 09:22:47 +09:00
Gilles Gouaillardet
b18745589f Merge pull request #1665 from ggouaillardet/topic/OMPI_DATATYPE_INIT_UNAVAILABLE_BASIC_TYPE
ompi/datatype: define OMPI_DATATYPE_INIT_UNAVAILABLE_BASIC_TYPE macro
2017-04-20 09:10:07 +09:00
Nathaniel Graham
34b4aeb17f Merge pull request #3339 from nrgraham23/mpirun_help_improvements
Additional mpirun --help changes
2017-04-19 14:05:07 -06:00
Nathaniel Graham
01312b2f90 Additional mpirun --help changes
This commit recategorizes several mpirun arguments,
and moves the information for mpirun --help arguments
to the bottom of the general help message.  I also
added the OPAL_CMD_LINE_OTYPE field to two commands
that were missed initially because they were not
in the same area as the others.

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2017-04-19 11:43:45 -06:00
Gilles Gouaillardet
cc8a655fe6 configury: remove now obsolete reference to OPAL_PTRDIFF_TYPE
since Open MPI now requires a C99, and ptrdiff_t type is part of C99,
there is no more need for the abstract OPAL_PTRDIFF_TYPE type.

Thanks George, Nathan and Paul for the help.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-19 13:42:45 +09:00
Gilles Gouaillardet
fa5cd0dbe5 use ptrdiff_t instead of OPAL_PTRDIFF_TYPE
since Open MPI now requires a C99, and ptrdiff_t type is part of C99,
there is no more need for the abstract OPAL_PTRDIFF_TYPE type.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-19 13:41:56 +09:00
Gilles Gouaillardet
dcf9cca21f ompi/datatype: add the OMPI_DATATYPE_INIT_UNAVAILABLE_BASIC_TYPE macro
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-19 13:09:33 +09:00
bosilca
872cf44c28 Improve the opal_pointer_array & more (#3369)
* Complete rewrite of opal_pointer_array
Instead of a cache oblivious linear search use a bits array
to speed up the management of the free space. As a result we
slightly increase the memory used by the structure, but we get a
significant boost in performance.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Do not register datatypes in the f2c translation table.
The registration is now done up into the Fortran layer, by
forcing a call to MPI_Type_c2f.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-04-18 21:41:26 -04:00
Gilles Gouaillardet
23dad50d51 mpi/c: allow MPI_PROC_NULL in MPI_Win_shared_query()
This fixes a regression introduced in open-mpi/ompi@b3a20100d3

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-19 10:06:41 +09:00
Howard Pritchard
3918b7a796 Merge pull request #3213 from hppritcha/topic/remove_loadleveer
orte/ras: remove loadleveler support
2017-04-18 09:18:54 -06:00
Yossi
9ebcafd6d6 Merge pull request #3260 from derbeyn/fix_yalla
Fix yalla PML: MPI_Recv does not return MPI_ERR_TRUNCATE upon overflow
2017-04-18 11:37:48 +03:00
Jeff Squyres
518cdd1603 Merge pull request #3361 from jsquyres/pr/fix-static-builds
dl/dlopen: add libs to wrapper LIBS
2017-04-16 06:57:08 -04:00
Jeff Squyres
a0543616ee dl/dlopen: add libs to wrapper LIBS
With this, libs (e.g., "-ldl") are not added to the wrapper LIBS
flags.  This may work on some platforms, but on at least RHEL 7.3, it
does not (i.e., compiling MPI applications fails because it can't find
dlopen).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-04-15 09:30:18 -07:00
Ralph Castain
3f5a1dc56d Merge pull request #3356 from rhc54/topic/bind
Use the node index to compare to daemon vpid when identifying procs to bind
2017-04-14 06:12:04 -07:00
Ralph Castain
bb1aaa3286 Use the node index to compare to daemon vpid when identifying procs to bind
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-14 02:37:25 -07:00
Ralph Castain
31ad45312f Merge pull request #3350 from rhc54/topic/josh
On behalf of Josh, ensure we flag that the child is no longer alive since we are killing it with SIGKILL
2017-04-14 02:16:47 -07:00
Ralph Castain
67156556ce On behalf of Josh, ensure we flag that the child is no longer alive since we are killing it with SIGKILL
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-13 21:07:26 -07:00
Ralph Castain
3202de8787 Merge pull request #3349 from rhc54/topic/reg
Fix event registration - need to increment the event index and record the number of codes in the event handler
2017-04-13 18:24:32 -07:00
Ralph Castain
ffbfd22d84 Fix event registration - need to increment the event index and record the number of codes in the event handler
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-13 17:35:10 -07:00
Ralph Castain
fbf714d326 Merge pull request #3331 from artpol84/orte_cleanup/master
orte/pmix: Do not set orted exit status to one from proc abort
2017-04-13 15:14:14 -07:00
Ralph Castain
004d655e38 Merge pull request #3344 from rhc54/topic/cov
Minor coverity cleanups
2017-04-13 12:54:08 -07:00
Howard Pritchard
4b9840e886 Merge pull request #3332 from hppritcha/topic/contributing
CONTRIBUTING: add a CONTRIBUTING.md file
2017-04-13 11:54:49 -06:00
Howard Pritchard
6b1dbaa65e CONTRIBUTING: add a CONTRIBUTING.md file
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-13 10:53:43 -06:00
Jeff Squyres
38d712ab96 Merge pull request #3328 from jsquyres/pr/guidelines
Create a Github issue template
2017-04-13 10:55:38 -04:00
Yossi
3081d7e626 Merge pull request #3346 from alinask/topic/ucx_yalla_fix_msg_release
PML/UCX/YALLA: Fix the message release call.
2017-04-13 16:35:06 +03:00
Alina Sklarevich
eec310c99c PML/UCX/YALLA: Fix the message release call.
Set message to MPI_MESSAGE_NULL.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-13 14:41:13 +03:00
Ralph Castain
1585854335 Minor coverity cleanups
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-12 19:31:35 -07:00
Gilles Gouaillardet
6886c1229a Merge pull request #3327 from jeffhammond/fix-issue-3326
check for negative ranks in ompi_win_peer_invalid
2017-04-13 10:53:32 +09:00
Ralph Castain
f1403ac3c2 Merge pull request #3336 from rhc54/topic/launchmon
Update the debugger launch code to reflect the new backend mapping method.
2017-04-12 15:10:23 -07:00
Ralph Castain
5df9567a23 Merge pull request #3335 from rhc54/topic/backward
Update to latest PMIx master, including disabling the pmi-1 and pmi-2…
2017-04-12 13:48:19 -07:00
Ralph Castain
0500cc1c66 Update the debugger launch code to reflect the new backend mapping method.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-12 13:31:18 -07:00
Jeff Squyres
fa10e1ea97 Create a Github issue template
So that we can stop asking common questions like "What version of Open
MPI are you using?", etc.

[skip ci]
bot:notest

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-04-12 16:18:32 -04:00
Ralph Castain
9f73974fe1 Update to latest PMIx master, including disabling the pmi-1 and pmi-2 backward compatibility as these interfere with the s1,s2 components
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-12 12:34:27 -07:00