1
1

9975 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
3f874c9857 spc: remove ompi_spc_get_count() prototype from ompi_spc.h
This function is only used in ompi_spc.c and is hence declared as static.
Remove its prototype from the header file in order to silence compiler warnings who will typically consider ompi_spc_get_count() as a declared but not defined function.

Fixes open-mpi/ompi#5279
Fixes open-mpi/ompi#5273

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-18 16:07:11 +09:00
Gilles Gouaillardet
2caf1bf0e5
Merge pull request #5263 from ggouaillardet/topic/ompio_abstraction
ompio: fix abstraction
2018-06-16 23:29:29 +09:00
Matias A Cabral
e6674556aa MTL OFI: add support for FI_REMOTE_CQ_DATA.
Extend number of supported ranks with providers that support
FI_REMOTE_CQ_DATA. Add README file to OFI MTL
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2018-06-14 17:17:38 -07:00
Edgar Gabriel
d5bdcf8595 fs/pvfs2: fix compilation problem
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-14 09:30:45 -05:00
Howard Pritchard
7dcab6e4a4
Merge pull request #5269 from hppritcha/topic/squash_gcc7.3.0_warnings
topo/treematch - quash compiler warning
2018-06-13 21:13:04 -05:00
Gilles Gouaillardet
cd45c7abb6 ompio: misc renames
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-14 09:41:10 +09:00
Gilles Gouaillardet
36b35ae0db ompio: fix abstraction
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-14 09:41:10 +09:00
Howard Pritchard
64de269cc3 topo/treematch - quash compiler warning
quash a compiler warning showing up with gcc 7.3

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-06-13 16:34:17 -05:00
Thananon Patinyasakdikul
390d72addd
Merge pull request #4885 from davideberius/spc_pr
Initial Software-based Performance Counters PR
2018-06-12 14:04:49 -07:00
David Eberius
d377a6b6f4 Added Software-based Performance Counters driver code along with several counters.
This code is the implementation of Software-base Performance Counters as described in the paper 'Using Software-Base Performance Counters to Expose Low-Level Open MPI Performance Information' in EuroMPI/USA '17 (http://icl.cs.utk.edu/news_pub/submissions/software-performance-counters.pdf).  More practical usage information can be found here: https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI.

All software events functions are put in macros that become no-ops when SOFTWARE_EVENTS_ENABLE is not defined.  The internal timer units have been changed to cycles to avoid division operations which was a large source of overhead as discussed in the paper.  Added a --with-spc configure option to enable SPCs in the Open MPI build.  This defines SOFTWARE_EVENTS_ENABLE.  Added an MCA parameter, mpi_spc_enable, for turning on specific counters.  Added an MCA parameter, mpi_spc_dump_enabled, for turning on and off dumping SPC counters in MPI_Finalize.  Added an SPC test and example.

Signed-off-by: David Eberius <deberius@vols.utk.edu>
2018-06-11 22:48:16 -04:00
Jeff Squyres
84701cd2b0
Merge pull request #5204 from markalle/info_snprintf
fix info-subscribe to use snprintf() and warn on long key
2018-06-08 15:22:55 -04:00
Edgar Gabriel
2d8a769bfd fcoll/static: remove component
now that we have a shiny new fcoll component, no need
to keep the static component around. No use for it anymore.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-08 07:39:46 -05:00
Edgar Gabriel
b27a40cdf9
Merge pull request #5246 from edgargabriel/topic/ibm-testsuite-fixes
Topic/ibm testsuite fixes
2018-06-08 06:06:49 -05:00
Yossi Itigin
fd12540751
Merge pull request #5227 from hoopoepg/topic/pml-ucx-hang-on-finalize
PML/UCX: fixed hand on MPI_Finalize
2018-06-08 13:19:49 +03:00
KAWASHIMA Takahiro
317e53f83f
Merge pull request #5243 from t-kurita/pr/mpiext-mpi-f08-logical
Fortran: Enable using `LOGICAL` parameter in MPI extensions.
2018-06-08 13:11:13 +09:00
Edgar Gabriel
a1484ec69a io/ompio: check error conditions before executing file_sync
check for pending I/O operations and invalid modes
and return proper error codes before executing MPI_File_sync

makes the e_sync_1 test from the ibm testsuite pass.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 19:30:27 -05:00
Edgar Gabriel
5f1e88d265 mpi/c: check for valid datatype in file_get_type_extend
the interface if file_get_type_extent did not check
whether the input datatype is valid or not.

Makes the e_get_type_extend_2 test from the ibm testsuite pass.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 19:30:27 -05:00
Edgar Gabriel
14bd114973 common/ompio: return error code from file_delete operation in file_close
in case the user opened a file using the DELETE_ON_CLOSE flag,
return the error code generated in the delete operation.

Note, that this is however just a partial fix to the e_close_1 test
from the ibm testsuite, since the object destructor that triggers
the file_close function does not have a mechanism right now to recognize
and return an error code.

Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>
2018-06-07 19:30:14 -05:00
Edgar Gabriel
f7cae7731c io/ompio: return error code for invalid offset
in file_get_byte_offset, return an error code if the offset
leads to an invalid position in file.

Makes the e_get_byte_offset_1 test from the ibm testsuite pass.

Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>
2018-06-07 18:46:17 -05:00
Edgar Gabriel
deaeaa60de fcoll/vulcan: minor bugfix
when creating the groups_per_proc arrays

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 17:52:32 -05:00
Edgar Gabriel
8feb497dbe io/ompio: cleanup the aggregator selection logic
and some internal structure elements/components. Along the way,
add support for the cb_nodes Info object.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 16:47:10 -05:00
Edgar Gabriel
529d882ff0 io/ompio and common/ompio: relocate ompio_request code to common
since the request code is now being accessed also from the vulcan fcoll
component, the request code was relocated into the common/ompio
directory to avoid ld load problems.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 16:13:12 -05:00
raafatfeki
5ecb4a56e3 fcoll/vulcan: Support of asynchronous write in collective writeAll
We introduced a new mca_vulcan parameter that specify the I/O synchronization
type (Async/sync I/O) applied within the collective write operation.
The user can explicitly choose to use async or sync write operation or make
the choice automatically made.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2018-06-07 16:13:12 -05:00
raafatfeki
4f7172ddf6 fcoll/vulcan: Support of larger offsets
For very large offsets, the data chunk size to be written by each aggregator
exceeds the capacity of an integer variable. Besides, some variables were
not large enough to hold intermediate values.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2018-06-07 16:13:12 -05:00
raafatfeki
4670fe50d7 fcoll/vulcan: Remove unnecessary calls to write
Identify the index of each aggregator process in order to restrict the call to write_init function by the specific aggregator.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2018-06-07 16:13:12 -05:00
raafatfeki
bc6431bee9 fcoll/vulcan: use hindexed constructor on the sender side
Instead of using a temporary buffer and copy data into the temp buffer before sending, use a derived datatype to describe the data that needs to be sent during a cycle in the collective I/O operation.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2018-06-07 16:13:12 -05:00
Edgar Gabriel
1c2c110824 fcoll/vulcan: add new fcoll component
import of the new vulcan component. It is an enhanced version
of the two_phase component, which uses however the ompio internal
codes/loops to assemble the data arrays. It is therefore more inline
with the dynamic and dynamic_gen2 component, and will be easier to
maintain.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 16:13:12 -05:00
Kurita, Takehiro
f9ae932bfd Fortran: Enable using LOGICAL parameter in MPI extensions.
If a subroutine of the Fortran `use-mpi-f08` binding in an MPI extension
have a `LOGICAL` parameter and no `TYPE(MPI_Status)` parameter,
it needs to use the `mpi_ext` module and call its corresponding subroutine
in the `mpif-h` directory, as explained in
`ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h`.
However, as shown in the figure below, the required directories are dependent
on each other, and "Can't open module file" error occurs at build time.

             ompi/mpiext/{extension name}/use-mpi-f08
                A                               |
                |                               |
                |                               V
   ompi/mpi/fortran/use-mpi-f08  <---  ompi/mpi/fortran/mpiext (mpi_ext.mod)

In order to solve this problem, change the configuration and the build order.
- divide Fortran extension directory (`ompi/mpi/fortran/mpiext`)
  into the directories for `use-mpi` and for `use-mpi-08`
    - `ompi/mpi/fortran/mpiext-use-mpi`     : for `use-mpi` (mpi_ext.mod)
    - `ompi/mpi/fortran/mpiext-use-mpi-f08` : for `use-mpi-08` (mpi_f08_ext.mod)

- change to the following build order about Fortran `use-mpi` and
  `use-mpi-f08` bindings in `ompi`
    1. mpi_ext bindings of MPI extensions (`mpiext/{extension name}/use-mpi` directory)
    2. Fortran use-mpi (`mpi/fortran/use-mpi-[ignore-]tkr` directory)
    3. Fortran extension for use-mpi (`mpi/fortran/mpiext-use-mpi` directory)
    4. Fortran use-mpi-f08 modules only (`mpi/fortran/use-mpi-f08/mod` directory)
    5. mpi_f08_ext bindings of MPI extensions (`mpiext/{extension name}/use-mpi-f08` directory)
    6. Fortran use-mpi-f08 (`mpi/fortran/use-mpi-f08` directory)
    7. Fortran extension for use-mpi-f08 (`mpi/fortran/mpiext-use-mpi-f08` directory)

Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>
2018-06-07 15:02:17 +09:00
Nathan Hjelm
63ded4d083
Merge pull request #5224 from benmenadue/master
io/romio314: Replace deprecated MPI-1 functions
2018-06-06 15:41:53 -06:00
Ralph Castain
5853ebee1a
Merge pull request #5240 from rhc54/topic/foo
Correct typo in name comparison flags
2018-06-06 13:25:20 -07:00
Ralph Castain
86d699d42e Correct typo in name comparison flags
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-06 12:18:52 -07:00
bosilca
fa1386768f
Merge pull request #5234 from jsquyres/pr/oshmem-init-race
ompi_mpi_init: fix race condition
2018-06-06 12:14:00 -04:00
Jeff Squyres
9b9cb5fef0 to be squashed: move wait-for-init loop to ompi_mpi_init()
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-06 05:35:19 -07:00
Ralph Castain
840fb42f93 PMIx rte component does support dynamics
Minor cleanups

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-05 21:55:19 -07:00
Jeff Squyres
67ba8da76f ompi_mpi_init: fix race condition
There was a race condition in 35438ae9b5: if multiple threads invoked
ompi_mpi_init() simultaneously (which could happen from both MPI and
OSHMEM), the code did not catch this condition -- Bad Things would
happen.

Now use an atomic cmp/set to ensure that only one thread is able to
advance ompi_mpi_init from NOT_INITIALIZED to INIT_STARTED.

Additionally, change the prototype of ompi_mpi_init() so that
oshmem_init() can safely invoke ompi_mpi_init() multiple times (as
long as MPI_FINALIZE has not started) without displaying an error.  If
multiple threads invoke oshmem_init() simultaneously, one of them will
actually do the initialization, and the rest will loop waiting for it
to complete.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-05 18:09:13 -07:00
Nathan Hjelm
64a5baaa28
Merge pull request #5193 from hjelmn/osc_sm_location
Use /dev/shm for shared memory files in osc components
2018-06-05 09:42:14 -06:00
Sergey Oblomov
0a8261f3b0 PML/UCX: fixed hand on MPI_Finalize
fixes issue https://github.com/openucx/ucx/issues/2656

added flush for worker object to complete all pending operations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-05 17:22:03 +03:00
Mikhail Kurnosov
3adf96fdb8 coll/base: add butterfly algorithm for MPI_Reduce_scatter
Implements butterfly algorithm for MPI_Reduce_scatter.
The algorithm can be used both by commutative and non-commutative operations, for power-of-two and non-power-of-two number of processes.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-06-05 15:53:13 +07:00
Ben Menadue
34ec0bd8ab Replace MPI_Type_extent with MPI_Type_get_extent in ROMIO.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
2018-06-05 15:27:58 +10:00
Ben Menadue
756cc67221 Replace MPI_Address with MPI_Get_address in ROMIO.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
2018-06-05 15:27:25 +10:00
Gilles Gouaillardet
05b3546151 java: do not use MPI1 deprecated subroutines
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-04 10:49:52 +09:00
Ralph Castain
3020b699f3
Merge pull request #5213 from rhc54/topic/rte
Enable the PMIx ompi/rte component
2018-06-03 10:23:40 -07:00
Ralph Castain
55ac526a67 Enable the PMIx ompi/rte component
Get the OMPI rte/pmix component working. This was tested using PRRTE as the RM, configuring OMPI using:

* autogen --no-orte

* with external libevent, external hwloc, and external PMIx master

* configuring PMIx master with the same libevent and hwloc

* execute the application using PRRTE's "prun" launcher, which has the same cmd line as ORTE's mpirun

Note that PMIx master appears to have a bug in the event notification system that caches job termination events. Thus, the first execution runs fine, but subsequent executions cause an "abort" when the OMPI default error handler is invoked upon notification of the prior job's termination. Will work that separately.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 134cca9ac0de092d767999357573a31703f72292)
2018-06-03 07:25:12 -07:00
Mark Allen
93fefc4d70 fix info-subscribe to use snprintf() and warn on long key
This checkin mainly concerns our internal info keys that are registering
for callbacks via opal_infosubscribe_subscribe(). Those keys need to have
an extra __IN_<key>/val stored to preserve their pre-callback value. So
that means our internal keys are limited to 5 chars shorter than the usual
key length limit.

The code previously would have been silently inactive if a large key happened
to come in, now it warns and also uses snprintf() to avoid compiler warnings.

I'm also making the top-level MPI_Info_set warn if the user uses our reserved
"__IN_" prefix. I had wanted the feature to be more invisible than that, but
it would require a more sophisticated approach to change that.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2018-06-01 18:31:32 -04:00
Jeff Squyres
38ed70de6f ompi_mpi_finalize: remove some dead code
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-01 13:37:20 -07:00
Jeff Squyres
35438ae9b5 mpi/finalized: revamp INITIALIZED/FINALIZED
Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE).  In such cases, MPI_FINALIZED must return "false".

Prior to this commit, we hung in FINALIZED if it were invoked during
a COMM_SELF attribute destruction callback in FINALIZE.  See
https://github.com/open-mpi/ompi/issues/5084.

This commit converts the MPI_INITIALIZED / MPI_FINALIZED
infrastructure to use a single enum (ompi_mpi_state, set atomically)
to represent the state of MPI:

- not initialized
- init started
- init completed
- finalize started
- finalize past COMM_SELF destruction
- finalize completed

The "finalize past COMM_SELF destruction" state is what allows us to
return "false" from MPI_FINALIZED before COMM_SELF has been fully
destroyed / all attribute callbacks have been invoked.

Since this state is checked at nearly every MPI API call (to see if
we're outside of the INIT/FINALIZE epoch), care was taken to use
atomics to *set* the ompi_mpi_state value in ompi_mpi_init() and
ompi_mpi_finalize(), but performance-critical code paths can simply
read the variable without needing to use a slow call to an
opal_atomic_*() function.

Thanks to @AndrewGaspar for reporting the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-01 13:36:29 -07:00
Edgar Gabriel
52bd606294 fcoll/dynamic_gen2: make sure that intermediate variables can hold the offset
for very large offsets, ome ariables used in the fcoll/dynamic_gen2
code base were under certain circumstances not large enough to hold
intermediate values. This issue was more detected in the vulcan component
but could happen in the dynamic_gen2 component as well.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-01 06:53:38 -05:00
Gilles Gouaillardet
9f7586465d fortran/mpif-h: fix MPI1 compatibility Makefile
appends MPI1 compatible source files instead of redefining all the source files
fix a typo from open-mpi/ompi@89da9651bb

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-01 09:52:22 +09:00
Nathan Hjelm
b323655809 mpi: make C++ bindings compile when MPI-1 compat is disabled
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-05-31 09:44:19 -06:00
Nathan Hjelm
89da9651bb ompi: disable functions removed from MPI-3.0 by default
This commit adds a new configure option: --enable-mpi1-compat. Without
this option we will no longer provide APIs, typedefs, and defines that
were removed from the standard in MPI-3.0. This option will exist for
one major release (Open MPI v4.x.x) and then the option and associated
code will be removed in Open MPI v5.x.x. Open MPI has already
internally prepared for this change. Please prepare your codes
accordingly.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-05-31 09:44:19 -06:00