1
1

10049 Коммитов

Автор SHA1 Сообщение Дата
Mikhail Kurnosov
6547b58316 coll/base: add knomial tree algorithm for MPI_Bcast
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-06-19 13:01:26 -06:00
Edgar Gabriel
c3ac06dc1b sharedfp/sm and lockedfile: fix coverty warnings
this commit fixes the coverty warnings CID 1437402 and
CID 1437401

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-19 10:04:51 -05:00
Yossi Itigin
c2fbf3a3e8 osc_ucx: register progress on-demand
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-06-19 12:47:08 +03:00
Edgar Gabriel
9986a15b57 sharedfp/individual: only complain about fseek if sharedfp operations are really in use
this component can only be used in very specific scenarios. However, since some file systems do not support file locking and processes might be distributed over multiple nodes (hence the sm sharedfp component is also inelligible), the component might be selected in some scenarios, even if an application does not intend to use shared file pointers.

Since the fseek_shared function is involved as part of the File_set_view operation, only complain about the inability to perform the seek_shared operation if actual shared file pointer operations are being used. This avoid spurious error values being returned.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-18 18:25:29 -05:00
Edgar Gabriel
bb1522472f
Merge pull request #5286 from edgargabriel/topic/sharedfp-revamp
sharedfp/all components: revamp internal operations
2018-06-18 16:09:54 -05:00
Edgar Gabriel
bc0f60dfd9 sharedfp/all components: revamp internal operations
this commit revamps the internal operations of the sharedfp components.
Specifically, it is focused around removing the second file_open
operation for shared file pointers. This makes the code more efficient.
Because of that, there is no necessity anymore for the sharedfp_lazy_open
mca parameter.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-18 14:34:05 -05:00
Yossi Itigin
733cac864a
Merge pull request #5282 from yosefe/topic/pml-ucx-opal-mem-hooks
pml_ucx: add option to use opal memhooks instead of ucx internal hooks
2018-06-18 19:07:01 +03:00
Gilles Gouaillardet
3f874c9857 spc: remove ompi_spc_get_count() prototype from ompi_spc.h
This function is only used in ompi_spc.c and is hence declared as static.
Remove its prototype from the header file in order to silence compiler warnings who will typically consider ompi_spc_get_count() as a declared but not defined function.

Fixes open-mpi/ompi#5279
Fixes open-mpi/ompi#5273

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-18 16:07:11 +09:00
Yossi Itigin
564f80d362 pml_ucx: add option to use opal memhooks instead of ucx internal hooks
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-06-17 15:30:44 +03:00
Gilles Gouaillardet
2caf1bf0e5
Merge pull request #5263 from ggouaillardet/topic/ompio_abstraction
ompio: fix abstraction
2018-06-16 23:29:29 +09:00
Matias A Cabral
e6674556aa MTL OFI: add support for FI_REMOTE_CQ_DATA.
Extend number of supported ranks with providers that support
FI_REMOTE_CQ_DATA. Add README file to OFI MTL
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2018-06-14 17:17:38 -07:00
Edgar Gabriel
d5bdcf8595 fs/pvfs2: fix compilation problem
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-14 09:30:45 -05:00
Howard Pritchard
7dcab6e4a4
Merge pull request #5269 from hppritcha/topic/squash_gcc7.3.0_warnings
topo/treematch - quash compiler warning
2018-06-13 21:13:04 -05:00
Gilles Gouaillardet
cd45c7abb6 ompio: misc renames
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-14 09:41:10 +09:00
Gilles Gouaillardet
36b35ae0db ompio: fix abstraction
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-14 09:41:10 +09:00
Howard Pritchard
64de269cc3 topo/treematch - quash compiler warning
quash a compiler warning showing up with gcc 7.3

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-06-13 16:34:17 -05:00
Thananon Patinyasakdikul
390d72addd
Merge pull request #4885 from davideberius/spc_pr
Initial Software-based Performance Counters PR
2018-06-12 14:04:49 -07:00
David Eberius
d377a6b6f4 Added Software-based Performance Counters driver code along with several counters.
This code is the implementation of Software-base Performance Counters as described in the paper 'Using Software-Base Performance Counters to Expose Low-Level Open MPI Performance Information' in EuroMPI/USA '17 (http://icl.cs.utk.edu/news_pub/submissions/software-performance-counters.pdf).  More practical usage information can be found here: https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI.

All software events functions are put in macros that become no-ops when SOFTWARE_EVENTS_ENABLE is not defined.  The internal timer units have been changed to cycles to avoid division operations which was a large source of overhead as discussed in the paper.  Added a --with-spc configure option to enable SPCs in the Open MPI build.  This defines SOFTWARE_EVENTS_ENABLE.  Added an MCA parameter, mpi_spc_enable, for turning on specific counters.  Added an MCA parameter, mpi_spc_dump_enabled, for turning on and off dumping SPC counters in MPI_Finalize.  Added an SPC test and example.

Signed-off-by: David Eberius <deberius@vols.utk.edu>
2018-06-11 22:48:16 -04:00
KAWASHIMA Takahiro
a38e9e064f coll: Update COLL module interface version to 2.3.0
Members for persistent operations are added to the module structure
in a prior commit.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
e12a5056f1 coll/libnbc: Rename internal functions
The `nbc_i*` functions don't start communication, but create a request.
`nbc_*_init` are appropriate names for them.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
5c21903477 coll/libnbc: Add assertion for NBC_A2A_DISS
Persistent operation for `NBC_A2A_DISS` is not supported currently.
Though the algorithm is not selected at all currently, I put an
assertion not to select it by mistake.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
0b8b0f8393 coll/libnbc: Implement MPI_STARTALL
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
ed0144bad4 coll/libnbc: Adapt local copy for persistent request
`NBC_Copy` shoud not be called in `MPI_*_INIT`.
`NBC_Sched_copy` should be called instead.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
5c5de3a4fb coll/libnbc: Fix handling of completed request
Because a persistent reuqest does not free its `schedule` object
when the communication completes, the `NBC_Progress` function cannot
determine the completion using `schedule`.

Without this change, a hang occurs when the `NBC_Progress` function
is called recursively through the `NBC_Start_round` function.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
8e5690bf5c coll/libnbc: Correct persistent request handling
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
e72f510daf ompi/request: Add ompi_request_persistent_noop_create
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
Gilles Gouaillardet
9a63dacf1c mpi: check MPI_Start[all] is invoked on persistent requests
and errors otherwise.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
545e9af896 mpiext/pcollreq: Add MPIX_Bcast_init etc.
Until the MPI Forum decides to add the persistent collective
communication request feature to the MPI Standard, these functions
are supported through MPI extensions with the `MPIX_` prefix.

Only C bindings are supported currently.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
e69e99575e coll: Enable func check in mca_coll_base_comm_select
Now libnbc COLL supports persistent collectives and all `*_init`
functions of the COLL interface are available. So let's enable the
check of availability of those functions on a communicator creation.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 09:53:37 +09:00
Gilles Gouaillardet
a9609b6bf8 coll/libnbc: add persistent collectives implementation
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-11 09:53:37 +09:00
KAWASHIMA Takahiro
a9fdea51aa coll: Add persistent collective communication request feature
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 09:53:37 +09:00
Gilles Gouaillardet
c753e9baff coll/libnbc: code refactoring
prepare the upcoming persistent collectives by pre-factoring some code

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

fixup 808c3c62cd9475edd91ecde9d2d53b12e28b2c04
2018-06-11 09:53:37 +09:00
Gilles Gouaillardet
fe0bb6c310 coll/libnbc: misc revamp - merge NBC_Init_handle() into NBC_Schedule_request() - set schedule in NBC_Schedule_request instead of NBC_Start() - update NBC_Start() prototype
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-11 09:53:37 +09:00
Gilles Gouaillardet
360a76f440 coll/libnbc: revamp ibcast and use NBC_Schedule_request()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-11 09:53:37 +09:00
Jeff Squyres
84701cd2b0
Merge pull request #5204 from markalle/info_snprintf
fix info-subscribe to use snprintf() and warn on long key
2018-06-08 15:22:55 -04:00
Edgar Gabriel
2d8a769bfd fcoll/static: remove component
now that we have a shiny new fcoll component, no need
to keep the static component around. No use for it anymore.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-08 07:39:46 -05:00
Edgar Gabriel
b27a40cdf9
Merge pull request #5246 from edgargabriel/topic/ibm-testsuite-fixes
Topic/ibm testsuite fixes
2018-06-08 06:06:49 -05:00
Yossi Itigin
fd12540751
Merge pull request #5227 from hoopoepg/topic/pml-ucx-hang-on-finalize
PML/UCX: fixed hand on MPI_Finalize
2018-06-08 13:19:49 +03:00
KAWASHIMA Takahiro
317e53f83f
Merge pull request #5243 from t-kurita/pr/mpiext-mpi-f08-logical
Fortran: Enable using `LOGICAL` parameter in MPI extensions.
2018-06-08 13:11:13 +09:00
Edgar Gabriel
a1484ec69a io/ompio: check error conditions before executing file_sync
check for pending I/O operations and invalid modes
and return proper error codes before executing MPI_File_sync

makes the e_sync_1 test from the ibm testsuite pass.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 19:30:27 -05:00
Edgar Gabriel
5f1e88d265 mpi/c: check for valid datatype in file_get_type_extend
the interface if file_get_type_extent did not check
whether the input datatype is valid or not.

Makes the e_get_type_extend_2 test from the ibm testsuite pass.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 19:30:27 -05:00
Edgar Gabriel
14bd114973 common/ompio: return error code from file_delete operation in file_close
in case the user opened a file using the DELETE_ON_CLOSE flag,
return the error code generated in the delete operation.

Note, that this is however just a partial fix to the e_close_1 test
from the ibm testsuite, since the object destructor that triggers
the file_close function does not have a mechanism right now to recognize
and return an error code.

Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>
2018-06-07 19:30:14 -05:00
Edgar Gabriel
f7cae7731c io/ompio: return error code for invalid offset
in file_get_byte_offset, return an error code if the offset
leads to an invalid position in file.

Makes the e_get_byte_offset_1 test from the ibm testsuite pass.

Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>
2018-06-07 18:46:17 -05:00
Edgar Gabriel
deaeaa60de fcoll/vulcan: minor bugfix
when creating the groups_per_proc arrays

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 17:52:32 -05:00
Edgar Gabriel
8feb497dbe io/ompio: cleanup the aggregator selection logic
and some internal structure elements/components. Along the way,
add support for the cb_nodes Info object.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 16:47:10 -05:00
Edgar Gabriel
529d882ff0 io/ompio and common/ompio: relocate ompio_request code to common
since the request code is now being accessed also from the vulcan fcoll
component, the request code was relocated into the common/ompio
directory to avoid ld load problems.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 16:13:12 -05:00
raafatfeki
5ecb4a56e3 fcoll/vulcan: Support of asynchronous write in collective writeAll
We introduced a new mca_vulcan parameter that specify the I/O synchronization
type (Async/sync I/O) applied within the collective write operation.
The user can explicitly choose to use async or sync write operation or make
the choice automatically made.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2018-06-07 16:13:12 -05:00
raafatfeki
4f7172ddf6 fcoll/vulcan: Support of larger offsets
For very large offsets, the data chunk size to be written by each aggregator
exceeds the capacity of an integer variable. Besides, some variables were
not large enough to hold intermediate values.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2018-06-07 16:13:12 -05:00
raafatfeki
4670fe50d7 fcoll/vulcan: Remove unnecessary calls to write
Identify the index of each aggregator process in order to restrict the call to write_init function by the specific aggregator.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2018-06-07 16:13:12 -05:00
raafatfeki
bc6431bee9 fcoll/vulcan: use hindexed constructor on the sender side
Instead of using a temporary buffer and copy data into the temp buffer before sending, use a derived datatype to describe the data that needs to be sent during a cycle in the collective I/O operation.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2018-06-07 16:13:12 -05:00