1
1

5398 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
e2b154327e Small optimization on the datatype commit.
This patch fixes the merge of contiguous elements into larger but more
compact datatypes, and allows for contiguous elements to have thir
blocklen increasing instead of the count. The idea is to always maximize
the blocklen, aka. the contiguous part of the datatype.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 41e6f55807b01ad5c04e8387a3699cf743931f6a)
2019-09-03 15:09:33 -04:00
Geoff Paulsen
893ea3f91f
Merge pull request from rhc54/cmr40/pmix314
Remove unnecessary error log
2019-08-30 14:10:36 -05:00
Harumi Kuno
fbbacc1303 Fix mmap infinite recurse in memory patcher
This commit fixes issue  by removing
MacOS/Darwin-specific logic from intercept_mmap.

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
2019-08-29 18:03:16 -07:00
Ralph Castain
8efc6e1dc1
Remove unnecessary error log
Refs https://github.com/pmix/pmix/pull/1413

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-08-26 23:48:34 -07:00
Sergey Oblomov
66e18563bf SPML/UCX: fixed hang in SHMEM_FINALIZE
- used MPI _Barrier to synchronize processes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 182023febb6f8f31ce34dc54c8aa409ad7e44fa2)
2019-08-22 11:41:52 +03:00
Geoff Paulsen
390e0bc5b2
Merge pull request from bosilca/topic/backport_6695
Refresh of the datatype engine from Topic/backport 6695
2019-08-21 10:49:37 -05:00
George Bosilca
8e6e826b54
Fix the variable names used for the datatype dump.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-16 10:27:35 -04:00
George Bosilca
83d40c1e14
Fix the stack displacement.
Fixes the convertor iovec description on the MPI-IO reported by Edgar.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-16 10:27:23 -04:00
Ralph Castain
167ca31a31
Update PMIx to official v3.1.4 release
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-08-09 13:14:48 -07:00
George Bosilca
f78d3d52cd
Optimize the pack/unpack.
Start optimizing the code.

This commit divides the operations in 2 parts, the first, outside the
critical part, deals with partial blocks of predefined elements, and the
second, inside the critical path, only deals with full blocks of
elements. This reduces the number of expensive operations in the
critical path and results in a decent performance increase.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:39:53 -04:00
George Bosilca
87299e0b1c
Get rid of the division in the critical path.
Amazing how a bad instruction scheduling can have such a drastic impact
on the code performance. With this change, the get a boost of at least
50% on the performance of data with a small blocklen and/or count.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:39:44 -04:00
George Bosilca
fad707d3b0
Rework the datatype commit.
Optimize contiguous loops by collapsing them into a single element.
During datatype optimization collapse similar elements into larger
blocks.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:39:36 -04:00
George Bosilca
d5cdfe70ef
Optimize the position placement.
Upon detecting a datatype loop representation skip the entire loop
according the the remaining space.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:39:27 -04:00
George Bosilca
78cc0ff891
Disable checksum.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:39:19 -04:00
George Bosilca
012a004806
Clean and sync the pack and unpack functions.
- optimize handling of contiguous with gaps datatypes.
- fixes a performance issue for all datatypes with a count of 1.
- optimize the pack/unpack of contiguous with gaps datatype.
- optimize the case of blocklen == 1

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:39:11 -04:00
George Bosilca
0a00b02e48
Small improvements on the test.
Rework the to_self test to be able to be used as a benchmark.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:39:02 -04:00
George Bosilca
4cdc2155e5
Optimize the raw representation.
Merge contiguous iov in order to minimize the number of returned iovec.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:38:52 -04:00
George Bosilca
8b794235b8
Update the datatype dump to match the actual types.
Update the comments to better reflect what is going on.
Minor indentations.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:37:47 -04:00
George Bosilca
4f754d0156
Optimized datatype description.
Move toward a base type of vector (count, type, blocklen, extent, disp)
with disp and extent applying toward the count repertition and blocklen
being a contiguous memory of type type.
Implement 2 optimizations on this description used during type_commit:
- collapse: successive similar datatype descriptions are collapsed
together with an increased count.
- fusion: fuse successive datatype descriptions in order to minimize the
number of resulting memcpy during pack/unpack.

Fixes at the OMPI datatype level including:
 - Fix the create_hindexed and vector creation.
 - Fix the handling of [get|set]_elements and _count.
 - Correctly compute the dispacement for block indexed types.
 - Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:35:07 -04:00
Howard Pritchard
71f240f078 btl/openib: fix issue 6785
Commit d7053a3 broke things for the case when Open MPI 4.0.x is built
without UCX support.  Problem was it was trying to partially initialize
the btl to try and delay printing of a help message till wireup.  Well
this sort of doesn't work in all cases.  Rather than keep piling on
changes to support a help message for a BTL that we are deprecating, take
a keep it simple stupid approach.

So, revert most of d7053a3 and instead put the help message back in the
original location, during scan of ports of the available HCAs to check
for whether or not link layer for that port is configured for ethernet or infiniband.
If Open MPI was built with UCX support, don't emit the help message, if
UCX was not linked in, emit the help message.

Verified on a system with connectX5 HCAs configured with two ports configured
for ethernet and two for infiniband.

relates to 

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-07-12 08:21:21 -06:00
Ralph Castain
1d0e0557b9
v4.0.x: Update PMIx to official v3.1.3 release
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-07-02 08:56:49 -07:00
Geoff Paulsen
7f26c6dc41
Merge pull request from rhc54/cmr40/pmix
Update to PMIx v3.1.3rc4
2019-06-26 13:21:31 -05:00
Howard Pritchard
6424857029
Merge pull request from jsquyres/pr/v4.0.x/ob1-fixes
v4.0.x: Cherry pick ob1 fixes from master
2019-06-26 10:49:32 -06:00
Ralph Castain
9d0adbc6bc
Update to track 32-bit support commit
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-06-26 09:31:43 -07:00
Ralph Castain
b353639573
Update to PMIx v3.1.3rc4
Will provide PR to update VERSION to final release once passes MTT

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-06-25 13:45:19 -07:00
Ralph Castain
05fa5845bc
Fix finalize of flux component
Per patches from @SteVwonder and @garlick

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit d4070d5f58f0c65aef89eea5910b202b8402e48b)
2019-06-19 06:00:02 -07:00
Nathan Hjelm
b5428aaf71 btl/uct: add support for UCX 1.6.x
This commit updates the uct btl to support the v1.6.x release of
UCX. This release breaks API.

Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>
(cherry picked from commit b78066720c3e3299bd76f2e22d2c0e415db572fc)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-06-07 15:54:47 -05:00
Geoff Paulsen
18f10377eb
Merge pull request from ggouaillardet/topic/v4.0.x/ucx_warning
btl/openib: delay UCX warning to add_procs()
2019-06-03 15:09:43 -05:00
Howard Pritchard
6c74d4031b
Merge pull request from markalle/patcher_additions_v40x
shmat/shmdt additions for patcher
2019-06-03 12:51:05 -07:00
Mark Allen
5f79dfaa0a shmat/shmdt additions for patcher
This is mostly based off recent UCX additions to their patcher:
    https://github.com/openucx/ucx/pull/2703

They added triggers for
* mmap when (flags & MAP_FIXED) && (addr != NULL)
* shmat when (shmflg & SHM_REMAP) && (shmaddr != NULL)

Beyond that I noticed they already had a trigger for
* madvise when (advice == MADV_FREE)
that we didn't so I added that.

And the other main thing is we didn't really have shmat/shmdt
active for some systems because we only had a path for
syscall(SYS_shmdt, ) but we needed to also have a path for
syscall(SYS_ipc, IPCOP_shmdt, ) and same for shmat.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit eb888118e83f56c131aff900b03eab34c92b7805)
2019-05-30 13:31:02 -04:00
Nathan Hjelm
11cb0f24a5 btl/uct: check for support before disabling UCX memory hooks
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
(cherry picked from commit 3e1dd362411f1da5564d3402f65e9b3b74f50759)
2019-05-20 16:42:38 -05:00
Sergey Oblomov
1944295da3 COMMON/UCX: removed ucs stuff
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit ebc457baf5ded5dd46cd73918a2f69555f408c54)
2019-05-17 09:58:20 +03:00
Sergey Oblomov
fa0a0b1597 COMMON/UCX: init memhooks infra on external hooks only
- initialize memory hooks infrastructure only in case
  if external memory hooks are requested

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit a0a93060668cd11a783cc94c753efb3129df9dde)
2019-05-17 09:58:12 +03:00
George Bosilca
4946570b24 Remove few warnings identified by @rhc in .
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

(cherry picked from commit open-mpi/ompi@6d11a45f44)
2019-05-11 16:38:31 +09:00
Gilles Gouaillardet
70a864fce3 btl/vader: fix finalize sequence
free the component mpool in mca_btl_vader_component_close()
and after freeing soem objects that depend on it such as
mca_btl_vader_component.vader_frags_user

Thanks Christoph Niethammer for reporting this.

Refs. 

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@77060cad07)
2019-05-11 13:04:23 +09:00
George Bosilca
48f824327c Fix the leak of fragments for persistent sends.
The rdma_frag attached to the send request was not correctly released
upon request completion, leaking until MPI_Finalize. A quick solution
would have been to add RDMA_FRAG_RETURN at different locations on the
send request completion, but it would have unnecessarily made the
sendreq completion path more complex. Instead, I added the length to
the RDMA fragment so that it can be completed during the remote ack.

Be more explicit on the comment.

The rdma_frag can only be freed once when the peer forced a protocol
change (from RDMA GET to send/recv). Otherwise the fragment will be
returned once all data pertaining to it has been trasnferred.

NOTE: Had to add a typedef for "opal_atomic_size_t" from master into
opal/threads/thread_usage.h into this cherry pick (it is in
opal/include/opal_stdatomic.h on master, but that file does not exist
here on the v4.0.x branch).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit a16cf0e4dd6df4dea820fecedd5920df632935b8)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-03 06:20:02 -07:00
Mikhail Brinskii
e4ee56d1f3 SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 2ef5bd8b3671f1e10caf00d06d66d120eac9c5be)
2019-05-02 21:25:59 +03:00
Xin Zhao
69a80fce9f ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
ompi/oshmem/spml/ucx: optimize spml ucx progress

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9c3d00b144641d2929f830279dcc9d163c38e9e1)
2019-03-21 23:59:58 +02:00
Xin Zhao
596997c194 ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e1c1ab020227fc18d145379ab29ea86a3cdb66b1)
2019-03-21 23:58:23 +02:00
Howard Pritchard
15cfba5347
Merge pull request from jjhursey/v4x-rm-hash-pmix3
Do not force 'hash' gds on direct modex
2019-03-19 17:58:26 -05:00
Joshua Hursey
45526fadee Do not force 'hash' gds on direct modex
* Forcing the 'hash' gds component should not be necessary any more.

Port of PR  (component names changed so a cherry-pick would not work)

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-03-19 10:52:17 -05:00
Nysal Jan K.A
1329cef213 opal/atomics: Add acquire semantics back for spinlocks
This was introduced in commit 9d0b3fe9

Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
(cherry picked from commit 00f27a80fc63053db1aeb42140148d7a3d1379b3)
2019-03-19 19:45:20 +05:30
Gilles Gouaillardet
8da4605589 btl/openib: immediately release the device when no port is allowed
Many thanks to Sergey Oblomov for reporting this issue
and the countless traces provided when troubleshooting it.

This is a one-off commit for the v4.0.x branch since btl/openib has been removed
 from master.

Refs. 

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-19 09:26:11 +09:00
Gilles Gouaillardet
c58c774981 btl/openib: have add_proc() return immediately when the port is disabled.
Fixes an issue introduced in open-mpi/ompi@0a2ce58040

This is a one-off commit for the v4.0.x branch since btl/openib has been removed from master.

Refs. 

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-19 09:24:25 +09:00
Gilles Gouaillardet
d7053a306a btl/openib: delay UCX warning to add_procs()
If UCX is available, then pml/ucx will be used instead of
pml/ob1 + btl/openib, so there is no need to warn about
btl/openib not supporting Infiniband.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@0a2ce58040)
2019-03-19 09:24:00 +09:00
Howard Pritchard
ceb93d7c03
Merge pull request from bosilca/v4.0.x
v4.0.x: Cherry-pick fixes for issue  from master (vader fixes)
2019-03-15 17:08:52 -06:00
Howard Pritchard
27899b0e8f
Merge pull request from hoopoepg/topic/check-ucx-params-v4.0
PML/SPML/UCX: added evaluation of mmap events - v4.0
2019-03-14 17:02:46 -06:00
Nathan Hjelm
3df8ed9cc0
btl/vader: fix fragment sizes used by free lists
This commit fixes a bug introduced in
f62d26ddbc8cda4d985cceee531a2ec32406d1f6. That commit changed how
vader allocates fragment memory from the shared memory
segment. Unfortunately, the values used for the fragment sizes did not
include space for the fragment header. This can cause an overrun of
data from one fragment to the header of the next fragment.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-03-14 17:25:31 -04:00
Nathan Hjelm
20017d345e
btl/vader: use basic mpool type to handle frag/fbox allocation
This commit updates btl/vader to use an mpool for handling all shared
memory allocations (frags, fboxes).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-03-14 17:21:12 -04:00
Nathan Hjelm
bac6024b5a
mpool: add new base module type "basic"
This commit adds a new mpool base module type: basic. This module can
be used with an opal_free_list_t to allocate space from a
pre-allocated block (such as a shared memory region). The new module
only supports allocation and is not meant for more dynamic use cases
at this time.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-03-14 17:20:30 -04:00