1
1

5543 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
d5360711fa btl/tcp: Skip printing error message in racy cleanup path
Avoid printing an error message about ENOTCONN return codes from
getpeername() when handling an incoming connection request.  At
this point in the receive state machine, the remote process has
been verified to be a valid OMPI instance.  In all-to-all startup
at 4k rank scale, we're seeing this error message when the remote
side drops the connection because it realizes it's the "loser"
in the connection race.  We were already doing all the right things,
other than printing a scary error message.  So skip the error
message and call it good.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2019-03-28 23:12:35 +00:00
Gilles Gouaillardet
77060cad07 btl/vader: fix finalize sequence
free the component mpool in mca_btl_vader_component_close()
and after freeing soem objects that depend on it such as
mca_btl_vader_component.vader_frags_user

Thanks Christoph Niethammer for reporting this.

Refs. open-mpi/ompi#6524

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-27 11:57:40 +09:00
Gilles Gouaillardet
e844f76725 pmix/pmix4x: refresh to the latest PMIx
refrest pmi4x to pmix/pmix@20cc9c041e

Fixes open-mpi/ompi#6513

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-25 13:33:18 +09:00
Joshua Ladd
9ab6ecba65
Merge pull request #6492 from janjust/oshmem-multiple-contexts-master
Oshmem multiple contexts
2019-03-22 17:34:46 -04:00
Xin Zhao
9c3d00b144 ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
ompi/oshmem/spml/ucx: optimize spml ucx progress

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:45 +02:00
Xin Zhao
e1c1ab0202 ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:37 +02:00
Josh Hursey
53cd31ed7e
Merge pull request #6504 from jjhursey/rm-hash-pmix4
Do not force 'hash' gds on direct modex in pmix4x
2019-03-19 20:35:12 -05:00
Ralph Castain
0f26d8c76b Silence warnings
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-19 10:27:39 -07:00
Ralph Castain
c4be211741 Sync to latest PMIx master
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-03-19 10:27:12 -07:00
Joshua Hursey
1314cf2640 Do not force 'hash' gds on direct modex in pmix4x
* Forcing the 'hash' gds component should not be necessary any more.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-03-19 11:53:26 -05:00
Josh Hursey
836c80c442
Merge pull request #6498 from jjhursey/rm-hash-pmix3
Do not force 'hash' gds on direct modex
2019-03-19 10:45:11 -05:00
Nysal Jan K.A
00f27a80fc opal/atomics: Add acquire semantics back for spinlocks
This was introduced in commit 9d0b3fe9

Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2019-03-19 16:27:03 +05:30
Joshua Hursey
c2581d0e33 Do not force 'hash' gds on direct modex
* Forcing the 'hash' gds component should not be necessary any more.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-03-18 21:52:32 -05:00
Josh Hursey
ad8c842e7d
Merge pull request #6477 from markalle/report_bindings_strlen
opal_hwloc_base_cset2str() off-by-1 in its strncat()
2019-03-14 12:42:50 -05:00
Sergey Oblomov
c319cf9ade COMMON/UCX: rewording of hooks suggestion
- also updated output macro

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-03-14 11:00:57 +02:00
Sergey Oblomov
d8e3562bae PML/SPML/UCX: added evaluation of mmap events
- there was a set of UCX related issues reported which caused
  by mmap API hooks conflicts. We added diagnostic of such
  problems to simplify bug-resolving pipeline

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-03-12 21:14:27 +02:00
Mark Allen
30d60994d2 opal_hwloc_base_cset2str() off-by-1 in its strncat()
I think the strncat() calls here need to be of the form
    strncat(str, new_str_to_add, len - strlen(new_str_to_addstr) - 1);
since in the OMPI calls len is being used as total number of bytes
in str.

strncat(dest,src,n) on the other hand is documented as writing up to
n chars from the incoming string plus 1 for the null, for n+1 total
bytes it can write.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2019-03-11 14:35:53 -04:00
Jeff Squyres
14563770a1 btl/usnic: amend Makefile.am fix from b4097626ab
Use $(AM_CPPFLAGS) in $(usnic_btl_run_tests_CPPFLAGS) so that we don't
have to replicate hard-coded values.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-03-05 09:30:21 -08:00
Nysal Jan K.A
da6d038d13 opal/asm: Remove an unused variable
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2019-03-05 19:12:19 +05:30
Nysal Jan K.A
5ffde16d9b opal/asm: Fix a compiler warning on POWER arch
OPAL_XLC_INLINE_ASSEMBLY was removed in commit ebce88b7ad.
Removing dead code, which also fixes a compiler warning.

Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2019-03-05 17:03:53 +05:30
Gilles Gouaillardet
b4097626ab btl/usnic: fix usnic_btl_run_tests CPPFLAGS
do define the OMPI_LIBMPI_NAME macro via the CPPFLAGS.
The issue occurs when Open MPI is configured with
--enable-opal-btl-usnic-unit-tests

Thanks George Marselis for reporting this issue

Refs. open-mpi/ompi#6441

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-05 09:57:55 +09:00
Ralph Castain
8fd6107987
Merge pull request #6418 from rhc54/topic/slurm
Update slurm pmi configury to account for pmix
2019-02-27 14:30:45 -08:00
Ben Menadue
17dcc7041a Hold off running hwloc:external feature tests until after we decide if we're using the internal or external component. This fixes #6430.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
2019-02-25 16:58:11 +11:00
Ralph Castain
cd1b5641be Update slurm pmi configury to account for pmix
When Slurm is built against PMIx, some installations place a copy of the
PMIx library that Slurm is linking against in the Slurm PMI location.
Current configury ignores that location. The desired behavior is to look
for a PMIx lib in that location when --with-pmi is given. If the user
also specifies --with-pmix and gives a different location, then override
anything previously found and look for it where the user directed.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-21 11:33:35 -08:00
Artem Polyakov
13a8e42108
Merge pull request #6163 from artpol84/osc/mt_submission
Refactoring of osc/ucx component for MT
2019-02-20 09:41:27 -08:00
Jeff Squyres
170d5d119e
Merge pull request #6409 from dmitrygladkov/topic/btl/tcp
btl/tcp: Fix copy-paste misprint
2019-02-20 12:12:18 -05:00
Dmitry Gladkov
9920da4992 btl/tcp: Fix copy-paste misprint
Signed-off-by: Dmitry Gladkov <dmitrygla@mellanox.com>
2019-02-20 11:18:02 +02:00
Artem Polyakov
91d6115d99 opal/common/ucx: Adjust the threasholds for periodical flushes
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
3aadc2b5e1 opal/common/ucx: Fix periodical flush in the worker pool
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
84dfe1277c opal/common/ucx: Rename wpool recv_worker to dflt_worker
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
8a990c2b64 opal/common/ucx: Add comments clarifying data structures
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
19e2ae2efb opal/common/ucx: Switch to opal/tsd
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
7984d7d997 opal/common/ucx: Remove unused debugging macro
Will be reintroduced later if needed and after adaptation to the OMPI
infrastructure.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
43f16d8796 opal/common/ucx: Remove common_ucx_int.h
Place the content of common_ucx_int.h back to the common_ucx.h and
include common_ucx_wpool.h explicitly.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
bb7d360621 opal/common/ucx: add refcnt in tlocal_ctx_tbl entry to keep track of usage
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
101036651b opal/common/ucx: Fix the bug in wpool's periodical flush
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
bcb52ecade opal/common/ucx: add winfo ptr into req
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
33517428a1 opal/common/ucx: add periodical flush and counter to opal directory.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
1fa7054041 opal/common/ucx: use trylock in opal_common_progress
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
2d3cffe1a3 opal/common/ucx: replace opal_mutex_t with opal_recursive_mutex_t
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
aa26a724ed opal/common/ucx: introduce internal UCX request in wpool.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
07cb4134be opal/common/ucx: Set of bug fixes in wpool
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
344bb641a1 opal/common/ucx: Minor changes in wpool
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
9fb9cfbe8e opal/common/ucx: Simplify Worker Pool TLS structure
Get rid of unneeded context and memory region identifiers

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
1e7bf7085d opal/common/ucx: Improve/fix debug output macro's
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Artem Polyakov
fd98ee14eb opal/common/ucx: Code cleanup
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Artem Polyakov
f38c9f3e5f opal/common/ucx: Simplify Worker Pool memory handler
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Artem Polyakov
6b7acdf21f opal/common/ucx: Somplify Worker Pool context management
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Xin Zhao
8b7fa927ba opal/common/ucx: Add fetch primitives to wpool
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Xin Zhao
bfbf818fe1 opal/common/ucx: Complete initialization of the Worker Pool
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00