Ralph Castain
cd1b5641be
Update slurm pmi configury to account for pmix
...
When Slurm is built against PMIx, some installations place a copy of the
PMIx library that Slurm is linking against in the Slurm PMI location.
Current configury ignores that location. The desired behavior is to look
for a PMIx lib in that location when --with-pmi is given. If the user
also specifies --with-pmix and gives a different location, then override
anything previously found and look for it where the user directed.
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-21 11:33:35 -08:00
Ralph Castain
d2f8737f5a
Merge pull request #6415 from rhc54/topic/sing
...
Remove stale singularity/schizo component
2019-02-20 18:47:57 -08:00
Ralph Castain
2f15379171
Remove stale singularity/schizo component
...
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-20 17:38:24 -08:00
Artem Polyakov
13a8e42108
Merge pull request #6163 from artpol84/osc/mt_submission
...
Refactoring of osc/ucx component for MT
2019-02-20 09:41:27 -08:00
Jeff Squyres
170d5d119e
Merge pull request #6409 from dmitrygladkov/topic/btl/tcp
...
btl/tcp: Fix copy-paste misprint
2019-02-20 12:12:18 -05:00
Dmitry Gladkov
9920da4992
btl/tcp: Fix copy-paste misprint
...
Signed-off-by: Dmitry Gladkov <dmitrygla@mellanox.com>
2019-02-20 11:18:02 +02:00
Gilles Gouaillardet
8d12bb25c2
Merge pull request #6408 from ggouaillardet/topic/orte_cleanup
...
Misc ORTE related cleanups
2019-02-20 17:00:13 +09:00
Gilles Gouaillardet
ad114be28c
configury: automatically select rte/pmix runtime if ORTE project is not built
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-20 13:55:55 +09:00
Gilles Gouaillardet
69d136ae5e
ompi/pmix: fix misc OPAL function calls
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-20 13:55:55 +09:00
Gilles Gouaillardet
e0e924c4ed
oshmem/wrappers: only install ORTE based wrappers if ORTE is built
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-20 13:55:55 +09:00
Gilles Gouaillardet
10cb9f6f9e
oshmem: remove unnecessary dependencies to ORTE
...
either use OPAL or OMPI layers, since ORTE layer
is not present when PMIx RTE is used
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-20 13:55:55 +09:00
Gilles Gouaillardet
18f679efac
Merge pull request #6401 from ggouaillardet/topic/osc_rdma_self
...
osc/rdma: correctly handle communications to self
2019-02-20 11:43:22 +09:00
KAWASHIMA Takahiro
19cbd00db0
Merge pull request #6403 from kawashima-fj/pr/man-typo-win-attach
...
man: fix more typos in MPI_Win_attach man page
2019-02-20 11:27:38 +09:00
KAWASHIMA Takahiro
7095ad10a5
man: fix more typos in MPI_Win_attach man page
...
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-20 11:22:38 +09:00
Gilles Gouaillardet
7694ecc13f
Merge pull request #6402 from ggouaillardet/topic/man_win_attach_detach
...
man: fix typos in MPI_Win_{attach,detach} man pages
2019-02-20 11:11:09 +09:00
Gilles Gouaillardet
7c0596819b
man: fix typos in MPI_Win_{attach,detach} man pages
...
no code change
[skip ci]
bot:notest
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-20 11:09:45 +09:00
Gilles Gouaillardet
fe05fcc11a
osc/rdma: correctly handle communications to self
...
mark the "self" peer OMPI_OSC_RDMA_PEER_LOCAL_BASE when
the window is dynamically created and use_cpu_atomics is set
in order to correctly handle communications to self.
Thanks Bart Janssens for reporting this issue.
Refs. open-mpi/ompi#6394
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-20 09:52:17 +09:00
Artem Polyakov
91d6115d99
opal/common/ucx: Adjust the threasholds for periodical flushes
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
3aadc2b5e1
opal/common/ucx: Fix periodical flush in the worker pool
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
84dfe1277c
opal/common/ucx: Rename wpool recv_worker to dflt_worker
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
8a990c2b64
opal/common/ucx: Add comments clarifying data structures
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
19e2ae2efb
opal/common/ucx: Switch to opal/tsd
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
7984d7d997
opal/common/ucx: Remove unused debugging macro
...
Will be reintroduced later if needed and after adaptation to the OMPI
infrastructure.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
43f16d8796
opal/common/ucx: Remove common_ucx_int.h
...
Place the content of common_ucx_int.h back to the common_ucx.h and
include common_ucx_wpool.h explicitly.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
c6de09940f
ompi/osc/ucx: Switch osc/ucx code to use Worker Pool.
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
bb7d360621
opal/common/ucx: add refcnt in tlocal_ctx_tbl entry to keep track of usage
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
101036651b
opal/common/ucx: Fix the bug in wpool's periodical flush
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
bcb52ecade
opal/common/ucx: add winfo ptr into req
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
33517428a1
opal/common/ucx: add periodical flush and counter to opal directory.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
1fa7054041
opal/common/ucx: use trylock in opal_common_progress
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
2d3cffe1a3
opal/common/ucx: replace opal_mutex_t with opal_recursive_mutex_t
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
aa26a724ed
opal/common/ucx: introduce internal UCX request in wpool.
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
07cb4134be
opal/common/ucx: Set of bug fixes in wpool
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
344bb641a1
opal/common/ucx: Minor changes in wpool
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
9fb9cfbe8e
opal/common/ucx: Simplify Worker Pool TLS structure
...
Get rid of unneeded context and memory region identifiers
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
1e7bf7085d
opal/common/ucx: Improve/fix debug output macro's
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Artem Polyakov
fd98ee14eb
opal/common/ucx: Code cleanup
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Artem Polyakov
f38c9f3e5f
opal/common/ucx: Simplify Worker Pool memory handler
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Artem Polyakov
6b7acdf21f
opal/common/ucx: Somplify Worker Pool context management
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Xin Zhao
8b7fa927ba
opal/common/ucx: Add fetch primitives to wpool
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Xin Zhao
bfbf818fe1
opal/common/ucx: Complete initialization of the Worker Pool
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Artem Polyakov
e28fadb048
opal/common/ucx: Introduce Worker Pool (wpool) functionality
...
Worker Pool is an object containing/managing a set of UCX workers
and providing access to those workers through a smal interface
to allow Multi-Threaded applicatoins to access multiple HW contexts.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Ralph Castain
c9d0393158
Merge pull request #6397 from rhc54/topic/oops
...
Restore orted hnp_uri cmd line option
2019-02-18 15:13:18 -08:00
Ralph Castain
2da5651869
Restore orted hnp_uri cmd line option
...
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-18 13:24:03 -08:00
KAWASHIMA Takahiro
60b6626955
Merge pull request #6392 from kawashima-fj/pr/remove-have-long-long
...
config: Remove remaining HAVE_LONG_LONG
2019-02-18 14:14:27 +09:00
KAWASHIMA Takahiro
1b0fa56261
config: Remove remaining HAVE_LONG_LONG
...
In the commit cacd6f389c
, I removed `#if HAVE_[TYPE]` lines for types
which are always available in C99 compilers. But I forgot to remove
this line. The `HAVE_LONG_LONG` macro is still defined in `confdefs.h`.
So this is not a bug but code cleanup.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-18 11:54:53 +09:00
Yossi Itigin
91d05f91e2
Merge pull request #6384 from brminich/topic/ucx_worker_net_address
...
PML/UCX: Use net worker address for remote peers
2019-02-17 12:21:00 +02:00
Matias Cabral
e3f213772d
Merge pull request #6385 from matcabral/ofi_mtl_non_blocking_cancel
...
MTL_OFI: Changed Recv cancel to be non-blocking
2019-02-15 08:44:00 -08:00
Matias A Cabral
25bdd118ac
MTL_OFI: Changed Recv cancel to be non-blocking
...
Updated the OFI MTL's Recv cancel to be a non-blocking call to match
the MPI spec. Given fi_cancel succeeded, then it is expected that the
user will wait on the request to read the result of if the cancel has
completed.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com
2019-02-14 17:07:20 -05:00
Mikhail Brinskii
751d88192d
PML/UCX: Use net worker address for remote peers
...
For remote node peers pack smaller worker address, which contains
network device addresses only. This would reduce amount of OOB traffic
during startup.
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-02-14 18:06:36 +02:00