1
1

216 Коммитов

Автор SHA1 Сообщение Дата
Mikhail Brinskii
8e9d401938 OSHMEM/SMPL/UCX: Add real fence support
+ Add quiet method to SPML, so it can have different implementation with
fence.
+ Use ucp_worker_fence for spml_fence method of UCX SPML

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2018-05-25 22:43:06 +03:00
Brian Barrett
9fff40647d oshmem: disable if no spmls build
This patch disables the oshmem layer if there are no SPMLs that
will build.  With the limited set of SPMLs available to support
oshmem, many builds end up installing an oshmem library that we
know will not work.  There has been a bit of customer confusion
over oshmem, hopefully this will lead customers in the right
direction.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-05-25 08:48:50 -07:00
Yossi Itigin
385f38ab4e ucx: improve error messages during connection establishment
Also, unite common code calling ucp_ep_create()

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-04-30 15:45:05 +03:00
Gilles Gouaillardet
88e26c63e0 spml/ucx: fix a double free() issue
in mca_spml_ucx_add_procs() error path

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-22 13:42:16 +09:00
Alex Mikheev
ae326546f4
ompi/oshmem: ucx is selected over yalla/ikrit by default
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-01-17 15:08:04 +02:00
Yossi Itigin
1193e1eb83 spml_ucx: fix rkey leak
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-12-26 20:47:26 +02:00
Nathan Hjelm
1282e98a01 opal/asm: rename existing arithmetic atomic functions
This commit renames the arithmetic atomic operations in opal to
indicate that they return the new value not the old value. This naming
differentiates these routines from new functions that return the old
value.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:22 -07:00
Alex Mikheev
7cb7af1685
OSHMEM: add ucx to the list of default spmls
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-10-18 10:41:00 +03:00
Yossi Itigin
3081576124 spml_ucx: fix typo in shmem_quiet() error message.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-09-24 19:20:55 +03:00
Alina Sklarevich
007b1803ec SPML_UCX: use ompi_proc_world_size() to set the estimated_num_eps value
before this fix, mca_spml_ucx_component_open was using
oshmem_num_procs() to set the value of params.estimated_num_eps for UCX.
The oshmem_num_procs() function uses oshmem_group_all which will be
initialized after the call to mca_spml_ucx_component_open and therefore,
cannot be used there.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-09-04 14:46:00 +03:00
Joshua Hursey
e1d079544b mca: Dynamic components link against project lib
* Resolves #3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-24 11:56:16 -04:00
Alex Mikheev
1b5df76f8b
oshmem: shmem_ptr() implementation
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-08-03 13:56:34 +03:00
Boris Karasev
77c50efb95 Yoda SPML is removed
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-07-14 08:47:16 +03:00
Xin Zhao
ee952fcccd Passing estimated_num_procs to UCX init in PML and SPML.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-27 20:36:52 +03:00
Pavel Shamis (Pasha)
95c440683b OSHMEM: shmem_wait code cleanup
* updating naming convention for the arguments in order to ensure
that the name aligns with an actual meaning of the argument
* remove local variable references in the macro
* adding volatile for the poll variables

Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
2017-03-15 21:53:44 +00:00
Alex Mikheev
c63137e1c0 oshmem: sshmem ucx: minor code cleanup
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
132fbd9ae9 oshmem: sshmem: add UCX allocator
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
ea3ea4835b oshmem: mem use hook: apply code review fixes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
(cherry picked from commit a422154a141f0be5b92d2b6c26d7b2b4176dfe18)
2017-01-30 11:30:20 +02:00
Alex Mikheev
9da9e6260d
oshmem: spml ucx: on error print ucx error string
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-29 10:28:24 +02:00
Alex Mikheev
986ca000f8
oshmem: spml: add memory allocation hook
The hook is called from memheap when memory range
is going to be allocated by smalloc(), realloc() and others.

ucx spml uses this hook to call ucp_mem_advise in order to speedup
non blocking memory mapping.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-26 16:41:39 +02:00
Xin Zhao
2d77912c19 Revert "PML/SPML/UCX: add UCX MT support to PML and SPML."
This reverts commit 0ecf3c951c0a87ab5bdd76a541a69852af671ba9.

Signed-off-by: Xin Zhao <xinz@mellanox.com>
2016-12-19 18:57:48 +02:00
Xin Zhao
0ecf3c951c PML/SPML/UCX: add UCX MT support to PML and SPML.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2016-12-15 23:59:15 +02:00
Alina Sklarevich
e9d2d029c6 PML/SPML/UCX: Adapt to the API changes in the UCX lib.
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2016-12-08 11:33:29 +02:00
Gilles Gouaillardet
062ed9c919 spml/yoda: fix support for BTLs that do not register memory in mca_spml_yoda_get()
Refs open-mpi/ompi#2499

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-08 15:56:25 +09:00
Yossi Itigin
0241a2697d spml_ucx: allow registering the heap in non-blocking mode.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2016-11-25 15:09:22 +02:00
Ralph Castain
1e2019ce2a Revert "Update to sync with OMPI master and cleanup to build"
This reverts commit cb55c88a8b7817d5891ff06a447ea190b0e77479.
2016-11-22 15:03:20 -08:00
Ralph Castain
cb55c88a8b Update to sync with OMPI master and cleanup to build
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-22 14:24:54 -08:00
Alex Mikheev
864904e8ab
oshmem: ucx: check status only if configured --with-oshmem-param-check
Current standard says that behaviour in the case of error is undefined

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-10 11:29:03 +02:00
Alex Mikheev
bf61961f8b
oshmem: code review fixes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-08 15:11:59 +02:00
Alex Mikheev
ff5095e533 OSHMEM: adds support for mkey caching by spml
It improves cpu cache hit ratio.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:43 +02:00
Alex Mikheev
defcc3ddc1 OSHMEM: spml ikrit: get/put request cleanup
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:41 +02:00
Alex Mikheev
23c3dc8345 OSHMEM: mxm: optimize mxm_peer layout.
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:40 +02:00
Alex Mikheev
df74d549dc OSHMEM: spml ikrit: changes mxm_peers layout
use single array instead of array of pointers

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:39 +02:00
Alex Mikheev
0826e63363 OSHMEM: spml_ikrit: makes quiet wait for get_nbi requests
shmem_quit() shall complete all outstanding get_nbi() requests

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:38 +02:00
Alex Mikheev
2f91ce7281 OSHMEM: mxm versions less than 2.0 are no longer supported
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:38 +02:00
Pavel Shamis (Pasha)
92b0ebd7c3 For UCX it is legal to return UCS_INPROGRESS (1) code for non-blocking function
calls, which means that the operation was successfully started but not
immediately completed. This is a "good" return code that should not be handled
as an error.

Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
2016-11-03 15:36:13 -05:00
Boris Karasev
68b5acd9f4 oshmem/spml/yoda: fixed the btl operations
Fixed the shmem OOM error which is referenced on #2028

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2016-11-02 13:38:35 +02:00
Yossi Itigin
05ca466c6b ucx: adapt pml_ucx and spml_ucx to new UCX APIs
- pass field_mask to ucp_init().
- use non-blocking disconnect.
- recv() with pre-allocated request.
- call opal_progress() from iprobe() and improbe().
- use shift pattern in connect/disconnect.
2016-10-12 23:45:45 +03:00
Joshua Ladd
d5e65c4860 Merge pull request #2052 from alex-mikheev/topic/spml_ikrit_zcopy_fix
OSHMEM: spml ikrit: fixes zero copy
2016-09-12 12:35:32 -04:00
Alex Mikheev
439456ae96 OSHMEM: spml ikrit: fixes zero copy
Allow mxm to use zero copy in put() and get() for the large messages.
2016-09-04 12:16:09 +03:00
Gilles Gouaillardet
0a25420dac oshmem: get rid of oshmem_proc_t and use ompi_proc_t instead
store oshmem related per proc data in an oshmem_proc_data_t struct,
that is stored in the padding section of an ompi_proc_t

this data can be accessed via the OSHMEM_PROC_DATA(proc) macro

Fixes open-mpi/ompi#2023
2016-09-01 14:20:14 +09:00
Gilles Gouaillardet
6b7bc64101 spml/yoda: MCA_PML(add_procs) all procs from oshmem_comm_world
and fix oshmem_group_proc_{init,create} so they use the number of procs in oshmem_comm_world

Thanks Debendra Das for the report and Josh Ladd for the guidance

Fixes open-mpi/ompi#1966
2016-08-17 14:24:02 +09:00
Igor Ivanov
36c29b393b oshmem: Align OSHMEM API with spec v1.3 (update spml/yoda) 2016-03-17 19:06:39 +02:00
Igor Ivanov
b2700320a3 oshmem: Align OSHMEM API with spec v1.3 (update spml/ikrit) 2016-03-17 19:06:39 +02:00
Igor Ivanov
450ea6684c oshmem: Align OSHMEM API with spec v1.3 (update spml/ucx) 2016-03-17 19:06:38 +02:00
Igor Ivanov
8464b6147a oshmem: Align OSHMEM API with spec v1.3 (Add spml/get_nb interface) 2016-03-15 14:04:59 +02:00
George Bosilca
68c36ea9dc Fix two annoying warnings in our UCX support. 2016-02-14 00:02:16 -05:00
Alex Mikheev
f627608e42 OSHMEM/UCX: implements atomic support
ucx atomic component has a real code now.
fixes bug in spml ucx arr_procs
removes redundant parameter checks from atomic components.
2016-01-21 16:02:28 +02:00
igor.ivanov@itseez.com
6448bd07a4 oshmem/spml: Fix warnings in ikrit component 2015-12-16 17:36:54 +02:00
Mike Dubman
3e93ef49da Merge pull request #1134 from alex-mikheev/topic/ikrit_err_fix_fix
SPML/IKRIT: opal_progress and ud_only fixes
2015-11-15 19:20:55 -06:00