1
1
Граф коммитов

374 Коммитов

Автор SHA1 Сообщение Дата
Boris Karasev
77c50efb95 Yoda SPML is removed
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-07-14 08:47:16 +03:00
Xin Zhao
ee952fcccd Passing estimated_num_procs to UCX init in PML and SPML.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-27 20:36:52 +03:00
Pavel Shamis (Pasha)
95c440683b OSHMEM: shmem_wait code cleanup
* updating naming convention for the arguments in order to ensure
that the name aligns with an actual meaning of the argument
* remove local variable references in the macro
* adding volatile for the poll variables

Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
2017-03-15 21:53:44 +00:00
Yossi
1a95633e40 Merge pull request #2717 from alex-mikheev/topic/sshmem_ucx
oshmem: sshmem: adds UCX allocator
2017-03-09 12:58:06 +02:00
George Bosilca
366d64b7e5 Move the collective structure outside the communicator.
As we changed the ABI (forcing a major release), we can limit
the size of the predefined communicators by moving the collective
structure outside the communicator. This might have a minimal,
but unnoticeable, impact on performance. This approach has been
discussed during the January 2017 devel meeting.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-27 11:54:17 -06:00
Alex Mikheev
c9b5b12af4
oshmem: sshmem ucx: use fixed base address
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-26 15:16:28 +02:00
Alex Mikheev
c63137e1c0 oshmem: sshmem ucx: minor code cleanup
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
132fbd9ae9 oshmem: sshmem: add UCX allocator
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
e038e3f9e0 oshmem: sshmem: code cleaunp
The commit removes unused code and interface function, moves
common code to the base.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:47:59 +02:00
Alex Mikheev
ea3ea4835b oshmem: mem use hook: apply code review fixes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
(cherry picked from commit a422154a141f0be5b92d2b6c26d7b2b4176dfe18)
2017-01-30 11:30:20 +02:00
Alex Mikheev
9da9e6260d
oshmem: spml ucx: on error print ucx error string
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-29 10:28:24 +02:00
Alex Mikheev
986ca000f8
oshmem: spml: add memory allocation hook
The hook is called from memheap when memory range
is going to be allocated by smalloc(), realloc() and others.

ucx spml uses this hook to call ucp_mem_advise in order to speedup
non blocking memory mapping.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-26 16:41:39 +02:00
Alex Mikheev
83c2ab76a5
oshmem: memheap: refactor component selection code
Do not call component's init function until the component has been
selected.

Use mca_base_select() instead of the custom component selection code.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-16 13:48:58 +02:00
Alex Mikheev
67d66c2326
oshmem: sshmem: make mmap allocator a default instead of verbs
By default use mmap() to allocate memory for the symmetric heap.
It is safer and more portable choice than sysv and verbs.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-12-14 13:31:16 +02:00
Alina Sklarevich
e9d2d029c6 PML/SPML/UCX: Adapt to the API changes in the UCX lib.
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2016-12-08 11:33:29 +02:00
Gilles Gouaillardet
062ed9c919 spml/yoda: fix support for BTLs that do not register memory in mca_spml_yoda_get()
Refs open-mpi/ompi#2499

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-08 15:56:25 +09:00
Mike Dubman
53a0c86c16 Merge pull request #2455 from yosefe/topic/ucp-uct-nonblock-mem-reg-api
spml_ucx: allow registering the heap in non-blocking mode.
2016-11-27 11:42:09 +02:00
Mike Dubman
f339632216 Merge pull request #2452 from alex-mikheev/topic/scoll_basic_fixes
oshmem: fixes scoll basic barrier and broadcast
2016-11-25 18:03:56 +02:00
Yossi Itigin
0241a2697d spml_ucx: allow registering the heap in non-blocking mode.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2016-11-25 15:09:22 +02:00
Alex Mikheev
0f83a1fd57
oshmem: scoll: fixes basic barrier broadcast and alltoall
Add missing fence() call to alltoall and central counter broadcast.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-24 16:56:55 +02:00
Alex Mikheev
d1723355d3
oshmem: memheap: removes find_offset
Reasons for removal are:
- the function is only used by the shmem_lock code
- only a subset of the function is used by the shmem_lock
- for the general case the function is not correct

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-21 16:09:17 +02:00
Gilles Gouaillardet
19bdd1d626 oshmem/memheap: initialize common symbol mca_memheap_base_map
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-11-21 09:32:27 +09:00
Alex Mikheev
864904e8ab
oshmem: ucx: check status only if configured --with-oshmem-param-check
Current standard says that behaviour in the case of error is undefined

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-10 11:29:03 +02:00
Alex Mikheev
bf61961f8b
oshmem: code review fixes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-08 15:11:59 +02:00
Alex Mikheev
f133d9b6c8
oshmem: fixes comiplation errors in sshmem
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-08 15:11:07 +02:00
Alex Mikheev
ff5095e533 OSHMEM: adds support for mkey caching by spml
It improves cpu cache hit ratio.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:43 +02:00
Alex Mikheev
defcc3ddc1 OSHMEM: spml ikrit: get/put request cleanup
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:41 +02:00
Alex Mikheev
61bd59a369 OSHMEM: fixes addr_acessible()
check every possible transport

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:41 +02:00
Alex Mikheev
23c3dc8345 OSHMEM: mxm: optimize mxm_peer layout.
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:40 +02:00
Alex Mikheev
df74d549dc OSHMEM: spml ikrit: changes mxm_peers layout
use single array instead of array of pointers

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:39 +02:00
Alex Mikheev
b5c7c7de78 OSHMEM: memheap: disable oob if allgather mkey exchange is used
In this case there is no point to add another progress callback

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:39 +02:00
Alex Mikheev
0826e63363 OSHMEM: spml_ikrit: makes quiet wait for get_nbi requests
shmem_quit() shall complete all outstanding get_nbi() requests

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:38 +02:00
Alex Mikheev
2f91ce7281 OSHMEM: mxm versions less than 2.0 are no longer supported
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2016-11-06 11:56:38 +02:00
Pavel Shamis (Pasha)
92b0ebd7c3 For UCX it is legal to return UCS_INPROGRESS (1) code for non-blocking function
calls, which means that the operation was successfully started but not
immediately completed. This is a "good" return code that should not be handled
as an error.

Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
2016-11-03 15:36:13 -05:00
Boris Karasev
68b5acd9f4 oshmem/spml/yoda: fixed the btl operations
Fixed the shmem OOM error which is referenced on #2028

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2016-11-02 13:38:35 +02:00
Yossi Itigin
05ca466c6b ucx: adapt pml_ucx and spml_ucx to new UCX APIs
- pass field_mask to ucp_init().
- use non-blocking disconnect.
- recv() with pre-allocated request.
- call opal_progress() from iprobe() and improbe().
- use shift pattern in connect/disconnect.
2016-10-12 23:45:45 +03:00
Joshua Hursey
f6f24a4f67 build: Custom libmpi(_FOO) name option in configure
* Add a configure time option to rename libmpi(_FOO).*
   - `--with-libmpi-name=STRING`
 * This commit only impacts the installed libraries.
   Internal, temporary libraries have not been renamed to limit the
   scope of the patch to only what is needed.

For example:
```shell
shell$ ./configure --with-libmpi-name=wookie
...
shell$ find . -name "libmpi*"
shell$ find . -name "libwookie*"
./lib/libwookie.so.0.0.0
./lib/libwookie.so.0
./lib/libwookie.so
./lib/libwookie.la
./lib/libwookie_mpifh.so.0.0.0
./lib/libwookie_mpifh.so.0
./lib/libwookie_mpifh.so
./lib/libwookie_mpifh.la
./lib/libwookie_usempi.so.0.0.0
./lib/libwookie_usempi.so.0
./lib/libwookie_usempi.so
./lib/libwookie_usempi.la
shell$
```
2016-09-29 21:47:24 -05:00
Joshua Ladd
d5e65c4860 Merge pull request #2052 from alex-mikheev/topic/spml_ikrit_zcopy_fix
OSHMEM: spml ikrit: fixes zero copy
2016-09-12 12:35:32 -04:00
Alex Mikheev
439456ae96 OSHMEM: spml ikrit: fixes zero copy
Allow mxm to use zero copy in put() and get() for the large messages.
2016-09-04 12:16:09 +03:00
Gilles Gouaillardet
0a25420dac oshmem: get rid of oshmem_proc_t and use ompi_proc_t instead
store oshmem related per proc data in an oshmem_proc_data_t struct,
that is stored in the padding section of an ompi_proc_t

this data can be accessed via the OSHMEM_PROC_DATA(proc) macro

Fixes open-mpi/ompi#2023
2016-09-01 14:20:14 +09:00
Gilles Gouaillardet
6b7bc64101 spml/yoda: MCA_PML(add_procs) all procs from oshmem_comm_world
and fix oshmem_group_proc_{init,create} so they use the number of procs in oshmem_comm_world

Thanks Debendra Das for the report and Josh Ladd for the guidance

Fixes open-mpi/ompi#1966
2016-08-17 14:24:02 +09:00
Igor Ivanov
a8ab5b55b9 oshmem: Fix double lock issue
Signed-off-by: Igor Ivanov <igor.ivanov.va@gmail.com>
2016-06-10 15:52:31 +03:00
Nathan Hjelm
dbfab94ede atomic/mxm: rename symbol that is a duplicate of one in atomic/ucx
This fixes an error when building with --enable-static.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 15:34:40 -06:00
Jeff Squyres
2c5b39718d oshmem: fix scoll_null_alltoall() prototype
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-03-26 03:50:57 -07:00
Mike Dubman
1d8fbfefb0 Merge pull request #1478 from igor-ivanov/pr/oshmem-v1.3-alltoall
oshmem: Add alltoall
2016-03-22 07:51:36 +02:00
Mike Dubman
7483a66ef6 Merge pull request #1455 from igor-ivanov/pr/oshmem-v1.3
oshmem: Add Non-blocking Remote Memory Access Routines
2016-03-22 07:50:11 +02:00
Igor Ivanov
1bed5d8aee oshmem: Align OSHMEM API with spec v1.3 (update scoll/basic) 2016-03-21 11:46:01 +02:00
Igor Ivanov
50906b34b3 oshmem: Align OSHMEM API with spec v1.3 (Add scoll/alltoall interface) 2016-03-21 10:43:31 +02:00
Igor Ivanov
e690521cdd oshmem/scoll: Fix bug in basic/barrier algorithm 2016-03-21 10:34:55 +02:00
Igor Ivanov
36c29b393b oshmem: Align OSHMEM API with spec v1.3 (update spml/yoda) 2016-03-17 19:06:39 +02:00
Igor Ivanov
b2700320a3 oshmem: Align OSHMEM API with spec v1.3 (update spml/ikrit) 2016-03-17 19:06:39 +02:00
Igor Ivanov
450ea6684c oshmem: Align OSHMEM API with spec v1.3 (update spml/ucx) 2016-03-17 19:06:38 +02:00
Igor Ivanov
8464b6147a oshmem: Align OSHMEM API with spec v1.3 (Add spml/get_nb interface) 2016-03-15 14:04:59 +02:00
George Bosilca
68c36ea9dc Fix two annoying warnings in our UCX support. 2016-02-14 00:02:16 -05:00
Alex Mikheev
f627608e42 OSHMEM/UCX: implements atomic support
ucx atomic component has a real code now.
fixes bug in spml ucx arr_procs
removes redundant parameter checks from atomic components.
2016-01-21 16:02:28 +02:00
Gilles Gouaillardet
fec973efda configury: test portability
replace test ... -o ... with test ... || test ...
and test ... -a ... with test ... && test ...
2015-12-28 13:58:45 +09:00
Gilles Gouaillardet
99d046d060 scoll/fca: add missing #include <alloca.h> 2015-12-24 14:33:58 +09:00
igor.ivanov@itseez.com
08c18195e7 oshmem/sshmem: Fix warnings in verbs component 2015-12-16 17:37:00 +02:00
igor.ivanov@itseez.com
6448bd07a4 oshmem/spml: Fix warnings in ikrit component 2015-12-16 17:36:54 +02:00
Igor Ivanov
5c061abf4e oshmem: Fix scan coverity issues
1324740 - Resource leak
1304562 - Unchecked return value
1340514 - Dereference before null check
1340515 - Use of untrusted scalar value
1340516 - Use of untrusted string value
2015-12-02 12:49:19 +02:00
Igor Ivanov
05d947d55a oshmem: Align OSHMEM API with spec v1.2 (support environment variables) 2015-11-24 18:57:56 +02:00
Mike Dubman
3e93ef49da Merge pull request #1134 from alex-mikheev/topic/ikrit_err_fix_fix
SPML/IKRIT: opal_progress and ud_only fixes
2015-11-15 19:20:55 -06:00
Mike Dubman
a7128af8c4 OSHMEM/ikrit: fix valgrind error 2015-11-15 14:51:41 +02:00
Alex Mikheev
0755a59091 SPML/IKRIT: opal_progress and ud_only fixes
Some MXM tls such as self, shm can comlete requests immediately.
Make sure that opal_progress() is called before before request
is completed.

fix ud_only logic when hw rdma channel is using ud and main
transport is rc or dc.
2015-11-15 12:13:24 +02:00
Mike Dubman
8ec5c99412 Merge pull request #1126 from alex-mikheev/topic/ikrit_err_fix
Topic/ikrit err fix
2015-11-11 15:31:06 +02:00
Alex Mikheev
cd8ea438d3 OSHMEM/SPML/ikrit: memcheck support 2015-11-11 13:46:20 +02:00
Alex Mikheev
2a8de45b43 OSHMEM/SPML/IKRIT: check return of mxm_req_send correctly
do not force memory registration if main and additional comm
channels are both ud
2015-11-11 13:34:26 +02:00
Igor Ivanov
c0518c0417 oshmem: Enable force output for error messages
This change fixes issue when oshmem related error messages are not
visible for an user.
2015-11-11 13:26:10 +02:00
Alex Mikheev
b269dd59e3 OSHMEM/SPML/UCX: fixes typo in add_procs 2015-11-02 16:48:26 +02:00
Ralph Castain
e1778f5f9b Revert " changing the destruct function of list release API to release list items"
This reverts commit 720fa860ee.
2015-10-27 15:24:45 -07:00
rhc54
0bc51375f3 Merge pull request #1004 from rppendya/rppendya_list_release
Releasing the list items when list destructor is called
2015-10-21 14:34:19 -07:00
yosefe
cc76db8d39 ucx: reduce components priority to 5. 2015-10-21 17:38:25 +03:00
Raghavendra Pendyala
720fa860ee changing the destruct function of list release API to release list items
caused a bug in oshmem application. Fixing the bug with this patch
2015-10-20 12:58:23 -07:00
Alex Mikheev
f2b501a862 oshmem: Add UCX spml. 2015-10-20 19:46:02 +03:00
Alex Mikheev
b020b628fc oshmem/memheap: optimized mkey lookup.
Fast path lookup is done in inline funcion.
2015-10-20 19:45:51 +03:00
yosefe
bd3f4c8cc7 spml/memheap: add support for mkey unpack. 2015-10-20 19:45:50 +03:00
Alex Mikheev
8fa14386ea spml_ikrit: fixes typo in .h file. 2015-10-20 19:36:41 +03:00
Gilles Gouaillardet
291a464efb configury: remove the --enable-mpi-profiling option
and directly call the PMPI_* symbols from C and Fortran bindings
2015-10-13 08:52:35 +09:00
Gilles Gouaillardet
53b952dc2b oshmem: invoke the C PMPI_* subroutines instead of the MPI_* ones
when profiling is built.
This prevents oshmem subroutines from being wrapped twice by third
party tools (e.g. once in oshmem and once in MPI)
see discussion starting at http://www.open-mpi.org/community/lists/devel/2015/08/17842.php

Thanks to Bert Wesarg for bringing this to our attention
2015-10-13 08:52:03 +09:00
Igor Ivanov
7de0537a1d oshmem: Add help message for fatal issues in scoll:mpi and scoll:fca
Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-21 18:50:20 +03:00
Igor Ivanov
ec7cd13a81 oshmem: Fix compilation warnings 2015-09-21 18:50:20 +03:00
Igor Ivanov
ca8c3eebea oshmem: Abort application in casesingle scoll:mpi is selected
scoll:mpi does not have barrier and should be selected with
any other scoll component.

Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-21 10:42:54 +03:00
Igor Ivanov
f437f4012e Revert "scoll/mpi: work around bug in oshmem/proc design"
This workaround is needless after oshmem/proc refactoring

This reverts commit 202c6a38e4.
2015-09-17 19:01:24 +03:00
Igor Ivanov
4b8d9b8eff oshmem/proc: Refactor proc component
Most functionality of oshmem_proc duplicates ompi_proc. In addition
to that, Current logic does not allow to do oshmem initialization
w/o ompi startup.
So this refactoring allows to  avoid code duplication, decrease used
memory and make oshmem support easier.
Now oshmem_proc is transparent ompi_proc structure, that can be
extended by oshmem specific data.

Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-17 18:49:00 +03:00
Nathan Hjelm
69b9bc2269 oshmem/memheap: correct usage of opal_dss functions
Any buffer given to opal_dss.load becomes the responsibility of the
opal_buffer_t object. It will be freed automatically if either the
opal_buffer_t is released or opal_dss.load is called again on the
buffer. opal_dss.unload will not prevent this unless no unpacking
takes place between the .load and .unload calls.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-14 13:54:56 -06:00
Gilles Gouaillardet
8f2d3aeb65 oshmem: do not include pml/ob1 headers
this is an abstraction violation and that can cause linker failure
2015-09-11 09:34:10 +09:00
Nathan Hjelm
202c6a38e4 scoll/mpi: work around bug in oshmem/proc design
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:56 -06:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
Gilles Gouaillardet
1a238d3a4f configury: fix fca detection
* do not add -I/.../include/fca -I /.../include/fca_core to CPPFLAGS
 * allow configure --with-fca
 * search fca libs in both DIR/lib and DIR/lib64
 * fix the description of the --with-fca option
2015-08-13 11:09:15 +09:00
Jeff Squyres
5065978a1e oshmem: __FUNCTION__ -> __func__ fixes 2015-08-05 05:39:38 -07:00
yosefe
41f3b77e31 ikrit: set DC defaults. 2015-07-24 21:01:13 +03:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Gilles Gouaillardet
11e11e1be9 initialize common symbols from oshmem 2015-05-08 10:11:58 +09:00
Nathan Hjelm
033894b493 Merge pull request #541 from hjelmn/c99_components
C99 component initialization
2015-04-20 10:45:39 -06:00
Devendar Bureddy
3dbd95fa73 OSHMEM: enable mpi collective by default 2015-04-20 19:39:36 +03:00
Nathan Hjelm
c4a61969c0 oshmem: use C99 subobject naming for component initialization
This commit helps future-proof oshmem components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Nathan Hjelm
b68d66bb9b MCA: Add the project/project version to the MCA base component
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.

All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Jeff Squyres
8d04215741 coll: trivial spelling fix
s/Algoritm/Algorithm/g
2015-02-27 18:20:17 -08:00
Alina Sklarevich
e4c4e7df5e Fix the calls to ibv_fork_init and remove btl_openib_want_fork_support.
In order to have an effect, ibv_fork_init should be called in the
beginning of the verbs initialization flow - before the calls to the
ibv_create_qp and ibv_create_cq verbs.
These functions are called from the oob/ud code and by the time the
other verbs components (btl openib, pml yalla, ...) call ibv_fork_init,
it's too late. This commit forces the call to ibv_fork_init (if it's
requested) right at the beginning of all the components that are using
verbs.
(ibv_fork_init() can be safely called multiple times)

This commit also removes the btl_openib_want_fork_support mca parameter
and adds a new mca parameter instead - opal_verbs_want_fork_support.
Through this new parameter, fork support may be requested for ALL
components.
The default value for this parameter is set to 1.

Before this commit the btl_openib_want_fork_support parameter didn't
provide fork support for the openib btl if its value was set to 1.
(because when openib called ibv_fork_init, it was already after the
calls to ibv_create_* in oob/ud and thereofre it failed).
2015-02-25 10:58:50 +02:00
igor-ivanov
0f44cdd779 Merge pull request #421 from igor-ivanov/pr/fix-oshmem-coverity
oshmem: Fix set of coverity issues
2015-02-24 21:40:06 +04:00
Nathan Hjelm
5f1254d710 Update code base to use the new opal_free_list_t
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.

This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.

Notes:

OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-24 10:05:45 -07:00
Igor Ivanov
3e2dd782ea oshmem: Fix set of coverity issues
Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-02-24 19:03:10 +02:00
Igor Ivanov
010dce307a Fix set of coverity issues
List of CIDs (scan.coverity.com):
oshmem:
1269787, 1269907, 1270161, 1270162, 1270977, 1270978
ompi:
1270170, 1270172, 1270173

Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-02-20 17:45:46 +04:00
Igor Ivanov
426d1ce146 oshmem: Fix set of coverity issues
List of CIDs (scan.coverity.com):
1269721, 1269725, 1269787, 1269907, 1269909, 1269910, 1269911, 1269912,
1269959, 1269960, 1269984, 1269985, 1270136, 1270157, 1269845, 1269875,
1269876, 1269877, 1269878, 1269884, 1269885, 1270161, 1270162, 1270175,
1269734, 1269739, 1269742, 1269743

Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-02-19 23:00:17 +04:00
Nathan Hjelm
16ae7d97d1 spml/yoda: update for BTL 3.0 interface
This commit make spml/yoda compatible with BTL 3.0. This is meant as a
starting point only. More work will be needed to make optimial use of
the new interface.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-13 11:46:38 -07:00
Mike Dubman
6611f4ce38 OSHMEM: fix warnings 2015-02-09 20:49:03 -08:00
Mike Dubman
54a072caaa OSHMEM: fix infinite recursion and stack size violation
send reply before posting the receive request again to limit the recursion size to
number of receive requests.
send can call opal_progress which calls this function again. If recv req is started
stack size will be proportional to number of job ranks.
2015-01-04 16:31:19 +02:00
Alex Mikheev
c76261da07 OSHMEM: atomic mxm: fix mkey conversion
Correctly return mxm_empty_mem_key when shmem mkey is empty
2014-12-16 16:34:42 +02:00
Alex Mikheev
71ebbca26d OSHMEM: spml ikrit: fix spelling in help file 2014-12-16 16:18:38 +02:00
Alex Mikheev
3f7ed56548 OSHMEM: spml ikrit: fix mxm disconnect flow
Add out of band barrier before performing mxm disconnect.
It will make sure that every pe is ready to disconnect. Otherwise
bad things may happen.
2014-12-16 15:07:17 +02:00
Alex Mikheev
428add390e OSHMEM: spml ikrit: add skew to connect/disconnect
Each pe connects/disconnects starting from itself instead of pe=0. This
will distribute network traffic in a more friendly way.
2014-12-03 15:36:45 +02:00
Alex Mikheev
8de50d8420 OSHMEM: spml ikrit: add call to mxm_mq_destroy()
Make valgrind happy by calling mxm_mq_destroy() on module
close.
2014-12-01 12:36:46 +02:00
Nathan Hjelm
d495d49b1c Merge pull request #273 from open-mpi/topic/yoda_rdma_flags
OSHMEM: spml yoda: use flags to check if btl is RDMA capable
2014-11-16 12:04:04 -07:00
Alex Mikheev
fbb9dc5b1e OSHMEM: spml ikrit valgrind fix
always initialize request flags
2014-11-16 17:24:16 +02:00
Alex Mikheev
3443c1d5e5 OSHMEM: spml yoda: use flags to check if btl is RDMA capable 2014-11-16 17:20:20 +02:00
Gilles Gouaillardet
2177f9ec3e fix missing copyright, no code change 2014-11-13 14:56:09 +09:00
Gilles Gouaillardet
cd6e3ecb07 oshmem/yoda: fix a typo in mca_spml_yoda_get_completion 2014-11-13 14:53:32 +09:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Alex Mikheev
097b469f61 OSHMEM: sshmem verbs: fix shared_mr detection
It seems that 5ce2f10067
changed default flag values but it did not modify detection code.
2014-11-10 13:34:04 +02:00
Alex Mikheev
7327b13823 OSHMEM: sshmem mmap: removed unused help topics 2014-11-05 16:39:20 +02:00
Alex Mikheev
1f2ab43ba9 OSHMEM: spml ikrit: remove empty lines in helpfile 2014-11-04 11:26:09 +02:00
Alex Mikheev
d06fb85350 OSHMEM: fixes 'improve mxm transport sanity check'
The code that actually checks for valid transport combos
somehow did not make it to the original commit:

74ab30b738
2014-11-04 11:07:22 +02:00
Alex Mikheev
e1cf6f37ba OSHMEM: spml ikrit: disable rdmap op DCI pool
Instead use single pool for both rdma and send receive ops.
2014-11-03 10:01:07 +02:00
Mike Dubman
5ce2f10067 OSHMEM: integrate review comments from open-mpi/ompi-release#7 2014-10-24 17:21:46 +03:00
Alex Mikheev
5af4d02bd3 OSHMEM: spml ikrit: complete puts b4 memheap destruction
Force completion of all puts before deregestering memheap/bss memory

Fixes a possible race condition where put request completion callback
is called when request context is already cleared.

Change-Id: I7ed887ec0b03a66ce5d3076a7edcf64061f57370
2014-10-19 14:04:34 +03:00
Mike Dubman
ab22dcb875 Merge pull request #229 from nkogteva/master
oshmem mmap: new mca parameters were introduced - sshmem_mmap_anonymous,...
2014-10-15 10:24:29 +03:00
Alex Mikheev
643e64497d OSHMEM: spml ikrit: hw rdma channel is disabled by default 2014-10-14 16:09:51 +03:00
Alex Mikheev
74ab30b738 OSHMEM: spml ikrit: improve mxm transport sanity check
Do not allow combination of transports that is not compliant with
shmem spec. Especially do not allow mix of hw and software atomic
ops

Issue: 4721
Change-Id: Ide382f7510495df3d385f2a5ae5f9def6ef5332c
2014-10-14 15:44:57 +03:00
Alex Mikheev
1bcc88cfb1 OSHMEM: spml ikrit: hardware rdma endpoint
Create additional endpoint that is capable of true
one sided RDMA transfers.

MXM atomics component now uses this endpoint
2014-10-14 15:31:09 +03:00
Alina Sklarevich
1eb6286547 OSHMEM: fix the makefile.
(oshmem/mca/sshmem/base/Makefile.am)
2014-10-14 11:57:46 +03:00
Nadezhda Kogteva
b2a93943dc oshmem mmap: set lvl4 for sshmem_mmap_anonymous and sshmem_mmap_fixed variables, define MAP_ANONYMOUS returned. 2014-10-14 08:54:44 +03:00
Mike Dubman
ec1f761d8e OSHMEM: add missing help file, got lost during merge. Thanks to Yossi/Igor for finding it.
Change-Id: I466e40a3fea70e8045dd1e897edcc50ccf0451a3

Conflicts:
	oshmem/mca/sshmem/base/Makefile.am
	oshmem/mca/sshmem/base/help-oshmem-sshmem.txt
2014-10-13 16:58:35 +03:00
Alex Mikheev
8fcbcba516 Merge branch 'topic/oshmem_shared_mr_fix' 2014-10-13 15:24:12 +03:00
Alex Mikheev
cd67642183 OSHMEM: sshmem verbs: workaround shared_mr procfs bug
dereg shared_mr before doing dereg on its mr.
2014-10-13 15:14:34 +03:00
Nadezhda Kogteva
c68c4b45b5 Merge remote-tracking branch 'upstream/master' 2014-10-13 15:12:39 +03:00
Nadezhda Kogteva
de68d58a9e oshmem: refactor of oshmem/mca/sshmem/*.[ch] files to use #if MACRO style 2014-10-13 13:12:16 +03:00
Nadezhda Kogteva
3e7002e8aa oshmem mmap: copyrights for memheap_base_alloc.c files updated 2014-10-13 11:41:35 +03:00
Nadezhda Kogteva
ce4ee2aa8d oshmem mmap: new mca parameters were introduced - sshmem_mmap_anonymous, sshmem_mmap_fixed and sshmem_base_backing_file_dir - for runtime mmap management.
(cherry picked up from Mellanox-v1.8 repo commit 4c391a)
2014-10-13 11:39:26 +03:00
Mike Dubman
113f40b0ec OSHMEM: sshmem verbs: allocate memory at fixed address
Use experimental verbs to allocate memory at fixed base
virtual address.

verbs will disqualify itself if shared_mr is disabled
or not supported and it is impossible to allocate memory
starting at fixed base virtual address.

verbs contig pages allocator did not guarantee fixed va, now it does.
(cherry picked from commit fd77ebd452)

Apply Jeff`s comments

Update with Jeff commits
(cherry picked from commit open-mpi/ompi-release@4dc487fc3d)
2014-10-12 09:53:48 +03:00
Alex Mikheev
89535a3272 OSHMEM: sshmem mmap: use MAP_PRIVATE instead of MAP_SHARED
It looks like using MAP_PRIVATE instead of MAP_SHARED greatly
speeds up infiniband memory registration.

Change-Id: Id7089f58458ef8fff4034a2c4707d31f7e8b6694
2014-10-06 11:41:06 +03:00
Mike Dubman
fd77ebd452 OSHMEM: sshmem verbs: allocate memory at fixed address
Use experimental verbs to allocate memory at fixed base
virtual address.

verbs will disqualify itself if shared_mr is disabled
or not supported and it is impossible to allocate memory
starting at fixed base virtual address.

verbs contig pages allocator did not guarantee fixed va, now it does.
2014-10-05 14:33:56 +03:00
Alex Mikheev
4ac5936257 OSHMEM: sshmem verbs: improve hca name parsing
If user gives hca port ignore port, use only hca name.
Ex: mlx4_0:1 -> mlx4_0

fixed by @alex-mikheev reviewed by @miked-mellanox
2014-10-05 14:29:11 +03:00
Igor Ivanov
d82dc7f67f OSHMEM: Add two new mca variables
Added use_hp flag in sshmem/sysv variable to control huge page usage;
Added shared_mr sshmem/verbs;
Both paraemetes are set in auto.
Fix help messages

fixed by Igor, reviewed by @miked-mellanox and @alex-mikheev
2014-10-05 14:25:39 +03:00
Alex Mikheev
067fa05209 OSHMEM: fixes bug in shmem_lock
Lock server pe computation was incorrect in cases when:

lock virtual address is signed long. In this case negative pe
value was returned.

In case when lock has different virtual adresses on different pes.
It can happen when memheap or static segment have different base
addresses. Use offset instead of absolute virtual address to
compute server pe

Fixed by @alex-mikheev, reviewed by @miked-mellanox
2014-10-05 09:31:03 +03:00
Alina Sklarevich
e974bec57e OSHMEM: fix check-help-string.pl errors and warnings.
This commit was SVN r32511.
2014-08-12 11:30:14 +00:00
Gilles Gouaillardet
03fbd9a12d check-help-strings cleanup
This commit was SVN r32490.
2014-08-11 03:19:01 +00:00
Mike Dubman
e819a45cee shmem: opal refactoring voices
http://www.open-mpi.org/community/lists/devel/2014/08/15590.php

This commit was SVN r32489.
2014-08-10 08:06:37 +00:00
Mike Dubman
bb53dff57a oshmem: fix after opal refactoring
http://www.open-mpi.org/community/lists/devel/2014/08/15590.php

This commit was SVN r32488.
2014-08-10 07:30:12 +00:00
Mike Dubman
b99fd08c3d oshmem: scoll/fca - opal refactoring voices
based on http://www.open-mpi.org/community/lists/devel/2014/08/15590.php

This commit was SVN r32487.
2014-08-10 04:54:38 +00:00