1
1

74 Коммитов

Автор SHA1 Сообщение Дата
Mike Dubman
3e93ef49da Merge pull request #1134 from alex-mikheev/topic/ikrit_err_fix_fix
SPML/IKRIT: opal_progress and ud_only fixes
2015-11-15 19:20:55 -06:00
Mike Dubman
a7128af8c4 OSHMEM/ikrit: fix valgrind error 2015-11-15 14:51:41 +02:00
Alex Mikheev
0755a59091 SPML/IKRIT: opal_progress and ud_only fixes
Some MXM tls such as self, shm can comlete requests immediately.
Make sure that opal_progress() is called before before request
is completed.

fix ud_only logic when hw rdma channel is using ud and main
transport is rc or dc.
2015-11-15 12:13:24 +02:00
Alex Mikheev
cd8ea438d3 OSHMEM/SPML/ikrit: memcheck support 2015-11-11 13:46:20 +02:00
Alex Mikheev
2a8de45b43 OSHMEM/SPML/IKRIT: check return of mxm_req_send correctly
do not force memory registration if main and additional comm
channels are both ud
2015-11-11 13:34:26 +02:00
Alex Mikheev
b020b628fc oshmem/memheap: optimized mkey lookup.
Fast path lookup is done in inline funcion.
2015-10-20 19:45:51 +03:00
yosefe
bd3f4c8cc7 spml/memheap: add support for mkey unpack. 2015-10-20 19:45:50 +03:00
Alex Mikheev
8fa14386ea spml_ikrit: fixes typo in .h file. 2015-10-20 19:36:41 +03:00
Igor Ivanov
ec7cd13a81 oshmem: Fix compilation warnings 2015-09-21 18:50:20 +03:00
Igor Ivanov
4b8d9b8eff oshmem/proc: Refactor proc component
Most functionality of oshmem_proc duplicates ompi_proc. In addition
to that, Current logic does not allow to do oshmem initialization
w/o ompi startup.
So this refactoring allows to  avoid code duplication, decrease used
memory and make oshmem support easier.
Now oshmem_proc is transparent ompi_proc structure, that can be
extended by oshmem specific data.

Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-17 18:49:00 +03:00
yosefe
41f3b77e31 ikrit: set DC defaults. 2015-07-24 21:01:13 +03:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Nathan Hjelm
c4a61969c0 oshmem: use C99 subobject naming for component initialization
This commit helps future-proof oshmem components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Nathan Hjelm
5f1254d710 Update code base to use the new opal_free_list_t
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.

This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.

Notes:

OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-24 10:05:45 -07:00
Igor Ivanov
010dce307a Fix set of coverity issues
List of CIDs (scan.coverity.com):
oshmem:
1269787, 1269907, 1270161, 1270162, 1270977, 1270978
ompi:
1270170, 1270172, 1270173

Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-02-20 17:45:46 +04:00
Igor Ivanov
426d1ce146 oshmem: Fix set of coverity issues
List of CIDs (scan.coverity.com):
1269721, 1269725, 1269787, 1269907, 1269909, 1269910, 1269911, 1269912,
1269959, 1269960, 1269984, 1269985, 1270136, 1270157, 1269845, 1269875,
1269876, 1269877, 1269878, 1269884, 1269885, 1270161, 1270162, 1270175,
1269734, 1269739, 1269742, 1269743

Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-02-19 23:00:17 +04:00
Alex Mikheev
71ebbca26d OSHMEM: spml ikrit: fix spelling in help file 2014-12-16 16:18:38 +02:00
Alex Mikheev
3f7ed56548 OSHMEM: spml ikrit: fix mxm disconnect flow
Add out of band barrier before performing mxm disconnect.
It will make sure that every pe is ready to disconnect. Otherwise
bad things may happen.
2014-12-16 15:07:17 +02:00
Alex Mikheev
428add390e OSHMEM: spml ikrit: add skew to connect/disconnect
Each pe connects/disconnects starting from itself instead of pe=0. This
will distribute network traffic in a more friendly way.
2014-12-03 15:36:45 +02:00
Alex Mikheev
8de50d8420 OSHMEM: spml ikrit: add call to mxm_mq_destroy()
Make valgrind happy by calling mxm_mq_destroy() on module
close.
2014-12-01 12:36:46 +02:00
Alex Mikheev
fbb9dc5b1e OSHMEM: spml ikrit valgrind fix
always initialize request flags
2014-11-16 17:24:16 +02:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Alex Mikheev
1f2ab43ba9 OSHMEM: spml ikrit: remove empty lines in helpfile 2014-11-04 11:26:09 +02:00
Alex Mikheev
d06fb85350 OSHMEM: fixes 'improve mxm transport sanity check'
The code that actually checks for valid transport combos
somehow did not make it to the original commit:

74ab30b73855fdb4559ccdcdd4cae6545ac75236
2014-11-04 11:07:22 +02:00
Alex Mikheev
e1cf6f37ba OSHMEM: spml ikrit: disable rdmap op DCI pool
Instead use single pool for both rdma and send receive ops.
2014-11-03 10:01:07 +02:00
Alex Mikheev
5af4d02bd3 OSHMEM: spml ikrit: complete puts b4 memheap destruction
Force completion of all puts before deregestering memheap/bss memory

Fixes a possible race condition where put request completion callback
is called when request context is already cleared.

Change-Id: I7ed887ec0b03a66ce5d3076a7edcf64061f57370
2014-10-19 14:04:34 +03:00
Alex Mikheev
643e64497d OSHMEM: spml ikrit: hw rdma channel is disabled by default 2014-10-14 16:09:51 +03:00
Alex Mikheev
74ab30b738 OSHMEM: spml ikrit: improve mxm transport sanity check
Do not allow combination of transports that is not compliant with
shmem spec. Especially do not allow mix of hw and software atomic
ops

Issue: 4721
Change-Id: Ide382f7510495df3d385f2a5ae5f9def6ef5332c
2014-10-14 15:44:57 +03:00
Alex Mikheev
1bcc88cfb1 OSHMEM: spml ikrit: hardware rdma endpoint
Create additional endpoint that is capable of true
one sided RDMA transfers.

MXM atomics component now uses this endpoint
2014-10-14 15:31:09 +03:00
Alina Sklarevich
e974bec57e OSHMEM: fix check-help-string.pl errors and warnings.
This commit was SVN r32511.
2014-08-12 11:30:14 +00:00
Gilles Gouaillardet
03fbd9a12d check-help-strings cleanup
This commit was SVN r32490.
2014-08-11 03:19:01 +00:00
Mike Dubman
e819a45cee shmem: opal refactoring voices
http://www.open-mpi.org/community/lists/devel/2014/08/15590.php

This commit was SVN r32489.
2014-08-10 08:06:37 +00:00
Mike Dubman
bb53dff57a oshmem: fix after opal refactoring
http://www.open-mpi.org/community/lists/devel/2014/08/15590.php

This commit was SVN r32488.
2014-08-10 07:30:12 +00:00
Alex Mikheev
c3e017c190 OSHMEM: refactoring of fix wrong btl/sm processing
Use exising fields of mkey struct to identify 'shared memory'
segments.

mkey.u.key is now always initialized to MAP_SEGMENT_SHM_INVALID instead
of 0

reviewed by Mike and Igor
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32174.
2014-07-09 08:57:27 +00:00
Mike Dubman
247da2819f OSHMEM: fix wrong btl/sm processing and typo
fixed by Igor reviewed by Alex,Mike,Yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32100.
2014-06-28 18:40:28 +00:00
Alex Mikheev
3b5fa97790 OSHMEM: fixes problem with local heap2heap copy
check for possibility of heap2heap copy was incorrect
in case when shared heaps have different virtual
addresses on same host.

It seems that ibv_exp_reg_mr() on CIB cards may return
different VAs for heap on same node. On CX3 addresses are
the same.

reviewed by miked

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31969.
2014-06-09 09:41:44 +00:00
Mike Dubman
55e35e0f6e OSHMEM: Fix issue with incorrect mca variables registration
Few components had wrong mca variables registration procedure
List of them:
- atomic basic and mxm
- spml yoda and ikrit
Two mca variables as runtime_api_verbose and runtime_lock_recursive change
names to oshmem_api_verbose and oshmem_lock_recursive otherwise they
were not shown by oshmem_info tool.

fixed by Igor, reviewed by Miked
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31962.
2014-06-06 17:36:47 +00:00
Mike Dubman
5dd4c68a0f OSHMEM: use bulk connect/disconnect API
fixed by Yossi, reviewed by MikeD

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31856.
2014-05-21 08:00:21 +00:00
Mike Dubman
95e637f5ba OSHMEM: fix error message when aborting on OOM
fixed by Roman, reviewed by Miked

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31752.
2014-05-14 13:45:16 +00:00
Ralph Castain
5602156a1c Use the correct abstraction layer name for the data dirs
This commit was SVN r31684.
2014-05-08 14:32:24 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Alex Mikheev
c29d426153 OSHMEM: fixes mxm rc transport mkey ecxhange
cmr=v1.8.2:reviewer=ompi-rm1.8
reviewed by miked

This commit was SVN r31627.
2014-05-04 14:26:54 +00:00
Mike Dubman
323e4418b9 OSHMEM: extract memheap allocate methods into separate framework
- similar to opal/shmem
- next step is some refactoring and merge into opal/shmem
 Developed by Igor, reviewed by AlexM, MikeD

This commit fixes trac:4261.

This commit was SVN r30855.

The following Trac tickets were found above:
  Ticket 4261 --> https://svn.open-mpi.org/trac/ompi/ticket/4261
2014-02-26 16:32:23 +00:00
Mike Dubman
202bf90287 OSHMEM: ikrit optimization
no need pre-register heap if MXM/ud is used
fixed by Alex, reviewed by Miked

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30809.
2014-02-25 15:03:53 +00:00
Mike Dubman
5ed50793d5 OSHMEM: misc ikrit/mxm enhancements
- fix mxm/tl selection logic
- do not require memory registration if mxm/ud was selected

fixed by Alex, reviewed by Miked

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30802.
2014-02-24 07:06:57 +00:00
Mike Dubman
49ee63f4b8 MXM: do not enforce version check
- MXM uses libtool versioning scheme which is enough, no need additional in OMPI

reviewed by yossi

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30768.
2014-02-18 19:44:37 +00:00
Mike Dubman
fe692eb107 OSHMEM: remove unused code, rename mca param
- remove old, unused code
- rename mca param for oshmem preconnect to match mpi naming scheme.

fixed by Alex, reviewed by Mike

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30748.
2014-02-17 14:57:12 +00:00
Mike Dubman
96142b31bd shmem: remove unused defines
fixed by Roman, reviewed by MikeD
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30735.
2014-02-15 06:43:08 +00:00
Mike Dubman
28949efcaf OSHMEM: MXM2: a2a perf improvement on large scale
Allow only limited number of coonections to have 'puts'
that do not require remote completion ack. That will
greatly improve performance of shmem_fence()/shmem_quiet()
and shmem_barrier() when there are many active connections.

fixed by Alex, reviewed by Miked

Refs trac:3763

This commit was SVN r30573.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2014-02-06 08:42:45 +00:00
Mike Dubman
27a763c86c OSHMEM: finalization fixes
Fixes mxm endpoint destruction and hung during
SHMEM finalization.

Add a barrier between spml del procs and finalization.
Not having it caused hungs because ikrit spml can not properly
disconnect if its peer already finizalized.

Refs trac:3763

This commit was SVN r30572.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2014-02-06 08:40:43 +00:00