1
1

57 Коммитов

Автор SHA1 Сообщение Дата
Artem Polyakov
2abb2972ac Fix Mellanox copyrights with respect to the following PRs:
* https://github.com/open-mpi/ompi/pull/1184
* https://github.com/open-mpi/ompi/pull/1188
* https://github.com/open-mpi/ompi/pull/1197
* https://github.com/open-mpi/ompi/pull/1202
* https://github.com/open-mpi/ompi/pull/1210
* https://github.com/open-mpi/ompi/pull/1216
* https://github.com/open-mpi/ompi/pull/1236
* https://github.com/open-mpi/ompi/pull/1237
* https://github.com/open-mpi/ompi/pull/1248
* https://github.com/open-mpi/ompi/pull/1260
* https://github.com/open-mpi/ompi/pull/1264
2015-12-30 00:12:19 +06:00
Artem Polyakov
3031affdb7 Fix openib process accounting if procs was dynamically added. 2015-12-24 17:56:35 +06:00
Artem Polyakov
400af6c52d openib addproc improvements:
1. finer grained locks;
2. separate srq creation from cq adjustments.
2015-12-24 17:56:35 +06:00
Artem Polyakov
41c325f15a Shift common code for calculating a port count and btl_rank in openib
into the static function
2015-12-24 17:56:35 +06:00
Artem Polyakov
08ad8357a8 Fix local process accounting in openib when dynamic add_proc is on. 2015-12-22 22:44:46 +06:00
Artem Polyakov
3c2f6d5560 Protect openib_btl->device data with explicit opal_mitex locks. 2015-12-22 18:33:26 +06:00
Artem Polyakov
3eb4756a17 Force locking regardles to the opal_using_threads() setting. 2015-12-21 18:52:31 +06:00
Artem Polyakov
86c0c3ec52 Provide additional information: whether ib_proc was newly created or
it was already existing.
2015-12-21 18:52:31 +06:00
Artem Polyakov
9325bd3d69 Protect device initialization 2015-12-21 18:52:31 +06:00
Artem Polyakov
0f77bc7ea7 Perform endpoint initialization atomically. 2015-12-21 18:52:31 +06:00
Artem Polyakov
afaf9c9ea6 Shift ib_proc initialization to the separate function. 2015-12-21 18:52:31 +06:00
Artem Polyakov
0951a34e95 Fix openib memory registration limit calculation if cutoff = 0. 2015-12-17 13:45:19 +06:00
Nathan Hjelm
90db00e37f Merge pull request #996 from hjelmn/openib_progress_thread
btl/openib: remove extra threads
2015-10-08 07:31:27 -06:00
Nathan Hjelm
b8af310efa btl/openib: remove extra threads
This commit removes the service and async event threads from the
openib btl. Both threads are replaced by opal progress thread
support. The run_in_main function is now supported by allocating an
event and adding it to the sync event base. This ensures that the
requested function is called as part of opal_progress.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-07 12:30:41 -06:00
Nathan Hjelm
59aa93e1b6 opal/mpool: add support for passing access flags to register
This commit adds a access_flags argument to the mpool registration
function. This flag indicates what kind of access is being requested:
local write, remote read, remote write, and remote atomic. The values
of the registration access flags in the btl are tied to the new flags
in the mpool. All mpools have been updated to include the new argument
but only the grdma and udreg mpools have been updated to make use of
the access flags. In both mpools existing registrations are checked
for sufficient access before being returned. If a registration does
not contain sufficient access it is marked as invalid and a new
registration is generated.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-05 13:53:55 -06:00
Nathan Hjelm
2041aac4e4 btl/openib: add support for dynamic add_procs
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:55 -06:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Jeff Squyres
8ab2b11f88 btl_openib.c: fix another compiler warning
Remove this unused variable
2015-06-17 09:00:12 -07:00
Jeff Squyres
f688289aaf btl_openib.c: fix compiler warning
This return code is not used; tell the compiler we're not going to
use it.
2015-06-17 08:56:56 -07:00
Nathan Hjelm
43d678e7ca btl/openib: fix more coverity issues
CID 1269931 Uninitialized scalar variable (UNINIT)

Initialize complete async message. This was not a bug but the fix
contributes to valgrind cleanness (uninitialed write).

CID 1269915 Unintended sign extension (SIGN_EXTENSION)

Should never happen. Quieting this by explicitly casting to uint64_t.

CID 1269824 Dereference null return value (NULL_RETURNS)

It is impossible for opal_list_remove_first to return NULL if
opal_list_is_empty returns false. I refactored the code in question to
not use opal_list_is_empty but loop until NULL is returned by
opal_list_remove_first. That will quiet the issue.

CID 1269913 Dereference before null check (REVERSE_INULL)

The storage parameter should never be NULL. The check intended to
check if *storage was NULL not storage.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-28 08:38:10 -06:00
adrianreber
714d9aa67e Merge pull request #348 from adrianreber/topic/orte_cr_continue_like_restart
Topic/orte cr continue like restart
2015-03-12 14:54:02 +01:00
Adrian Reber
f45dd069bd FT: fix compilation using --with-ft (1/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

This first patch moves orte_cr_continue_like_restart from ORTE
to opal_cr_continue_like_restart in OPAL. This only leaves three
calls from OPAL to ORTE in the FT code. As it is not yet 100%
clear how to handle these calls the code orte_sstore.set_attr()
has been #ifdef'd out for now.
2015-03-11 14:23:33 +01:00
Nathan Hjelm
b308afa8fd btl/openib: remove derived btl segment type
The derived segment type (btl_openib_segment_t) was intended to store
the registration info needed for put and get. In BTL 3.0 this is no
longer required. I intended to remove this type as part of
open-mpi/ompi@74f1af4548 .

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-03-10 14:41:15 -06:00
Nathan Hjelm
5f1254d710 Update code base to use the new opal_free_list_t
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.

This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.

Notes:

OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-24 10:05:45 -07:00
Nathan Hjelm
cf91156105 btl/openib: add atomic operation support
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-13 11:46:36 -07:00
Nathan Hjelm
74f1af4548 btl/openib: update for BTL 3.0 interface
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-13 11:46:36 -07:00
Nathan Hjelm
fc7397949c btl: require that btls handle descriptor = NULL in the btl_sendi function
The send inline optimization uses the btl_sendi function to achieve lower
latency and higher message rates. Before this commit BTLs were allowed to
assume the descriptor was non-NULL and were expected to return a valid
descriptor if the send could not be completed using btl_sendi. This
behavior was fine until the usage of btl_sendi was changed in ob1. This
commit allows the caller to specify NULL for the descriptor. The affected
btls have been updated to handle this case.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-13 11:46:36 -07:00
Rolf vandeVaart
08dceda2c0 Fix logic for handling priority and eager RDMA. There was some refactoring that was done
in this code and it ended up changing the logic that is used to set up eager RDMA.
Rather than setting up eager RDMA with a high priority message, it did it the other
way around.  For some reason, CUDA-aware support did not like this.  So, basically,
restore the logic to the way it was prior to the refactoring.  The refactoring did not
intend to change this.  Lightly reviewed by hjelmn.
2015-02-11 16:38:36 -05:00
Gilles Gouaillardet
661c35ca67 cleanup dead code caused by the removal of the --with-threads configure option 2015-01-16 19:13:59 +09:00
Nathan Hjelm
006074c48d Merge pull request #332 from hjelmn/openib_updates
Openib updates
2015-01-15 15:05:18 -06:00
Gilles Gouaillardet
135ecce0eb btl/openib: rename OPAL_HAVE_XRCD macro into OPAL_HAVE_CONNECTX_XRC_DOMAINS 2015-01-07 13:27:25 +09:00
Nathan Hjelm
cde79bfa60 btl/openib: misc cleanup (tabs, etc) and put credit code into a common place (was duplicated in the send and sendi paths) 2015-01-06 11:39:23 -07:00
Nathan Hjelm
9bae131589 btl/openib: fix message coalescing
There was a bug in the openib btl handling this valid sequence of
calls:

desc = btl_alloc ();
btl_free (desc);

When triggered the bug would cause either fragment loss or undefined
behavior (SEGV, etc). The problem occured because btl_alloc contained
the logic to modify the pending fragment (length, etc) and these
changes were not corrected if the fragment was freed instead of sent.

To fix this issue I 1) moved some of the coalescing logic to the
btl_send function, and 2) retry the coalesced fragment on btl_free
if it was never sent. This appears to completely address the issue.
2015-01-06 11:39:16 -07:00
Nathan Hjelm
9aaac11648 btl/openib: fix recieve queue source detection 2015-01-06 11:39:11 -07:00
Gilles Gouaillardet
b3617e736e btl/openib: add XRC support with OFED 3.12+
based on an original patch contributed by Bull.
2015-01-06 15:30:52 +09:00
Devendar Bureddy
ccafc62c07 OMPI: btl openib: fix max registarable memory caluclation
- by default allow to register maximum possible (i.e 2 * total_memory)
      memory. This beheviour can be turned off using mca parameter
      "btl_openib_allow_max_memory_registration"

    - In fallback case, use device specific parameters to calulate
      memory limit.
2014-12-23 23:35:54 +02:00
Nathan Hjelm
1b564f62bd Revert "Merge pull request #275 from hjelmn/btlmod"
This reverts commit ccaecf0fd6c862877e6a1e2643f95fa956c87769, reversing
changes made to 6a19bf85dde5306f559f09952cf3919d97f52502.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
b1f9569b7d Revert "btl/openib: fix warnings"
This reverts commit 6e6c786b49dc9df70c87bb8348c12b6f68de173b.
2014-11-19 23:16:16 -07:00
Nathan Hjelm
6e6c786b49 btl/openib: fix warnings 2014-11-19 15:57:01 -07:00
Nathan Hjelm
bf7daac388 btl/openib: add atomic operation support 2014-11-19 11:33:04 -07:00
Nathan Hjelm
90554d0f95 btl/openib: misc cleanup (tabs, etc) and put credit code into a common place (was duplicated in the send and sendi paths) 2014-11-19 11:33:03 -07:00
Nathan Hjelm
4122067236 btl/openib: fix message coalescing 2014-11-19 11:33:03 -07:00
Nathan Hjelm
38e9611930 btl/openib: fix recieve queue source detection 2014-11-19 11:33:03 -07:00
Nathan Hjelm
7c43b566d2 more openib updates 2014-11-19 11:33:03 -07:00
Nathan Hjelm
e03956e099 Update the scif and openib btls for the new btl interface
Other changes:
 - Remove the registration argument from prepare_src since it no
   longer is meant for RDMA buffers.

 - Additional cleanup and bugfixes.
2014-11-19 11:33:02 -07:00
Nathan Hjelm
ec33374339 btl: remove des_remote/des_remote_count from the mca_btl_base_descriptor_t
structure

This structure member was originally used to specify the remote segment
for an RDMA operation. Since the new btl interface no longer uses
desriptors for RDMA this member no longer has a purpose. In addition
to removing these members the local segment information has been
renamed to des_segments/des_segment_count.
2014-11-19 11:33:02 -07:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
rolfv
9134f48d4c Do not use sendi path with GPU buffer 2014-10-24 13:35:01 -07:00
Ralph Castain
fd6a044b7f Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages.
Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.
2014-10-03 16:02:57 -06:00