1
1

553 Коммитов

Автор SHA1 Сообщение Дата
Matias A Cabral
25bdd118ac MTL_OFI: Changed Recv cancel to be non-blocking
Updated the OFI MTL's Recv cancel to be a non-blocking call to match
the MPI spec. Given fi_cancel succeeded, then it is expected that the
user will wait on the request to read the result of if the cancel has
completed.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com
2019-02-14 17:07:20 -05:00
Aravind Gopalakrishnan
6edcc479c4 mtl/ofi: Fix segfault when not using Thread-Grouping feature
For the non thread-grouping paths, only the first (0th) OFI context
should be used for communication. Otherwise this would access a non existant
array item and cause segfault.

While at it, clarifiy some content regarding SEPs in README (Credit to Matias Cabral
for README edits).

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2019-02-07 11:52:53 -08:00
Jeff Squyres
f5e1a672cc ofi: revamp OPAL_CHECK_OFI configury
Update the OPAL_CHECK_OFI configury macro:

- Make it safe to call the macro multiple times:
  - The checks only execute the first time it is invoked
  - Subsequent invocations, it just emits a friendly "checking..."
    message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
  output variables from OPAL_CHECK_OFI to be
  opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
  causes the macro to be invoked at a fairly random time, which makes
  configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
  resulted in an indenting change, making the change to
  opal_check_ofi.m4 look larger than it really is).

Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 06:29:58 -08:00
Jeff Squyres
aba2571881 mtl/ofi/Makefile.am: down with tabs!
Replace all tabs with spaces.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 06:29:58 -08:00
Gilles Gouaillardet
945f830f7a mtl/ofi: fix configury when VPATH is used
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-07 06:29:58 -08:00
Aravind Gopalakrishnan
9cabcfdbba mtl/ofi: Fix reference to help text object
When we exceed the threshold number of contexts created, print appropriate help
text

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2019-01-29 15:10:06 -08:00
Brian Barrett
23da9fac23
Merge pull request #6294 from bwbarrett/mtl-ofi-no-device-warning
mtl/ofi: Print descriptive error message on modex failure
2019-01-29 08:32:49 -08:00
Brian Barrett
44be7f139a mtl/ofi: Provide av count hint during initialization
Provide the av_attr.count hint (number of addresses that will be
inserted into the address vector through the life of the process)
at initialization of the address vector.  It's ok to be a bit
wrong, but some endpoints (RxR) can benefit by not going through
the slow growth realloc churn.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2019-01-24 15:47:24 -08:00
Brian Barrett
fe25097194 mtl/ofi: Print descriptive error message on modex failure
With MTLs, there's no "other transport" when the remote side
does not have an active NIC, so we should print a useful error
message when the modex failed (indicating lack of a NIC on
the remote side).

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2019-01-21 23:50:31 +00:00
Aravind Gopalakrishnan
37f9aff2a0 mtl/ofi: Add MCA variables to enable SEP and to request number of OFI contexts
Moving to a model where we have users actively _enable_ SEP feature for use
rather than opening SEP by default if provider supports it. This allows us to
not regress (either functionally or for performance reasons) any apps that were
working correctly on regular endpoints.

Also, providing MCA to specify number of OFI contexts to create and default
this value to 1 (Given btl/ofi also creates one by default, this reduces the
incidence of a scenario where we allocate all available contexts by default and
if btl/ofi asks for one more, then provider breaks as it doesn't support it).

While at it, spruce up README on SEP content.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2019-01-14 09:58:36 -08:00
Spruit, Neil R
bef5f50a42 MTL_OFI: Generation of specialized functions at build time
-> Added new targets in Makefile.am to call a new build script
   generate-opt-funcs.pl to generate specialized functions for
   each *.pm file.

-> Added new perl module *.pm files for send,isend,irecv,iprobe,improbe
   which are loaded by generate-opt-funcs.pl to create new source files
   that correspond to the name of the .pm file to be used as part of
   MTL OFI.

-> Added mtl_ofi_opt.pm.template and updated README with details on the
   specialization features and how to add additional specialization
   support.

-> Added new opt_common/mtl_ofi_opt_common.pm containing common
   functions for generating the specialized functions used by
   all other *.pm modules.

-> Added new mtl_ofi.h which includes the definitions for the
   function symbol table for storing the specialized functions along
   with the definitions for the initialization functions for the
   corresponding function pointers.

-> Based off the OFI provider capabilities the specialized function
   pointers are assigned at mtl_ofi_component_init to the corresponding
   MTL OFI function.

-> mca_mtl_ofi_module_t has been updated with the symbol table
   struct which is assigned at component init.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-12-13 00:35:19 -08:00
Aravind Gopalakrishnan
e5e19dfcf7 Fix for SEP when num local procs is greater than available contexts
For cases when the number of local processes is greater than the number of
available contexts, the SEP initialization phase would calculate the number of
contexts to provision for each rank to be 0 and would eventually crash.

Fix the issue here by using regular endpoints in the event the number of local
processes is more than available contexts. This fixes issue #6182.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-12-12 16:49:04 -08:00
Brian Barrett
6e15128d96 mtl/ofi: Fix crash if no providers found
Commit 109d0569ffd introduced a crash when an error occurred
before ofi_ctxt was allocated, including when no providers
passed the selection logic.  Properly check that the pointer
is not NULL in the error cleanup code before dereferencing
the pointer.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-12-11 15:46:18 -08:00
Matias A Cabral
c76c6d8b28 MTL/PSM2: add missing default priority
Missing default priority after PR #6153

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2018-12-07 14:46:34 -08:00
Matias A Cabral
fc8582c560 MTL/PSM2: Do not lower the priority when all processes are local.
The intention of lowering the priority when all processes are local
was to favor Vader BTL. However, in builds including the OFI MTL it
gets selected instead.

Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com>
Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com>
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2018-12-04 15:31:09 -08:00
Aravind Gopalakrishnan
109d0569ff MTL/OFI: Add OFI Scalable Endpoint support
OFI MTL supports OFI Scalable Endpoints feature as means to improve
multi-threaded application throughput and message rate. Currently the feature
is designed to utilize multiple TX/RX contexts exposed by the OFI provider in
conjunction with a multi-communicator MPI application model. For more
information, refer to README under mtl/ofi.

Reviewed-by: Matias Cabral <matias.a.cabral@intel.com>
Reviewed-by: Neil Spruit <neil.r.spruit@intel.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-12-03 09:56:52 -08:00
matcabral
6a15712df5 MTL/OFI: revert PR 6082
Revert to avoid issues with dynamic processes.

Signed-off-by: matcabral <matias.a.cabral@intel.com>
2018-11-30 13:44:39 -08:00
matcabral
5f58453e63 MTL/OFI: Lower priority when all procs are local
So far Vader is faster than OFI MTL for doing shared memory.
Therefore, let it run by default when all procs are local.

Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com>
Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com>
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2018-11-14 11:01:33 -08:00
Aravind Gopalakrishnan
5cf43de445 MTL/OFI: Check threshold number of peers allowed per rank
When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have
sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when
this limit is crossed.

Check the max allowed number of ranks during add_procs() and return if there is
danger of exceeding this threshold.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-11-01 14:03:00 -07:00
Brian Barrett
e9e4d2a4bc Handle asprintf errors with opal_asprintf wrapper
The Open MPI code base assumed that asprintf always behaved like
the FreeBSD variant, where ptr is set to NULL on error.  However,
the C standard (and Linux) only guarantee that the return code will
be -1 on error and leave ptr undefined.  Rather than fix all the
usage in the code, we use opal_asprintf() wrapper instead, which
guarantees the BSD-like behavior of ptr always being set to NULL.
In addition to being correct, this will fix many, many warnings
in the Open MPI code base.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-08 16:43:53 -07:00
Brian Barrett
c5eaa38491 mtl ofi: Change from opt-in to opt-out provider selection
Change default provider selection logic for the OFI MTL.  The
old logic was whitelist-only, so any new HPC NIC provider would
have to ask users to do extra work or wait for an OMPI release
to be whitelisted.  The reason for the logic was to avoid
selecting a "generic" provider like sockets or shm that would
frequently have worse performance than the optimized BTL options
Open MPI supports.

With the change, we blacklist the (small, relatively static) list
of providers that duplicate internal capabilities.  Users can use
one of thse blacklisted providers in two ways: first, they can
explicitly request the provider in the include list (which will
override the default exclude list) and second, the can set a new
empty exclude list.

Since most HPC networks require special libraries and therefore
an explicit build of libfabric, it is highly unlikely that this
change will cause users to use libfabric when they didn't want to
do so.  It does, however, solve the whitelisting problem.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-09-27 11:02:18 -07:00
Nathan Hjelm
000f9eed4d opal: add types for atomic variables
This commit updates the entire codebase to use specific opal types for
all atomic variables. This is a change from the prior atomic support
which required the use of the volatile keyword. This is the first step
towards implementing support for C11 atomics as that interface
requires the use of types declared with the _Atomic keyword.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-14 10:48:55 -06:00
Gilles Gouaillardet
316e4e38f4 mtl/psm2: fix a misc memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-08-30 10:07:17 +09:00
Aravind Gopalakrishnan
5cbcae79d8 MTL OFI: Ask for FI_THREAD_DOMAIN support when not using MPI_THREAD_MULTIPLE
When an application is not using multiple threads to call into MPI, we can
safely ask for FI_THREAD_DOMAIN setting from the provider as it should
translate to the least amount of locking in provider.

Conversely, for applications using THREAD_MULTIPLE, explicitly ask for
FI_THREAD_SAFE to prevent race conditions.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-08-23 14:18:32 -07:00
Aravind Gopalakrishnan
ed2343034d MTL OFI: Fix race condition due to global progress entries array
Since progress entries array is globally allocated, it is susceptible
to race conditions when using multi-threaded applications. Allocating it
on the stack resolves any potential races as it is thread local by default.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-08-09 10:52:28 -07:00
Ralph Castain
1aef0a64aa
Merge pull request #5477 from nrspruit/ns_mtl_send_isend
MTL OFI: send/isend split into blocking/non-blocking paths
2018-07-31 13:08:37 -07:00
Spruit, Neil R
7dc8c8ba3f MTL OFI: send/isend split into blocking/non-blocking paths
-Updated blocking send to directly call functionality and
set completion events expected to 0 initally. This allows for optimization for
providers that support fi_tinject up to larger sizes. This also reduces
latency on running the OFI mtl with smaller sizes without requiring
calls to progress given fi_tinject is required to complete the messaging
before returning and will not create any events in the Completion Queue.

-Updated non-blocking send to directly call fi_tsend and avoid calling
fi_tinject as the functionality should not wait on completions. This
resolves a bug where applications calling MPI_Isend can overrun the
TX buffer with small (inject) messages causing a deadlock. In addition
this improves performance in message rates by preventing
waiting on any size message to complete in non-blocking send messages.

-Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv
which is common between isend and send code paths.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-07-24 07:54:24 -07:00
Spruit, Neil R
767135c580 MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
- If a message for a recv that is being cancelled gets completed after
the call to fi_cancel, then the OFI mtl will enter a deadlock state
waiting for ofi_req->super.ompi_req->req_status._cancelled which will
never happen since the recv was successfully finished.

- To resolve this issue, the OFI mtl now checks ofi_req->req_started
to see if the request has been started within the loop waiting for the
event to be cancelled. If the request is being completed, then the loop
is broken and fi_cancel exits setting
ofi_req->super.ompi_req->req_status._cancelled = false;

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-07-24 03:12:44 -07:00
Matias Cabral
d996f529c0 MTL OFI: Add support for mem_tag_format
OFI providers may reserve some of the upper bits of the tag for
internal usage and expose it using mem_tag_format. Check for that
and adjust communicator bits as needed.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-07-23 11:39:40 -07:00
Spruit, Neil R
d4f408a7f8 MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow
- Added support in MTL_OFI_RETRY_UNTIL_DONE to handle -FI_EAGAIN
  from the provider and correctly attempt to progress the OFI Completion
  queue by calling ompi_mtl_ofi_progress.

- If events were pending that blocked OFI operations from being enqueued
  they will be completed and the OFI operation will be retried once
  ompi_mtl_ofi_progress has successfully completed.

- Updated MTL_OFI_RETRY_UNTIL_DONE to take a RETURN variable instead of
  requiring the existance of a "ret" variable to pass back the return
  value from completing the OFI operation.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-07-17 03:00:38 -07:00
Spruit, Neil R
9a17864278 MTL OFI: Redesign sync send with reduced tag bits and quick ack
-Updated the design for sync send MPI calls to use 2 protocol bits for
denoting "sync_send" or "sync_send_ack".

-"Sync_send" is added to the send tag only and is masked out in receives
such that it can be read by the original Recv posted in the send/recv
operation.

-"Sync_send_ack" is sent from the recv callback to the send side. This 0
byte send does not generate a completion entry and instead sends the
message and immediately completes the opal completion in the recv.

-Tag formats ofi_tag_1 and ofi_tag_2 have been updated to include 2
more tag bits per format type due to the reduced protocal bits required
by OMPI.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-07-09 06:50:21 -07:00
Matias A Cabral
e6674556aa MTL OFI: add support for FI_REMOTE_CQ_DATA.
Extend number of supported ranks with providers that support
FI_REMOTE_CQ_DATA. Add README file to OFI MTL
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2018-06-14 17:17:38 -07:00
Brian Barrett
09e4c40ce9 mtl: remove MXM MTL
Remove the MXM MTL, which has been deprecated in preference for
the Yalla PML.  This was discussed at the last developers meeting
and somehow I ended up with the action item to do the removal.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-05-21 14:18:30 -07:00
Nathan Hjelm
f432d07844 mtl: reset ompi_mtl_base_selected_component on framework close
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-05-02 14:53:34 -06:00
Todd Kordenbrock
d646a00cd9
Merge pull request #5054 from tkordenbrock/topic/master/mtl-portals4.finalize.fix
master: mtl-portals4: don't call progress() in finalize() if Portals4 was not initialized
2018-04-12 12:12:05 -05:00
Todd Kordenbrock
90659671bc mtl-portals4: don't call progress() in finalize() if Portals4 was not initialized
This commit fixes a segfault in mtl-portals4 finalize().  The segfault
occurs if finalize() is called without any calls to add_procs().  This
commit resolves the segfault by skipping the progress() loop in
finalize() if the Portals was not initialized.

Signed-off-by: Todd Kordenbrock (thkgcode@gmail.com)
2018-04-10 14:22:32 -05:00
Spruit, Neil R
e7bff501cd MTL OFI: Added support for reading multiple CQ events in ofi progress
-Updated ompi_mtl_ofi_progress to use an array to read CQ events up to a
threshold that can be set by the Open MPI User.

-Users can adjust the number of events that can be handled in the
ompi_mtl_ofi_progress by setting "--mca mtl_ofi_progress_event_cnt #".

-The default value for the the number of CQ events that can be read in a
single call to ofi progress is 100 which is an average
based off workload usecase anaylsis showing 70-128 as the range of
multiple events returned during ofi progress.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-02-15 09:41:14 -05:00
Aravind Gopalakrishnan
fb68726baf MTL OFI: Allow retries in MTL progress for interrupted syscalls
This fixes a regression in sockets provider which could return -EINTR value
from fi_cq_read() due to a syscall being interrupted. The error value is
currently interpreted as fatal condition. Relax the rule so that we can retry
fi_cq_read() operation.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2017-12-20 14:58:49 -08:00
Matias Cabral
2c86b8723d
Merge pull request #4510 from matcabral/mtl_psm2_shadow_vars
New flag for MCA parameters that allows a behaving with a default value of "unset".
2017-12-04 12:25:37 -08:00
Howard Pritchard
b160cf6339
Merge pull request #4533 from hppritcha/topic/ofi_mtl_mprobe_fixes
mtl/ofi: fix problem with mprobe/mrecv
2017-12-04 09:11:47 -07:00
Nathan Hjelm
1282e98a01 opal/asm: rename existing arithmetic atomic functions
This commit renames the arithmetic atomic operations in opal to
indicate that they return the new value not the old value. This naming
differentiates these routines from new functions that return the old
value.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:22 -07:00
Nathan Hjelm
9d0b3fe9f4 opal/asm: remove opal_atomic_bool_cmpset functions
This commit eliminates the old opal_atomic_bool_cmpset functions. They
have been replaced by the opal_atomic_compare_exchange_strong
functions.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:22 -07:00
Howard Pritchard
cd48eccbae mtl/ofi: fix problem with mprobe/mrecv
At least with some providers (sockets and GNI), the mprobe/mrecv
ofi mtl methods were incorrect.  For these two providers at least
one must supply the original tag and mask bits used with the
prior FI_PEEK | FI_CLAIM request that had been used to probe for
the message.

These providers take a strict interpretation of the following sentence
from the libfabric fi_tagged man page:

```
 Claimed messages can only be retrieved using a subsequent, paired receive  operation  with  the  FI_CLAIM  flag  set.
```

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-11-24 08:11:18 -07:00
Matias A Cabral
1fad59465f New flag for MCA parameters that allows a behaving with a default value
of "unset".
mtl/psm2: Update some shadow mca parameters to use the default "unset".
mtl/psm2: Add new shadow parameter to allow specifying the service level.

Signed-off-by: Matias A Cabral <matias.a.cabral@intel.com>
2017-11-16 16:28:50 -08:00
Matias Cabral
d1869a725a
Merge pull request #4467 from matcabral/master
mtl/ofi: Set data and control progress options default values to FI_PROGRESS_UNSPEC
2017-11-13 07:35:39 -08:00
Jeff Squyres
a8686a6813 mtl ofi: squelch compiler warnings
gcc 5.2 complains:

```
mtl_ofi_component.c: In function ‘ompi_mtl_ofi_finalize’:
mtl_ofi_component.c:613:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
     if (ret = fi_close((fid_t)ompi_mtl_ofi.fabric)) {
     ^
```

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-11 05:07:11 -08:00
Jeff Squyres
5a6ddf42d6 mtl ofi: it is not an error to return no data from fi_getinfo()
Before this commit, the presence of usNIC devices -- which will
(currently) return no data when fi_getinfo() is queried for tagged
matching providers -- would cause an error message to be displayed.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-11 05:07:11 -08:00
Jeff Squyres
f910f554f7 mtl ofi: show the positive value of the error
The value of ret is negative (e.g., -61), but it is displayed in the
help message as `%zd`, which renders as unsigned (i.e., a giant
positive value).  So make sure to negate the negative value before
rendering it (e.g., so we display "61", not "4294967235").

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-11 05:07:11 -08:00
Jeff Squyres
e8c13ef286 mtl ofi: fix trivial comment whitespace
No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-11 05:07:10 -08:00
Jeff Squyres
bed1930df8 mtl ofi: fix formatting of help message
No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-11 05:07:05 -08:00