the sharedfp component has to be selected and opened before
we set the default file view during file_open. Otherwise
there is a sperious error message from the sharefp_file_seek
operation that is called during the file_set_view.
Fixes Issue #5560
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
The write memory barrier was intended to precede setting a fast-box
header but instead follows it. This commit moves the memory barrier to
the intended location.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit dca3516765)
The Autoconf AC_CONFIG_* macros can only be instantiated exacly once
for any given file, *and* they must be in a code execution path at run
time for the target file to be generated at the end of configure.
For example, if you want to generate file ABC at the end of configure,
you must invoke the AC_CONFIG_FILES(ABC) macro in a code path that
will get executed when configure is run.
That's pretty straightforward.
What's not straightforward is two corner cases:
1. You cannot invoke the AC_CONFIG_FILES(ABC) macro for the same file
more than once. If you do, autoreconf will fail (even before you
can run configure).
2. If AC_CONFIG_FILES(ABC) is not in a code path that is executed by
configure, the file ABC is not registered properly, and ABC will
not be generated at the end of configure.
This applies to hwloc because hwloc's HWLOC_SETUP_CORE macro calls
both AC_CONFIG_FILES and AC_CONFIG_HEADER to setup its Makefiles
(etc.) so that targets like "make distclean" and "make distcheck" will
work properly. Hence, we *have* to invoke HWLOC_SETUP_CORE.
However, the MCA_opal_hwloc_hwloc201_CONFIG macro has a few side
effects. It would be nice to do able to do something like this:
```
if hwloc:extern is going to be used:
Invoke minimal HWLOC_SETUP_CORE (with no side effects)
else
Invoke full HWLOC_SETUP_CORE (with side effects)
fi
```
But we can't, because autoreconf will detect that AC_CONFIG_FILES has
been invoked on the same files more than once (regardless of whether
those code paths will be executed at run time or not). Kaboom.
Similarly, we can't do this:
```
if hwloc:extern is not going to be used:
Invoke full HWLOC_SETUP_CORE (with side effects)
fi
```
Because then hwloc's AC_CONFIG_FILES won't be registered properly when
hwloc:external *is* used (i.e., when the HWLOC_SETUP_CORE macro is not
in a code path that is executed at run time), and targets like "make
distclean" will fail because hwloc's Makefiles won't have been setup.
Kaboom.
But remember that the hwloc framework is a bit special: there will
only ever be 2 comoponents: external and internal. External is
guaranteed to be configured first because of its priority. So the
internal component (i.e., this component) immediately knows if it is
going to be used or not based on whether the external component
configuration succeeded or failed.
Specifically: regardless of whether the internal component (i.e., this
component) is going to be used, we have to invoke HWLOC_SETUP_CORE.
But we can manage the side effects: allow the side effects when
this/internal component is going to be used, and avoid the side
effects when this/internal component is not going to be used.
This is a little less clean than I would have liked, but because of
Autoconf's oddity about its AC_CONFIG_* macros, this is the only
solution I could come up with.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 01e4570af7)
In order to make "make distclean" (and friends) work, we need to
*always* invoke the embedded configure script -- even if we know that
we're not going to use this component.
But in cases where we know we're not going to use this component, we
also need to avoid the side effects of the code path that is used when
we *do* want to use this component. So split the two possibilities
into two different macros:
1. MCA_opal_event_libevent2022_FAKE_CONFIG: which does almost nothing
except invoke the underlying "configure" script.
2. MCA_opal_event_libevent2022_REAL_CONFIG: which does all the real
work (including invoking the underlying "configure" script).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 69aa46e167)
Put argument to AM_CONDITIONAL inside []. No code or logic changes.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 80df3f040b)
We know that event:external will be configured first (because of its
priority). Take advantage of that here in libevent2022 by having it
refuse to configure / politely fail if event:external succeeded.
Also print out some additional lines in configure output indicating
what is going on (i.e., event:external succeeded, so this component
will be skipped, or event:external failed, so this component will be
used).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b063cb6b0f)
We know that hwloc:external will be configured first (because of its
priority). Take advantage of that here in hwloc201 by having it
refuse to configure / politely fail if hwloc:external succeeded.
Also print out some additional lines in configure output indicating
what is going on (i.e., hwloc:external succeeded, so this component
will be skipped, or hwloc:external failed, so this component will be
used).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 4e5f432786)
In the cleanup phase, it is possible for PtlMEUnlink() to return
PTL_IN_USE if the NIC is not done with the ME. This should not
be considered an error. This commit adds a retry loop around
PtlMEUnlink().
In some cases, the return value of PtlMEUnlink() and PtlCTFree()
was not checked at all. Check them with the same retry loop as
above.
Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
(cherry picked from commit f3f2a826b4)
-Updated blocking send to directly call functionality and
set completion events expected to 0 initally. This allows for optimization for
providers that support fi_tinject up to larger sizes. This also reduces
latency on running the OFI mtl with smaller sizes without requiring
calls to progress given fi_tinject is required to complete the messaging
before returning and will not create any events in the Completion Queue.
-Updated non-blocking send to directly call fi_tsend and avoid calling
fi_tinject as the functionality should not wait on completions. This
resolves a bug where applications calling MPI_Isend can overrun the
TX buffer with small (inject) messages causing a deadlock. In addition
this improves performance in message rates by preventing
waiting on any size message to complete in non-blocking send messages.
-Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv
which is common between isend and send code paths.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit 7dc8c8ba3f)
- If a message for a recv that is being cancelled gets completed after
the call to fi_cancel, then the OFI mtl will enter a deadlock state
waiting for ofi_req->super.ompi_req->req_status._cancelled which will
never happen since the recv was successfully finished.
- To resolve this issue, the OFI mtl now checks ofi_req->req_started
to see if the request has been started within the loop waiting for the
event to be cancelled. If the request is being completed, then the loop
is broken and fi_cancel exits setting
ofi_req->super.ompi_req->req_status._cancelled = false;
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit 767135c580)
- to avoid value mix used C99 style of object initializations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2806504290)
Things got a little out of whack and we weren't actually processing the map-by modifiers, plus an error crept into the display of the binding report. So clean those up.
Thanks to @tonyreina for the error report
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit bcdb1f45ac)
The call of MPI_Allgather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.
The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 540c2d1)
- there is C11 naming conflict - atomic_init is C macro
which cause building issue
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 3295b23800)
- Added support in MTL_OFI_RETRY_UNTIL_DONE to handle -FI_EAGAIN
from the provider and correctly attempt to progress the OFI Completion
queue by calling ompi_mtl_ofi_progress.
- If events were pending that blocked OFI operations from being enqueued
they will be completed and the OFI operation will be retried once
ompi_mtl_ofi_progress has successfully completed.
- Updated MTL_OFI_RETRY_UNTIL_DONE to take a RETURN variable instead of
requiring the existance of a "ret" variable to pass back the return
value from completing the OFI operation.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit d4f408a7f8)
- in sine cases persistent request was deleted during completion
callback, this cause double free of linked UCX request (assert
in debug build or hang in release build)
- UCX request is freed prior completion callback
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 6fe0a73861)
Flag that we provided a notification and ignore it if it attempts to come back up.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit ea0d70bc9396def61545e2ce492a55c4c3aa7772)
Per https://github.com/open-mpi/ompi/issues/5031, if the user didn't specify a particular PMIx installation, then default back to the internal version if it is newer than the discovered external one. PMIx doesn't yet provide a full signature so we have to just get as close as possible for now.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1e6aaf7f22)
- there worked progress was missed on startup which caused hang
on one of ranks
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit a081fba046)
Due to decreasing support by vendors/other orgs for the OpenIB BTL,
only look for iWarp/RoCE devices by default. Allow IB HCAs
with ports configured for ethernet.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>