This commit fixes a typo in compare-and-swap when retrieving the
memory region associated with a displacement. It was erroneously 8
bytes instead of the datatype size. This can cause an incorrect RMA
range error when the compare-and-swap is less than 4 bytes from the
end of the region.
Fixedopen-mpi/ompi#2080
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds support for using network AMOs for MPI_Accumulate,
MPI_Fetch_and_op, and MPI_Compare_and_swap. This support is only
enabled if the ompi_single_intrinsic info key is specified or the
acc_single_interinsic MCA variable is set. This configuration
indicates to this implementation that no long accumulates will be
performed since these do not currently mix with the AMO
implementation.
This commit also cleans up the code somwhat. This includes removing
unnecessary struct keywords where the type is also typedef'd.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit cleans up some code in the passive target path. The code
used the buffered frag control send path but it is more appropriate to
use the unbuffered one. This avoids checking structures that are
should not be in use in this path.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes an ordering bug in the code that keeps track of all
attached memory windows. The code is intended to keep the memory
regions sorted but was often inserting at the wrong index. Thanks to
Christoph Niethammer for reporting the issue. The reproducer will be
added to nightly MTT testing.
Fixesopen-mpi/ompi#2012
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
It is possible for another thread to process a lock ack before the
peer is set as locked. In this case either setting the locked or the
eager active flag might clobber the other thread. To address this the
flags have been made volatile and are set atomically. Since there is
no a opal_atomic_or or opal_atomic_and function just use cmpset for
now.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes some bugs uncovered during thread testing of
2.0.1rc1. With these fixes the component is running cleanly with
threads.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit changes the sematics of ompi request callbacks. If a
request's callback has freed or re-posted (using start) a request
the callback must return 1 instead of OMPI_SUCCESS. This indicates
to ompi_request_complete that the request should not be modified
further. This fixes a race condition in osc/pt2pt that could lead
to the req_state being inconsistent if a request is freed between
the callback and setting the request as complete.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The original lock_all algorithm in osc/pt2pt sent a lock message to
each peer in the communicator even if the peer is never the target of
an operation. Since this scales very poorly the implementation has
been replaced by one that locks the remote peer on first communication
after a call to MPI_Win_lock_all.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes an issue that can occur if a target gets overwhelmed with
requests. This can cause osc/pt2pt to go into deep recursion with a stack
like req_complete_cb -> ompi_osc_pt2pt_callback -> start -> req_complete_cb
-> ... . At small scale this is fine as the recursion depth stays small but
at larger scale we can quickly exhaust the stack processing frag requests.
To fix the issue the request callback now simply puts the request on a
list and returns. The osc/pt2pt progress function then handles the
processing and reposting of the request.
As part of this change osc/pt2pt can now post multiple fragment receive
requests per window. This should help prevent a target from being overwhelmed.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
It is possible for the start call to complete the requests. For this
reason the module rdma_frag field should be filled in before start is
called. If the request completes the completion callback will reset
the rdma_frag field to NULL. Fixes a bug discovered by @tkordenbrock.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit expands the OPAL_THREAD macros to include 32- and 64-bit
atomic swap. Additionally, macro declararations have been updated to
include both OPAL_THREAD_* and OPAL_ATOMIC_*. Before this commit the
former was used with add and the later with cmpset.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit fixes a bug in the RDMA compare-and-swap implementation
that caused the origin value to always be written even if the compare
should have failed.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Need to increment the total size after checking the local offset not
before. This typo causes large allocations with MPI_Win_allocate() to
fail.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* mpi/start: fix bugs in cm and ob1 start functions
There were several problems with the implementation of start in Open
MPI:
- There are no checks whatsoever on the state of the request(s)
provided to MPI_Start/MPI_Start_all. It is erroneous to provide an
active request to either of these calls. Since we are already
looping over the provided requests there is little overhead in
verifying that the request can be started.
- Both ob1 and cm were always throwing away the request on the
initial call to start and start_all with a particular
request. Subsequent calls would see that the request was
pml_complete and reuse it. This introduced a leak as the initial
request was never freed. Since the only pml request that can
be mpi complete but not pml complete is a buffered send the
code to reallocate the request has been moved. To detect that
a request is indeed mpi complete but not pml complete isend_init
in both cm and ob1 now marks the new request as pml complete.
- If a new request was needed the callbacks on the original request
were not copied over to the new request. This can cause osc/pt2pt
to hang as the incoming message callback is never called.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* osc/pt2pt: add request for gc after starting a new request
Starting a new receive may cause a recursive call into the pt2pt
frag receive function. If this happens and the prior request is
on the garbage collection list it could cause problems. This commit
moves the gc insert until after the new request has been posted.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
* Remodel the request.
Added the wait sync primitive and integrate it into the PML and MTL
infrastructure. The multi-threaded requests are now significantly
less heavy and less noisy (only the threads associated with completed
requests are signaled).
* Fix the condition to release the request.
This commit fixes a bad synchronization detection bug that occurs when
mixing MPI_Win_fence() and MPI_Win_lock(). If no communication has
occurred in the fence epoch it is safe to just clear the all_sync
object (it was set up by fence).
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a bug that occurs when ranks are either not mapped
evenly or by something other than core.
Fixesopen-mpi/ompi#1599
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Before this commit, a same PML tag may be used for distinct
communications for long messages. For example, consider a condition
where rank A calls ```MPI_PUT``` targeting rank B and rank B calls
```MPI_GET``` targeting rank A simultaneously.
A PML tag for the ```MPI_PUT``` is acquired on rank A and is used
for the long-message communication from rank A to rank B.
A PML tag for the ```MPI_GET``` is acquired on rank B and is used
for the long-message communication from rank A to rank B.
These two tags may become a same value because they are managed
independently on each rank. This will cause a data corruption.
This commit separates the tag used in a single RMA communication
call, one for communication from an origin to a target, and one
for communication from a target to an origin. A "base" tag
is acquired using ```get_tag``` function and PML tag is caluculated
from the base tag by ```tag_to_target``` and ```tag_to_origin```
function.
Fix CID 1324726 (#1 of 1): Free of address-of expression (BAD_FREE):
Indeed, if a lock conflicts with the lock_all we will end up trying to
free an invalid pointer.
Fix CID 1328826 (#1 of 1): Dereference after null check (FORWARD_NULL):
This was intentional but it would be a good idea to check for
module->comm being non_NULL to be safe. Also cleaned out some checks
for NULL before free().
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes several bugs identified by @ggouaillardet and MTT:
- Fix SEGV in long send completion caused by missing update to the
request callback data.
- Add an MPI_Barrier to the fence short-cut. This fixes potential
semantic issues where messages may be received before fence is
reached.
- Ensure fragments are flushed when using request-based RMA. This
allows MPI_Test/MPI_Wait/etc to work as expected.
- Restore the tag space back to 16-bits. It was intended that the
space be expanded to 32-bits but the required change to the
fragment headers was not committed. The tag space may be expanded
in a later commit.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes several bugs identified by a new multi-threaded RMA
benchmarking suite. The following bugs have been identified and fixed:
- The code that signaled the actual start of an access epoch changed
the eager_send_active flag on a synchronization object without
holding the object's lock. This could cause another thread waiting
on eager sends to block indefinitely because the entirety of
ompi_osc_pt2pt_sync_expected could exectute between the check of
eager_send_active and the conditon wait of
ompi_osc_pt2pt_sync_wait.
- The bookkeeping of fragments could get screwed up when performing
long put/accumulate operations from different threads. This was
caused by the fragment flush code at the end of both put and
accumulate. This code was put in place to avoid sending a large
number of unexpected messages to a peer. To fix the bookkeeping
issue we now 1) wait for eager sends to be active before stating
any large isend's, and 2) keep track of the number of large isends
associated with a fragment. If the number of large isends reaches
32 the active fragment is flushed.
- Use atomics to update the large receive/send tag counters. This
prevents duplicate tags from being used. The tag space has also
been updated to use the entire 16-bits of the tag space.
These changes should also fixopen-mpi/ompi#1299.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds code to handle large unaligned gets. There are two
possible code paths for these transactions:
1) The remote region and local region have the same alignment. In
this case the get will be broken down into at most three get
transactions: 1 transaction to get the unaligned start of the region
(buffered), 1 transaction to get the aligned portion of the region,
and 1 transaction to get the end of the region.
2) The remote and local regions do not have the same alignment. This
should be an uncommon case and is not optimized. In this case a
buffer is allocated and registered locally to hold the aligned data
from the remote region. There may be cases where this fails (low
memory, can't register memory). Those conditions are unlikely and
will be handled later.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
If atomics are not globally visible (cpu and nic atomics do not mix)
then a btl endpoint must be used to access local ranks. To avoid
issues that are caused by having the same region registered with
multiple handles osc/rdma was updated to always use the handle for
rank 0. There was a bug in the update that caused osc/rdma to continue
using the local endpoint for accessing the state even though the
pointer/handle are not valid for that endpoint. This commit fixes the
bug.
Fixesopen-mpi/ompi#1241.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Optimizing put aggregation in the presence of threads will require a
redesign of the code. For now just ensure that put aggregation is
turned off when MPI_THREAD_MULTIPLE is enabled.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
A bus error occurs in sm OSC under the following conditions.
- sparc64 or any other architectures which need strict alignment.
- `MPI_WIN_POST` or `MPI_WIN_START` is called for a window created
by sm OSC.
- The communicator size is odd and greater than 3.
The lines 283-285 in current `ompi/mca/osc/sm/osc_sm_component.c` has
the following code.
```c
module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base);
module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1);
module->posts[0] = (uint64_t *) (module->node_states + comm_size);
```
The size of `ompi_osc_sm_node_state_t` is multiples of 4 but not
multiples of 8. So if `comm_size` is odd, `module->posts[0]` does
not aligned to 8. This causes a bus error when accessing
`module->posts[i][j]`.
This patch fixes the alignment of `module->posts[0]` by setting
`module->posts[0]` first.
NOTE: Building with external pmix *requires* that you also build with external libevent and hwloc libraries. Detect this at configure and error out with large message if this requirement is violated.
Closes#1204 (replaces it)
Fixes#1064
A previous commit updated the one-sided code to register the state
region only once. This created an issue when using the scratch lock
with fetching atomics. In this case on any rank that isn't local rank
0 the module->state_handle is NULL. This commit fixes the issue by
removing the scratch lock and using a fragment pointer instead.
Fixesopen-mpi/ompi#1290
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
`MPI_WIN_TEST` must update the `flag` parameter to 0 when not all
origin processes called `MPI_WIN_COMPLETE`. But sm OSC doesn't.
If the caller initialize the `flag` argument to a non-0 value,
the caller will receive the non-0 `flag` value.
This commit changes the way ompi_proc_t's are retained/released by
ompi_group_t's. Before this change ompi_proc_t's were retained once
for the group and then once for each retain of a group. This method
adds unnecessary overhead (need to traverse the group list each time
the group is retained) and causes problems when using an async
add_procs.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
There were two bugs in osc/rdma when using threads:
- Deadlock is ompi_osc_rdma_start_atomic. This occurs because
ompi_osc_rdma_frag_alloc is called with the module lock. To fix the
issue the module lock is now recursive. In the future I will add a
new lock to protect just the current rdma fragment.
- Do not drop the lock in ompi_osc_rdma_frag_alloc when calling
ompi_osc_rdma_frag_complete. Not only is it not needed but dropping
the lock at this point can cause a competing thread to mess up the
state.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a bug that can occur on Cray Gemini networks. If
multiple registrations are used for the local state then we looks the
atomicity guarantees. To avoid issues like this use only a single
registration handle for all local state on a node.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes the following:
- CIDs 1328491, 1328492: Dead code caused by typos in a prior
commit.
- Fix the calculation of dynamic memory regions. This was causes
incorrect RMA range errors when accessing the last partial page of
an attachment.
- Fix a SEGV when using dynamic memory windows with local state (all
processes on the same node).
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a bug that occurs when a post message comes in when
sending complete messages or while waiting for all outgoing messages
to flush. In that case the post message might get incorrecly
associated with the ending sync object.
References open-mpi/ompi#1012
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes several bugs in the osc/rdma component:
- Complete aggregated requests immediately. Completion of RMA
requests indicates local completion anyway. This fixes a hang in
the c_reqops test.
- Correctly mark Rget_accumulate requests.
- Set the local base flag correctly on the local peer.
- Clear or set the no locks flag on the window if the value is
changed by MPI_Win_set_info.
- Actually update the target when using MPI_OP_REPLACE.
Fixesopen-mpi/ompi#1010
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes an issue identified by @rolfv. The local peer was
not being correctly initialized when running with a single process on
a node.
This fixesopen-mpi/ompi#1010
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Fix code paths that didn't convert the MPI datatype to the
corresponding Portals4 datatype.
Thanks to Nicolas Chevalier (@shawone) for finding this bug and
submitting a patch.
Fixed CID 1269712, 1269709, 1269706, 1269703, 1269694: Logically dead code
Remove extra NULL check as OMPI_OSC_PT2PT_REQUEST_ALLOC can never set the
request to NULL.
Fixes CID 1269668: Unchecked return value
False positive. Add (void) to indicate we do not care about the return code
from opal_hash_table_get_uint32.
Fixes CID 1324726: Free of address-of expression
Do not free lock if it was not allocated.
Fixes CID 1269658: Free of address-of expression
Never will happen but because op is always a built-in op there is no
reason to retain/release it anyway.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
In the default mode of operation, the Portals4 components support
dynamic add_procs().
The Portals4 components have two alternate modes (flow control and
logical-to-physical) that require knowledge of all procs at startup.
In these modes, mtl-portals4 sets the MCA_MTL_BASE_FLAG_REQUIRE_WORLD
flag and btl-portals4 sets the MCA_BTL_FLAGS_SINGLE_ADD_PROCS flag
to tell the PML that we need all the procs in one add_procs() call.
The osc/sm component was using a simple counter to determine if all
expected posts had arrived to start a PSCW access epoch. This is
incorrect as a post may arrive from a peer that isn't part of the
current start group. There are many ways this could have been fixed.
This commit adds an n^2 bitmap. When a process posts it sets a bit in
the bitmap associated with the access rank to indicate the post is
complete. The access rank checks for and clears the bits associated
with all the processes in the start group.
The bitmap requires comm_size ^ 2 bits of space. This should be
managable as most nodes have relatively small numbers of processes. If
this changes another algorigthm can be implemented.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Fix CID 1324733: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324734: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324735: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324736: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324737: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324751: Memory - illegal accesses (USE_AFTER_FREE)
Fix CID 1324750: (USE_AFTER_FREE)
Fix CID 1324749: Memory - corruptions (USE_AFTER_FREE)
Fix CID 1324748: Memory - illegal accesses (USE_AFTER_FREE)
Fix CID 1324747: (USE_AFTER_FREE)
Fix CID 1324746: Memory - corruptions (USE_AFTER_FREE)
Add missing return on an error path.
Fix CID 1324745: Code maintainability issues (UNUSED_VALUE)
Ignore return code from barrier. It was not being used anyway.
Fix CID 1324738: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324741: Null pointer dereferences (REVERSE_INULL)
module->selected_btl can not be NULL in osc/rdma during normal
operation. Removed the unnecessary NULL check.
Fix CID 1324752: Memory - illegal accesses (USE_AFTER_FREE)
Move ompi_osc_pt2pt_module_lock_remove to before the lock is freed.
Fix CID 1324744: Uninitialized variables (UNINIT)
Fix CID 1324743: Uninitialized variables (UNINIT)
This array is not used unitialized but there is no reason not to use
calloc here to silence the warning.
The following CID is a false positive: 1324742. I will mark it such in
coverity.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds support for performing one-sided operations over
supported hardware (currently Infiniband and Cray Gemini/Aries). This
component is still undergoing active development.
Current features:
- Use network atomic operations (fadd, cswap) for implementing
locking and PSCW synchronization.
- Aggregate small contiguous puts.
- Reduced memory footprint by storing window data (pointer, keys,
etc) at the lowest rank on each node. The data is fetched as each
process needs to communicate with a new peer. This is a trade-off
between the performance of the first operation on a peer and the
memory utilization of a window.
TODO:
- Add support for the accumulate_ops info key. If it is known that
the same op or same op/no op is used it may be possible to use
hardware atomics for fetch-and-op and compare-and-swap.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit updates osc/pt2pt to allocate peer object as they are
needed rather than all at once. Additionally, to help improve the
memory footprint a new synchronization structure has been added.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit modifies the ompi_group_t union/difference code to compare/copy the
raw group values. This will either be a ompi_proc_t or a sentinel value. This
commit also adds helper functions to convert between opal process names and
sentinel values.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
- MPI_Compare_and_swap
- MPI_Fetch_and_op
- MPI_Raccumulate
- MPI_Win_detach
Thanks to Michael Knobloch and Takahiro Kawashima for bringing this
to our attention
Portals4 supports atomic ops on datatypes less than or equal to
max_fetch_atomic_size bytes. This commit fixes a bug that required
the datatype to be less than max_fetch_atomic_size bytes.
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
In days past, some implementations of Portals4 could not cover all
of memory with a single Memory Descriptor so multiple large
overlapping Memory Descriptors were created. Because none of the
current implementations have this limitation (and no future
implementations should either), this commit removes the overlapping
Memory Descriptors code.
This commit fixes a bug identified by MTT that occurred when mixing
passive and active target synchronization. The bugs fixed in this
commit are:
- Do not update incoming fragment counts for any type of unbuffered
control message. These messages are out-of-band and should not be
considered towards the signal counts.
- Complete a change from using received counts to expected counts for
lock, unlock, and flush acks. Part of the change made it into
master before the rest was ready. This was preventing wakeups in
some cases.
- Turn the passive_target_access_epoch module member into a
counter. As long as at least one peer is locked we are in a
passive-target epoch and not an active target one. This fix will
ensure that fragment flags are set appropriately.
fixes#538
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Based on some on-list and IM discussion with @hjelmn about
open-mpi/ompi@40b7643119, change the testing to a switch/case. If we
fall into the default case, assert() error (because it's an OMPI
developer programming error).
The fragment flush code tries to send the active fragment before
sending any queued fragments. This could cause osc messages to arrive
out-of-order at the target (bad). Ensure ordering by alway sending
the active fragment after sending queued fragments.
This commit also fixes a bug when a synchronization message (unlock,
flush, complete) can not be packed at the end of an existing active
fragment. In this case the source process will end up sending 1 more
fragment than claimed in the synchronization message. To fix the issue
a check has been added that fixes the fragment count if this situation
is detected.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.
All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit fixes a bug in osc/pt2pt which causes MPI_Win_unlock_all
to hang. The problem was caused by code refactoring that moved the
unlock of the local process to after the loop that waits for unlock
acks. This will cause the code to loop forever waiting on the self
ack.
Fixes#444
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.
This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.
Notes:
OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds an owner file in each of the component directories
for each framework. This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page. Currently there are two
"fields" in the file, an owner and a status. A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
Commit open-mpi/ompi@1a3597aam changed the type of the `convertor`
variable from `ompi_osc_base_convertor_t` (which contained an
`opal_convertor_t`) to an `opal_convertor_t`. Hence, using memchecker
to ensure that the inner convertor of the `ompi_osc_base_convertor_t`
is considered initialized is now unnecessary.
With certain datatypes the opal_datatype_unpack method for performing
the accumulate operation does not work. This commit modifies the
accumulate code in the osc base to use opal_convertor_raw instead.
Fixes#385
The ompi_osc_signal_outgoing was moved from ompi_osc_rdma_frag_start to frag_send
which gave correct results for the bug reproducer but hangs with simple OSC
tests. Moved the ompi_osc_signal_outgoing back and it now passes all tests.
Closes#256
Some of the counters used by the "rdma" one-sided component are intended
to overflow. Since overflow behavior is undefined for signed integers in
C it is safer to use unsigned integers here.
osc/rdma uses counters to determine if all messages have been received
before exiting synchronization calls. The problem is that the active
target counter is always increasing (never zeroed). If over 2^31-1
messages are sent this causes the counter to overflow (in itself this
isn't an error). This causes test/wait to return before the communication
is complete. There is an additional error in the use of the fragment
flush function. If PSCW synchronization is in use this function CAN NOT
be called unless a post message has arrived.
Relevant mailing list thread: http://www.open-mpi.org/community/lists/devel/2014/10/16016.php
This commit fixes both issues. Tested against MTT and issue reproducer.
Closes#224.
Initialize the blocking_fence flag to false as the code logic indicates that it should only be set if someone provides that flag.
Thanks to Lisandro Dalcin for reporting it
cmr=v1.8.4:reviewer=hjelmn
This commit was SVN r32812.
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.
- Portals4/OSC was unable to acquire an exclusive lock due to an invalid
local address in the atomic operation. This caused the reported hang.
- After fixing the hang, the test continued to fail because
ompi_datatype_is_contiguous_memory_layout() reports that MPI_EMPTY (the
origin datatype) is noncontiguous and Portals4/OSC does not support
noncontiguous datatypes at this time. However, in this case the origin
count is zero so contiguous/noncontiguous is irrelevant. Now we skip
the contiguous check if the count is zero.
cmr=v1.8.3:reviewer=regrant:subject=Fix for "Portals4/MTL hangs in c_get_accumulate test"
This commit was SVN r32295.
The following Trac tickets were found above:
Ticket 4662 --> https://svn.open-mpi.org/trac/ompi/ticket/4662
This commit adds a check to see if the target is in an access epoch. If
not we return OMPI_ERR_RMA_SYNC. This fixes test_start3 in the onesided
test suite. The cost of this extra check is 1 byte/peer for the boolean
flag indicating that the peer is in an access epoch.
I also fixed a problem where mupliple unexpected post messages are not
correctly handled.
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r32160.
cmr=v1.8.2:reviewer=tkordenbrock:subject=Portals4/MTL hanging fix
This commit was SVN r32113.
The following Trac tickets were found above:
Ticket 4681 --> https://svn.open-mpi.org/trac/ompi/ticket/4681
cmr=v1.8.2:reviewer=tkordenbrock:subject=Move r32112 to v1.8.2 branch
This commit was SVN r32112.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r32112
The following Trac tickets were found above:
Ticket 4682 --> https://svn.open-mpi.org/trac/ompi/ticket/4682
The post and start window calls are supposed to be matching. The code
did not check to see that an incoming post matched with the start call.
This commit fixes the bug by placing the post on a pending list that
will be checked by the next call to start.
cmr=v1.8.2:reviewer=dgoodell
This commit was SVN r32017.
The replace callback did not increment the incoming frag counter. This
leads to a hang during synchronization. This commit adds the increment
and also puts the request on the garbage collection list to fix a leak.
This fixes a hang found when running the mpich test suite.
cmr=v1.8.2:reviewer=bbenton
This commit was SVN r32016.
The wrong type was used when calculating the amount of space needed
for an accumulate fragment. Fixed the calculation and took the
opportunity to eliminate the get_acc header as it is identical to the
acc header.
This fixes trac:4719 and #4718
Tracking these fixes for 1.8.2 in this CMR.
Throwing this to Brad for review as he is the one who ran into the issue.
cmr=v1.8.2:reviewer=bbenton
This commit was SVN r32015.
The following Trac tickets were found above:
Ticket 4719 --> https://svn.open-mpi.org/trac/ompi/ticket/4719
in self optimization
This addresses an issue found with the MPICH pscw_ordering test. Eager sends
were not yet active (which is ok for the standard path) but not ok for the
self optimization. Fixed by waiting for all post messages before checking
the sync state.
Fixes trac:4724
Tracking the 1.8.2 issue in this CMR.
cmr=v1.8.2:reviewer=bbenton
This commit was SVN r32012.
The following Trac tickets were found above:
Ticket 4724 --> https://svn.open-mpi.org/trac/ompi/ticket/4724
Brad correctly pointed out that the total window size should not be an
int. Changed it to an unsigned long.
cmr=v1.8.2:reviewer=bbenton
This commit was SVN r32010.
Only one field is valid for RMA requests: MPI_ERROR. This field is set
to the correct value in ompi_request_empty so there is no reason to
allocate and keep track of osc/sm requests because they are always
complete on return. Since we are no longer using the osc/sm request
structure or free list they are now removed.
Closes trac:4723
Tracking this issue with the CMR. Brad, can you verify the issue is indeed fixed.
cmr=v1.8.2:reviewer=bbenton
This commit was SVN r32009.
The following Trac tickets were found above:
Ticket 4723 --> https://svn.open-mpi.org/trac/ompi/ticket/4723
This commit fixes a bug that can cause request and communicator leaks
when cleaning up an OSC window. The should prevent a hang seen with
IMB-EXT.
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r31539.
When sending PUT_LONG, the data is sent before headers, and sometimes
the header is not flushed immediately. This creates a lot of unexpected
receives in the peer, since it would posts a receive only when gets the
header, which makes it run out of receive buffers. When the sender
eventually flushes the window, the receiver already has no buffers to
receive the header, which causes a deadlock.
The fix is to always flush the headers when doing put_long.
cmr=v1.8.1:reviewer=hjelmn
This commit was SVN r31378.
While testing one-sided on LANL systems I found a couple more OSC
bugs that were not caught during the initial testing:
- In the passive target code we read the read lock count as a
char instead of the intended uint32_t. This causes lock to
lockup when using shared locks after 127 iterations.
- The post code used the wrong group when trying to increment post
counters. This causes a segmentation fault.
- Both the post and wait code used the wrong check in the inner
loop leading to an infinite loop.
cmr=v1.8.1:reviewer=jsquyres
This commit was SVN r31354.
There was a typo in the ompi_osc_gacc_long_start that was causing a
segmentation fault when executing long get accumulate operations.
cmr=v1.8.1:reviewer=jsquyres
This commit was SVN r31353.
The last fix prevented a hang but had some cases where the results were
wrong. Fixed. Tested with armci, openmpi/ibm, openmpi/onesided.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31284.
It might be possible (don't know) for a datatype to made of a contiguous block
of a primitive datatype and have an lb. If this is ever the case the code
would have done the wrong thing. Add the lb in to be safe.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31283.
There are differences between how active and passive messages are
accounted for in this component. Active message counts on the sender
side are set to zero before the control message is sent so we do not
have to add one to the expected number of messages or we end up
double counting the control message. This commit should fix that error.
Fixes regression in one-sided/test_rma1
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31281.
This fixes more issues identified by armci. More issues still remain and fixes are
coming for those as well.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31272.
the case fix in ompi_osc_base_process_op in r31204.
There are two cases that needed to be handled:
- The target is a simple datatype (contiguous block of a primitive
type) but the origin is not. In this case we still need to pack
the origin data but we can not rely on the convertor to do the
unpack (see r31204).
- Both the origin and target datatypes are simple datatypes. In this
case we can use ompi_op_reduce to do the accumulation without having
to pack the origin data.
cmr=v1.8:ticket=trac:4449
This commit was SVN r31231.
The following SVN revision numbers were found above:
r31204 --> open-mpi/ompi@949abe45cd
The following Trac tickets were found above:
Ticket 4449 --> https://svn.open-mpi.org/trac/ompi/ticket/4449
Fixed two bugs:
- Use module->comm NOT comm to get the CID for the shared memory backing
file. This fixes the case where there are multiple shared memory windows
at the same time.
- Remember to unlink the shared memory backing file.
Refs trac:4438
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31227.
The following Trac tickets were found above:
Ticket 4438 --> https://svn.open-mpi.org/trac/ompi/ticket/4438
This commit fixes two bugs:
- We were not correctly setting the lock type in the outstanding lock
for lock_all. This caused undefined behavior.
- flush_all was incorrectly checking for comm size - 1 lock acks but
comm size flush acks. This is the reverse of what was intended.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31226.
It is possible to get into a situation where a small accumulate operation
can not be completed because a large accumulate operation holds the lock.
In this case we may return from wait/flush/etc before the operation is
complete. To handle this case increment the expected incoming fragment
count when queuing an accumulate operation and increment the incoming
fragment count after processing the accumulate operation.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31224.
of the primitive datatype
In this case we can not use the convertor to run the accumulate operation
since the datatype is a more or less a primitive type.
cmr=v1.8:ticket=trac:4449
This commit was SVN r31222.
The following Trac tickets were found above:
Ticket 4449 --> https://svn.open-mpi.org/trac/ompi/ticket/4449
This commit fixes two issues:
- osc/rdma: The target side of an accumulate was using the target datatype
in the receive to the packed buffer. This was conflicting with the way
the reduction is done into the target buffer. Changed the receive to use
the primitive datatype.
- osc/base: The copy table was completely wrong. Fixed the table to match
the underlying datatypes (which are opal not ompi datatypes).
- osc/base: There is a problem using the optimized description. Fall back
on using the non-optimized description until we can understand what is
going wrong.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31204.
results
The code to handle completion messages did not correctly increment the
number of expected messages. This could cause wait to return before all
incoming messages are complete.
I also added a check to ensure that start returns an error if we are in
a passive access epoch.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31203.
This commit adds large datatype description support to the osc/rdma
component. Support is provided by an additional send/recv of the datatype
description if the description does not fit in an eager buffer. The
code is designed to require minimal new code and not for speed. We
consider this code path to be a slow path.
Refs trac:1905
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31197.
The following Trac tickets were found above:
Ticket 1905 --> https://svn.open-mpi.org/trac/ompi/ticket/1905
It is not valid to call flush outside a passive target epoch nor is
it valid to call lock/lock_all when no_locks is set. In the former
we were just semantically incorrect and the later would crash and
burn.
cmr=v1.7.5:ticket=trac:4382
This commit was SVN r31046.
The following Trac tickets were found above:
Ticket 4382 --> https://svn.open-mpi.org/trac/ompi/ticket/4382
This fixes a bug in r31029 which removes the use of the pml base request
(also not a good way since cm doesn't use the base request). We now allocate
a data structure (ugh) to determine the needed information. Tested with
mtt/onesided.
cmr=v1.7.5:ticket=trac:4379
This commit was SVN r31044.
The following SVN revision numbers were found above:
r31029 --> open-mpi/ompi@29e00f9161
The following Trac tickets were found above:
Ticket 4379 --> https://svn.open-mpi.org/trac/ompi/ticket/4379
- Return an error if the caller specified both MPI_MODE_NOPRECEDE and
MPI_MODE_NOSUCCEED to MPI_Win_fence.
- Return an error if the caller attempts to enter an active target
epoch while already in a passive target epoch.
- End an active target epoch if MPI_Win_fence is called with
MPI_MODE_NOSUCCEED.
cmr=v1.7.5:ticket=trac:4382
This commit was SVN r31043.
The following Trac tickets were found above:
Ticket 4382 --> https://svn.open-mpi.org/trac/ompi/ticket/4382
need to enable the access epoch in MPI_Win_fence.
I missed this change when I fixed the semantics of MPI_Win_create. With
this commit our one-sided MTT runs are now running clean.
cmr=v1.7.5:reviewer=dgoodell
This commit was SVN r31041.
It seems we can't release accumulate buffers in completion callbacks
because the btls don't release registration resources until after the
callback has fired. The fix is to keep track of the unused buffers and
free them later. This should resolve issues when running IMB-EXT and
IMB-RMA.
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r31029.
Dave Goodell correctly pointed out that it is unusual to return MPI
error classes from internal ompi functions. Correct this in the RMA
case by adding an internal error code to match MPI_ERR_RMA_SYNC.
This does change OMPI_ERR_MAX. I don't think this will cause any
problems with ABI.
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r31012.
This commit resolves a number of crashed discovered my the onesided
tests in MTT. The functions in question were operating on the assumption
the user was calling RMA functions correctly.
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r31008.
The datatype unpacking code assumes that the packed datatype buffer has the
same alignment as an OPAL_PTRDIFF_TYPE. This was not enforced by the rdma
one-sided component. I changed the ordering and sized of various osc/rdma
headers to ensure their sizes are a multiple of 8-bytes and modified the
fragment allocation call to ensure all headers are 8-byte aligned. While
not the cleanest way to handle this situation it should resolve the issue.
Fixes trac:4315
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r30974.
The following Trac tickets were found above:
Ticket 4315 --> https://svn.open-mpi.org/trac/ompi/ticket/4315