Commit open-mpi/ompi@1a3597aam changed the type of the `convertor`
variable from `ompi_osc_base_convertor_t` (which contained an
`opal_convertor_t`) to an `opal_convertor_t`. Hence, using memchecker
to ensure that the inner convertor of the `ompi_osc_base_convertor_t`
is considered initialized is now unnecessary.
With certain datatypes the opal_datatype_unpack method for performing
the accumulate operation does not work. This commit modifies the
accumulate code in the osc base to use opal_convertor_raw instead.
Fixes#385
The ompi_osc_signal_outgoing was moved from ompi_osc_rdma_frag_start to frag_send
which gave correct results for the bug reproducer but hangs with simple OSC
tests. Moved the ompi_osc_signal_outgoing back and it now passes all tests.
Closes#256
Some of the counters used by the "rdma" one-sided component are intended
to overflow. Since overflow behavior is undefined for signed integers in
C it is safer to use unsigned integers here.
osc/rdma uses counters to determine if all messages have been received
before exiting synchronization calls. The problem is that the active
target counter is always increasing (never zeroed). If over 2^31-1
messages are sent this causes the counter to overflow (in itself this
isn't an error). This causes test/wait to return before the communication
is complete. There is an additional error in the use of the fragment
flush function. If PSCW synchronization is in use this function CAN NOT
be called unless a post message has arrived.
Relevant mailing list thread: http://www.open-mpi.org/community/lists/devel/2014/10/16016.php
This commit fixes both issues. Tested against MTT and issue reproducer.
Closes#224.
Initialize the blocking_fence flag to false as the code logic indicates that it should only be set if someone provides that flag.
Thanks to Lisandro Dalcin for reporting it
cmr=v1.8.4:reviewer=hjelmn
This commit was SVN r32812.
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.
- Portals4/OSC was unable to acquire an exclusive lock due to an invalid
local address in the atomic operation. This caused the reported hang.
- After fixing the hang, the test continued to fail because
ompi_datatype_is_contiguous_memory_layout() reports that MPI_EMPTY (the
origin datatype) is noncontiguous and Portals4/OSC does not support
noncontiguous datatypes at this time. However, in this case the origin
count is zero so contiguous/noncontiguous is irrelevant. Now we skip
the contiguous check if the count is zero.
cmr=v1.8.3:reviewer=regrant:subject=Fix for "Portals4/MTL hangs in c_get_accumulate test"
This commit was SVN r32295.
The following Trac tickets were found above:
Ticket 4662 --> https://svn.open-mpi.org/trac/ompi/ticket/4662
This commit adds a check to see if the target is in an access epoch. If
not we return OMPI_ERR_RMA_SYNC. This fixes test_start3 in the onesided
test suite. The cost of this extra check is 1 byte/peer for the boolean
flag indicating that the peer is in an access epoch.
I also fixed a problem where mupliple unexpected post messages are not
correctly handled.
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r32160.
cmr=v1.8.2:reviewer=tkordenbrock:subject=Portals4/MTL hanging fix
This commit was SVN r32113.
The following Trac tickets were found above:
Ticket 4681 --> https://svn.open-mpi.org/trac/ompi/ticket/4681
cmr=v1.8.2:reviewer=tkordenbrock:subject=Move r32112 to v1.8.2 branch
This commit was SVN r32112.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r32112
The following Trac tickets were found above:
Ticket 4682 --> https://svn.open-mpi.org/trac/ompi/ticket/4682
The post and start window calls are supposed to be matching. The code
did not check to see that an incoming post matched with the start call.
This commit fixes the bug by placing the post on a pending list that
will be checked by the next call to start.
cmr=v1.8.2:reviewer=dgoodell
This commit was SVN r32017.
The replace callback did not increment the incoming frag counter. This
leads to a hang during synchronization. This commit adds the increment
and also puts the request on the garbage collection list to fix a leak.
This fixes a hang found when running the mpich test suite.
cmr=v1.8.2:reviewer=bbenton
This commit was SVN r32016.
The wrong type was used when calculating the amount of space needed
for an accumulate fragment. Fixed the calculation and took the
opportunity to eliminate the get_acc header as it is identical to the
acc header.
This fixes trac:4719 and #4718
Tracking these fixes for 1.8.2 in this CMR.
Throwing this to Brad for review as he is the one who ran into the issue.
cmr=v1.8.2:reviewer=bbenton
This commit was SVN r32015.
The following Trac tickets were found above:
Ticket 4719 --> https://svn.open-mpi.org/trac/ompi/ticket/4719
in self optimization
This addresses an issue found with the MPICH pscw_ordering test. Eager sends
were not yet active (which is ok for the standard path) but not ok for the
self optimization. Fixed by waiting for all post messages before checking
the sync state.
Fixes trac:4724
Tracking the 1.8.2 issue in this CMR.
cmr=v1.8.2:reviewer=bbenton
This commit was SVN r32012.
The following Trac tickets were found above:
Ticket 4724 --> https://svn.open-mpi.org/trac/ompi/ticket/4724
Brad correctly pointed out that the total window size should not be an
int. Changed it to an unsigned long.
cmr=v1.8.2:reviewer=bbenton
This commit was SVN r32010.
Only one field is valid for RMA requests: MPI_ERROR. This field is set
to the correct value in ompi_request_empty so there is no reason to
allocate and keep track of osc/sm requests because they are always
complete on return. Since we are no longer using the osc/sm request
structure or free list they are now removed.
Closes trac:4723
Tracking this issue with the CMR. Brad, can you verify the issue is indeed fixed.
cmr=v1.8.2:reviewer=bbenton
This commit was SVN r32009.
The following Trac tickets were found above:
Ticket 4723 --> https://svn.open-mpi.org/trac/ompi/ticket/4723
This commit fixes a bug that can cause request and communicator leaks
when cleaning up an OSC window. The should prevent a hang seen with
IMB-EXT.
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r31539.
When sending PUT_LONG, the data is sent before headers, and sometimes
the header is not flushed immediately. This creates a lot of unexpected
receives in the peer, since it would posts a receive only when gets the
header, which makes it run out of receive buffers. When the sender
eventually flushes the window, the receiver already has no buffers to
receive the header, which causes a deadlock.
The fix is to always flush the headers when doing put_long.
cmr=v1.8.1:reviewer=hjelmn
This commit was SVN r31378.
While testing one-sided on LANL systems I found a couple more OSC
bugs that were not caught during the initial testing:
- In the passive target code we read the read lock count as a
char instead of the intended uint32_t. This causes lock to
lockup when using shared locks after 127 iterations.
- The post code used the wrong group when trying to increment post
counters. This causes a segmentation fault.
- Both the post and wait code used the wrong check in the inner
loop leading to an infinite loop.
cmr=v1.8.1:reviewer=jsquyres
This commit was SVN r31354.
There was a typo in the ompi_osc_gacc_long_start that was causing a
segmentation fault when executing long get accumulate operations.
cmr=v1.8.1:reviewer=jsquyres
This commit was SVN r31353.
The last fix prevented a hang but had some cases where the results were
wrong. Fixed. Tested with armci, openmpi/ibm, openmpi/onesided.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31284.
It might be possible (don't know) for a datatype to made of a contiguous block
of a primitive datatype and have an lb. If this is ever the case the code
would have done the wrong thing. Add the lb in to be safe.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31283.
There are differences between how active and passive messages are
accounted for in this component. Active message counts on the sender
side are set to zero before the control message is sent so we do not
have to add one to the expected number of messages or we end up
double counting the control message. This commit should fix that error.
Fixes regression in one-sided/test_rma1
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31281.
This fixes more issues identified by armci. More issues still remain and fixes are
coming for those as well.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31272.
the case fix in ompi_osc_base_process_op in r31204.
There are two cases that needed to be handled:
- The target is a simple datatype (contiguous block of a primitive
type) but the origin is not. In this case we still need to pack
the origin data but we can not rely on the convertor to do the
unpack (see r31204).
- Both the origin and target datatypes are simple datatypes. In this
case we can use ompi_op_reduce to do the accumulation without having
to pack the origin data.
cmr=v1.8:ticket=trac:4449
This commit was SVN r31231.
The following SVN revision numbers were found above:
r31204 --> open-mpi/ompi@949abe45cd
The following Trac tickets were found above:
Ticket 4449 --> https://svn.open-mpi.org/trac/ompi/ticket/4449
Fixed two bugs:
- Use module->comm NOT comm to get the CID for the shared memory backing
file. This fixes the case where there are multiple shared memory windows
at the same time.
- Remember to unlink the shared memory backing file.
Refs trac:4438
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31227.
The following Trac tickets were found above:
Ticket 4438 --> https://svn.open-mpi.org/trac/ompi/ticket/4438
This commit fixes two bugs:
- We were not correctly setting the lock type in the outstanding lock
for lock_all. This caused undefined behavior.
- flush_all was incorrectly checking for comm size - 1 lock acks but
comm size flush acks. This is the reverse of what was intended.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31226.
It is possible to get into a situation where a small accumulate operation
can not be completed because a large accumulate operation holds the lock.
In this case we may return from wait/flush/etc before the operation is
complete. To handle this case increment the expected incoming fragment
count when queuing an accumulate operation and increment the incoming
fragment count after processing the accumulate operation.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31224.
of the primitive datatype
In this case we can not use the convertor to run the accumulate operation
since the datatype is a more or less a primitive type.
cmr=v1.8:ticket=trac:4449
This commit was SVN r31222.
The following Trac tickets were found above:
Ticket 4449 --> https://svn.open-mpi.org/trac/ompi/ticket/4449
This commit fixes two issues:
- osc/rdma: The target side of an accumulate was using the target datatype
in the receive to the packed buffer. This was conflicting with the way
the reduction is done into the target buffer. Changed the receive to use
the primitive datatype.
- osc/base: The copy table was completely wrong. Fixed the table to match
the underlying datatypes (which are opal not ompi datatypes).
- osc/base: There is a problem using the optimized description. Fall back
on using the non-optimized description until we can understand what is
going wrong.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31204.
results
The code to handle completion messages did not correctly increment the
number of expected messages. This could cause wait to return before all
incoming messages are complete.
I also added a check to ensure that start returns an error if we are in
a passive access epoch.
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31203.
This commit adds large datatype description support to the osc/rdma
component. Support is provided by an additional send/recv of the datatype
description if the description does not fit in an eager buffer. The
code is designed to require minimal new code and not for speed. We
consider this code path to be a slow path.
Refs trac:1905
cmr=v1.8:reviewer=jsquyres
This commit was SVN r31197.
The following Trac tickets were found above:
Ticket 1905 --> https://svn.open-mpi.org/trac/ompi/ticket/1905
It is not valid to call flush outside a passive target epoch nor is
it valid to call lock/lock_all when no_locks is set. In the former
we were just semantically incorrect and the later would crash and
burn.
cmr=v1.7.5:ticket=trac:4382
This commit was SVN r31046.
The following Trac tickets were found above:
Ticket 4382 --> https://svn.open-mpi.org/trac/ompi/ticket/4382
This fixes a bug in r31029 which removes the use of the pml base request
(also not a good way since cm doesn't use the base request). We now allocate
a data structure (ugh) to determine the needed information. Tested with
mtt/onesided.
cmr=v1.7.5:ticket=trac:4379
This commit was SVN r31044.
The following SVN revision numbers were found above:
r31029 --> open-mpi/ompi@29e00f9161
The following Trac tickets were found above:
Ticket 4379 --> https://svn.open-mpi.org/trac/ompi/ticket/4379
- Return an error if the caller specified both MPI_MODE_NOPRECEDE and
MPI_MODE_NOSUCCEED to MPI_Win_fence.
- Return an error if the caller attempts to enter an active target
epoch while already in a passive target epoch.
- End an active target epoch if MPI_Win_fence is called with
MPI_MODE_NOSUCCEED.
cmr=v1.7.5:ticket=trac:4382
This commit was SVN r31043.
The following Trac tickets were found above:
Ticket 4382 --> https://svn.open-mpi.org/trac/ompi/ticket/4382
need to enable the access epoch in MPI_Win_fence.
I missed this change when I fixed the semantics of MPI_Win_create. With
this commit our one-sided MTT runs are now running clean.
cmr=v1.7.5:reviewer=dgoodell
This commit was SVN r31041.
It seems we can't release accumulate buffers in completion callbacks
because the btls don't release registration resources until after the
callback has fired. The fix is to keep track of the unused buffers and
free them later. This should resolve issues when running IMB-EXT and
IMB-RMA.
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r31029.
Dave Goodell correctly pointed out that it is unusual to return MPI
error classes from internal ompi functions. Correct this in the RMA
case by adding an internal error code to match MPI_ERR_RMA_SYNC.
This does change OMPI_ERR_MAX. I don't think this will cause any
problems with ABI.
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r31012.
This commit resolves a number of crashed discovered my the onesided
tests in MTT. The functions in question were operating on the assumption
the user was calling RMA functions correctly.
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r31008.
The datatype unpacking code assumes that the packed datatype buffer has the
same alignment as an OPAL_PTRDIFF_TYPE. This was not enforced by the rdma
one-sided component. I changed the ordering and sized of various osc/rdma
headers to ensure their sizes are a multiple of 8-bytes and modified the
fragment allocation call to ensure all headers are 8-byte aligned. While
not the cleanest way to handle this situation it should resolve the issue.
Fixes trac:4315
cmr=v1.7.5:reviewer=jsquyres
This commit was SVN r30974.
The following Trac tickets were found above:
Ticket 4315 --> https://svn.open-mpi.org/trac/ompi/ticket/4315
assumes the send request is derived from mca_pml_base_send_request_t,
but this is not true for pml cm, so we end up freeing invalid pointer.
We cannot take the data pointer from the pml send request, so we pass
the allocated buffer pointer in req_complete_cb_data, and put the
osc_rdma_module pointer in that buffer as well.
Previously, osc_pt2pt was used with pml_cm which didn't have this
problem.
cmr=v1.7.5:reviewer=ompi-rm1.7
This commit was SVN r30967.
- Fix several typos is osc/rdma.
- Fix a locking issue in osc/sm that was caused by an incorrect
assumption about the semantics of opal_atomic_add_32.
- Always unlock the accumulation lock in osc/sm.
- The base of a processes shared memory window should be NULL if
the size is zero. Fixed.
cmr=v1.7.5:ticket=trac:4304
This commit was SVN r30853.
The following Trac tickets were found above:
Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi. This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.
This commit was SVN r30140.
configure-time dynamic allocation of flags. The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process. Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.
This commit was SVN r29100.
osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_sendreq_recv_accum_long_cb':
osc_pt2pt_data_move.c:643:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
osc_rdma_data_move.c: In function 'ompi_osc_rdma_control_send_cb':
osc_rdma_data_move.c:1312:37: warning: variable 'header' set but not used [-Wunused-but-set-variable]
This commit was SVN r29092.
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.
The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.
This commit was SVN r28722.
Notes:
- This commit also eliminates the need for an available components list in use
in several frameworks. None of the code in question was making use of the
priority field of the priority component list item so these extra lists were
removed.
- Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
- Cleans up the ompi/orte-info functions. Expose the functions that construct the
list of params so they can be used elsewhere.
patches for mtl/portals4 from brian
missed a few output variables in openib
This commit was SVN r28241.
Features:
- Support for an override parameter file (openmpi-mca-param-override.conf).
Variable values in this file can not be overridden by any file or environment
value.
- Support for boolean, unsigned, and unsigned long long variables.
- Support for true/false values.
- Support for enumerations on integer variables.
- Support for MPIT scope, verbosity, and binding.
- Support for command line source.
- Support for setting variable source via the environment using
OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
- Cleaner API.
- Support for variable groups (equivalent to MPIT categories).
Notes:
- Variables must be created with a backing store (char **, int *, or bool *)
that must live at least as long as the variable.
- Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
mca_base_var_set_value() to change the value.
- String values are duplicated when the variable is registered. It is up to
the caller to free the original value if necessary. The new value will be
freed by the mca_base_var system and must not be freed by the user.
- Variables with constant scope may not be settable.
- Variable groups (and all associated variables) are deregistered when the
component is closed or the component repository item is freed. This
prevents a segmentation fault from accessing a variable after its component
is unloaded.
- After some discussion we decided we should remove the automatic registration
of component priority variables. Few component actually made use of this
feature.
- The enumerator interface was updated to be general enough to handle
future uses of the interface.
- The code to generate ompi_info output has been moved into the MCA variable
system. See mca_base_var_dump().
opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system
This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.
This commit was SVN r28236.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.
This commit was SVN r27570.