Fix CID 1324726 (#1 of 1): Free of address-of expression (BAD_FREE):
Indeed, if a lock conflicts with the lock_all we will end up trying to
free an invalid pointer.
Fix CID 1328826 (#1 of 1): Dereference after null check (FORWARD_NULL):
This was intentional but it would be a good idea to check for
module->comm being non_NULL to be safe. Also cleaned out some checks
for NULL before free().
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes several bugs identified by @ggouaillardet and MTT:
- Fix SEGV in long send completion caused by missing update to the
request callback data.
- Add an MPI_Barrier to the fence short-cut. This fixes potential
semantic issues where messages may be received before fence is
reached.
- Ensure fragments are flushed when using request-based RMA. This
allows MPI_Test/MPI_Wait/etc to work as expected.
- Restore the tag space back to 16-bits. It was intended that the
space be expanded to 32-bits but the required change to the
fragment headers was not committed. The tag space may be expanded
in a later commit.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes several bugs identified by a new multi-threaded RMA
benchmarking suite. The following bugs have been identified and fixed:
- The code that signaled the actual start of an access epoch changed
the eager_send_active flag on a synchronization object without
holding the object's lock. This could cause another thread waiting
on eager sends to block indefinitely because the entirety of
ompi_osc_pt2pt_sync_expected could exectute between the check of
eager_send_active and the conditon wait of
ompi_osc_pt2pt_sync_wait.
- The bookkeeping of fragments could get screwed up when performing
long put/accumulate operations from different threads. This was
caused by the fragment flush code at the end of both put and
accumulate. This code was put in place to avoid sending a large
number of unexpected messages to a peer. To fix the bookkeeping
issue we now 1) wait for eager sends to be active before stating
any large isend's, and 2) keep track of the number of large isends
associated with a fragment. If the number of large isends reaches
32 the active fragment is flushed.
- Use atomics to update the large receive/send tag counters. This
prevents duplicate tags from being used. The tag space has also
been updated to use the entire 16-bits of the tag space.
These changes should also fixopen-mpi/ompi#1299.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a bug that occurs when a post message comes in when
sending complete messages or while waiting for all outgoing messages
to flush. In that case the post message might get incorrecly
associated with the ending sync object.
References open-mpi/ompi#1012
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Fixed CID 1269712, 1269709, 1269706, 1269703, 1269694: Logically dead code
Remove extra NULL check as OMPI_OSC_PT2PT_REQUEST_ALLOC can never set the
request to NULL.
Fixes CID 1269668: Unchecked return value
False positive. Add (void) to indicate we do not care about the return code
from opal_hash_table_get_uint32.
Fixes CID 1324726: Free of address-of expression
Do not free lock if it was not allocated.
Fixes CID 1269658: Free of address-of expression
Never will happen but because op is always a built-in op there is no
reason to retain/release it anyway.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
Fix CID 1324733: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324734: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324735: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324736: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324737: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324751: Memory - illegal accesses (USE_AFTER_FREE)
Fix CID 1324750: (USE_AFTER_FREE)
Fix CID 1324749: Memory - corruptions (USE_AFTER_FREE)
Fix CID 1324748: Memory - illegal accesses (USE_AFTER_FREE)
Fix CID 1324747: (USE_AFTER_FREE)
Fix CID 1324746: Memory - corruptions (USE_AFTER_FREE)
Add missing return on an error path.
Fix CID 1324745: Code maintainability issues (UNUSED_VALUE)
Ignore return code from barrier. It was not being used anyway.
Fix CID 1324738: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324741: Null pointer dereferences (REVERSE_INULL)
module->selected_btl can not be NULL in osc/rdma during normal
operation. Removed the unnecessary NULL check.
Fix CID 1324752: Memory - illegal accesses (USE_AFTER_FREE)
Move ompi_osc_pt2pt_module_lock_remove to before the lock is freed.
Fix CID 1324744: Uninitialized variables (UNINIT)
Fix CID 1324743: Uninitialized variables (UNINIT)
This array is not used unitialized but there is no reason not to use
calloc here to silence the warning.
The following CID is a false positive: 1324742. I will mark it such in
coverity.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit updates osc/pt2pt to allocate peer object as they are
needed rather than all at once. Additionally, to help improve the
memory footprint a new synchronization structure has been added.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
- MPI_Compare_and_swap
- MPI_Fetch_and_op
- MPI_Raccumulate
- MPI_Win_detach
Thanks to Michael Knobloch and Takahiro Kawashima for bringing this
to our attention
This commit fixes a bug identified by MTT that occurred when mixing
passive and active target synchronization. The bugs fixed in this
commit are:
- Do not update incoming fragment counts for any type of unbuffered
control message. These messages are out-of-band and should not be
considered towards the signal counts.
- Complete a change from using received counts to expected counts for
lock, unlock, and flush acks. Part of the change made it into
master before the rest was ready. This was preventing wakeups in
some cases.
- Turn the passive_target_access_epoch module member into a
counter. As long as at least one peer is locked we are in a
passive-target epoch and not an active target one. This fix will
ensure that fragment flags are set appropriately.
fixes#538
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The fragment flush code tries to send the active fragment before
sending any queued fragments. This could cause osc messages to arrive
out-of-order at the target (bad). Ensure ordering by alway sending
the active fragment after sending queued fragments.
This commit also fixes a bug when a synchronization message (unlock,
flush, complete) can not be packed at the end of an existing active
fragment. In this case the source process will end up sending 1 more
fragment than claimed in the synchronization message. To fix the issue
a check has been added that fixes the fragment count if this situation
is detected.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a bug in osc/pt2pt which causes MPI_Win_unlock_all
to hang. The problem was caused by code refactoring that moved the
unlock of the local process to after the loop that waits for unlock
acks. This will cause the code to loop forever waiting on the self
ack.
Fixes#444
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.
This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.
Notes:
OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds an owner file in each of the component directories
for each framework. This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page. Currently there are two
"fields" in the file, an owner and a status. A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi. This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.
This commit was SVN r30140.
osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_sendreq_recv_accum_long_cb':
osc_pt2pt_data_move.c:643:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
osc_rdma_data_move.c: In function 'ompi_osc_rdma_control_send_cb':
osc_rdma_data_move.c:1312:37: warning: variable 'header' set but not used [-Wunused-but-set-variable]
This commit was SVN r29092.
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.
The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.
This commit was SVN r28722.
Notes:
- This commit also eliminates the need for an available components list in use
in several frameworks. None of the code in question was making use of the
priority field of the priority component list item so these extra lists were
removed.
- Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
- Cleans up the ompi/orte-info functions. Expose the functions that construct the
list of params so they can be used elsewhere.
patches for mtl/portals4 from brian
missed a few output variables in openib
This commit was SVN r28241.
Features:
- Support for an override parameter file (openmpi-mca-param-override.conf).
Variable values in this file can not be overridden by any file or environment
value.
- Support for boolean, unsigned, and unsigned long long variables.
- Support for true/false values.
- Support for enumerations on integer variables.
- Support for MPIT scope, verbosity, and binding.
- Support for command line source.
- Support for setting variable source via the environment using
OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
- Cleaner API.
- Support for variable groups (equivalent to MPIT categories).
Notes:
- Variables must be created with a backing store (char **, int *, or bool *)
that must live at least as long as the variable.
- Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
mca_base_var_set_value() to change the value.
- String values are duplicated when the variable is registered. It is up to
the caller to free the original value if necessary. The new value will be
freed by the mca_base_var system and must not be freed by the user.
- Variables with constant scope may not be settable.
- Variable groups (and all associated variables) are deregistered when the
component is closed or the component repository item is freed. This
prevents a segmentation fault from accessing a variable after its component
is unloaded.
- After some discussion we decided we should remove the automatic registration
of component priority variables. Few component actually made use of this
feature.
- The enumerator interface was updated to be general enough to handle
future uses of the interface.
- The code to generate ompi_info output has been moved into the MCA variable
system. See mca_base_var_dump().
opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system
This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.
This commit was SVN r28236.