1
1
Граф коммитов

436 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
34ff6293bd osc/pt2pt: do not drop/reacquire the ompi_request_lock
This lock is now recursive so it is safe to call into the pml without
dropping the lock.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-26 14:19:38 -06:00
Nathan Hjelm
3245428e82 Merge pull request #1535 from kawashima-fj/pr/osc-pt2pt-header-fix
osc/pt2pt: Fix a struct name typo
2016-04-14 15:55:25 -06:00
KAWASHIMA Takahiro
35ea9e5c3c Add FUJITSU copyright 2016-04-12 13:47:53 +09:00
KAWASHIMA Takahiro
39bcbe439a osc/pt2pt: Fix a struct name typo
Fortunately the sizes of `ompi_osc_pt2pt_header_put_t` and
`ompi_osc_pt2pt_header_get_t` are same. So this doesn't affect
the behavior.
2016-04-11 20:55:22 +09:00
KAWASHIMA Takahiro
28a0577364 osc/pt2pt: Insert breaks in long lines 2016-04-11 19:06:01 +09:00
KAWASHIMA Takahiro
5ac95df9dc osc/pt2pt: use two distinct "namespaces" for tags - revised
Before this commit, a same PML tag may be used for distinct
communications for long messages. For example, consider a condition
where rank A calls ```MPI_PUT``` targeting rank B and rank B calls
```MPI_GET``` targeting rank A simultaneously.
A PML tag for the ```MPI_PUT``` is acquired on rank A and is used
for the long-message communication from rank A to rank B.
A PML tag for the ```MPI_GET``` is acquired on rank B and is used
for the long-message communication from rank A to rank B.
These two tags may become a same value because they are managed
independently on each rank. This will cause a data corruption.

This commit separates the tag used in a single RMA communication
call, one for communication from an origin to a target, and one
for communication from a target to an origin. A "base" tag
is acquired using ```get_tag``` function and PML tag is caluculated
from the base tag by ```tag_to_target``` and ```tag_to_origin```
function.
2016-04-11 19:05:20 +09:00
KAWASHIMA Takahiro
3576ecafa7 Revert "osc/pt2pt: use two distinct "namespaces" for tags"
This reverts commit 06ecdb6aa7
to reimplement the fix completely.
2016-04-11 19:04:11 +09:00
Ryan Grant
7cdf50533c Merge pull request #1314 from francois-wellenreiter/osc_disable_portals4_evt_send
OSC portals4 : do not generate an EVENT_SEND to avoid to filter it
2016-04-07 10:04:27 -06:00
Nathan Hjelm
2ed4501490 osc: fix coverity issues
Fix CID 1324726 (#1 of 1): Free of address-of expression (BAD_FREE):

Indeed, if a lock conflicts with the lock_all we will end up trying to
free an invalid pointer.

Fix CID 1328826 (#1 of 1): Dereference after null check (FORWARD_NULL):

This was intentional but it would be a good idea to check for
module->comm being non_NULL to be safe. Also cleaned out some checks
for NULL before free().

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 09:11:48 -06:00
Nathan Hjelm
deae9e52bf Merge pull request #1259 from kawashima-fj/pr/osc-sm-align
osc/sm: Fix a bus error on MPI_WIN_{POST,START}.
2016-03-15 09:13:38 -06:00
George Bosilca
7c574a3530 Typo. 2016-02-07 07:22:22 +02:00
Nathan Hjelm
5b9c82a964 osc/pt2pt: bug fixes
This commit fixes several bugs identified by @ggouaillardet and MTT:

 - Fix SEGV in long send completion caused by missing update to the
   request callback data.

 - Add an MPI_Barrier to the fence short-cut. This fixes potential
   semantic issues where messages may be received before fence is
   reached.

 - Ensure fragments are flushed when using request-based RMA. This
   allows MPI_Test/MPI_Wait/etc to work as expected.

 - Restore the tag space back to 16-bits. It was intended that the
   space be expanded to 32-bits but the required change to the
   fragment headers was not committed. The tag space may be expanded
   in a later commit.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-04 16:59:39 -07:00
Gilles Gouaillardet
6eac6a8b00 osc/sm: create datafile into the per proc directory in order to make it unique per communicator
Thanks Peter Wind for the report
2016-02-03 10:12:37 +09:00
Nathan Hjelm
519fffb65e osc/pt2pt: eager sends are always active if MPI_MODE_NOCHECK is used
This commit fixes open-mpi/ompi#1299.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-02 12:44:17 -07:00
Nathan Hjelm
d7264aa613 osc/pt2pt: various threading fixes
This commit fixes several bugs identified by a new multi-threaded RMA
benchmarking suite. The following bugs have been identified and fixed:

 - The code that signaled the actual start of an access epoch changed
   the eager_send_active flag on a synchronization object without
   holding the object's lock. This could cause another thread waiting
   on eager sends to block indefinitely because the entirety of
   ompi_osc_pt2pt_sync_expected could exectute between the check of
   eager_send_active and the conditon wait of
   ompi_osc_pt2pt_sync_wait.

 - The bookkeeping of fragments could get screwed up when performing
   long put/accumulate operations from different threads. This was
   caused by the fragment flush code at the end of both put and
   accumulate. This code was put in place to avoid sending a large
   number of unexpected messages to a peer. To fix the bookkeeping
   issue we now 1) wait for eager sends to be active before stating
   any large isend's, and 2) keep track of the number of large isends
   associated with a fragment. If the number of large isends reaches
   32 the active fragment is flushed.

 - Use atomics to update the large receive/send tag counters. This
   prevents duplicate tags from being used. The tag space has also
   been updated to use the entire 16-bits of the tag space.

These changes should also fix open-mpi/ompi#1299.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-02 12:33:33 -07:00
Nathan Hjelm
a19c265ab5 osc/rdma: fix typo in ompi_osc_rdma_complete_atomic
The typo caused SEGVs on systems with only fetching atomic
support.

Fixes open-mpi/ompi#1329

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-26 15:44:07 -07:00
Nathan Hjelm
45da311473 osc/rdma: fix hang when performing large unaligned gets
This commit adds code to handle large unaligned gets. There are two
possible code paths for these transactions:

 1) The remote region and local region have the same alignment. In
 this case the get will be broken down into at most three get
 transactions: 1 transaction to get the unaligned start of the region
 (buffered), 1 transaction to get the aligned portion of the region,
 and 1 transaction to get the end of the region.

 2) The remote and local regions do not have the same alignment. This
 should be an uncommon case and is not optimized. In this case a
 buffer is allocated and registered locally to hold the aligned data
 from the remote region. There may be cases where this fails (low
 memory, can't register memory). Those conditions are unlikely and
 will be handled later.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-22 21:06:46 -07:00
Nathan Hjelm
49d2f44b97 osc/rdma: use correct endpoint for local state
If atomics are not globally visible (cpu and nic atomics do not mix)
then a btl endpoint must be used to access local ranks. To avoid
issues that are caused by having the same region registered with
multiple handles osc/rdma was updated to always use the handle for
rank 0. There was a bug in the update that caused osc/rdma to continue
using the local endpoint for accessing the state even though the
pointer/handle are not valid for that endpoint. This commit fixes the
bug.

Fixes open-mpi/ompi#1241.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-22 10:41:27 -07:00
Nathan Hjelm
6180386bea osc/rdma: disable put aggregation when using threads
Optimizing put aggregation in the presence of threads will require a
redesign of the code. For now just ensure that put aggregation is
turned off when MPI_THREAD_MULTIPLE is enabled.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-21 15:50:35 -07:00
Francois WELLENREITER
411b7301c3 OSC portals4 : do not generate an EVENT_SEND to avoid to filter it 2016-01-20 11:47:46 +01:00
KAWASHIMA Takahiro
ad26899110 osc/sm: Fix a bus error on MPI_WIN_{POST,START}.
A bus error occurs in sm OSC under the following conditions.

- sparc64 or any other architectures which need strict alignment.
- `MPI_WIN_POST` or `MPI_WIN_START` is called for a window created
  by sm OSC.
- The communicator size is odd and greater than 3.

The lines 283-285 in current `ompi/mca/osc/sm/osc_sm_component.c` has
the following code.

```c
module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base);
module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1);
module->posts[0] = (uint64_t *) (module->node_states + comm_size);
```

The size of `ompi_osc_sm_node_state_t` is multiples of 4 but not
multiples of 8. So if `comm_size` is odd, `module->posts[0]` does
not aligned to 8. This causes a bus error when accessing
`module->posts[i][j]`.

This patch fixes the alignment of `module->posts[0]` by setting
`module->posts[0]` first.
2016-01-05 19:04:53 +09:00
Gilles Gouaillardet
06ecdb6aa7 osc/pt2pt: use two distinct "namespaces" for tags 2016-01-05 16:57:37 +09:00
Gilles Gouaillardet
071ae39a44 osc/rdma: add missing #include <alloca.h> 2015-12-24 14:33:58 +09:00
Ralph Castain
ac6289dca6 Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX
Cleanup per George's comments
2015-12-17 17:39:15 -08:00
Ralph Castain
3a56f0d34b Create the pmix external component. Fix a few places where opal/util/argv.h were required when building with an external pmix (go figure).
NOTE: Building with external pmix *requires* that you also build with external libevent and hwloc libraries. Detect this at configure and error out with large message if this requirement is violated.

Closes #1204  (replaces it)
Fixes #1064
2015-12-15 15:26:13 -08:00
Nathan Hjelm
0de9445fc7 osc/rdma: fix bugs when running more than one process per node
A previous commit updated the one-sided code to register the state
region only once. This created an issue when using the scratch lock
with fetching atomics. In this case on any rank that isn't local rank
0 the module->state_handle is NULL. This commit fixes the issue by
removing the scratch lock and using a fragment pointer instead.

Fixes open-mpi/ompi#1290

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-15 11:25:25 -07:00
Nathan Hjelm
b7ba301310 Merge pull request #1165 from hjelmn/add_procs_group
ompi/group: release ompi_proc_t's at group destruction
2015-12-14 13:53:42 -08:00
KAWASHIMA Takahiro
9c7b6a4352 osc/sm: Fix a bug that MPI_WIN_TEST does not update flag to 0.
`MPI_WIN_TEST` must update the `flag` parameter to 0 when not all
origin processes called `MPI_WIN_COMPLETE`. But sm OSC doesn't.
If the caller initialize the `flag` argument to a non-0 value,
the caller will receive the non-0 `flag` value.
2015-12-08 19:23:21 +09:00
Nathan Hjelm
5334d22a37 ompi/group: release ompi_proc_t's at group destruction
This commit changes the way ompi_proc_t's are retained/released by
ompi_group_t's. Before this change ompi_proc_t's were retained once
for the group and then once for each retain of a group. This method
adds unnecessary overhead (need to traverse the group list each time
the group is retained) and causes problems when using an async
add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-30 23:03:47 -07:00
Gilles Gouaillardet
025fd8a9fc osc: use PMPI_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Nathan Hjelm
9ef0821856 osc/rdma: fix some threading bugs
There were two bugs in osc/rdma when using threads:

 - Deadlock is ompi_osc_rdma_start_atomic. This occurs because
   ompi_osc_rdma_frag_alloc is called with the module lock. To fix the
   issue the module lock is now recursive. In the future I will add a
   new lock to protect just the current rdma fragment.

 - Do not drop the lock in ompi_osc_rdma_frag_alloc when calling
   ompi_osc_rdma_frag_complete. Not only is it not needed but dropping
   the lock at this point can cause a competing thread to mess up the
   state.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-12 20:25:57 -07:00
Rolf vandeVaart
87a4cc6118 Disable the use of osc rdma when we detect a GPU buffer as it is not supported in that component.
This forces a failover to the osc pt2pt component. Fixes #1012
2015-10-28 14:47:45 -04:00
Jeff Squyres
140cf90e3e osc_rdma: minor compiler warning stomp 2015-10-23 06:21:56 -07:00
Nathan Hjelm
63e744ffc6 osc/rdma: use only a single btl registration for local state
This commit fixes a bug that can occur on Cray Gemini networks. If
multiple registrations are used for the local state then we looks the
atomicity guarantees. To avoid issues like this use only a single
registration handle for all local state on a node.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-22 15:51:19 -06:00
Nathan Hjelm
f690fc8fd5 osc/pt2pt: fix warnings
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-22 15:50:40 -06:00
Nathan Hjelm
97c9732bad osc/rdma: bug fixes
This commit fixes the following:

 - CIDs 1328491, 1328492: Dead code caused by typos in a prior
   commit.

 - Fix the calculation of dynamic memory regions. This was causes
   incorrect RMA range errors when accessing the last partial page of
   an attachment.

 - Fix a SEGV when using dynamic memory windows with local state (all
   processes on the same node).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-22 09:49:38 -06:00
Nathan Hjelm
b2fa2a9bef Merge pull request #1056 from hjelmn/osc_fixes
osc/pt2pt: reset all_sync sync object before sending complete messages
2015-10-21 19:40:28 -06:00
Nathan Hjelm
864f88a2a3 osc/pt2pt: reset all_sync sync object before sending complete messages
This commit fixes a bug that occurs when a post message comes in when
sending complete messages or while waiting for all outgoing messages
to flush. In that case the post message might get incorrecly
associated with the ending sync object.

References open-mpi/ompi#1012

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-21 18:30:08 -06:00
Nathan Hjelm
9476c7bbca osc/rdma: use standard verbosity levels
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-21 12:31:41 -06:00
Nathan Hjelm
b8ee05d352 osc/rdma: bug fixes
This commit fixes several bugs in the osc/rdma component:

 - Complete aggregated requests immediately. Completion of RMA
   requests indicates local completion anyway. This fixes a hang in
   the c_reqops test.

 - Correctly mark Rget_accumulate requests.

 - Set the local base flag correctly on the local peer.

 - Clear or set the no locks flag on the window if the value is
   changed by MPI_Win_set_info.

 - Actually update the target when using MPI_OP_REPLACE.

Fixes open-mpi/ompi#1010

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-20 15:27:15 -06:00
Nathan Hjelm
e11f014c6e osc/rdma: fix segmentation fault when running 1 ppn
This commit fixes an issue identified by @rolfv. The local peer was
not being correctly initialized when running with a single process on
a node.

This fixes open-mpi/ompi#1010

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-14 12:40:52 -06:00
Todd Kordenbrock
141b20d991 osc-portals4: Initialize datatype in MPI_Get_accumulate and MPI_Rget_accumulate
Fix code paths that didn't convert the MPI datatype to the
corresponding Portals4 datatype.

Thanks to Nicolas Chevalier (@shawone) for finding this bug and
submitting a patch.
2015-10-08 12:17:19 -05:00
Nathan Hjelm
5fd9c35957 osc/rdma: fix incorrect assert
This commit fixes MTT failures in debug builds.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-29 15:37:40 -06:00
Nathan Hjelm
7b8ec48c68 osc/rdma: fix typos inarguments to btl_atomic_[f]op
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-29 08:09:00 -06:00
Nathan Hjelm
12bd300c40 Merge pull request #929 from hjelmn/add_procs
Update add_procs support
2015-09-28 17:29:13 -06:00
Nathan Hjelm
552e1b59a5 osc/rdma: fix coverity issues
Fixes CID 1324730, 1327429, 1324728, 1196633, 1324731, 1324727, and
1196632: Logically dead code

OMPI_OSC_RDMA_REQUEST_ALLOC can never return a NULL request. Removed
unnecessary NULL checks.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-26 12:45:14 -06:00
Nathan Hjelm
ebf19ac5eb osc/pt2pt: fix coveity issues
Fixed CID 1269712, 1269709, 1269706, 1269703, 1269694: Logically dead code

Remove extra NULL check as OMPI_OSC_PT2PT_REQUEST_ALLOC can never set the
request to NULL.

Fixes CID 1269668: Unchecked return value

False positive. Add (void) to indicate we do not care about the return code
from opal_hash_table_get_uint32.

Fixes CID 1324726: Free of address-of expression

Do not free lock if it was not allocated.

Fixes CID 1269658: Free of address-of expression

Never will happen but because op is always a built-in op there is no
reason to retain/release it anyway.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-26 11:18:22 -06:00
Nathan Hjelm
f84716fcd0 Merge pull request #941 from hjelmn/osc_pt2pt_fix
osc/pt2pt: fix heterogenous build
2015-09-25 08:07:09 -06:00
Nathan Hjelm
ae7f47e04d osc/pt2pt: fix heterogenous build
Fixes #940

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-25 00:15:02 -06:00
Todd Kordenbrock
3e63a3458c portals4: add support for dynamic add_procs() to all Portals4 components
In the default mode of operation, the Portals4 components support
dynamic add_procs().

The Portals4 components have two alternate modes (flow control and
logical-to-physical) that require knowledge of all procs at startup.
In these modes, mtl-portals4 sets the MCA_MTL_BASE_FLAG_REQUIRE_WORLD
flag and btl-portals4 sets the MCA_BTL_FLAGS_SINGLE_ADD_PROCS flag
to tell the PML that we need all the procs in one add_procs() call.
2015-09-24 22:12:57 -05:00
Nathan Hjelm
248212276d osc/sm: fix remaining coverity issues
Fixes CID 1324870: Memory - illegal accesses (USE_AFTER_FREE)

Free osc module after calling destruct on the lock.

Fixes CID 1324868: Integer handling issues (OVERFLOW_BEFORE_WIDEN)
Fixes CID 1324867: Integer handling issues (OVERFLOW_BEFORE_WIDEN)

Explicitly cast to uint64_t to ensure the widen happens before an overflow
can occur.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-24 15:55:01 -06:00
Nathan Hjelm
ee5810813b osc/pt2pt: fix regression in pscw sync on 0 size groups
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 17:09:00 -06:00
Nathan Hjelm
f6920aa916 osc/rdma: check for usable btls during query
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 17:08:28 -06:00
Nathan Hjelm
903762e194 osc/sm: fix pscw synchronization
The osc/sm component was using a simple counter to determine if all
expected posts had arrived to start a PSCW access epoch. This is
incorrect as a post may arrive from a peer that isn't part of the
current start group. There are many ways this could have been fixed.
This commit adds an n^2 bitmap. When a process posts it sets a bit in
the bitmap associated with the access rank to indicate the post is
complete. The access rank checks for and clears the bits associated
with all the processes in the start group.

The bitmap requires comm_size ^ 2 bits of space. This should be
managable as most nodes have relatively small numbers of processes. If
this changes another algorigthm can be implemented.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 16:00:27 -06:00
Nathan Hjelm
036395dc0f osc/pt2pt: fix typos
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 10:30:01 -06:00
Nathan Hjelm
974061c38f osc: fixed issues identified by coverity
Fix CID 1324733: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324734: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324735: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324736: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324737: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324751: Memory - illegal accesses  (USE_AFTER_FREE)
Fix CID 1324750: (USE_AFTER_FREE)
Fix CID 1324749: Memory - corruptions  (USE_AFTER_FREE)
Fix CID 1324748: Memory - illegal accesses  (USE_AFTER_FREE)
Fix CID 1324747: (USE_AFTER_FREE)
Fix CID 1324746: Memory - corruptions  (USE_AFTER_FREE)

Add missing return on an error path.

Fix CID 1324745: Code maintainability issues  (UNUSED_VALUE)

Ignore return code from barrier. It was not being used anyway.

Fix CID 1324738: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324741: Null pointer dereferences  (REVERSE_INULL)

module->selected_btl can not be NULL in osc/rdma during normal
operation. Removed the unnecessary NULL check.

Fix CID 1324752: Memory - illegal accesses  (USE_AFTER_FREE)

Move ompi_osc_pt2pt_module_lock_remove to before the lock is freed.

Fix CID 1324744: Uninitialized variables  (UNINIT)
Fix CID 1324743: Uninitialized variables  (UNINIT)

This array is not used unitialized but there is no reason not to use
calloc here to silence the warning.

The following CID is a false positive: 1324742. I will mark it such in
coverity.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 09:23:39 -06:00
Nathan Hjelm
60c2b0df48 Merge pull request #903 from hjelmn/new_osc_rdma
osc/rdma: add true RDMA one-sided component
2015-09-21 10:29:11 -06:00
Nathan Hjelm
d8df9d414d osc/rdma: add true RDMA one-sided component
This commit adds support for performing one-sided operations over
supported hardware (currently Infiniband and Cray Gemini/Aries). This
component is still undergoing active development.

Current features:

 - Use network atomic operations (fadd, cswap) for implementing
   locking and PSCW synchronization.

 - Aggregate small contiguous puts.

 - Reduced memory footprint by storing window data (pointer, keys,
   etc) at the lowest rank on each node. The data is fetched as each
   process needs to communicate with a new peer. This is a trade-off
   between the performance of the first operation on a peer and the
   memory utilization of a window.

TODO:

 - Add support for the accumulate_ops info key. If it is known that
   the same op or same op/no op is used it may be possible to use
   hardware atomics for fetch-and-op and compare-and-swap.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 15:01:33 -06:00
Nathan Hjelm
fd42343ff0 osc/pt2pt: reduce memory footprint of window
This commit updates osc/pt2pt to allocate peer object as they are
needed rather than all at once. Additionally, to help improve the
memory footprint a new synchronization structure has been added.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 13:01:56 -06:00
Nathan Hjelm
ad3a2ef6cc silence warnings introduced by add_procs merge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 16:33:52 -06:00
Nathan Hjelm
5b7943db78 ompi/group: do not allocate ompi_proc_t's on group union/difference
This commit modifies the ompi_group_t union/difference code to compare/copy the
raw group values. This will either be a ompi_proc_t or a sentinel value. This
commit also adds helper functions to convert between opal process names and
sentinel values.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
d8b0a6efda Remove use of ompi_comm_peer_lookup in osc/sm
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
2a8cc5e637 osc/pt2pt: remove outstanding lock only after lock/flush ack received
fixes #840

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-01 10:54:47 -06:00
Gilles Gouaillardet
21642a2407 osc: do not cast way the const modifier when this is not necessary
update the osc framework and mpi c bindings
2015-08-31 10:34:05 +09:00
Gilles Gouaillardet
21b1e7f8c5 mpi conformance: fix prototypes
- MPI_Compare_and_swap
- MPI_Fetch_and_op
- MPI_Raccumulate
- MPI_Win_detach

Thanks to Michael Knobloch and Takahiro Kawashima for bringing this
to our attention
2015-08-31 10:34:05 +09:00
Todd Kordenbrock
10cf64373a osc-portals4: allow atomic ops on datatypes that are max_fetch_atomic_size bytes in length
Portals4 supports atomic ops on datatypes less than or equal to
max_fetch_atomic_size bytes.  This commit fixes a bug that required
the datatype to be less than max_fetch_atomic_size bytes.
2015-08-18 11:51:16 -05:00
Nathan Hjelm
ee36d813dc Merge pull request #657 from hjelmn/c99
more c99 updates
2015-06-25 11:21:09 -06:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Howard Pritchard
e49a37c034 ownership: update ownership files
per discussions at OMPI devel workshop

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-06-25 10:04:42 -06:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Gilles Gouaillardet
1488e82efd osc/pt2pt: enable heterogeneous support 2015-05-14 16:42:48 +09:00
Todd Kordenbrock
9df163f116 portals4: use a single Memory Descriptor to cover all of memory
In days past, some implementations of Portals4 could not cover all
of memory with a single Memory Descriptor so multiple large
overlapping Memory Descriptors were created.  Because none of the
current implementations have this limitation (and no future
implementations should either), this commit removes the overlapping
Memory Descriptors code.
2015-05-11 11:49:41 -05:00
Ralph Castain
6e95bcd583 Fix typo in oob_tcp.c when IPV6 enabled. Cleanup a few other warnings, including a type in coll_sm that prevented that component from registering its MCA params! 2015-05-07 21:05:08 -07:00
Gilles Gouaillardet
9d56b85b55 initialize common symbols from ompi 2015-05-08 10:11:58 +09:00
Nathan Hjelm
2716b8b1da osc/pt2pt: correct flush expected counts
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-04-24 13:34:21 -06:00
Nathan Hjelm
f1d09e55ec osc/pt2pt: silence warnings
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-23 15:35:47 -06:00
Nathan Hjelm
29b435a5a4 osc/pt2pt: fix bugs that caused incorrect fragment counting
This commit fixes a bug identified by MTT that occurred when mixing
passive and active target synchronization. The bugs fixed in this
commit are:

 - Do not update incoming fragment counts for any type of unbuffered
   control message. These messages are out-of-band and should not be
   considered towards the signal counts.

 - Complete a change from using received counts to expected counts for
   lock, unlock, and flush acks. Part of the change made it into
   master before the rest was ready. This was preventing wakeups in
   some cases.

 - Turn the passive_target_access_epoch module member into a
   counter. As long as at least one peer is locked we are in a
   passive-target epoch and not an active target one. This fix will
   ensure that fragment flags are set appropriately.

fixes #538

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-23 13:22:24 -06:00
Nathan Hjelm
df75d0382f ompi: use C99 subobject naming for component initialization
This commit helps future-proof ompi components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Nathan Hjelm
3436f2917d Merge pull request #449 from hjelmn/mca_base_update
mca/base update
2015-04-16 08:41:48 -06:00
Jeff Squyres
49f52a5356 osc_sm_passive_target.c: update the check for lock types
Based on some on-list and IM discussion with @hjelmn about
open-mpi/ompi@40b7643119, change the testing to a switch/case.  If we
fall into the default case, assert() error (because it's an OMPI
developer programming error).
2015-04-13 12:02:15 -04:00
Jeff Squyres
40b7643119 osc_sm_passive_target.c: ensure ret is always defined
Fixes a compiler warning
2015-04-13 11:31:43 -04:00
Nathan Hjelm
80ed805a16 osc/pt2pt: fix synchronization bugs
The fragment flush code tries to send the active fragment before
sending any queued fragments. This could cause osc messages to arrive
out-of-order at the target (bad). Ensure ordering by alway sending
the active fragment after sending queued fragments.

This commit also fixes a bug when a synchronization message (unlock,
flush, complete) can not be packed at the end of an existing active
fragment. In this case the source process will end up sending 1 more
fragment than claimed in the synchronization message. To fix the issue
a check has been added that fixes the fragment count if this situation
is detected.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-06 08:39:19 -06:00
Nathan Hjelm
b68d66bb9b MCA: Add the project/project version to the MCA base component
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.

All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Ralph Castain
0cfb4f29aa Silence compiler warning 2015-03-16 09:59:21 -07:00
Nathan Hjelm
d929137768 osc/pt2pt: need to unlock self before waiting for unlock acks
This commit fixes a bug in osc/pt2pt which causes MPI_Win_unlock_all
to hang. The problem was caused by code refactoring that moved the
unlock of the local process to after the loop that waits for unlock
acks. This will cause the code to loop forever waiting on the self
ack.

Fixes #444
2015-03-10 14:10:37 -06:00
Todd Kordenbrock
0cf45df1a0 osc-portals4: fix incomplete free list conversion 2015-02-26 10:53:45 -06:00
Nathan Hjelm
5f1254d710 Update code base to use the new opal_free_list_t
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.

This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.

Notes:

OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-24 10:05:45 -07:00
Jeff Squyres
b70fa3e2cb osc_sm: Fix valgrind warning
Many thanks to Lisandro Dalcin for contributing this patch.

Fixes open-mpi/ompi#202.
2015-02-24 03:36:17 -08:00
Howard Pritchard
bf89131f9e add owner files to opa/ompi/orte mca directories
This commit adds an owner file in each of the component directories
for each framework.  This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page.  Currently there are two
"fields" in the file, an owner and a status.  A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
2015-02-22 15:10:23 -07:00
Jeff Squyres
0bb1dfeca9 osc_base_obj_convert: remove unnecessary MEMCHECKER line
Commit open-mpi/ompi@1a3597aam changed the type of the `convertor`
variable from `ompi_osc_base_convertor_t` (which contained an
`opal_convertor_t`) to an `opal_convertor_t`.  Hence, using memchecker
to ensure that the inner convertor of the `ompi_osc_base_convertor_t`
is considered initialized is now unnecessary.
2015-02-16 07:27:44 -08:00
Gilles Gouaillardet
0d560ddf77 osc: fix typo
this typo caused build failure when configure'd with --enable-memchecker
see http://mtt.open-mpi.org/index.php?do_redir=2234
2015-02-16 10:09:08 +09:00
George Bosilca
a7a4d6335e Various cleanups. 2015-02-15 11:39:09 -05:00
Nathan Hjelm
0e822e03f7 osc/sm: always release the lock on MPI_Unlock
When a lock was obtained with MPI_MODE_NOCHECK it was not correctly
release on unlock. This is an error.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-12 18:54:22 -07:00
Nathan Hjelm
1a3597aa93 osc/base: fix accumulate on derived datatypes
With certain datatypes the opal_datatype_unpack method for performing
the accumulate operation does not work. This commit modifies the
accumulate code in the osc base to use opal_convertor_raw instead.

Fixes #385
2015-02-11 12:36:30 -07:00
Nathan Hjelm
a2bdfd99a2 osc/pt2pt: do not set active_incoming_frag_signal_count to 0 on fence completion 2015-02-11 12:34:04 -07:00
Todd Kordenbrock
b5a0f3d347 osc-portals4: rename OPAL_ASSEMBLY_ARCH values from OMPI_* to OPAL_* 2015-02-04 16:08:55 -06:00
Gilles Gouaillardet
9be4dfb152 osc/pt2pt: invoke ompi_osc_signal_outgoing only once per fragment 2015-01-22 13:43:44 +09:00
Gilles Gouaillardet
661c35ca67 cleanup dead code caused by the removal of the --with-threads configure option 2015-01-16 19:13:59 +09:00
Ralph Castain
4e592ac434 Fix the tarball by providing the correct list of headers in the Makefile.am 2015-01-07 18:37:26 -08:00
Nathan Hjelm
e68ed2876c osc/pt2pt: threading fixes and code cleanup 2015-01-06 13:39:16 -07:00