1
1
Граф коммитов

8783 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
d7264aa613 osc/pt2pt: various threading fixes
This commit fixes several bugs identified by a new multi-threaded RMA
benchmarking suite. The following bugs have been identified and fixed:

 - The code that signaled the actual start of an access epoch changed
   the eager_send_active flag on a synchronization object without
   holding the object's lock. This could cause another thread waiting
   on eager sends to block indefinitely because the entirety of
   ompi_osc_pt2pt_sync_expected could exectute between the check of
   eager_send_active and the conditon wait of
   ompi_osc_pt2pt_sync_wait.

 - The bookkeeping of fragments could get screwed up when performing
   long put/accumulate operations from different threads. This was
   caused by the fragment flush code at the end of both put and
   accumulate. This code was put in place to avoid sending a large
   number of unexpected messages to a peer. To fix the bookkeeping
   issue we now 1) wait for eager sends to be active before stating
   any large isend's, and 2) keep track of the number of large isends
   associated with a fragment. If the number of large isends reaches
   32 the active fragment is flushed.

 - Use atomics to update the large receive/send tag counters. This
   prevents duplicate tags from being used. The tag space has also
   been updated to use the entire 16-bits of the tag space.

These changes should also fix open-mpi/ompi#1299.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-02 12:33:33 -07:00
Gilles Gouaillardet
728a97c558 use-mpi-f08: remove duplicates from Makefile.am 2016-02-02 13:33:07 +09:00
Jeff Squyres
910eca751f Merge pull request #1327 from ggouaillardet/poc/mpi_xxx_dup_yyy_no_bind
f08: do not BIND(C) to subroutines with LOGICAL parameters
2016-02-01 17:56:27 -05:00
Edgar Gabriel
3f7fff5780 Merge pull request #1331 from edgargabriel/solaris-statfs-fix
Solaris statfs fix
2016-01-28 20:16:33 -06:00
Gilles Gouaillardet
69ba2a9b6b ddt: fix support of MPI_COMBINER_RESIZED in __ompi_datatype_create_from_args
Thanks James Ramsey for the report
2016-01-28 11:32:29 +09:00
Nathan Hjelm
a19c265ab5 osc/rdma: fix typo in ompi_osc_rdma_complete_atomic
The typo caused SEGVs on systems with only fetching atomic
support.

Fixes open-mpi/ompi#1329

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-26 15:44:07 -07:00
Edgar Gabriel
b4a725c26a need to check for the parent dir as well, since the file might not exist yet. 2016-01-26 13:49:21 -06:00
Edgar Gabriel
722aab92e6 - extend opal_path_nfs to retrieve the file system type
- use opal_path_nfs in the fs_base function to avoid code duplication.
2016-01-26 13:36:21 -06:00
Gilles Gouaillardet
704f14f91e f08: do not BIND(C) to subroutines with LOGICAL parameters
Thanks Paul Romano for reporting this issue.
2016-01-26 13:56:24 +09:00
Joshua Ladd
69e3c6f289 Merge pull request #1321 from jladd-mlnx/topic/add-allgatherv-reduce
Adding entry points for Allgatherv, iAllgatherv, Reduce, and iReduce.
2016-01-25 20:46:52 -05:00
Nathan Hjelm
500e90422d Merge pull request #1320 from hjelmn/osc_rdma_fix
osc/rdma: fix hang when performing large unaligned gets
2016-01-25 09:36:13 -07:00
Nathan Hjelm
45da311473 osc/rdma: fix hang when performing large unaligned gets
This commit adds code to handle large unaligned gets. There are two
possible code paths for these transactions:

 1) The remote region and local region have the same alignment. In
 this case the get will be broken down into at most three get
 transactions: 1 transaction to get the unaligned start of the region
 (buffered), 1 transaction to get the aligned portion of the region,
 and 1 transaction to get the end of the region.

 2) The remote and local regions do not have the same alignment. This
 should be an uncommon case and is not optimized. In this case a
 buffer is allocated and registered locally to hold the aligned data
 from the remote region. There may be cases where this fails (low
 memory, can't register memory). Those conditions are unlikely and
 will be handled later.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-22 21:06:46 -07:00
Valentin Petrov
5e2a2c0755 BufFix for coll/hcoll: coll_request must be set to ACTIVE when alloced
If the state of the request is not set to OMPI_REQUEST_ACTIVE
       then MPI_Test would immediately signal such request completed
       while hcoll may still be working on it.

Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-23 03:23:59 +02:00
Joshua Ladd
e398bf6f3a Adding entry points for Allgatherv, iAllgatherv, Reduce, and iReduce.
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-23 03:09:29 +02:00
Nathan Hjelm
49d2f44b97 osc/rdma: use correct endpoint for local state
If atomics are not globally visible (cpu and nic atomics do not mix)
then a btl endpoint must be used to access local ranks. To avoid
issues that are caused by having the same region registered with
multiple handles osc/rdma was updated to always use the handle for
rank 0. There was a bug in the update that caused osc/rdma to continue
using the local endpoint for accessing the state even though the
pointer/handle are not valid for that endpoint. This commit fixes the
bug.

Fixes open-mpi/ompi#1241.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-22 10:41:27 -07:00
Nathan Hjelm
243d973cfe Merge pull request #1316 from hjelmn/datatype_pack_threads
ompi/datatype: make datatype pack thread safe
2016-01-21 20:14:10 -07:00
Nathan Hjelm
b921831f2b ompi/datatype: make datatype pack thread safe
This commit makes ompi_datatype_get_pack_description thread safe. The
call is used by osc/pt2pt to send the packed description to remote
peers. Before this commit if MPI_THREAD_MULTIPLE is enabled and the
user uses MPI_Put, MPI_Get, etc we could hit a race where multiple
threads attempt to store the packed description on the datatype. Since
the code in question is not performance-critical the threading fix
uses opal_atomic_* calls instead of bothering with OPAL_THREAD_*.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-21 17:53:28 -07:00
Nathan Hjelm
6180386bea osc/rdma: disable put aggregation when using threads
Optimizing put aggregation in the presence of threads will require a
redesign of the code. For now just ensure that put aggregation is
turned off when MPI_THREAD_MULTIPLE is enabled.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-21 15:50:35 -07:00
Edgar Gabriel
b253d4e887 fix CID 1349739, CID 1349738, CID 1349736 and (probably) CID 1349740 (not entirely sure about the last one, since I don't understand why block[i] is a problem but max_len[i] allocated and treated exactly the same way 1 line later is not). 2016-01-21 08:32:23 -06:00
Edgar Gabriel
9b8d769e41 will rivist the addproc component later in spring, right now it is constantly in the way of doing my tests. 2016-01-20 15:05:51 -06:00
Edgar Gabriel
1671604dbc Merge pull request #1307 from edgargabriel/fcoll-dynamic_gen2
Fcoll dynamic gen2
2016-01-20 10:19:56 -06:00
Gilles Gouaillardet
2adbe273d6 mpi: have MPI_Wtick() return the period (and not the frequency) if OPAL_TIMER_CYCLE_NATIVE 2016-01-20 14:14:47 +09:00
Gilles Gouaillardet
c0f8f2ce32 ompi/dpm: correctly handle sentinels in construct peers
This fix is similar to open-mpi/ompi@4c1ea4a171
and open-mpi/ompi@213b2abde4
2016-01-18 09:57:38 +09:00
Edgar Gabriel
a9ca37059a improve the communicaton abstraction. This commit also allows all aggregators to work simultaniously, instead of the slightly staggered way of the previous version. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
56e11bfc97 initialize the stripe_size variable as well. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
26c57ef374 separate the size of the buffer used for the shuffle step and the size of the buffer used for a pwritev operation. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
39d5c8c281 further bug fixes silencing a compiler warning and fixing a memory overrun 2016-01-17 09:48:49 -06:00
Edgar Gabriel
2bcae84e11 further debugging 2016-01-17 09:48:49 -06:00
Edgar Gabriel
2bdd6ba17a correctly free some buffers, and ensure that lustre_stripe_size and stripe_count are always read from the file system. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
4bbb22bd0b add a new field to the ompio data structure (stripe_count) and set it correctly on pvfs2 and lustre. 2016-01-17 09:48:49 -06:00
Edgar Gabriel
d282e94b67 add the new dynamic_gen2 component, designed to coexist for now with the original dynamic component 2016-01-17 09:48:49 -06:00
Jeff Squyres
60ffe713b8 common syms: whitelist bison-generated common symbols
Bison generates some common symbols that we can't do anything about,
so whitelist them.
2016-01-16 03:53:14 -08:00
Jeff Squyres
96f94f8228 fortran: whitelist deliberate common symbols
The Fortran library has a number of common symbols that are
deliberate, so whitelist them.
2016-01-16 03:53:14 -08:00
Joshua Ladd
18c5a21562 Fix typo in error handling flow. 2016-01-14 22:28:54 +02:00
Joshua Ladd
afa62d8ca1 Addressing reviewers' comments for https://github.com/open-mpi/ompi-release/pull/891 2016-01-14 19:22:27 +02:00
Tomislav Janjusic
3858bc8e62 Adding support for dynamic endpoint creation
Signed-off-by: Tomislav Janjusic <tomislavj@mngx-apl-01.mtl.labs.mlnx>
Signed-off-by: Tomislavj Janjusic <tomislavj@mellanox.com>
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-12 22:17:03 +02:00
Nathan Hjelm
dd4d49cbbb Merge pull request #1278 from ggouaillardet/poc/osc_pt2pt
osc/pt2pt: use two distinct "namespaces" for tags
2016-01-12 09:49:31 -07:00
Nathan Hjelm
d26cc3fece ompi/group: do no decrement parent group proc pointers in destruct
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-01-11 12:56:11 -07:00
Edgar Gabriel
0a1b735eed use the actual preadv and pwritev functions if available. That's what the fbtl interfaces have been designed for. 2016-01-07 08:29:17 -06:00
Gilles Gouaillardet
4c1ea4a171 dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept()
this commit includes missing bits from open-mpi/ompi@213b2abde4
2016-01-07 09:11:03 +09:00
Gilles Gouaillardet
213b2abde4 dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept() 2016-01-06 16:21:13 +09:00
Edgar Gabriel
1b0b849994 remove the MCA parameter setting the number of hosts in PLFS, since the plfs_setxattr function used is causing linking problems with PLFS 2.5
remove unused variables.
2016-01-05 11:13:23 -06:00
Edgar Gabriel
7861a8c357 revise the logic in the fbtl plfs avoiding the memcpy operation 2016-01-05 10:04:46 -06:00
Edgar Gabriel
da309ac962 - use a unique pid for each process as requested by the API
- sync the file before closing it
- use plfs_access() instead of access() before closing the file
2016-01-05 10:04:12 -06:00
Gilles Gouaillardet
06ecdb6aa7 osc/pt2pt: use two distinct "namespaces" for tags 2016-01-05 16:57:37 +09:00
Gilles Gouaillardet
14fdf75944 fs/pvfs2: fix typo
Thanks Dave Love for reporting this issue.

Fixes #1272
2016-01-03 23:28:35 +09:00
Artem Polyakov
2abb2972ac Fix Mellanox copyrights with respect to the following PRs:
* https://github.com/open-mpi/ompi/pull/1184
* https://github.com/open-mpi/ompi/pull/1188
* https://github.com/open-mpi/ompi/pull/1197
* https://github.com/open-mpi/ompi/pull/1202
* https://github.com/open-mpi/ompi/pull/1210
* https://github.com/open-mpi/ompi/pull/1216
* https://github.com/open-mpi/ompi/pull/1236
* https://github.com/open-mpi/ompi/pull/1237
* https://github.com/open-mpi/ompi/pull/1248
* https://github.com/open-mpi/ompi/pull/1260
* https://github.com/open-mpi/ompi/pull/1264
2015-12-30 00:12:19 +06:00
Ralph Castain
810f2446b7 Add pmix120 component, update the error handling functions in the PMIx API.
Update the configure logic for the new pmix120 component

ckpt

Get the pmix120 component to work - still not really registering or handling notifications, but infrastructure now operates

Cleanup some of the symbol scopes, and provide a more comprehensive rename.h file. Will pretty it up later - let's see how this works

Cleanup the rename files to use the pretty macros
2015-12-28 23:15:44 +09:00
Gilles Gouaillardet
fec973efda configury: test portability
replace test ... -o ... with test ... || test ...
and test ... -a ... with test ... && test ...
2015-12-28 13:58:45 +09:00
Gilles Gouaillardet
47ab2fcb89 man: fix MPI_Neighbor_alltoall{v,w} prototypes
Thanks Willem Vermin for bringing this to our attention
2015-12-28 09:39:33 +09:00