1
1

28547 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
c3adcb05eb Miscellaneous compiler warnings fixes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-23 11:45:30 -07:00
Artem Polyakov
714c8c7381
Merge pull request #4957 from open-mpi/host_filtering
plm/base: fixed the hosts filtering
2018-03-23 10:27:04 -07:00
Jeff Squyres
f66ac43fbc opal/util: fix CID 1430381
Fix minor resource leak.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-23 08:48:11 -07:00
Nathan Hjelm
5f7ff5307e fcoll/two_phase: do not use removed function (MPI_Address)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-03-23 08:43:24 -06:00
Boris Karasev
6afc7099a0 plm/base: fixed the hosts filtering
Reseting the `ORTE_NODE_FLAG_MAPPED` flag after hosts filtering, this
flag is used subsequently and can be affect to the node mapping logic

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-03-23 09:41:16 +03:00
Jeff Squyres
1e56023ea4
Merge pull request #4951 from jsquyres/pr/contribution-consolidation
CONTRIBUTING: Consolidate the 2 files
2018-03-22 10:53:08 -05:00
Jeff Squyres
8ada4e48a5 CONTRIBUTING: Consolidate the 2 files
We accidentally had 2 CONTRIBUTING.md files.  Consolidate the content
of both of them.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-22 08:56:35 -05:00
Jeff Squyres
023a4a82d3
Merge pull request #4942 from jsquyres/pr/tcp-btl-help-message-updates
TCP help message updates
2018-03-22 08:53:04 -05:00
Jeff Squyres
0f8077ace6 oob/tcp: add show_help message about version mismatch
Be more explicit about version mismatch between ORTE processes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-21 20:18:28 -07:00
Jeff Squyres
a15d8233c9
Merge pull request #3434 from dsharma283/pr-3431
ompi/opal: add support for HDR link speeds
2018-03-21 21:57:20 -05:00
Jeff Squyres
40afd525f8 btl/tcp: make error messages more specific
Convert some verbose messages to opal_show_help() messages.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-21 19:34:03 -07:00
Jeff Squyres
e0d86b1c72 opal/util/fd: add opal_fd_get_peer_name(()
Returns a string name (either a resolved name or IPv4/IPv6 name in a
string if unresolvable.  The caller is responsible for freeing the
string.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-21 19:34:03 -07:00
Devesh Sharma
90e9b22196 ompi/opal: add support for HDR link speeds
This patch enables to use adapters with HDR speeds.
issue id 3431

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
2018-03-21 19:15:41 -07:00
Edgar Gabriel
c23dff24bc
Merge pull request #4940 from edgargabriel/topic/ompi-cleanup-march-2018
Topic/ompio cleanup march 2018
2018-03-21 13:47:41 -05:00
Edgar Gabriel
36747cca67 io/ompio: disable the fcoll timing by default
somehow the flag indicating to gather performance data
on collective io operations has changed to 1 accidentally.
Should be 0 ( false) by default.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-03-21 11:34:35 -05:00
Edgar Gabriel
aae8c6c6ad remove addproc sharedfp component
never got to move this sharedfp component into anything
usable. Can easily be restored if necessary.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-03-21 11:27:01 -05:00
Edgar Gabriel
e703ac2da8 remove plfs components
plfs components are at this point not utilized by anybody as far as I know.
Easy to bring back if we want to.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-03-21 11:27:01 -05:00
Howard Pritchard
ade280eb7c
Merge pull request #3292 from markalle/pr/ibv_reg_mr__fork
IB fork
2018-03-21 09:39:08 -06:00
Boris Karasev
85ce76fa36
Merge pull request #4926 from karasevb/pmix_dstore_enable
pmix: dstore returned for direct modex
2018-03-21 14:24:51 +07:00
Boris Karasev
3796307a57 timings: added new timing points
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-03-21 05:16:25 +02:00
Jeff Squyres
5ad796259c
Merge pull request #4931 from jsquyres/pr/contributing-guideliens
CONTRIBUTING.md: add Github contribution guidelines
2018-03-20 16:49:25 -05:00
Jeff Squyres
ed3aa63a96 CONTRIBUTING.md: add Github contribution guidelines
We have a small number of requirements for contributions (e.g.,
"Signed-off-by"), so let's make sure that people have an easy way of
knowing these things.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-20 07:10:38 -07:00
Jeff Squyres
dd620049cd
Merge pull request #4925 from jsquyres/pr/mpool-memkind-typo-fix
mpool/memkind: fix typo in partition page sizes
2018-03-19 22:07:00 -05:00
Boris Karasev
dca3dd2ea4 pmix: dstore returned for direct modex
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-03-20 04:56:48 +02:00
Jeff Squyres
cc4bb433bc mpool/memkind: fix typo in partition page sizes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-19 16:33:08 -05:00
Ralph Castain
9eb426e288
Merge pull request #4924 from karasevb/pmix_fix_dmdx
Sync to PMIx master PR pmix/pmix#697
2018-03-19 06:24:06 -07:00
Boris Karasev
36a0c6a794 pmix: fixed the direct modex request
This commit fixes the case when local client asks for the key from the
process on the remote node. The local server don't have commit count for
remote ranks, it is maintained by another PMIx server, so commit count
should be ignored for remote requests.

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-03-19 11:51:03 +02:00
Brian Barrett
a2d1419185
Merge pull request #4921 from bwbarrett/master-NEWS
dist: Sync 2.1.3 NEWS items into master
2018-03-16 21:01:17 -07:00
Brian Barrett
ab19602752 dist: Sync 2.1.3 NEWS items into master
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-03-16 12:28:41 -07:00
bosilca
bf3dd8af19
Merge pull request #4884 from bosilca/topic/fix_wtime
Improve the range and accuracy of MPI_Wtime.
2018-03-16 14:09:33 +09:00
Aurelien Bouteiller
e08e580e27
Merge pull request #4916 from abouteiller/topic/scaling.pl-m
Scaling.pl: Fix Srun options and wait for DVM launch
2018-03-15 22:06:01 -04:00
Nathan Hjelm
7f4872d483 osc/rdma: performance improvments and bug fixes
This commit is a large update to the osc/rdma component. Included in
this commit:

 - Add support for using hardware atomics for fetch-and-op and single
   count accumulate  when using the accumulate lock. This will improve
   the performance of these operations even when not setting the
   single intrinsic info key.

 - Rework how large accumulates are done. They now block on the get
   operation to fix some bugs discovered by an IBM one-sided test. I
   may roll back some of the changes if the underlying bug in the
   original design is discovered. There appear to be no real
   difference (on the hardware this was tested with) in performance so
   its probably a non-issue. References #2530.

 - Add support for an additional lock-all algorithm: on-demand. The
   on-demand algorithm will attempt to acquire the peer lock when
   starting an RMA operation. The lock algorithm default has not
   changed. The algorithm can be selected by setting the
   osc_rdma_locking_mode MCA variable. The valid values are two_level
   and on_demand.

 - Make use of the btl_flush function if available. This can improve
   performance with some btls.

 - When using btl_flush do not keep track of the number of put
   operations. This reduces the number of atomic operations in the
   critical path.

 - Make the window buffers more friendly to multi-threaded
   applications. This was done by dropping support for multiple
   buffers per MPI window. I intend to re-add that support once the
   underlying performance bug under the old buffering scheme is
   fixed.

 - Fix a bug in request completion in the accumulate, get, and put
   paths. This also helps with #2530.

 - General code cleanup and fixes.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-03-15 14:53:53 -06:00
Aurélien Bouteiller
9e23d24bb4
Scaling.pl: Fix Srun options and wait for DVM launch
Flush out the DVM ready notice on stdout

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2018-03-15 00:00:49 -04:00
Jeff Squyres
5f58e7b961
Merge pull request #4910 from jsquyres/pr/reset-opal-cuda-verbose-value
opal_datatype_module.c: reset opal_cuda_verbose
2018-03-13 14:01:34 -04:00
Jeff Squyres
2713a24009 opal_datatype_module.c: reset opal_cuda_verbose
999de137ce6 accidentally reset opal_cuda_verbose's default value.
This commit puts it back.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-03-13 10:10:15 -07:00
Jeff Squyres
695b92ec7b
Merge pull request #4906 from blegat/doctypo
Fix typo in MPI_Cart_shift doc
2018-03-13 11:29:44 -04:00
Benoît Legat
00600c7cbb Fix typo in MPI_Cart_shift doc
Signed-off-by: Benoît Legat <benoit.legat@gmail.com>
2018-03-13 15:25:42 +01:00
Josh Hursey
ae1d3183f9
Merge pull request #4891 from jjhursey/fix/mpir-symbol-vis
Fix MPIR_proctable structure visibility
2018-03-13 08:09:15 -05:00
Edgar Gabriel
50d07e9622
Merge pull request #4900 from edgargabriel/topic/two_phase_data_sieving_fix
fcoll/two_phase: data sieving has to occur at offset 0 as well
2018-03-10 12:18:15 -06:00
Edgar Gabriel
da640f98df fcoll/two_phase: data sieving has to occur at offset 0 as well
data sieving has to occur for any offset provided that is larger
or equal zero for this implementation to work correctly.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-03-10 11:23:09 -06:00
Joshua Hursey
ccb4f43c9b Fix MPIR_proctable structure visibility
* The `MPIR_PROCDESC` structure needs to be visible even in optimized
   builds so that debuggers can attach to `mpirun` and properly read the
   `MPIR_proctable`.
 * In the v2.0.x and v2.x series this structure resided in the `orterun`
   directory and included the `CFLAGS` fix included here. This code
   moved in the v3.x series and the `CFLAGS` did not move causing this
   issue.
   - Instead of applying the debug `CFLAGS` globally to libopen-rte,
     only apply them to the `orted_submit.c` compile which contains the
     MPIR symbols.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2018-03-09 21:15:28 -05:00
George Bosilca
0f0c27a184
Allow MPI_PROC_NULL as neighbor.
Allowing MPI_PROC_NULL as a neighbor in any topology allows us to add
gaps on the send and recv buffers. This does make the traditional
neighbor collective have a similar behavior as the V version, but in
same time it allows the users to skip the step where they prepare the
counts and the displacement array.

For more info please take a look at issue #4675.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-03-09 12:20:26 +09:00
Edgar Gabriel
0f345c068a
Merge pull request #4888 from edgargabriel/topic/romio_size0_contiguous_flag
io/romio314: mark datatypes of size 0 as contiguous
2018-03-08 13:28:13 -06:00
Jeff Squyres
70c59f78b9
Merge pull request #4883 from bosilca/topic/get_element_fix
Topic/get element fix
2018-03-08 10:31:47 -05:00
Edgar Gabriel
c83b47c266 io/romio314: mark datatypes of size 0 as contiguous
this commit fixes an issue observed with romio314 and the hdf5 1.10.x testsuite.
The ADIOI_Datatype_iscontig() routine in romio314/src/io_romio314_module.c
will now return for a datatype of size 0 that it is contiguous, even if the extent
of the datatype is non-zero. This avoids a segmentation fault observed in the
ADIOI_Flatten routine, and fixes this particular with the hdf5 1.10.x testsuite in
OpenMPI with romio314.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-03-08 09:10:09 -06:00
George Bosilca
9bced03213
Improve the range and accuracy of MPI_Wtime.
As discussed on https://github.com/mpi-forum/mpi-issues/issues/77#issuecomment-369663119
the conversion to double in the MPI_Wtime decrease the range
and accuracy of the resulting timer. By setting the timer to
0 at the first usage we basically maintain the accuracy for
194 days even for gettimeofday.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-03-08 14:26:02 +09:00
George Bosilca
999de137ce
Fix the datatype debug.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-03-08 03:40:08 +09:00
George Bosilca
7848035195
Update the loop stats.
The loop should be updated on each internal iteration.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-03-08 03:18:39 +09:00
Jordan Cherry
2f0e8153a5
Merge pull request #4247 from jocherry/btlTcpLinksBugFix
tcp btl: Fix multiple-link connection establishment.
2018-03-07 08:40:37 -08:00
Alex Mikheev
04ec013da9
Merge pull request #4847 from alex-mikheev/topic/oshmem_group_cache_refactor
oshmem: refactor group cache
2018-03-04 14:36:32 +02:00