1
1
Граф коммитов

30330 Коммитов

Автор SHA1 Сообщение Дата
Austen Lauria
edcd6d8aeb
Merge pull request #7146 from bgoglin/master
fix typos hlwoc->hwloc
2019-11-13 15:38:00 -05:00
Ralph Castain
b0a487a3c7
Update IOF redirection options
Provide both "--output-directory" and "--output-filename" options but do
not allow both to be given at the same time. Output-directory allows
specification of a directory, with output redirected into files of form
"<directory>/<jobid>/rank.<vpid>/stdout[err]". This option also supports two
directives: nojobid (removes the jobid directory layer) and nocopy (do
not copy the output to the terminal).

Output-filename is the "old" behavior that names the output files as
"<filename>.rank" with both stdout and stderr redirected into it. This
option only supports one directive: nocopy (do not copy the output to
the terminal).

Fix both the --help and man documentation.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-11-13 12:13:06 -08:00
Nathan Hjelm
09dd383f8b
Merge pull request #7108 from devreal/btl-ugni-deadlock
uGNI: Fix potential deadlock when processing outstanding transfers
2019-11-11 10:56:56 -08:00
George Bosilca
3de636dc6f Swap the 2 fields to maintain the size of the struct.
Thanks @devreal for catching this.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-11-07 11:25:03 -05:00
George Bosilca
59fb02618e Prevent overflow when dealing with datatype count.
This patch fixes #7147 by preventing overflow when multiplying
the count and the blocklen. The count reflects MPI count and is
therefore bound to the size of an int (it is an uint32_t) while the
blocklen can be merged together to represent the largest contiguous
memory layout and it is therefore promoted to a size_t.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-11-07 11:25:03 -05:00
Yossi Itigin
40e2fbb093
Merge pull request #7114 from brminich/topic/mlx_scat_tuning
COLL/TUNED: Add linear scatter using isend for mlnx platform
2019-11-07 13:38:21 +02:00
Mikhail Brinskii
f2cbd4806e COLL/TUNED: Add linear scatter using isend for mlnx platform
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-11-07 11:04:39 +02:00
Howard Pritchard
6fc5a4e033
Merge pull request #7142 from hppritcha/topic/swat_issue_7128
btl/uct: restrict to using UCT 1.7 or older API for now
2019-11-06 16:07:54 -07:00
Howard Pritchard
9d345d9aa0 btl/uct: add UCT API version check to configury
related to #7128

The UCX crew is no longer guaranteeing that the UCT API is going to be frozen,
so this is kind of a whack-a-mole problem trying to keep the BTL UCT working
with various changing UCT APIs.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-11-06 14:27:58 -07:00
Brice Goglin
5c6bd7ea4e fix typos hlwoc->hwloc
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2019-11-06 10:42:36 +01:00
Austen Lauria
1c33d5aecd
Merge pull request #7141 from devreal/shmem_memheap_alloc_band
Shmem: use bitwise and instead of logical and to check for allocator capabilities
2019-11-05 18:21:23 -05:00
Nathan Hjelm
30171cbd27
Merge pull request #7144 from hjelmn/btl_uct_fix_compilation_issue_for_ucx_1_7_because_the_api_break_got_into_this_release
btl/uct: fix compilation for UCX 1.7.0
2019-11-05 14:54:30 -08:00
Nathan Hjelm
a3026c016a btl/uct: fix compilation for UCX 1.7.0
Ref #7128

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2019-11-05 12:53:26 -08:00
Joseph Schuchart
9f2c6a42c3 Shmem: use bitwise and instead of logical and to check for allocator capabilities
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2019-11-05 15:44:59 +01:00
bosilca
6e3ff5e9b4
Merge pull request #7131 from bosilca/fix/tcp_errors
Correctly report TCP connect errors.
2019-10-31 22:12:27 -04:00
George Bosilca
476562752f
Correctly report TCP connect errors.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-10-31 18:33:15 -04:00
Mark Allen
6855ebb84b Adding -mca comm_method to print table of communication methods
This is closely related to Platform-MPI's old -prot feature.

The long-format of the tables it prints could look like this:
>   Host 0 [myhost001] ranks 0 - 1
>   Host 1 [myhost002] ranks 2 - 3
>   Host 2 [myhost003] ranks 4
>   Host 3 [myhost004] ranks 5
>   Host 4 [myhost005] ranks 6
>   Host 5 [myhost006] ranks 7
>   Host 6 [myhost007] ranks 8
>   Host 7 [myhost008] ranks 9
>   Host 8 [myhost009] ranks 10
>
>    host | 0    1    2    3    4    5    6    7    8
>   ======|==============================================
>       0 : sm   tcp  tcp  tcp  tcp  tcp  tcp  tcp  tcp
>       1 : tcp  sm   tcp  tcp  tcp  tcp  tcp  tcp  tcp
>       2 : tcp  tcp  self tcp  tcp  tcp  tcp  tcp  tcp
>       3 : tcp  tcp  tcp  self tcp  tcp  tcp  tcp  tcp
>       4 : tcp  tcp  tcp  tcp  self tcp  tcp  tcp  tcp
>       5 : tcp  tcp  tcp  tcp  tcp  self tcp  tcp  tcp
>       6 : tcp  tcp  tcp  tcp  tcp  tcp  self tcp  tcp
>       7 : tcp  tcp  tcp  tcp  tcp  tcp  tcp  self tcp
>       8 : tcp  tcp  tcp  tcp  tcp  tcp  tcp  tcp  self
>
>   Connection summary:
>     on-host:  all connections are sm or self
>     off-host: all connections are tcp

In this example hosts 0 and 1 had multiple ranks so "sm" was more
meaningful than "self" to identify how the ranks on the host are
talking to each other. While host 2..8 were one rank per host so
"self" was more meaningful as their btl.

Above a certain number of hosts (12 by default) the above table gets too big
so we shrink to a more abbreviated looking table that has the same data:
>    host | 0 1 2 3 4       8
>   ======|====================
>       0 : A C C C C C C C C
>       1 : C A C C C C C C C
>       2 : C C B C C C C C C
>       3 : C C C B C C C C C
>       4 : C C C C B C C C C
>       5 : C C C C C B C C C
>       6 : C C C C C C B C C
>       7 : C C C C C C C B C
>       8 : C C C C C C C C B
>   key: A == sm
>   key: B == self
>   key: C == tcp

Then above 36 hosts we stop printing the 2d table entirely and just print the
summary:
>   Connection summary:
>     on-host:  all connections are sm or self
>     off-host: all connections are tcp

The options to control it are
    -mca comm_method 1   :   print the above table at the end of MPI_Init
    -mca comm_method 2   :   print the above table at the beginning of MPI_Finalize
    -mca comm_method_max <n> :  number of hosts <n> for which to print a full size 2d
    -mca comm_method_brief 1 :  only print summary output, no 2d table
    -mca comm_method_fakefile <filename> :  for debugging only

* printing at init vs finalize:

The most important difference between these two is that when printing the table
during MPI_Init(), we send extra messages to make sure all hosts are connected to
each other. So the table ends up working against the idea of on-demand connections
(although it's only forcing the n^2 connections in the number of hosts, not the
total ranks).  If printing at MPI_Finalize() we don't create any connections that
aren't already connected, so the table is more likely to have "n/a" entries if
some hosts never connected to each other.

* how many hosts <n> for which to print a full size 2d table

The option -mca comm_method_max <n> can be used to specify a number of hosts <n>
(default 12) that controls at what host-count the unabbreviated / abbreviated
2d tables get printed:
    1 - n      : full size 2d table
    n+1 - 3n   : shortened 2d table
    3n+1 - inf : summary only, no 2d table

* brief

The option -mca comm_method_brief 1 can be used to skip the printing of the 2d
table and only show the short summary

* fakefile

This is a debugging option that allows easeir testing of all the printout
routines by letting all the detected communication methods between the hosts
be overridden by fake data from a file.

The source of the information used in the table is the .mca_component_name

In the case of BTLs, the module always had a .btl_component linking back to the
component. The vars mca_pml_base_selected_component and ompi_mtl_base_selected_component
offer similar functionality for pml/mtl.

So with the ability to identify the component, we can then access
the component name with code like this
    mca_pml_base_selected_component.pmlm_version.mca_component_name
See the three lookup_{pml,mtl,btl}_name() functions in hook_comm_method_fns.c,
and their use in comm_method() to parse the strings and produce an integer
to represent the connection type being used.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2019-10-31 16:23:57 -04:00
Joseph Schuchart
5471d59af4 Fix OPAL_ALIGN_MIN to work on 32-bit systems
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2019-10-30 10:33:10 +01:00
Gilles Gouaillardet
631a43581f
Merge pull request #7117 from ggouaillardet/topic/f08_bind_c_constants_revamp_misc_fixes
fortran/use-mpi-f08: misc fixes
2019-10-30 10:42:33 +09:00
Edgar Gabriel
007b773cd7
Merge pull request #7122 from edgargabriel/pr/simple-aggr-mode-fix
common/ompio: fix calculation in simple-grouping option
2019-10-29 13:38:24 -05:00
Josh Hursey
312c55edaa
Merge pull request #7092 from sam6258/smiller_rsh_chdir
plm/rsh: Add chdir option to change directory before orted exec
2019-10-29 13:34:25 -05:00
Edgar Gabriel
ad5d0df4e9 common/ompio: fix calculation in simple-grouping option
This is based on a bug reported on the mailing list using a netcdf testcase.
The problem occurs if processes are using a custom file view, but on some
of them it appears as if the default file view is being used. Because of that,
the simple-grouping option lead to different number of aggregators used on different
processes, and ultimately to a deadlock. This patch fixes the problem by not using
the file_view size anymore for the calculation in the simple-grouping option,
but the contiguous chunk size (which is identical on all processes).

Fixes issue #7109

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-10-29 12:30:41 -05:00
Gilles Gouaillardet
fda4d040da fortran/use-mpi-f08: misc fixes
- fix typos from open-mpi/ompi@b10a60a5a9
 - remove remaining references to OMPI_PROTECTED from open-mpi/ompi@df6d763a53

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-10-29 15:00:51 +09:00
Jeff Squyres
8343a289f2
Merge pull request #7112 from wbailey2/pr/fix-HACKING
Changed the final URL to https://github.com/westes/flex
2019-10-28 09:25:21 -04:00
Jeff Squyres
e59e6f714c
Merge pull request #7105 from ggouaillardet/topic/f08_bind_c_constants_revamp
fortran/use-mpi-f08: revamp mpi_f08 constants
2019-10-28 09:14:23 -04:00
Joseph Schuchart
02dd877d8a RDMA osc: perform CAS in shared memory if possible
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2019-10-28 11:24:16 +01:00
William Bailey
caf1d9292c Changed the final URL to https://github.com/westes/flex
Signed-off-by: William Bailey <wbailey2@nd.edu>
2019-10-27 22:33:50 -04:00
Gilles Gouaillardet
51e23f8cb6 fortran/use-mpi-f08: remove bind(C) constants.
Remove unused bind(C) constants in ompi/mpi/fortran/use-mpi-f08/constants.{c,h}
(and break ABI compatibility).

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-10-28 10:28:17 +09:00
Gilles Gouaillardet
df6d763a53 configury: remove references to unused OMPI_PROTECTED
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-10-28 10:28:17 +09:00
Gilles Gouaillardet
b10a60a5a9 fortran/use-mpi-f08: revamp constant declarations
In order to work around an issue with flang based compilers,
avoid declaring bind(C) constants and use plain Fortran parameter
instead.

For example,
type(MPI_Comm), bind(C, name="ompi_f08_mpi_comm_world") OMPI_PROTECTED :: MPI_COMM_WORLD
is changed to
type(MPI_Comm), parameter :: MPI_COMM_WORLD = MPI_Comm(OMPI_MPI_COMM_WORLD)

Note that in order to preserve ABI compatibility, ompi/mpi/fortran/use-mpi-f08/constants.{c,h}
have been kept even if its symbols are no more referenced by Open MPI.

Refs. open-mpi/ompi#7091

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-10-28 10:01:17 +09:00
Joseph Schuchart
c09ca039b4 uGNI: Fix potential deadlock when processing outstanding transfers
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2019-10-26 12:21:17 +02:00
Austen Lauria
aa8be9c12d
Merge pull request #6284 from devreal/ompi-rdma-memalign
Ensure proper alignment of memory provided by MPI
2019-10-25 12:27:58 -04:00
Austen Lauria
ecd990a67c
Merge pull request #6933 from devreal/osc-ucx-excl-lock
UCX osc: properly release exclusive lock to avoid lockup
2019-10-25 09:16:51 -04:00
Austen Lauria
96f55b0b32
Merge pull request #7096 from jsquyres/pr/fix-alps-configure-output
opal_check_alps: fix configure output
2019-10-25 09:02:25 -04:00
Jeff Squyres
d8f17aea69
Merge pull request #7097 from mcoil1/pr/README-fix2
README: Use "--" notation for CLI options
2019-10-18 16:32:29 -04:00
Maxwell Coil
7e07346524 README: Use "--" notation for CLI options
Signed-off-by: Maxwell Coil <mcoil@nd.edu>
2019-10-18 15:44:23 -04:00
Jeff Squyres
26705efad0 opal_check_alps: fix configure output
There was a path where OPAL_CHECK_ALPS would exit its testing but
still leave `opal_check_cray_alps_happy` blank.  Fix that by setting
it to "no".

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-10-18 11:30:00 -07:00
Edgar Gabriel
dce203ffc6
Merge pull request #7057 from edgargabriel/topic/romio321-status-set-elements-fix
MPIR_Status_set_bytes: fix for large counts
2019-10-18 08:16:36 -05:00
Nathan Hjelm
b1ef5a40fa
Merge pull request #7016 from hjelmn/fix_btl_uct_from_yet_another_unannounced_api_break_in_the_openucx_uct_layer
btl/uct: add support for OpenUCX v1.8 API changes
2019-10-17 06:27:18 -07:00
Scott Miller
c1b8599528 plm/rsh: Add chdir option to change directory before orted exec
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
2019-10-15 17:19:30 -04:00
Jeff Squyres
b6c4d5c118
Merge pull request #7060 from jsquyres/pr/usnic-mca-updates
BTL usnic MCA updates
2019-10-15 10:48:10 -04:00
Jeff Squyres
e1e6d8b85e
Merge pull request #7076 from ftab/pr/my-superlative-fix
README: Remove info for plugins that aren't used anymore
2019-10-10 14:52:36 -04:00
Jeff Squyres
65fd12feff
Merge pull request #7081 from msbrowning/pr/fixed-README
Removed text block from line 883 of README.
2019-10-10 14:52:23 -04:00
Mark Browning
77b3ff9d38 Remove the stale cr MPI extension
Also removed text block from line 883 of README.

Signed-off-by: Mark Browning <marksbrowning3@gmail.com>
2019-10-10 13:24:30 -04:00
Jeff Squyres
f7ee4463b3
Merge pull request #7079 from CalebProvost/hacktoberfest
Edit README
2019-10-10 13:18:54 -04:00
Jeff Squyres
896ce76b64
Merge pull request #7082 from kizill/master
Fix ipv6 improper address copy bug
2019-10-10 12:01:44 -04:00
Jeff Squyres
8f3583d3bd
Merge pull request #7073 from Joe-Downs/pr/fix-README
README: edit "dist_graph topologies" to "communicator topologies"
2019-10-10 11:55:43 -04:00
Jeff Squyres
d736253079
Merge pull request #7074 from classicsman/pr/fix-README
Deleted paragraph
2019-10-10 11:50:38 -04:00
Jeff Squyres
836a0766ae
Merge pull request #7072 from bfitzgit23/pr/fix-README
README-fixed-bfitzgit23
2019-10-10 11:39:06 -04:00
CalebProvost
634054fb37 README: minor grammar fixes
Signed-off-by: CalebProvost <DHX664@gmail.com>
2019-10-10 11:23:55 -04:00