1
1
Граф коммитов

28082 Коммитов

Автор SHA1 Сообщение Дата
KAWASHIMA Takahiro
710080be63
Merge pull request #4667 from kawashima-fj/pr/f08-pmpi
fortran: Fix PMPI interface bugs in mpi_f08 module
2018-01-05 03:45:10 -06:00
Gilles Gouaillardet
56fe714776
Merge pull request #4637 from ggouaillardet/topic/tree_spawn_no_regex
orted: fix tree-spawn when the node regex is too long
2018-01-04 13:03:31 +09:00
Gilles Gouaillardet
03da5218ea orte: remove some dead code related to the new tree_spawn method
Now that the daemon calls remote_spawn itself, there is no longer
a need for the "tree_spawn" command nor the associated command
processing code since the HNP is no longer sending a tree-spawn
message to the orted.

Thanks Ralph for the guidance !

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-04 09:35:17 +09:00
Gilles Gouaillardet
4527584840 orted: fix tree-spawn when the node regex is too long
When the node regex is too long to be sent on the command line,
retrieve  it first from the parent, and then spawn the remote orted

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-04 09:33:46 +09:00
Gilles Gouaillardet
799152e7fb plm/base: add the orte_plm_base_node_regex_threshold MCA parameter
This parameter can be used to set the node regex max length that can
be passed to the orted command line.
For testing purpose, it can be set to zero in order to force the node regex
being retrieved by orted from its parent.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-04 09:33:46 +09:00
Gilles Gouaillardet
f7e29127bc sstore/stage: fix parameter handling in sstore_stage_local_compress_waitpid_cb()
since open-mpi/ompi@8f496b01b7
sstore_stage_local_compress_waitpid_cb is invoked with an orte_wait_tracker_t *,
that must be used to reach the orte_sstore_stage_local_app_snapshot_info_t *.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-04 09:33:46 +09:00
Gilles Gouaillardet
c4cd12bc43 plm/rsh: fix parameter handling in rsh_wait_daemon()
since open-mpi/ompi@8f496b01b7
rsh_wait_daemon is invoked with an orte_wait_tracker_t *,
that must be used to reach the orte_plm_rsh_caddy_t *.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-04 09:33:46 +09:00
bosilca
ef38ca5663
Merge pull request #4644 from bosilca/topic/treematch
Fix treematch topology assert
2018-01-02 21:21:54 -05:00
Nathan Hjelm
8b8aae372d opal/asm: add atomic min/max convenience functions
This commit adds atomic functions for min/max.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-01-02 08:38:36 -07:00
Gilles Gouaillardet
e2808c9ccc
Merge pull request #4673 from ggouaillardet/topic/ompi_comm_split
ompi/communicator: optimize ompi_comm_split()
2017-12-28 16:23:00 +09:00
Gilles Gouaillardet
2dd345465f ompi/communicator: optimize ompi_comm_split()
set grp_local_rank as MPI_UNDEFINED before invoking
ompi_comm_nexcid() in order to benefit from the optimizations
introduced in open-mpi/ompi@68167ec879

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-12-28 15:34:04 +09:00
Yossi Itigin
70a2098493
Merge pull request #4655 from yosefe/topic/spml-ucx-fix-rkey-leak
spml_ucx: fix rkey leak
2017-12-27 14:52:31 +02:00
Yossi Itigin
1193e1eb83 spml_ucx: fix rkey leak
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-12-26 20:47:26 +02:00
Yossi Itigin
697a9437e2
Merge pull request #4666 from alex-mikheev/topic/pml_ucx_recv_fix
ompi: pml ucx: improve recv latency
2017-12-26 20:12:18 +02:00
Alex Mikheev
e7bf0617cf
ompi: pml ucx: improve recv latency
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-12-26 16:24:16 +02:00
KAWASHIMA Takahiro
91c4cad343
Merge pull request #4668 from kawashima-fj/pr/f08-set-cancelled
fortran: Call PMPI from PMPI_Status_set_cancelled_f08
2017-12-26 03:12:48 -06:00
KAWASHIMA Takahiro
bd2fe9c324 fortran: Call PMPI from PMPI_Status_set_cancelled_f08
This is a bug which was forgotten to change in c08f97b030.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-12-26 15:53:12 +09:00
KAWASHIMA Takahiro
00e3c7a973 fortran: Align indentation
This change makes comparison of `mpi-f08-interfaces.F90` and
`pmpi-f08-interfaces.F90` easier.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-12-26 14:39:09 +09:00
KAWASHIMA Takahiro
056eb39b12 fortran: Correct type of info_used
It is incorrectly typed as `MPI_Comm` in only `pmpi` in 24f7bd327e.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-12-26 14:39:09 +09:00
KAWASHIMA Takahiro
0c3a534b32 fortran: Use C_PTR for buffer_addr
It was changed to use `C_PTR` in only `mpi` in fc69c0be24.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-12-26 14:39:09 +09:00
KAWASHIMA Takahiro
d4fc404dc6 fortran: Change PMPI_Aint_{add,diff} to functions.
They were incorrectly changed to subroutines in only `pmpi`
in 258d1aa160.

Strictly speaking, this change involves binary incompatibility.
But nobody used these subroutines and nobody will be affected because
these subroutines were useless (didn't return a calculated value).

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-12-26 14:39:09 +09:00
KAWASHIMA Takahiro
9240967b8f fortran: Remove ASYNCHRONOUS from mpi_f08 pmpi
It was removed from only `mpi` as a bug fix in db41d749c1.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-12-26 14:39:09 +09:00
Nathan Hjelm
39d598899b rcache/grdma: fix crash when part of a registration is unmapped
This commit fixes an issue when a registration is created for a large
region and then invalidated while part of it is in use.

References #4509

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-12-22 10:36:35 -07:00
KAWASHIMA Takahiro
4cd917a3fd
Merge pull request #4659 from yanagibashi/pr/fortran2008
Add missing Fortran 2008 binding subroutines
2017-12-22 01:53:36 -06:00
Tsubasa Yanagibashi
3f4b373856 Add missing Fortran 2008 binding subroutines
added missing Fortran 2008 binding pmpi_{*} subroutines to Open MPI.

Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
2017-12-22 13:45:53 +09:00
Matias Cabral
a600525836
Merge pull request #4653 from aravindksg/ofi_regression_fix
MTL OFI: Allow retries in MTL progress for interrupted syscalls
2017-12-21 10:55:40 -08:00
Aravind Gopalakrishnan
fb68726baf MTL OFI: Allow retries in MTL progress for interrupted syscalls
This fixes a regression in sockets provider which could return -EINTR value
from fi_cq_read() due to a syscall being interrupted. The error value is
currently interpreted as fatal condition. Relax the rule so that we can retry
fi_cq_read() operation.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2017-12-20 14:58:49 -08:00
Ralph Castain
3f7494aec1
Merge pull request #4648 from rhc54/topic/cleanup
Silence warnings in optimized build
2017-12-20 14:00:22 -08:00
Ralph Castain
d5471d7898 Silence warnings in optimized build
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-20 12:00:28 -08:00
Ralph Castain
ad96fa19d4
Merge pull request #4642 from rhc54/topic/validate
Detect/warn of illegal node names
2017-12-20 10:18:43 -08:00
Brian Barrett
465842294f doc: Add README note about ARM/POWER hangs
As documented in #4563 and #3697, there is an issue on ARM and
POWER platforms when the atomic fifo assembly isn't inlined,
which manifests as a hang.  Document the issue and the
work-around until a proper fix is committed.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2017-12-20 09:47:27 -08:00
George Bosilca
38455845db
Fix asserts.
In both cases we were comparing with the wrong size, it should be either
the number of local processes or the number of nodes, and not the size
of the communicator.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-12-20 11:51:35 -05:00
George Bosilca
808f865e9d
Force all output to use OMPI infrastructure.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-12-20 11:50:51 -05:00
Ralph Castain
1687b04c9e
Merge pull request #4645 from rhc54/topic/debug
Remove debug from rmaps base
2017-12-20 07:53:51 -08:00
Ralph Castain
8a7a57d4e2 Remove debug from rmaps base
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-20 00:22:51 -08:00
Ralph Castain
3269f2de66 Detect/warn of illegal node names
If we detect that someone has given us an incorrect node name, provide a helpful message telling them as it is almost certainly a typo.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-19 12:55:04 -08:00
Ralph Castain
b37315658b
Merge pull request #4636 from rhc54/topic/attrs
Fix the optnone attribute, add extension attribute
2017-12-19 10:18:59 -08:00
Ralph Castain
ccc2fcdfdf
Merge pull request #4627 from ggouaillardet/topic/nidmap
orte/nidmap: correctly handle '-' as a valid hostname character
2017-12-19 09:09:58 -08:00
Ralph Castain
db8ebd33ad Fix the optnone attribute, add extension attribute
See how the various compilers handle these

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-18 19:18:53 -08:00
Ralph Castain
e9f4e93800
Merge pull request #4606 from rhc54/topic/register
Update to PMIx v3.0 PR for cleanup registration
2017-12-18 07:57:07 -08:00
Nathan Hjelm
47fd2313ab btl/vader: move backing files into /dev/shm on Linux
This commit moves the backing files to /dev/shm to avoid limitations
that may be set on /tmp. The files are registered with pmix to ensure
they are cleaned up after an erroneous exit.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 48101278160672317ade352365592f56ef3b8977)
2017-12-18 07:09:18 -08:00
Ralph Castain
07427c6d89 Update to PMIx v3.0 PR for cleanup registration
If available, have apps use registration capability to cleanup their session directories. Setup capability for vader to register its shared memory file location - let someone familiar with that code do so.

Final cleanup to track uid/gid, update the opal/pmix API to pass flags for ignore and leave top directory alone

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-18 06:53:11 -08:00
Ralph Castain
a863c26d6f
Merge pull request #4628 from rhc54/topic/treespawn
Fix the tree-spawn-with-rollup
2017-12-18 06:48:10 -08:00
Jeff Squyres
3d80794df2
Merge pull request #4631 from jedbrown/jed/doc-fix-mpi-attr-get
MPI_Attr_get: doc fix: MPI_Comm_create_attr -> MPI_Comm_get_attr
2017-12-17 11:23:03 -05:00
Jed Brown
533800070e MPI_Attr_get: doc fix: MPI_Comm_create_attr -> MPI_Comm_get_attr
MPI_Comm_create_attr does not exist.

Signed-off-by: Jed Brown <jed@jedbrown.org>
2017-12-17 07:44:22 -07:00
Ralph Castain
7a58f91ab9 Fix the tree-spawn-with-rollup
Somehow, the code for passing a daemon's parent was accidentally removed, thus breaking the tree-spawn callback sequence and causing all daemons to phone directly home. Note that this is noticeably slower than no-tree-spawn for small clusters where directly ssh launch of the child daemons from the HNP doesn't overload the available file descriptors.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-15 16:03:43 -08:00
Gilles Gouaillardet
f3e2a313af orte/nidmap: correctly handle '-' as a valid hostname character
'-' is not an alpha character nor a digit, but it is a valid hostname
character and should be handled as an alpha character, otherwise, nodes
such as node-001 do not get "compressed" in the regex.

Refs open-mpi/ompi#4621

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-12-15 15:28:50 +09:00
Brian Barrett
ea35820246 dist: Update NEWS to match release branches
Pull in changes from the v2.0x, v2.x, and v3.0.x release branches
so that master includes all items from released releases.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2017-12-14 15:11:11 -08:00
Ralph Castain
fc1acea533
Merge pull request #4626 from rhc54/topic/optnone
Add the __optnone__ attribute
2017-12-14 15:01:56 -08:00
Ralph Castain
5c4185abd8 Add the __optnone__ attribute to help avoid optimizing out MPIR_Breakpoint
Thanks to @kiranchandramohan for the suggestion

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-14 13:14:21 -08:00