1
1

28168 Коммитов

Автор SHA1 Сообщение Дата
Artem Polyakov
4add7cd5f5
Merge pull request #4781 from karasevb/fix_rmaps_nodelist
rmaps: fixed the ordering of `mpirun` target nodes
2018-02-02 12:29:15 -08:00
Boris Karasev
52e81ee4b1 rmaps: fixed the ordering of mpirun target nodes
Fixed the desync of job-nodelists between mpirun and orted
daemons. The issue was observed when using RSH launching because user
can provide arbitrary order of nodes regarding HNP placement.
The mpirun process propagate the daemon's nodelist order to nodes.
The problem was that HNP itself is assembling the nodelist based on
user provided order. As the result ranks assignment was calculated
differently on orted and mpirun.

Consider following example:
* User launches mpirun on node cn2.
* Hostlist is cn1,cn2,cn3,cn4; ppn=1
* mpirun is passing hostlist cn[2:2,1,3-4]@0(4) to orteds
So as result mpirun will assing rank 0 on cn1 while orted will assign
rank 0 on cn2 (because orted sees cn2 as the first element in the node
list)

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-02-01 17:16:05 +02:00
Ralph Castain
bc1d7ff2cc
Merge pull request #4780 from ggouaillardet/topic/ext3x
pmix/ext3x: remove autogenerated ext3x.h header file
2018-01-31 08:18:34 -08:00
Gilles Gouaillardet
43700faba1 pmix/ext3x: remove autogenerated ext3x.h header file
This header file was meant to be autogenerated, and for
some reasons, was never removed from the repository.
Update .gitignore as well

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 23:45:42 +09:00
Ralph Castain
7ddffc627d
Merge pull request #4776 from rhc54/topic/rte
Correct abstraction break and update ignores
2018-01-31 04:34:05 -08:00
Gilles Gouaillardet
9dcb7ab317
Merge pull request #4772 from ggouaillardet/topic/osc_sm_free
osc/sm: fix the osc_free callback
2018-01-31 17:06:36 +09:00
Ralph Castain
982415749c Update ignores for pmix/ext3x component
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-30 21:41:55 -08:00
Ralph Castain
8e8a9aecc5 Correct abstraction break - direct reference to ORTE
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-30 21:19:14 -08:00
Ralph Castain
0c5bb999ed
Merge pull request #4775 from ggouaillardet/topic/ext3x
pmix/ext3x: bring external component up-to-date with the embedded pmix3x
2018-01-30 21:18:48 -08:00
Gilles Gouaillardet
8209fca842 pmix/ext3x: bring external component up-to-date with the embedded pmix3x
add the callback prototype for the upcoming PMIx_IOF_push() API

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:35:34 +09:00
Gilles Gouaillardet
0481277e93 pmix/ext3x: bring external component up-to-date with the embedded pmix3x
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:33:33 +09:00
Nathan Hjelm
bb212e0c94
Merge pull request #4767 from ggouaillardet/topic/vader_backing_file
btl/vader: make the backing file job specific
2018-01-30 21:27:02 -07:00
Gilles Gouaillardet
0285c63348 pmix/ext3x: generate component source when only static libraries are built
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:21:14 +09:00
Gilles Gouaillardet
34b45cc879 osc/sm: fix the osc_free callback
If component selection fails, then module->bases might be unallocated
when ompi_osc_sm_free() in invoked, so test it before trying to free()
module->bases[0].

Thanks Martin Binder for the report.

Refs open-mpi/ompi#4770

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 11:23:21 +09:00
Yossi Itigin
46967cfa63
Merge pull request #4764 from hoopoepg/topic/set-recv-status-canceled
request/state: update state for canceled request
2018-01-30 12:22:59 +02:00
Gilles Gouaillardet
611d7c2d27 btl/vader: make the backing file job specific
Since open-mpi/ompi@47fd2313ab
the backing file is now in /dev/shm by default. As a consequence,
the backing file name has to include the jobid so more than one job
can run at a time.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-30 16:52:51 +09:00
Sergey Oblomov
7a5811d0a8 request/state: update state for canceled request
- fixed issue in set state for canceled request

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-01-29 18:26:20 +02:00
Ralph Castain
f3fbc1172d
Merge pull request #4757 from ggouaillardet/topic/iof_hnp
iof: do not release a sink before all read data is written.
2018-01-26 14:43:01 -08:00
Ralph Castain
e284a3e98b
Merge branch 'master' into topic/iof_hnp 2018-01-26 13:55:49 -08:00
Ralph Castain
5190f4e4ec
Merge pull request #4759 from rhc54/topic/missing
Properly terminate the job when executable not found
2018-01-26 13:55:32 -08:00
Ralph Castain
b643852d8a Properly terminate the job when executable not found
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-26 12:09:24 -08:00
Ralph Castain
c166e26265
Merge branch 'master' into topic/iof_hnp 2018-01-26 06:15:58 -08:00
Ralph Castain
d83d2be9ea
Merge pull request #4758 from rhc54/topic/sync
Refresh ORTE PMIx support
2018-01-25 15:00:37 -08:00
Ralph Castain
a17df810ed Sync with PMIx iof rfc
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 10:51:38 -08:00
Ralph Castain
e9cd7fd7e6 Update orte
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:53:43 -08:00
Ralph Castain
d1071397ac Update the orte/ess framework
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:43:44 -08:00
Ralph Castain
9fb80bd239 Update the opal/pmix base framework elements
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:37:52 -08:00
Ralph Castain
187352eb3d Update the PMIx external components
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:35:57 -08:00
Ralph Castain
a5679ef000 Update the PMIx 3.x component
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:34:44 -08:00
Gilles Gouaillardet
54fb8ac5d5 iof: do not release a sink before all read data is written.
When too much data is available on stdin, it might not be
forwarded immediatly to the task (write() might fail with -EAGAIN),
so when stdin is terminated, there might be some remaining data
to be pushed to the task. In this case, delay the release of the sink
so no data is discarded.

Refs open-mpi/ompi#4744

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-25 16:29:22 +09:00
Gilles Gouaillardet
ebffaded5d iof/base: remove the unused iof_base_input_files MCA parameter
this option was only used by the iof/mr_hnp (aka Map/Reduce)
component that is no more part of master nor v3 branches.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-25 11:29:14 +09:00
Edgar Gabriel
57f946a798
Merge pull request #4749 from edgargabriel/pr/fs_ufs_bad_return_statement
fs/ufs and fs/lustre: remove erroneous return statement
2018-01-24 16:28:39 -06:00
Edgar Gabriel
bcf26d419f fs/ufs and fs/lustre: remove erroneous return statement
an erroneous return statement has creeped in commit 1885d99
which leads to some processes not resetting stripe_size
and stripe_count correctly. This can lead in 3.0.x to different
fcoll modules being selected. The impact is not that dramatic on
master and 3.1.x, but could lead to problems as well.

Fixes #4745

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-01-24 14:07:21 -06:00
Jeff Squyres
5b0df815d2
Merge pull request #4743 from jsquyres/pr/extension-attribute-config-test-fix
opal_check_attributes: fix __extension__ test
2018-01-24 13:52:23 -05:00
Jeff Squyres
ff31da6f74 opal_check_attributes: fix __extension__ test
Per
https://gcc.gnu.org/onlinedocs/gcc/Alternate-Keywords.html#index-_005f_005fextension_005f_005f,
use __extension__ in a C statement that will actually verify if the
compiler supports it or not.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-01-23 13:44:43 -08:00
Gilles Gouaillardet
88e26c63e0 spml/ucx: fix a double free() issue
in mca_spml_ucx_add_procs() error path

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-22 13:42:16 +09:00
Joshua Ladd
13bbc394cc
Merge pull request #4728 from xinzhao3/topic/osc-ucx-fetch-op-fix
OMPI/OSC/UCX: adding atomic lock for fetch_and_op and compare_and_swap
2018-01-19 15:32:51 -05:00
Ralph Castain
f92c9f35e6
Merge pull request #4729 from rhc54/topic/revert
Revert changes to OPAL_CHECK_PACKAGE
2018-01-17 17:01:17 -08:00
Ralph Castain
01e6539127 Revert "Filter /usr[/local]/include from opal CPPFLAGS when used explictly --with-package=DIR"
This reverts commit c4fe4ecfb918eef88bcc8dc10fdd743e3dc7fa38.

Revert "Fix  DIR, DIR/include search for --with-pmix"

This reverts commit 2e3f4017639e0b248c2f0d1eb14e7bb31f6287be.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-17 16:02:19 -08:00
Xin Zhao
72ff2b1135 OMPI/OSC/UCX: adding atomic lock for fetch_and_op and compare_and_swap
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-01-18 00:36:22 +02:00
Joshua Ladd
dbefb35aad
Merge pull request #4635 from karasevb/oshmem/spec_1.3/broadcast
oshmem: remove "shmem_broadcast" in accordance with the spec v1.3
2018-01-17 12:11:09 -05:00
Yossi Itigin
f2851fd502
Merge pull request #4724 from alex-mikheev/topic/ucx_as_default
ompi/oshmem: ucx is selected over yalla/ikrit by default
2018-01-17 17:41:49 +02:00
Yossi Itigin
df1136dc63
Merge pull request #4719 from alex-mikheev/topic/pml_ucx_send_nbr
ompi: pml/ucx: blocking send using ucp_tag_send_nbr
2018-01-17 17:01:30 +02:00
Alex Mikheev
640e945b9c ompi: pml/ucx: blocking send using ucp_tag_send_nbr
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-01-17 15:54:18 +02:00
Alex Mikheev
ae326546f4
ompi/oshmem: ucx is selected over yalla/ikrit by default
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-01-17 15:08:04 +02:00
Yossi Itigin
79ca1c4f18
Merge pull request #4697 from yosefe/topic/opal-progress-avoid-checking-timer
opal_progress: check timer only once per 8 calls
2018-01-17 10:48:34 +02:00
Ralph Castain
6b3cf6fcf1
Merge pull request #4722 from pkovacs/master-opal-check
opal_check_package: filter /usr[/local]/include from CPPFLAGS
2018-01-16 16:23:49 -08:00
Philip Kovacs
c4fe4ecfb9 Filter /usr[/local]/include from opal CPPFLAGS when used explictly --with-package=DIR
Signed-off-by: Philip Kovacs <pkdevel@yahoo.com>
2018-01-16 16:45:20 -05:00
Ralph Castain
8eea942b80
Merge pull request #4721 from rhc54/topic/nidmap
Remove the orte_nidmap test
2018-01-16 12:35:43 -08:00
Ralph Castain
345916f2f3 Remove the orte_nidmap test
Moved to the ompi-tests repo

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-16 11:47:44 -08:00