Artem Polyakov
4add7cd5f5
Merge pull request #4781 from karasevb/fix_rmaps_nodelist
...
rmaps: fixed the ordering of `mpirun` target nodes
2018-02-02 12:29:15 -08:00
Boris Karasev
52e81ee4b1
rmaps: fixed the ordering of mpirun
target nodes
...
Fixed the desync of job-nodelists between mpirun and orted
daemons. The issue was observed when using RSH launching because user
can provide arbitrary order of nodes regarding HNP placement.
The mpirun process propagate the daemon's nodelist order to nodes.
The problem was that HNP itself is assembling the nodelist based on
user provided order. As the result ranks assignment was calculated
differently on orted and mpirun.
Consider following example:
* User launches mpirun on node cn2.
* Hostlist is cn1,cn2,cn3,cn4; ppn=1
* mpirun is passing hostlist cn[2:2,1,3-4]@0(4) to orteds
So as result mpirun will assing rank 0 on cn1 while orted will assign
rank 0 on cn2 (because orted sees cn2 as the first element in the node
list)
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-02-01 17:16:05 +02:00
Ralph Castain
bc1d7ff2cc
Merge pull request #4780 from ggouaillardet/topic/ext3x
...
pmix/ext3x: remove autogenerated ext3x.h header file
2018-01-31 08:18:34 -08:00
Gilles Gouaillardet
43700faba1
pmix/ext3x: remove autogenerated ext3x.h header file
...
This header file was meant to be autogenerated, and for
some reasons, was never removed from the repository.
Update .gitignore as well
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 23:45:42 +09:00
Ralph Castain
7ddffc627d
Merge pull request #4776 from rhc54/topic/rte
...
Correct abstraction break and update ignores
2018-01-31 04:34:05 -08:00
Gilles Gouaillardet
9dcb7ab317
Merge pull request #4772 from ggouaillardet/topic/osc_sm_free
...
osc/sm: fix the osc_free callback
2018-01-31 17:06:36 +09:00
Ralph Castain
982415749c
Update ignores for pmix/ext3x component
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-30 21:41:55 -08:00
Ralph Castain
8e8a9aecc5
Correct abstraction break - direct reference to ORTE
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-30 21:19:14 -08:00
Ralph Castain
0c5bb999ed
Merge pull request #4775 from ggouaillardet/topic/ext3x
...
pmix/ext3x: bring external component up-to-date with the embedded pmix3x
2018-01-30 21:18:48 -08:00
Gilles Gouaillardet
8209fca842
pmix/ext3x: bring external component up-to-date with the embedded pmix3x
...
add the callback prototype for the upcoming PMIx_IOF_push() API
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:35:34 +09:00
Gilles Gouaillardet
0481277e93
pmix/ext3x: bring external component up-to-date with the embedded pmix3x
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:33:33 +09:00
Nathan Hjelm
bb212e0c94
Merge pull request #4767 from ggouaillardet/topic/vader_backing_file
...
btl/vader: make the backing file job specific
2018-01-30 21:27:02 -07:00
Gilles Gouaillardet
0285c63348
pmix/ext3x: generate component source when only static libraries are built
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:21:14 +09:00
Gilles Gouaillardet
34b45cc879
osc/sm: fix the osc_free callback
...
If component selection fails, then module->bases might be unallocated
when ompi_osc_sm_free() in invoked, so test it before trying to free()
module->bases[0].
Thanks Martin Binder for the report.
Refs open-mpi/ompi#4770
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 11:23:21 +09:00
Yossi Itigin
46967cfa63
Merge pull request #4764 from hoopoepg/topic/set-recv-status-canceled
...
request/state: update state for canceled request
2018-01-30 12:22:59 +02:00
Gilles Gouaillardet
611d7c2d27
btl/vader: make the backing file job specific
...
Since open-mpi/ompi@47fd2313ab
the backing file is now in /dev/shm by default. As a consequence,
the backing file name has to include the jobid so more than one job
can run at a time.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-30 16:52:51 +09:00
Sergey Oblomov
7a5811d0a8
request/state: update state for canceled request
...
- fixed issue in set state for canceled request
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-01-29 18:26:20 +02:00
Ralph Castain
f3fbc1172d
Merge pull request #4757 from ggouaillardet/topic/iof_hnp
...
iof: do not release a sink before all read data is written.
2018-01-26 14:43:01 -08:00
Ralph Castain
e284a3e98b
Merge branch 'master' into topic/iof_hnp
2018-01-26 13:55:49 -08:00
Ralph Castain
5190f4e4ec
Merge pull request #4759 from rhc54/topic/missing
...
Properly terminate the job when executable not found
2018-01-26 13:55:32 -08:00
Ralph Castain
b643852d8a
Properly terminate the job when executable not found
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-26 12:09:24 -08:00
Ralph Castain
c166e26265
Merge branch 'master' into topic/iof_hnp
2018-01-26 06:15:58 -08:00
Ralph Castain
d83d2be9ea
Merge pull request #4758 from rhc54/topic/sync
...
Refresh ORTE PMIx support
2018-01-25 15:00:37 -08:00
Ralph Castain
a17df810ed
Sync with PMIx iof rfc
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 10:51:38 -08:00
Ralph Castain
e9cd7fd7e6
Update orte
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:53:43 -08:00
Ralph Castain
d1071397ac
Update the orte/ess framework
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:43:44 -08:00
Ralph Castain
9fb80bd239
Update the opal/pmix base framework elements
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:37:52 -08:00
Ralph Castain
187352eb3d
Update the PMIx external components
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:35:57 -08:00
Ralph Castain
a5679ef000
Update the PMIx 3.x component
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:34:44 -08:00
Gilles Gouaillardet
54fb8ac5d5
iof: do not release a sink before all read data is written.
...
When too much data is available on stdin, it might not be
forwarded immediatly to the task (write() might fail with -EAGAIN),
so when stdin is terminated, there might be some remaining data
to be pushed to the task. In this case, delay the release of the sink
so no data is discarded.
Refs open-mpi/ompi#4744
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-25 16:29:22 +09:00
Gilles Gouaillardet
ebffaded5d
iof/base: remove the unused iof_base_input_files MCA parameter
...
this option was only used by the iof/mr_hnp (aka Map/Reduce)
component that is no more part of master nor v3 branches.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-25 11:29:14 +09:00
Edgar Gabriel
57f946a798
Merge pull request #4749 from edgargabriel/pr/fs_ufs_bad_return_statement
...
fs/ufs and fs/lustre: remove erroneous return statement
2018-01-24 16:28:39 -06:00
Edgar Gabriel
bcf26d419f
fs/ufs and fs/lustre: remove erroneous return statement
...
an erroneous return statement has creeped in commit 1885d99
which leads to some processes not resetting stripe_size
and stripe_count correctly. This can lead in 3.0.x to different
fcoll modules being selected. The impact is not that dramatic on
master and 3.1.x, but could lead to problems as well.
Fixes #4745
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-01-24 14:07:21 -06:00
Jeff Squyres
5b0df815d2
Merge pull request #4743 from jsquyres/pr/extension-attribute-config-test-fix
...
opal_check_attributes: fix __extension__ test
2018-01-24 13:52:23 -05:00
Jeff Squyres
ff31da6f74
opal_check_attributes: fix __extension__ test
...
Per
https://gcc.gnu.org/onlinedocs/gcc/Alternate-Keywords.html#index-_005f_005fextension_005f_005f ,
use __extension__ in a C statement that will actually verify if the
compiler supports it or not.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-01-23 13:44:43 -08:00
Gilles Gouaillardet
88e26c63e0
spml/ucx: fix a double free() issue
...
in mca_spml_ucx_add_procs() error path
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-22 13:42:16 +09:00
Joshua Ladd
13bbc394cc
Merge pull request #4728 from xinzhao3/topic/osc-ucx-fetch-op-fix
...
OMPI/OSC/UCX: adding atomic lock for fetch_and_op and compare_and_swap
2018-01-19 15:32:51 -05:00
Ralph Castain
f92c9f35e6
Merge pull request #4729 from rhc54/topic/revert
...
Revert changes to OPAL_CHECK_PACKAGE
2018-01-17 17:01:17 -08:00
Ralph Castain
01e6539127
Revert "Filter /usr[/local]/include from opal CPPFLAGS when used explictly --with-package=DIR"
...
This reverts commit c4fe4ecfb918eef88bcc8dc10fdd743e3dc7fa38.
Revert "Fix DIR, DIR/include search for --with-pmix"
This reverts commit 2e3f4017639e0b248c2f0d1eb14e7bb31f6287be.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-17 16:02:19 -08:00
Xin Zhao
72ff2b1135
OMPI/OSC/UCX: adding atomic lock for fetch_and_op and compare_and_swap
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-01-18 00:36:22 +02:00
Joshua Ladd
dbefb35aad
Merge pull request #4635 from karasevb/oshmem/spec_1.3/broadcast
...
oshmem: remove "shmem_broadcast" in accordance with the spec v1.3
2018-01-17 12:11:09 -05:00
Yossi Itigin
f2851fd502
Merge pull request #4724 from alex-mikheev/topic/ucx_as_default
...
ompi/oshmem: ucx is selected over yalla/ikrit by default
2018-01-17 17:41:49 +02:00
Yossi Itigin
df1136dc63
Merge pull request #4719 from alex-mikheev/topic/pml_ucx_send_nbr
...
ompi: pml/ucx: blocking send using ucp_tag_send_nbr
2018-01-17 17:01:30 +02:00
Alex Mikheev
640e945b9c
ompi: pml/ucx: blocking send using ucp_tag_send_nbr
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-01-17 15:54:18 +02:00
Alex Mikheev
ae326546f4
ompi/oshmem: ucx is selected over yalla/ikrit by default
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-01-17 15:08:04 +02:00
Yossi Itigin
79ca1c4f18
Merge pull request #4697 from yosefe/topic/opal-progress-avoid-checking-timer
...
opal_progress: check timer only once per 8 calls
2018-01-17 10:48:34 +02:00
Ralph Castain
6b3cf6fcf1
Merge pull request #4722 from pkovacs/master-opal-check
...
opal_check_package: filter /usr[/local]/include from CPPFLAGS
2018-01-16 16:23:49 -08:00
Philip Kovacs
c4fe4ecfb9
Filter /usr[/local]/include from opal CPPFLAGS when used explictly --with-package=DIR
...
Signed-off-by: Philip Kovacs <pkdevel@yahoo.com>
2018-01-16 16:45:20 -05:00
Ralph Castain
8eea942b80
Merge pull request #4721 from rhc54/topic/nidmap
...
Remove the orte_nidmap test
2018-01-16 12:35:43 -08:00
Ralph Castain
345916f2f3
Remove the orte_nidmap test
...
Moved to the ompi-tests repo
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-16 11:47:44 -08:00