Ralph Castain
a9005d6f72
Merge pull request #3679 from rhc54/topic/spawn
...
Fix the backend mapper algorithm for comm_spawn. The front and back e…
2017-06-08 10:23:07 -07:00
Geoff Paulsen
bdc7206230
Merge pull request #3672 from markalle/pr/darray_fix
...
Type_create_darray with mix of BLOCK/CYCLIC
2017-06-08 10:52:50 -05:00
Ralph Castain
7b39f19f60
Fix the backend mapper algorithm for comm_spawn. The front and back ends need to get the nodes into the job map in the same order so that the ranking algorithms will reach the same results
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-08 08:00:52 -07:00
Ralph Castain
20166460c7
Merge pull request #3676 from rhc54/topic/orted
...
Ensure the orted doesn't go into an infinite loop during force-terminate
2017-06-08 05:51:20 -07:00
Ralph Castain
81ab79f311
Ensure the orted doesn't go into an infinite loop during force-terminate
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-07 21:44:49 -07:00
Ralph Castain
7002535059
Merge pull request #3671 from rhc54/topic/ofi
...
We cannot use OFI to determine when daemons can finalize as we don't …
2017-06-07 15:08:56 -07:00
George Bosilca
484004b03d
simple_spawn should be independent of ORTE.
2017-06-07 17:51:46 -04:00
Mark Allen
aeb2c02d2f
Type_create_darray with mix of BLOCK/CYCLIC
...
Example (using MPI_ORDER_C so the below has 6 rows of 4 ints to parcel out)
size = 4;
rank = 0;
ndims=2;
gsizes[0] = 6;
gsizes[1] = 4;
distribs[0] = MPI_DISTRIBUTE_CYCLIC;
distribs[1] = MPI_DISTRIBUTE_BLOCK;
dargs[0] = 2;
dargs[1] = 2;
psizes[0] = 2;
psizes[1] = 2;
MPI_Type_create_darray(size, rank, ndims,
gsizes, distribs, dargs, psizes,
MPI_ORDER_C, MPI_INT, &mydt);
Expectation for the layout:
inner dimension (1) is
4 items (ints) distributed block over 2 ranks with 2 items each
eg for rank 0: [ x x . . ]
outer dimension (0) is:
6 items (the above [ x x . .]) cyclic over 2 ranks with 2 items each
eg for rank 0:
[ x x . . ] : offset=0 bytes=8
[ x x . . ] : ofset=16 bytes=8
[ . . . . ]
[ . . . . ]
[ x x . . ] : offset=64 bytes=8
[ x x . . ] : offset=80 bytes=8
Or more specifically a stream of ints 0,1,2,3,4,5,6,7 sent into that
type should be
[ 0 1 . . ]
[ 2 3 . . ]
[ . . . . ]
[ . . . . ]
[ 4 5 . . ]
[ 6 7 . . ]
The data was laying out though as
[ 0 1 2 3 ]
[ . . . . ]
[ . . . . ]
[ . . . . ]
[ 4 5 6 7 ]
[ . . . . ]
because the recursive construction inside the block() function (which
creates the smaller row datatype [ x x . . ]) wasn't setting the extent
of that type.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-06-07 16:53:03 -04:00
Ralph Castain
919d7fcf49
We cannot use OFI to determine when daemons can finalize as we don't see the "sockets" go away. So always use the OOB for the mgmt conduit - this provides the necessary termination signal AND ensures that IOF and other mgmt messages go solely across TCP.
...
Cleanup the way we look for matching OFI addresses by using the opal_net_samenetwork helper function. This now works for multi-network environments, but only using the socket provider
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-07 13:51:30 -07:00
Nathan Hjelm
f038fe6427
Merge pull request #3661 from jjhursey/fix/ppc-wmb
...
atomics/powerpc: Fix WMB instruction
2017-06-07 12:14:20 -06:00
Ralph Castain
c9a0fd3d3f
Merge pull request #3666 from rhc54/topic/extpmix
...
Correct the external pmix configury
2017-06-07 06:21:36 -07:00
Ralph Castain
2d65908184
Correct the external pmix configury
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-07 00:33:29 -07:00
Ralph Castain
ea5649d381
Merge pull request #3665 from rhc54/topic/trivial
...
Add missing constant to error-strings
2017-06-07 00:09:30 -07:00
Ralph Castain
88b5ec3597
Merge pull request #3664 from rhc54/topic/ext2
...
Get the pmix/ext2x component to work. Fix a minor problem in the libevent external component.
2017-06-06 21:11:47 -07:00
Ralph Castain
bd1793ad17
Get the pmix/ext2x component to work. Fix a minor problem in the libevent external component.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 20:06:28 -07:00
Ralph Castain
17484409a3
Merge pull request #3662 from rhc54/topic/pmixupagain
...
Update to pmix v2.0.0rc1, including thread safety fixes
2017-06-06 16:12:24 -07:00
Ralph Castain
acd60a2cc4
Add missing constant to error-strings
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 16:10:52 -07:00
Ralph Castain
c3e6dc2022
Update to pmix v2.0.0rc1, including thread safety fixes
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 15:16:34 -07:00
Ralph Castain
21fba8b7f3
Merge pull request #3659 from rhc54/topic/threads
...
Update OPAL and ORTE for thread safety
2017-06-06 14:52:40 -07:00
Joshua Hursey
4796193cdb
atomics/powerpc: Fix WMB instruction
...
* `lwsync` is a write memory barrier.
- `eieio` is really not meant for this type of operation.
* `lwsync` can also be used for the read memory barrier according to
my reading of the of the Power 8 ISA docs (v2.07)
- https://www-01.ibm.com/marketing/iwm/iwm/web/reg/download.do?source=swg-opower&S_PKG=dl&lang=en_US&cp=UTF-8
* References https://github.com/pmix/pmix/pull/391
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-06-06 16:41:37 -05:00
Ralph Castain
93cf3c7203
Update OPAL and ORTE for thread safety
...
(I swear, if I look this over one more time, I'll puke)
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 12:30:57 -07:00
Ralph Castain
7be09f8143
Merge pull request #3658 from rhc54/topic/pmixup
...
Update to PMIx master
2017-06-06 11:23:20 -07:00
Ralph Castain
2f85d10600
Update to PMIx master
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 08:19:25 -07:00
George Bosilca
ba46b35515
Dont assume a size for constants with UL and ULL.
...
According to Section 6.4.4.1 of the C, we do not need to prepend a type
to a constant to get the right size. The compiler will infer the type
according to the number of bits in the constant.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-06-05 22:07:53 -04:00
Ralph Castain
29411472f2
Merge pull request #3656 from rhc54/topic/silence
...
Silence warnings when terminating
2017-06-05 15:22:02 -07:00
Ralph Castain
a28eaf914a
Silence warnings when terminating
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-05 13:53:07 -07:00
Jeff Squyres
44aef39b24
Merge pull request #3641 from ggouaillardet/topic/fortran_strings
...
fortran/base: rename strings.h into fortran_base_strings.h
2017-06-05 15:31:08 -04:00
Ralph Castain
8a377beb25
Merge pull request #3651 from rhc54/topic/stuff
...
Do not hang if we cannot relay messages. Eliminate extra error log message
2017-06-05 09:36:29 -07:00
Ralph Castain
594c0e2876
Retain the max terminal length of 78 characters, replace the word "disabled" with a simple "-" and hope people know what that means
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-05 07:10:05 -07:00
Ralph Castain
8f526968c2
Do not hang if we cannot relay messages. Eliminate extra error log message
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-05 06:35:19 -07:00
Ralph Castain
dea9ef2020
Merge pull request #3637 from hjelmn/osc_sm_info_fix
...
osc/sm: fix SEGV in new info usage
2017-06-05 05:45:21 -07:00
Ralph Castain
6d68d2ee0b
Merge pull request #3650 from rhc54/topic/info
...
Change the default sizes for opal_info output
2017-06-05 05:21:59 -07:00
Ralph Castain
e25a051f41
Change the default sizes for opal_info output
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-04 20:30:53 -07:00
Ralph Castain
51b4078b70
Merge pull request #3648 from rhc54/topic/ofi
...
Clean up the conduit open code so we return detectable errors when co…
2017-06-02 18:08:55 -07:00
Ralph Castain
e884cbf5f5
Even though the ofi component doesn't do any routing itself, the rest of the code base (e.g., grpcomm) needs to know what routing module this component is using. So set it to the "direct" module, and don't allow ofi to be used if that module isn't available.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-02 15:47:25 -07:00
Jeff Squyres
68a22689c4
Merge pull request #3649 from jsquyres/pr/fix-signal-include
...
ess: add missing <signal.h> header
2017-06-02 18:41:48 -04:00
Ralph Castain
ba9a6078c2
Add ability to select transport, and only compare the first one in the conduit list for a match. This lets you select which conduit to use for OFI - if you set "-mca rml_ofi_transports ethernet" you'll pickup the mgmt conduit. If you set "-mca rml_ofi_transports fabric", you'll get the coll conduit
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-02 14:31:23 -07:00
Jeff Squyres
af9565ec25
ess: add missing <signal.h> header
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-06-02 14:11:40 -07:00
Ralph Castain
b0b985bb06
Merge pull request #3644 from rhc54/topic/signals
...
Shift the signal forwarding code to ess/base...
2017-06-02 13:45:13 -07:00
Ralph Castain
066d5eedce
Shift the signal forwarding code to ess/base so it can be available to more than just the hnp component. Extend the slurm component to use it so that any signals given directly to the daemons by their slurmstepd get forwarded to their local clients
...
Check for NULL
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-02 10:59:14 -07:00
Ralph Castain
6b3bbd30c5
Clean up the conduit open code so we return detectable errors when conduit not opened.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-02 10:40:51 -07:00
Ralph Castain
e45a358bf0
Merge pull request #3647 from rhc54/topic/forced
...
Provide better help when forced_terminate is invoked
2017-06-02 10:27:41 -07:00
Ralph Castain
2ab4f93f6a
Instead of "forced_terminate" just quietly causing the daemon to disappear, let's at least attempt to let the user know where the problem occurred.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-02 08:28:16 -07:00
Ralph Castain
cde80bbf47
Merge pull request #3638 from anandhis/ofi-coll-conduit-fail-dbg
...
Minor clean up to rml-ofi send message
2017-06-01 20:51:22 -07:00
KAWASHIMA Takahiro
c8d38d31c6
Merge pull request #3618 from kawashima-fj/pr/java-doc-man
...
java: Detect `javadoc` path and improve `mpijavac` man page
2017-06-02 10:24:05 +09:00
anandhi
6ddb487744
Cleaned up the send_msg(), moved checking for send to self into the send_nb()
...
and send_buffer_nb()
modified: orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi Jayakumar <anandhi.s.jayakumar@intel.com>
2017-06-01 17:50:54 -07:00
Gilles Gouaillardet
08526e8adc
fortran/base: rename strings.h into fortran_base_strings.h
...
rename ompi/mpi/fortran/base/strings.h so it does not get pulled
when /usr/include/strings.h is expected.
Refs open-mpi/ompi#3639
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-02 09:46:20 +09:00
George Bosilca
037a85a782
Fix the OSHMEM request padding.
...
This patch fixes a missed case by 5b670a2 (PR #3634 ).
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-06-01 18:30:02 -04:00
Josh Hursey
1665d771a6
Merge pull request #3635 from wlepera/fix/ibm/155305
...
MPI_Sendreceive_replace data error with > 2k msg
2017-06-01 14:38:01 -05:00
Jeff Squyres
d520c24f3a
predefined MPI object padding: set to fixed number of bytes ( #3634 )
...
Convert the predefined MPI object padding to a fixed number of bytes
(vs. a multiple of sizeof(void*)) so that the padding is the same size
between 32 and 64 bit builds. I.e., we won't have a situation where
we've run out of padding in 32 bit builds but still have more space
available in 64 bit builds.
Fixes #3610
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-06-01 15:28:23 -04:00