1
1
Граф коммитов

27329 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
c18007d095 btl/vader: work around ob1 pending fragment bug
This commit ensures that the pml callback is always made when
sending fragments. This is needed to avoid #3845. Once that is
fixed the #if 0'd code can be restored.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-07-11 15:59:56 -06:00
Nathan Hjelm
e73ab93ebf pml/ob1: do not access fragment after calling btl rget
This commit fixes a bug that occurs when the btl callback happens before
the rget returns. In this case the fragment has been returned and is no
longer valid. This commit saves the size before calling rget. This is
valid since the BTL is not allowed to change the read size.

Fixes #3821

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-07-11 15:59:40 -06:00
Geoff Paulsen
6570374238 Merge pull request #3843 from jjhursey/revert/gfortran-sizeof
Revert MPI_SIZEOF fix for gfortran 4.8
2017-07-11 14:10:19 -05:00
Joshua Hursey
20ac03c063 config/fortran: Add note about why we reverted PR #3822
* This should be enough of a breadcrumb for when we get to fixing the
   `INTERFACE` check to be strong enough to kick out gfortran 4.8

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-07-11 11:09:27 -05:00
Joshua Hursey
c81795cbda Revert "Fix MPI_SIZEOF for gfortran 4.8"
This reverts commit 5de3d5dde6.

 Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-07-11 11:09:17 -05:00
Joshua Hursey
23ee6024e4 Revert "Merge pull request #1 from jsquyres/tjcw-tjcw-fix-mpi-sizeof"
This reverts commit 3e6a196714, reversing
changes made to 5de3d5dde6.

 Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-07-11 11:09:01 -05:00
Howard Pritchard
550e8c4afe Merge pull request #3842 from hppritcha/topic/fix_cray_pmix_problem
pmix/cray: add a bit of debug output
2017-07-11 08:29:56 -06:00
Howard Pritchard
26a8142c97 pmix/cray: add a bit of debug output
add a bit of debug output to help with pmix finalize issues

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-07-11 05:45:49 -05:00
Gilles Gouaillardet
ff2dd69533 opal/util: silence warning in opal_info_dup_mode()
as reported by coverity with CID 1414729

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-11 14:40:37 +09:00
Gilles Gouaillardet
85ff3ebad1 opal: fix return status of opal_info_set()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-11 13:58:15 +09:00
Gilles Gouaillardet
1ac931a431 Merge pull request #3838 from ggouaillardet/topic/opal_info_dup_mode
opal/info: fix recursive deadlock in opal_info_dup_mode()
2017-07-10 17:09:45 +09:00
Gilles Gouaillardet
92441accc9 opal/info: fix recursive deadlock in opal_info_dup_mode()
use opal_info_{get,set}_nolock() instead of opal_info_{get,set}()
since the former can be invoked when the info lock is being held.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-10 14:51:46 +09:00
Ralph Castain
c632784ca3 Merge pull request #3835 from rhc54/topic/hetero
Remove --enable-heterogeneous until fix is ready
2017-07-07 10:57:12 -07:00
Jeff Squyres
83746fba71 Merge pull request #3822 from tjcw/tjcw-fix-mpi-sizeof
Fix MPI_SIZEOF for gfortran 4.8
2017-07-07 13:49:52 -04:00
Ralph Castain
8e25733760 Remove --enable-heterogeneous until fix is ready
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-07 10:09:30 -07:00
Ralph Castain
b2f90e5d1b Merge pull request #3831 from rhc54/topic/fix
Prefix the MB macro in one more place
2017-07-07 08:04:30 -07:00
Chris Ward
3e6a196714 Merge pull request #1 from jsquyres/tjcw-tjcw-fix-mpi-sizeof
README: minor tweak to specifically mention GNU Fortran
2017-07-07 15:42:06 +01:00
Jeff Squyres
75ec541610 README: minor tweak to specifically mention GNU Fortran
Lots of people still use GFortran, and lots of people still use
somewhat old versions of it (e.g., if it's bundled in their
older-but-still-installed Linux distros).  So let's specifically
mention it.  This may be a bit overkill, but more specific docs are
usually a Good Thing (i.e., they can prevent questions from being sent
to the mailing list).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-07-07 07:30:03 -07:00
Ralph Castain
a190b4b89f Prefix the MB macro in one more place
Fixes #3830

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-07 06:07:47 -07:00
Chris Ward
5de3d5dde6 Fix MPI_SIZEOF for gfortran 4.8
Add copyrights.

Revise the README to take out the 'most notably' statement about GNU Fortran 4.8

Signed-off-by: Chris Ward <tjcw@uk.ibm.com>
2017-07-07 13:47:35 +01:00
Gilles Gouaillardet
823382f5d7 plm/base: do not abort when configure'd with --enable-heterogeneous
and a mix of BE/LE is detected

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-07 10:43:54 +09:00
Ralph Castain
2a580fa71e Merge pull request #3801 from rhc54/topic/hetero
Detect that we have a mix of BE/LE in the system
2017-07-06 15:29:06 -07:00
Josh Hursey
753e3b0156 Merge pull request #3824 from jjhursey/doc/xl-f08-readme
README: Update F08 language about IBM XL compiler
2017-07-06 16:26:23 -05:00
Joshua Hursey
bf5a58dcca README: Update F08 language about IBM XL compiler
- MPI bindings build/link correctly, so remove note about that.
 - OpenSHMEM bindings do not build/link correctly by default.
   - Note the workaround and the issue on GitHub for users.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-07-06 15:52:48 -05:00
Ralph Castain
1bc366b374 Merge pull request #3820 from rhc54/topic/cov
Silence Coverity warnings
2017-07-06 06:53:43 -07:00
Ralph Castain
9c9e0a9773 Merge pull request #3819 from rhc54/topic/esh
Not really necessary, but technically correct
2017-07-06 06:49:44 -07:00
Ralph Castain
8979bfe71e Silence Coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-06 06:07:28 -07:00
Ralph Castain
ed43492867 Not really necessary, but technically correct
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-06 06:00:03 -07:00
Gilles Gouaillardet
fc11c37223 Merge pull request #3646 from ggouaillardet/spacc-fix-coverity-warnings
coll/spacc: misc fixes
2017-07-06 11:39:14 +09:00
Ralph Castain
7bea824194 Merge pull request #3813 from rhc54/topic/esh
Replace syntax with something less strictly C99
2017-07-05 19:14:38 -07:00
Mikhail Kurnosov
44acc92104 Fix buffer overflow
Add check for bounds of sindex[] and rindex[].

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2017-07-06 10:49:08 +09:00
Gilles Gouaillardet
5fceca235b coll/spacc: silence more coverity warnings in mca_coll_spacc_allreduce_intra_redscat_allgather()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-06 10:49:08 +09:00
Mikhail Kurnosov
2f0f476642 Silence spacc coverity warnings
1. Add assert for opal_hibit return value: comm_size is always > 1.
2. Modified verbose output (dead-code warning).

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2017-07-06 10:49:08 +09:00
Ralph Castain
e7a44a1483 Merge pull request #3814 from anandhis/ofi-choose-provider-at-send
Choosing the ofi provider when opening conduit and sending message to peer
2017-07-05 16:55:06 -07:00
Ralph Castain
31130a4bee Replace syntax with something less strictly C99
Fixes #3809

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-05 16:54:36 -07:00
anandhi
793ebc272e When opening conduit, checking for the transport preference in below order -
(1) rml_ofi_transports mca parameter.  This parameter should have the list of transports (currently ethernet,fabric are valid)
    fabric is higher priority if provided.
(2) ORTE_RML_TRANSPORT_TYPE key with values "ethernet" or "fabric". "fabric" is higher priority.
If specific provider is required use ORTE_RML_OFI_PROV_NAME key with values "socket" or "OPA" or any other supported in system.

	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c

On send_msg choose the provider on local and peer to follow below rules -
1. if the user specified the transport for this conduit (even giving us a prioritized list of candidates), then the one we selected is the _only_ one we will use. If the remote peer has a matching endpoint, then we use it - otherwise, we error out

2. if the user didn't specify a transport, then we look for matches against _all_ of our available transports, starting with fabric and then going to Ethernet, taking the first one that matches.

3. if we can't find any match, then we error out

	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c

send_msg() -> Fixed case when the local provider chosen at time of opening conduit
is not present in peer (destination) node
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c

When opening conduit, checking for the transport preference in below order -

(1) rml_ofi_transports mca parameter.  This parameter should have the list of transports (currently ethernet,fabric are valid)
    fabric is higher priority if provided.
(2) ORTE_RML_TRANSPORT_TYPE key with values "ethernet" or "fabric". "fabric" is higher priority.
If specific provider is required use ORTE_RML_OFI_PROV_NAME key with values "socket" or "OPA" or any other supported in system.

	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c

On send_msg choose the provider on local and peer to follow below rules -
1. if the user specified the transport for this conduit (even giving us a prioritized list of candidates), then the one we selected is the _only_ one we will use. If the remote peer has a matching endpoint, then we use it - otherwise, we error out

2. if the user didn't specify a transport, then we look for matches against _all_ of our available transports, starting with fabric and then going to Ethernet, taking the first one that matches.

3. if we can't find any match, then we error out

	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_component.c
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c

send_msg() -> Fixed case when the local provider chosen at time of opening conduit
is not present in peer (destination) node
	modified:   ../orte/mca/rml/ofi/rml_ofi.h
	modified:   ../orte/mca/rml/ofi/rml_ofi_send.c

Signed-off-by: Anandhi Jayakumar <anandhi.s.jayakumar@intel.com>
2017-07-05 15:40:14 -07:00
Gilles Gouaillardet
fbeb7b94f4 Merge pull request #3802 from ggouaillardet/topic/gcc_builtin_atomics
configury: fix gcc builtin atomic detection
2017-07-04 10:33:43 +09:00
Gilles Gouaillardet
e77874bbaf configury: fix gcc builtin atomic detection
test for both 32 and 64 bits.
clang only support 32 bits builtin atomics when -m32 is used

Thanks Paul Hargrove for reporting this.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-04 09:47:45 +09:00
Howard Pritchard
f2c6e70ef0 Merge pull request #3800 from hppritcha/topic/fix_cray_pmix_problem
pmix/cray: fix handling of multiple finis
2017-07-03 17:07:32 -06:00
Ralph Castain
2753f53e6d Detect that we have a mix of BE/LE in the system, provide a warning that OMPI doesn't currently support this environment, and error out
Fixes #2817

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-03 15:47:05 -07:00
Howard Pritchard
1f2f3db553 pmix/cray: fix handling of multiple finis
The fini code for cray pmix wasn't correct.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-07-03 14:30:34 -05:00
Gilles Gouaillardet
d1c5955b73 coll/base: optimize handling of zero-byte datatypes in mca_coll_base_alltoallv_intra_basic_inplace()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-06-30 09:47:08 +09:00
Ralph Castain
7cbea77238 Merge pull request #3778 from rhc54/topic/warn
Attempt to detect when we are direct-launched without the necessary P…
2017-06-29 16:53:12 -07:00
Ralph Castain
cb19296b71 Merge pull request #3794 from rhc54/topic/shutdown
Stop all progress threads prior to releasing the peer objects
2017-06-29 16:52:57 -07:00
Ralph Castain
85f8eb4c6b Stop all progress threads prior to releasing the peer objects to avoid a race condition whereby a lost connection could be reported after a peer object was freed and before the threads were stopped.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-29 15:48:18 -07:00
Ralph Castain
bd4a6fee22 Attempt to detect when we are direct-launched without the necessary PMI support, and thus are incorrectly identified as being "singleton". Advise the user on the required PMI(x) support and error out.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-29 15:26:53 -07:00
Gilles Gouaillardet
7e5e5fe887 Merge pull request #3719 from ggouaillardet/topic/libnbc_revamp
coll/libnbc: revisit NBC_Handle usage
2017-06-29 11:13:58 +09:00
Ralph Castain
eee4579f5a Merge pull request #3788 from rhc54/topic/fix
Deregister event handlers only on final call to finalize. Ensure we pass PMIx mca params
2017-06-28 16:36:07 -07:00
Ralph Castain
9178219e6b Deregister event handlers only on final call to finalize. Ensure we pass PMIx mca params
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-28 15:00:43 -07:00
Ralph Castain
e07ed6dccd Merge pull request #3785 from rhc54/topic/lock
Fix a threadlock when notifying clients of failures
2017-06-28 10:13:19 -07:00