Jeff Squyres
87a5ccc060
usnic: show the local UDP ports
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-26 12:25:18 -07:00
Jeff Squyres
e03a40a0e9
pmix3x: remove generated file
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-26 10:30:47 -07:00
rhc54
03838f275a
Merge pull request #2019 from artpol84/fix_schizo
...
orte/schizo: fix binding detection in slurm component
2016-08-26 09:08:43 -05:00
Edgar Gabriel
b5c757e82c
Merge pull request #2014 from edgargabriel/topic/mt-io
...
Topic/mt io
2016-08-26 08:54:45 -05:00
Jeff Squyres
9ae51a09f2
Merge pull request #1989 from jsquyres/pr/update-usnic-to-libfabric-v1.4
...
Update usnic BTL to libfabric v1.4
2016-08-26 09:53:07 -04:00
Artem Polyakov
55ac3b0be3
orte/schizo: fix binding detection in slurm component
...
in SLURM 16.05 the SLURM_CPU_BIND_TYPE is equal to "mask_cpu:"
instead of "mask_cpu". Account for that.
2016-08-26 09:55:52 +03:00
Gilles Gouaillardet
e4bf915e75
pmix3x: remove auto-generated file
...
remove opal/mca/pmix/pmix3x/pmix/src/include/pmix_config.h.in
.gitignore is correct, so it seems this file was added before .gitignore was updated
2016-08-26 15:00:18 +09:00
rhc54
c0fff60e59
Merge pull request #2017 from rhc54/topic/pmixconfig
...
Update configury to support multiple PMIx versions
2016-08-25 21:36:34 -05:00
Ralph Castain
af67f16422
Update configury to support multiple PMIx versions, rename pmix2x component to pmix3x for support of PMIx master
...
Update support for external v1.1.x and v2.x libraries. Minor corrections to the v3.x component
2016-08-25 18:19:05 -07:00
Gilles Gouaillardet
277c319389
opal/util: fix (again and again) incorrect type casting in opal_path_df
...
and silence CID 1371767
this fixes previous commits :
- open-mpi/ompi@2eec8970ff
- open-mpi/ompi@a439afce5b
2016-08-26 09:42:45 +09:00
Nathan Hjelm
89c2f4974c
Merge pull request #2016 from hjelmn/wait_sync
...
opal/wait_sync: add #if protection on header
2016-08-25 15:13:09 -07:00
Nathan Hjelm
f3d4eaeaf7
Merge pull request #2013 from hjelmn/osc_rdma_fix
...
osc/rdma: fix bug in dynamic memory window tracking code
2016-08-25 13:42:27 -07:00
Nathan Hjelm
de32c779e2
opal/wait_sync: add #if protection on header
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-25 14:31:52 -06:00
rhc54
19b0f4db9f
Merge pull request #1995 from rhc54/topic/pe-per-rank
...
Change the behavior of cpus-per-rank.
2016-08-25 14:38:12 -05:00
Edgar Gabriel
1ba03d38ec
io/ompio: protect remaining functions in multi-threaded scenarios
...
protect the remaining functions where necessary by a mutex lock
to avoid problems in multi-threaded executions. Some functions
do not require that in my opinion, and I provided an explanation
in those cases.
2016-08-25 13:45:51 -05:00
Nathan Hjelm
e53de7ecbe
osc/rdma: fix bug in dynamic memory window tracking code
...
This commit fixes an ordering bug in the code that keeps track of all
attached memory windows. The code is intended to keep the memory
regions sorted but was often inserting at the wrong index. Thanks to
Christoph Niethammer for reporting the issue. The reproducer will be
added to nightly MTT testing.
Fixes open-mpi/ompi#2012
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-25 12:08:46 -06:00
Nathan Hjelm
7af138f83b
osc/pt2pt: fix possible race in peer locking
...
It is possible for another thread to process a lock ack before the
peer is set as locked. In this case either setting the locked or the
eager active flag might clobber the other thread. To address this the
flags have been made volatile and are set atomically. Since there is
no a opal_atomic_or or opal_atomic_and function just use cmpset for
now.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-25 09:28:25 -06:00
Nathan Hjelm
c082068953
Merge pull request #2006 from hjelmn/osc_pt2pt_fix
...
osc/pt2pt: fix several bugs
2016-08-25 09:19:29 -06:00
rhc54
17a210f7f0
Merge pull request #2008 from rhc54/topic/binding
...
Correct the binding algorithm to decouple it from oversubscribe.
2016-08-25 09:25:33 -05:00
Edgar Gabriel
1cee83cc1b
use the common/ interfaces in file_preallocate instead of the io_ompio_ interfaces.
...
Necessar for avoiding potential deadlock situations in multi-threaded scenarios.
2016-08-25 08:55:12 -05:00
Jeff Squyres
0d19cc4a13
README: fix a bunch of typos
...
Thanks to Paul Hargrove for pointing them out. Really.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-25 09:15:27 -04:00
Jeff Squyres
f56b16f079
usnic: remove unused variable
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-25 03:53:18 -07:00
Jeff Squyres
9717bcb7e6
btl/usnic: remove stale comment
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-25 03:53:18 -07:00
Jeff Squyres
6f5e377fe0
btl/usnic: update for libfabric v1.4
...
With libfabric v1.4, the usnic provider changed the values of its
fabric and domain name strings (compared to libfabric <v1.4). Update
the Open MPI usNIC BTL to handle both pre-v1.4 and v1.4 fabric/domain
names.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-25 03:53:17 -07:00
rhc54
b563c9e303
Merge pull request #2003 from rhc54/topic/sync
...
Set the default value of both barrier counters to zero, thus ensuring the coll/sync component is off by default
2016-08-24 23:18:58 -05:00
Ralph Castain
440eae90ec
Correct the binding algorithm to decouple it from oversubscribe.
...
Oversubscribe stipulates that we allow more procs on the node than assigned slots - it has nothing to do with the number of available pe's. Let overload directives handle the pe situation.
2016-08-24 21:17:22 -07:00
George Bosilca
3adff9d323
Fixes #1793 .
...
Reshape the tearing down process (connection close) to prevent race
conditions between the main thread and the progress thread.
Minor cleanups.
2016-08-24 22:45:19 -04:00
Nathan Hjelm
70f8a6e792
osc/pt2pt: fix several bugs
...
This commit fixes some bugs uncovered during thread testing of
2.0.1rc1. With these fixes the component is running cleanly with
threads.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-24 14:35:45 -06:00
Nathan Hjelm
6de64ddbc1
Merge pull request #2005 from hjelmn/ugni_fix
...
btl/ugni: actually make the endpoint lock recursive
2016-08-24 11:05:27 -06:00
Nathan Hjelm
83062db7cb
btl/ugni: actually make the endpoint lock recursive
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-24 10:36:08 -06:00
Ralph Castain
bcf5ac3971
Set the default value of both barrier counters to zero, thus ensuring the coll/sync component is off by default
2016-08-24 07:51:32 -07:00
Gilles Gouaillardet
2eec8970ff
opal/util: fix (again) incorrect type casting in opal_path_df
...
this fixes previous commit open-mpi/ompi@a439afce5b
2016-08-24 12:50:15 +09:00
rhc54
b12e43fc03
Merge pull request #2001 from ggouaillardet/topic/pmix2x_sec_native
...
fix sec/native module under Solaris and other misc issues
2016-08-23 22:47:05 -05:00
Gilles Gouaillardet
02847d9e7b
pmix2x: dstore: add missing <fcntl.h> include file in pmix_esh.c
...
(back-ported from upstream pmix/master@5c66ffe0f0 )
2016-08-24 11:18:46 +09:00
Gilles Gouaillardet
c11e8163f8
pmix2x: sec/native: fix the pmix_native module under solaris by using getpeerucred()
...
and fail with a user friendly message if no method is available:
"sec: native cannot validate_cred on this system"
(back-ported from upstream pmix/master@c474a1fc60 )
2016-08-24 11:18:40 +09:00
Gilles Gouaillardet
e91292aa41
pmix2x: configury: add missing check for <netdb.h> header file
...
(back-ported from upstream pmix/master@e54ce6d423 )
2016-08-24 11:18:32 +09:00
Gilles Gouaillardet
a439afce5b
opal/util: fix incorrect type casting in opal_path_df
2016-08-24 10:26:13 +09:00
Ralph Castain
22844b0dc6
Balance priorities to ensure something is below sync
2016-08-23 17:33:45 -07:00
Ralph Castain
540f23c4dd
Adjust priority of coll/sync downwards
2016-08-23 17:12:48 -07:00
Jeff Squyres
b5d03c6eea
Merge pull request #1996 from bharatpotnuri/patching
...
Add Chelsio T6 adapter device parameters.
2016-08-23 13:04:26 -04:00
Jeff Squyres
a0a1849101
README: restrict OS X and Oracle Studio compiler versions
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-23 09:46:30 -07:00
Edgar Gabriel
41ed4a28d2
add the protective lock around read and write operations in ompio
2016-08-23 11:07:58 -05:00
Jeff Squyres
997431696a
opal_check_cma: make consistent with rest of configury
...
Split the CMA test into two parts so that the back-end test only has
to be run once. Fail with --with-cma is specified and cannot be
provided. Remove a few useless quotes. Change
$ompi_check_cma_need_defs and $ompi_check_cma_happy to be numeric
values. Finally, remove a bunch of tabs.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-23 07:26:47 -07:00
Jeff Squyres
065b93600d
AUTHORS: Fix minor typos
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-23 06:32:57 -07:00
Potnuri Bharat Teja
9b7f9ece20
Add Chelsio T6 adapter device parameters.
...
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
2016-08-23 10:38:13 +05:30
Ralph Castain
92102304b6
Minor typo - init the job_data stdin_target field to 0 for default behavior. Add test.
2016-08-22 21:03:45 -07:00
Gilles Gouaillardet
93e73841f9
ess/singleton: push all PMIX_* environment variables, regardless how many there are
2016-08-23 09:46:55 +09:00
Gilles Gouaillardet
a1e8e58a8a
ess/singleton: expects 4 PMIX_* environment variables or more
2016-08-23 09:34:03 +09:00
Howard Pritchard
696121cc4a
Merge pull request #1988 from hppritcha/topic/another_ofi_fix
...
mtl/ofi: fix a botched assignment of av_type
2016-08-22 17:59:59 -06:00
Ralph Castain
7de4d6922b
Change the behavior of cpus-per-rank. We previously counted each cpu against the #slots. However, IBM has pointed out that "slot" is equated to the number of processes allowed to run on each node, and not the number of cpus on the node. This has been a continuing source of confusion, so make the distinction a "hard" one.
...
Each process occupies a "slot". We automatically set #slots = #cpus if nothing else is told to us. If you want to run more procs and slots, you must tell us to allow oversubscription.
A process can utilize multiple pe's if that option is given. If you try to bind more than one proc to a given pe, then we will error out unless you tell us to allow overloading.
2016-08-22 15:54:41 -07:00