Nathan Hjelm
6fd5041e48
Merge pull request #1264 from artpol84/vader_resource_leak_fix
...
Fix vader resource leak.
2015-12-27 15:19:27 -07:00
Artem Polyakov
a20826e6b4
Fix vader resource leak.
...
This nasty bug was nicely masked. It was causing `mca_btl_vader_component.vader_frags_user`
overflow and as the result rear hangs of ompi-test-suite.
2015-12-28 00:41:45 +06:00
Gilles Gouaillardet
2d9aa38e6a
btl/openib: fix heterogeneous support
2015-12-25 16:31:35 +09:00
Ralph Castain
8ab28cdc82
Fix a typo that causes segfaults on multi-node executions
2015-12-24 08:43:47 -08:00
Artem Polyakov
3031affdb7
Fix openib process accounting if procs was dynamically added.
2015-12-24 17:56:35 +06:00
Artem Polyakov
400af6c52d
openib addproc improvements:
...
1. finer grained locks;
2. separate srq creation from cq adjustments.
2015-12-24 17:56:35 +06:00
Artem Polyakov
41c325f15a
Shift common code for calculating a port count and btl_rank in openib
...
into the static function
2015-12-24 17:56:35 +06:00
Gilles Gouaillardet
0b3e3c6817
opal/runtime: add missing #include <unistd.h>
...
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:56 +09:00
Gilles Gouaillardet
f0e3e16f49
pmix/base: add missing #include <unistd.h>
...
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:52 +09:00
Gilles Gouaillardet
66d9c2daea
rcache/vma: add missing #include "opal/util/output.h"
...
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:49 +09:00
Gilles Gouaillardet
5fa63f086a
btl/tcp: add missing #include <unistd.h>
...
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:46 +09:00
Gilles Gouaillardet
15ed7ad9f5
btl/sm: add missing #include <unistd.h>
...
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:41 +09:00
Gilles Gouaillardet
65a081ae6a
mca/base: add missing #include "opal/util/output.h" and <unistd.h>
...
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:33 +09:00
Gilles Gouaillardet
ccc96ad204
fbtl/base: add missing #include "opal/util/output.h"
...
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:26 +09:00
Gilles Gouaillardet
cebde2a753
coll/tuned: add missing #include "opal/util/output.h"
...
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:17 +09:00
Gilles Gouaillardet
99d046d060
scoll/fca: add missing #include <alloca.h>
2015-12-24 14:33:58 +09:00
Gilles Gouaillardet
ad9693c604
pml/yalla: add missing #include <alloca.h>
2015-12-24 14:33:58 +09:00
Gilles Gouaillardet
b38c17dbcb
pml/cm: add missing #include <alloca.h>
...
Thanks Paul Hargrove for reporting this issue
2015-12-24 14:33:58 +09:00
Gilles Gouaillardet
071ae39a44
osc/rdma: add missing #include <alloca.h>
2015-12-24 14:33:58 +09:00
Gilles Gouaillardet
77f199d1d7
coll/fca: add missing #include <alloca.h>
2015-12-24 14:33:58 +09:00
Gilles Gouaillardet
42313acd58
btl/usnic: add missing #include <alloca.h>
2015-12-24 14:33:58 +09:00
Gilles Gouaillardet
38a8826136
opal/datatype: #include <alloca.h> when needed and nowhere else
2015-12-24 14:33:58 +09:00
rhc54
d7199dc75b
Merge pull request #1255 from annu13/fixup
...
Fixup
2015-12-22 20:54:48 -08:00
Nathan Hjelm
84d890b7e7
Merge pull request #1248 from artpol84/openib_proc_init_race
...
Openib dynamic add proc race conditions
2015-12-22 21:48:05 -07:00
Howard Pritchard
2362bf0c0c
Merge pull request #1257 from hppritcha/topic/disable_mpirun_for_native_slurm_crayxc
...
plm/alps: only use srun for Native SLURM
2015-12-22 21:07:41 -07:00
annu13
43f44f31c1
moved code to job setup first before enabling comm
2015-12-22 14:30:59 -08:00
Howard Pritchard
39367ca0bf
plm/alps: only use srun for Native SLURM
...
Turns out that the way the SLURM plm works
is not compatible with the way MPI processes
on Cray XC obtain RDMA credentials to use
the high speed network. Unlike with ALPS,
the mpirun process is on the first compute
node in the job. With the current PLM launch
system, mpirun (HNP daemon) launches the MPI
ranks on that node rather than relying on
srun.
This will probably require a significant amount
of effort to rework to support Native SLURM
on Cray XC's. As a short term alternative,
have the alps plm (which gets selected by default
again on Cray systems regardless of the launch system)
check whether or not srun or alps is being used on the
system. If alps is not being used, print a helpful
message for the user and abort the job launch.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-12-22 11:03:42 -08:00
Ryan Grant
5ec5bd08c1
Merge pull request #1256 from tkordenbrock/topic/mtl.init.nid.pid.in.logical
...
mtl-portals4: initialize endpoint nid/pid when using logical mapping
2015-12-22 11:43:03 -07:00
Todd Kordenbrock
8a3660138e
mtl-portals4: initialize endpoint nid/pid when using logical mapping
...
When mtl-portals4 is configured for logical mapping, coll-portals4
must disqualify because it does not yet support logical mapping.
coll-portals4 looks for the endpoint pid to be zero which tells it
that mtl-portals4 is configured for logical mapping. This commit
initializes the endpoint nid/pid to zero for logical mapping.
2015-12-22 11:20:18 -06:00
Artem Polyakov
08ad8357a8
Fix local process accounting in openib when dynamic add_proc is on.
2015-12-22 22:44:46 +06:00
Artem Polyakov
3c2f6d5560
Protect openib_btl->device data with explicit opal_mitex locks.
2015-12-22 18:33:26 +06:00
Gilles Gouaillardet
607d7c7545
btl/sm: rename file after file descriptor has been closed.
...
Thanks George for spotting this.
2015-12-22 13:56:53 +09:00
Ralph Castain
079ea14dab
Update symbol-hiding script
2015-12-21 20:49:14 -08:00
Gilles Gouaillardet
e918d75fae
java: try do dlopen libmpi with the full path
...
Since OS X 10.11 (aka El Capitan) DYLD_LIBRARY_PATH is no more
propagated to children, so try to dlopen libmpi with the full path
using the directory of libmpi_java
Fixes open-mpi/ompi#1220
Thanks Alexander Daryin for reporting this
2015-12-22 11:09:46 +09:00
rhc54
d9cd451a16
Merge pull request #1250 from rhc54/topic/rf
...
Fix the default slot mapping in rank file mapper
2015-12-21 10:57:52 -08:00
Ralph Castain
7cc5879bdd
Fix the default slot mapping in rank file mapper
2015-12-21 09:47:27 -08:00
rhc54
38830e41b4
Merge pull request #1249 from rhc54/topic/pmixdflt
...
Do not override any external settings for PMIx component selection
2015-12-21 09:36:16 -08:00
Ralph Castain
94ffe10808
Do not override any external settings for PMIx component selection
2015-12-21 08:36:12 -08:00
rhc54
aa17bdf6e8
Merge pull request #1239 from rhc54/topic/cleanup
...
Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX
2015-12-21 07:23:31 -08:00
Artem Polyakov
e06bffe213
Fix ib_proc locking
2015-12-21 18:52:31 +06:00
Artem Polyakov
3eb4756a17
Force locking regardles to the opal_using_threads()
setting.
2015-12-21 18:52:31 +06:00
Artem Polyakov
11b72d9add
Make important fields of ib_proc volatile.
2015-12-21 18:52:31 +06:00
Artem Polyakov
86c0c3ec52
Provide additional information: whether ib_proc was newly created or
...
it was already existing.
2015-12-21 18:52:31 +06:00
Artem Polyakov
9325bd3d69
Protect device initialization
2015-12-21 18:52:31 +06:00
Artem Polyakov
0f77bc7ea7
Perform endpoint initialization atomically.
2015-12-21 18:52:31 +06:00
Artem Polyakov
afaf9c9ea6
Shift ib_proc initialization to the separate function.
2015-12-21 18:52:31 +06:00
Artem Polyakov
3c9fd567b6
Fix openib race condition when direct modex is used.
...
The problem was in mca_btl_openib_proc_create. This function may be called
from several places simultaneously:
* from the main thread when somebody wants to do `MPI_Send()` (for example) for
the first time;
* from udcm if the counterpart peer is trying to connect and `mca_btl_openib_get_ep()`
is called.
In this case one of the threads may add an uninitialized proc structure
to the `mca_btl_openib_component.ib_procs` and the other will read it and
treat as initialized.
This commit turns ib_proc initialization into a single atomic operation.
2015-12-21 18:52:30 +06:00
Gilles Gouaillardet
db4f483653
btl/sm: fix race condition
...
write to file and then rename, so when the file is open for read, its content is known to have been written.
Fixes open-mpi/ompi#1230
2015-12-21 16:37:51 +09:00
Gilles Gouaillardet
862b12acf9
configury: fix uninitialized variable in OMPI_FORTRAN_CHECK_USE_ONLY
...
Thanks Paul Hargrove for pointing this issue
2015-12-21 09:56:35 +09:00
Edgar Gabriel
46c20a1246
correctly set all variables storing information on the file pointer position to zero when setting the file view
2015-12-21 09:41:39 +09:00