Mike Dubman
c544620a7c
Merge pull request #1138 from igor-ivanov/pr/yalla-valgrind
...
yalla: fix valgrind error due to uninitialized status field.
2015-11-20 07:19:11 -05:00
Gilles Gouaillardet
002c7b8b3a
fcoll/two_phase: use PMPI_* insted of MPI_*
2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
561e7f6647
vprotocol/pessimist: use internal ompi_* insted of MPI_*
2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
025fd8a9fc
osc: use PMPI_* insted of MPI_*
2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
d816d1c194
coll/libnbc: use PMPI_* and internal ompi_* insted of MPI_*
2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
c61ef30980
autogen.pl: aborts if autogen.pl is invoked from an Open MPI tarball and without the --force option
...
Thanks Jeff for the wording and review
2015-11-20 13:00:55 +09:00
Francois WELLENREITER
251009e0aa
BTL portals4: remove useless PtlMDBind PtlMDRelease calls for RMDA
2015-11-19 14:51:00 +01:00
yosefe
3bb1270715
yalla: fix valgrind error due to uninitialized status field.
2015-11-19 10:59:31 +02:00
Francois WELLENREITER
9126ea5e82
MTL portals4 : improve the rendez-vous protocol using PtlTriggeredGet operation
2015-11-19 09:52:53 +01:00
Edgar Gabriel
9e5ade4e8b
argh, a debugging sleep statement got into the source code.
2015-11-16 13:26:57 -06:00
Edgar Gabriel
9828389dd7
Merge pull request #1135 from edgargabriel/pr/aggregatorlogic
...
add a simplified version of the aggregator selection logic which reduces communication costs
2015-11-16 08:58:32 -06:00
Edgar Gabriel
dbfbcdecd5
make adjustments for the default settings of grouping parameters and the default contiguous group size option.
...
minor bug fix in the simple grouping strategy.
2015-11-16 08:17:27 -06:00
Edgar Gabriel
27628774c7
add a new option for a simple aggregator selection which has zero communication costs.
2015-11-16 08:17:26 -06:00
Edgar Gabriel
66c1ea5fcb
change the default value of the grouping option. Also add new grouping option which skips the refinement step in the aggregator selection.
2015-11-16 08:17:23 -06:00
Edgar Gabriel
e8e117503d
reduce the communication volume during MPI_File_set_view
2015-11-16 08:17:22 -06:00
Mike Dubman
3e93ef49da
Merge pull request #1134 from alex-mikheev/topic/ikrit_err_fix_fix
...
SPML/IKRIT: opal_progress and ud_only fixes
2015-11-15 19:20:55 -06:00
Mike Dubman
b5f83d9ec6
Merge pull request #1133 from miked-mellanox/topic/master_vg_fix
...
OSHMEM/ikrit: fix valgrind error
2015-11-15 18:45:22 -06:00
Mike Dubman
a7128af8c4
OSHMEM/ikrit: fix valgrind error
2015-11-15 14:51:41 +02:00
Alex Mikheev
0755a59091
SPML/IKRIT: opal_progress and ud_only fixes
...
Some MXM tls such as self, shm can comlete requests immediately.
Make sure that opal_progress() is called before before request
is completed.
fix ud_only logic when hw rdma channel is using ud and main
transport is rc or dc.
2015-11-15 12:13:24 +02:00
rhc54
5b75af5182
Merge pull request #1130 from edgargabriel/pr/oldlustreconfigurelogic
...
add a verification step looking for the structures that we use in the…
2015-11-14 12:45:58 -08:00
yohann
005400a937
mtl/ofi: Make sure the resources are managed by the provider.
2015-11-13 16:16:58 -08:00
Howard Pritchard
60be91b321
Merge pull request #1105 from marksantcroos/fix/alpsinfov3
...
Support ALPS_APPINFO_VERSION 3.
2015-11-13 14:57:45 -07:00
Nathan Hjelm
1a0882ffb2
Merge pull request #1131 from hjelmn/osc_fixes
...
osc/rdma: fix some threading bugs
2015-11-13 08:46:44 -07:00
Mark Santcroos
3119bc14b2
Merge branch 'master' into fix/alpsinfov3
2015-11-13 08:53:06 -05:00
Nathan Hjelm
9ef0821856
osc/rdma: fix some threading bugs
...
There were two bugs in osc/rdma when using threads:
- Deadlock is ompi_osc_rdma_start_atomic. This occurs because
ompi_osc_rdma_frag_alloc is called with the module lock. To fix the
issue the module lock is now recursive. In the future I will add a
new lock to protect just the current rdma fragment.
- Do not drop the lock in ompi_osc_rdma_frag_alloc when calling
ompi_osc_rdma_frag_complete. Not only is it not needed but dropping
the lock at this point can cause a competing thread to mess up the
state.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-12 20:25:57 -07:00
Ralph Castain
84eb21d6bf
Update the script to properly run on the Cray. Add rawout option to retain the raw timing output in case the formats don't match
2015-11-12 12:11:17 -08:00
Edgar Gabriel
497bd6f355
add a verification step looking for the structures that we use in the lustre component. Disable the ccomponent if not found.
2015-11-12 10:35:11 -06:00
Yossi
b750b72a81
Merge pull request #1127 from yosefe/topic/pml-ucx-implement-cancel
...
pml_ucx: implement cancel, and add small optimizations.
2015-11-12 10:50:48 +02:00
yosefe
7becc54d67
pml_ucx: fix typo.
2015-11-12 09:57:41 +02:00
Gilles Gouaillardet
6ab3289582
rpm: fix openmpi.spec not to include the /usr directory
...
/usr cannot be included on RHEL7 like distros
2015-11-12 16:21:40 +09:00
rhc54
0d175bfca1
Merge pull request #1128 from rhc54/topic/warning
...
If an executable isn't found, it's possible for the state machine to…
2015-11-11 17:50:49 -08:00
Ralph Castain
986a8c1d48
If an executable isn't found, it's possible for the state machine to hit the grpcomm with a zero-node map before we actually terminate with error. Silence the annoying malloc warning about zero-byte requests.
...
In a novm operation that only has the HNP, ensure the #nodes gets set
Clean up the error reporting
2015-11-11 14:24:13 -08:00
Matias Cabral
45c27843e1
Merge pull request #1129 from matcabral/HFI1_openib_params
...
Default values for Intel HFI1 (OmniPath gen1 device) in openib btl
2015-11-11 14:04:24 -08:00
Ralph Castain
1607daeb10
Update the scaling script to output data into a CSV file for easy import into Excel
2015-11-11 13:29:37 -08:00
Ralph Castain
efbea40a8b
Minor typo for slurm scaling test support, add aprun for use on Cray
2015-11-11 13:29:37 -08:00
Matias A Cabral
254a05dbbb
Default values for Intel HFI1 (OmniPath gen1 device) in openib btl
2015-11-11 12:35:35 -08:00
Mike Dubman
8ec5c99412
Merge pull request #1126 from alex-mikheev/topic/ikrit_err_fix
...
Topic/ikrit err fix
2015-11-11 15:31:06 +02:00
Mike Dubman
93847e4ca9
Merge pull request #1125 from igor-ivanov/pr/oshmem_new_mca_vars
...
oshmem: Add new mca variables oshmem_abort_delay and oshmem_abort_pri…
2015-11-11 14:34:12 +02:00
Mike Dubman
ae6b6ba05b
Merge pull request #1124 from igor-ivanov/pr/oshmem_error_output
...
oshmem: Enable force output for error messages
2015-11-11 14:33:40 +02:00
Alex Mikheev
cd8ea438d3
OSHMEM/SPML/ikrit: memcheck support
2015-11-11 13:46:20 +02:00
Alex Mikheev
2a8de45b43
OSHMEM/SPML/IKRIT: check return of mxm_req_send correctly
...
do not force memory registration if main and additional comm
channels are both ud
2015-11-11 13:34:26 +02:00
Igor Ivanov
f288cd7254
oshmem: Add new mca variables oshmem_abort_delay and oshmem_abort_print_stack
...
This commit allows to control output during abnormal oshmem application
termination.
2015-11-11 13:33:28 +02:00
Igor Ivanov
c0518c0417
oshmem: Enable force output for error messages
...
This change fixes issue when oshmem related error messages are not
visible for an user.
2015-11-11 13:26:10 +02:00
Jeff Squyres
8bd356549a
orte proc_info.h: use symbolic names
...
This fix was actually applied in the v2.x branch first (as commit
open-mpi/ompi-release@a9b22afc1a ).
2015-11-10 13:39:21 -08:00
Mark Santcroos
299fd69c6d
Merge branch 'master' into fix/alpsinfov3
2015-11-10 15:40:19 -05:00
Todd Kordenbrock
b9630f802b
Merge pull request #1120 from francois-wellenreiter/mtl_min_mdbind
...
mtl-portals4 : remove useless PtlMDBind PtlMDRelease calls for rendez-vous messages
2015-11-10 14:34:19 -06:00
rhc54
474a869b8d
Merge pull request #1121 from dmt4/orterun-manpage-typos
...
change -0bind-to and -bind-to to --bind-to in the manpages
2015-11-10 11:24:08 -08:00
rhc54
8af89a9f83
Merge pull request #1119 from rhc54/topic/fixtools
...
Prevent a segfault on tools if a connection attempt fails - tools don…
2015-11-10 10:41:39 -08:00
Dimitar Pashov
9f6e306064
change -0bind-to and -bind-to to --bind-to in the manpages
2015-11-10 17:44:53 +00:00
Ralph Castain
6a607d42a6
Prevent a segfault on tools if a connection attempt fails - tools don't open the opal/pmix framework and thus have no way of looking up a proc hostname
2015-11-10 09:11:34 -08:00