1
1
Граф коммитов

24109 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
d816d1c194 coll/libnbc: use PMPI_* and internal ompi_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
c61ef30980 autogen.pl: aborts if autogen.pl is invoked from an Open MPI tarball and without the --force option
Thanks Jeff for the wording and review
2015-11-20 13:00:55 +09:00
Francois WELLENREITER
251009e0aa BTL portals4: remove useless PtlMDBind PtlMDRelease calls for RMDA 2015-11-19 14:51:00 +01:00
yosefe
3bb1270715 yalla: fix valgrind error due to uninitialized status field. 2015-11-19 10:59:31 +02:00
Francois WELLENREITER
9126ea5e82 MTL portals4 : improve the rendez-vous protocol using PtlTriggeredGet operation 2015-11-19 09:52:53 +01:00
Edgar Gabriel
9e5ade4e8b argh, a debugging sleep statement got into the source code. 2015-11-16 13:26:57 -06:00
Edgar Gabriel
9828389dd7 Merge pull request #1135 from edgargabriel/pr/aggregatorlogic
add a simplified version of the aggregator selection logic which reduces communication costs
2015-11-16 08:58:32 -06:00
Edgar Gabriel
dbfbcdecd5 make adjustments for the default settings of grouping parameters and the default contiguous group size option.
minor bug fix in the simple grouping strategy.
2015-11-16 08:17:27 -06:00
Edgar Gabriel
27628774c7 add a new option for a simple aggregator selection which has zero communication costs. 2015-11-16 08:17:26 -06:00
Edgar Gabriel
66c1ea5fcb change the default value of the grouping option. Also add new grouping option which skips the refinement step in the aggregator selection. 2015-11-16 08:17:23 -06:00
Edgar Gabriel
e8e117503d reduce the communication volume during MPI_File_set_view 2015-11-16 08:17:22 -06:00
Mike Dubman
3e93ef49da Merge pull request #1134 from alex-mikheev/topic/ikrit_err_fix_fix
SPML/IKRIT: opal_progress and ud_only fixes
2015-11-15 19:20:55 -06:00
Mike Dubman
b5f83d9ec6 Merge pull request #1133 from miked-mellanox/topic/master_vg_fix
OSHMEM/ikrit: fix valgrind error
2015-11-15 18:45:22 -06:00
Mike Dubman
a7128af8c4 OSHMEM/ikrit: fix valgrind error 2015-11-15 14:51:41 +02:00
Alex Mikheev
0755a59091 SPML/IKRIT: opal_progress and ud_only fixes
Some MXM tls such as self, shm can comlete requests immediately.
Make sure that opal_progress() is called before before request
is completed.

fix ud_only logic when hw rdma channel is using ud and main
transport is rc or dc.
2015-11-15 12:13:24 +02:00
rhc54
5b75af5182 Merge pull request #1130 from edgargabriel/pr/oldlustreconfigurelogic
add a verification step looking for the structures that we use in the…
2015-11-14 12:45:58 -08:00
yohann
005400a937 mtl/ofi: Make sure the resources are managed by the provider. 2015-11-13 16:16:58 -08:00
Howard Pritchard
60be91b321 Merge pull request #1105 from marksantcroos/fix/alpsinfov3
Support ALPS_APPINFO_VERSION 3.
2015-11-13 14:57:45 -07:00
Nathan Hjelm
1a0882ffb2 Merge pull request #1131 from hjelmn/osc_fixes
osc/rdma: fix some threading bugs
2015-11-13 08:46:44 -07:00
Mark Santcroos
3119bc14b2 Merge branch 'master' into fix/alpsinfov3 2015-11-13 08:53:06 -05:00
Nathan Hjelm
9ef0821856 osc/rdma: fix some threading bugs
There were two bugs in osc/rdma when using threads:

 - Deadlock is ompi_osc_rdma_start_atomic. This occurs because
   ompi_osc_rdma_frag_alloc is called with the module lock. To fix the
   issue the module lock is now recursive. In the future I will add a
   new lock to protect just the current rdma fragment.

 - Do not drop the lock in ompi_osc_rdma_frag_alloc when calling
   ompi_osc_rdma_frag_complete. Not only is it not needed but dropping
   the lock at this point can cause a competing thread to mess up the
   state.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-12 20:25:57 -07:00
Ralph Castain
84eb21d6bf Update the script to properly run on the Cray. Add rawout option to retain the raw timing output in case the formats don't match 2015-11-12 12:11:17 -08:00
Edgar Gabriel
497bd6f355 add a verification step looking for the structures that we use in the lustre component. Disable the ccomponent if not found. 2015-11-12 10:35:11 -06:00
Yossi
b750b72a81 Merge pull request #1127 from yosefe/topic/pml-ucx-implement-cancel
pml_ucx: implement cancel, and add small optimizations.
2015-11-12 10:50:48 +02:00
yosefe
7becc54d67 pml_ucx: fix typo. 2015-11-12 09:57:41 +02:00
Gilles Gouaillardet
6ab3289582 rpm: fix openmpi.spec not to include the /usr directory
/usr cannot be included on RHEL7 like distros
2015-11-12 16:21:40 +09:00
rhc54
0d175bfca1 Merge pull request #1128 from rhc54/topic/warning
If an executable isn't found, it's possible for the state machine to…
2015-11-11 17:50:49 -08:00
Ralph Castain
986a8c1d48 If an executable isn't found, it's possible for the state machine to hit the grpcomm with a zero-node map before we actually terminate with error. Silence the annoying malloc warning about zero-byte requests.
In a novm operation that only has the HNP, ensure the #nodes gets set

Clean up the error reporting
2015-11-11 14:24:13 -08:00
Matias Cabral
45c27843e1 Merge pull request #1129 from matcabral/HFI1_openib_params
Default values for Intel HFI1 (OmniPath gen1 device) in openib btl
2015-11-11 14:04:24 -08:00
Ralph Castain
1607daeb10 Update the scaling script to output data into a CSV file for easy import into Excel 2015-11-11 13:29:37 -08:00
Ralph Castain
efbea40a8b Minor typo for slurm scaling test support, add aprun for use on Cray 2015-11-11 13:29:37 -08:00
Matias A Cabral
254a05dbbb Default values for Intel HFI1 (OmniPath gen1 device) in openib btl 2015-11-11 12:35:35 -08:00
Mike Dubman
8ec5c99412 Merge pull request #1126 from alex-mikheev/topic/ikrit_err_fix
Topic/ikrit err fix
2015-11-11 15:31:06 +02:00
Mike Dubman
93847e4ca9 Merge pull request #1125 from igor-ivanov/pr/oshmem_new_mca_vars
oshmem: Add new mca variables oshmem_abort_delay and oshmem_abort_pri…
2015-11-11 14:34:12 +02:00
Mike Dubman
ae6b6ba05b Merge pull request #1124 from igor-ivanov/pr/oshmem_error_output
oshmem: Enable force output for error messages
2015-11-11 14:33:40 +02:00
Alex Mikheev
cd8ea438d3 OSHMEM/SPML/ikrit: memcheck support 2015-11-11 13:46:20 +02:00
Alex Mikheev
2a8de45b43 OSHMEM/SPML/IKRIT: check return of mxm_req_send correctly
do not force memory registration if main and additional comm
channels are both ud
2015-11-11 13:34:26 +02:00
Igor Ivanov
f288cd7254 oshmem: Add new mca variables oshmem_abort_delay and oshmem_abort_print_stack
This commit allows to control output during abnormal oshmem application
termination.
2015-11-11 13:33:28 +02:00
Igor Ivanov
c0518c0417 oshmem: Enable force output for error messages
This change fixes issue when oshmem related error messages are not
visible for an user.
2015-11-11 13:26:10 +02:00
Jeff Squyres
8bd356549a orte proc_info.h: use symbolic names
This fix was actually applied in the v2.x branch first (as commit
open-mpi/ompi-release@a9b22afc1a).
2015-11-10 13:39:21 -08:00
Mark Santcroos
299fd69c6d Merge branch 'master' into fix/alpsinfov3 2015-11-10 15:40:19 -05:00
Todd Kordenbrock
b9630f802b Merge pull request #1120 from francois-wellenreiter/mtl_min_mdbind
mtl-portals4 : remove useless PtlMDBind PtlMDRelease calls for rendez-vous messages
2015-11-10 14:34:19 -06:00
rhc54
474a869b8d Merge pull request #1121 from dmt4/orterun-manpage-typos
change -0bind-to and -bind-to to --bind-to in the manpages
2015-11-10 11:24:08 -08:00
rhc54
8af89a9f83 Merge pull request #1119 from rhc54/topic/fixtools
Prevent a segfault on tools if a connection attempt fails - tools don…
2015-11-10 10:41:39 -08:00
Dimitar Pashov
9f6e306064 change -0bind-to and -bind-to to --bind-to in the manpages 2015-11-10 17:44:53 +00:00
Ralph Castain
6a607d42a6 Prevent a segfault on tools if a connection attempt fails - tools don't open the opal/pmix framework and thus have no way of looking up a proc hostname 2015-11-10 09:11:34 -08:00
yosefe
d66b01d380 pml_ucx: implement cancel, and add small optimizations. 2015-11-10 17:40:06 +02:00
Mark Santcroos
8c255452cf Merge branch 'master' into fix/alpsinfov3 2015-11-10 04:17:42 -05:00
Gilles Gouaillardet
d6ff25b9a2 pml/monitoring: initialize common symbols 2015-11-10 13:58:54 +09:00
Nathan Hjelm
6ae82ff090 Merge pull request #1115 from hjelmn/flist_fix
opal_free_list: fix strange size check
2015-11-09 20:55:46 -07:00