Mike Dubman
a7128af8c4
OSHMEM/ikrit: fix valgrind error
2015-11-15 14:51:41 +02:00
rhc54
5b75af5182
Merge pull request #1130 from edgargabriel/pr/oldlustreconfigurelogic
...
add a verification step looking for the structures that we use in the…
2015-11-14 12:45:58 -08:00
yohann
005400a937
mtl/ofi: Make sure the resources are managed by the provider.
2015-11-13 16:16:58 -08:00
Howard Pritchard
60be91b321
Merge pull request #1105 from marksantcroos/fix/alpsinfov3
...
Support ALPS_APPINFO_VERSION 3.
2015-11-13 14:57:45 -07:00
Nathan Hjelm
1a0882ffb2
Merge pull request #1131 from hjelmn/osc_fixes
...
osc/rdma: fix some threading bugs
2015-11-13 08:46:44 -07:00
Mark Santcroos
3119bc14b2
Merge branch 'master' into fix/alpsinfov3
2015-11-13 08:53:06 -05:00
Nathan Hjelm
9ef0821856
osc/rdma: fix some threading bugs
...
There were two bugs in osc/rdma when using threads:
- Deadlock is ompi_osc_rdma_start_atomic. This occurs because
ompi_osc_rdma_frag_alloc is called with the module lock. To fix the
issue the module lock is now recursive. In the future I will add a
new lock to protect just the current rdma fragment.
- Do not drop the lock in ompi_osc_rdma_frag_alloc when calling
ompi_osc_rdma_frag_complete. Not only is it not needed but dropping
the lock at this point can cause a competing thread to mess up the
state.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-12 20:25:57 -07:00
Ralph Castain
84eb21d6bf
Update the script to properly run on the Cray. Add rawout option to retain the raw timing output in case the formats don't match
2015-11-12 12:11:17 -08:00
Edgar Gabriel
497bd6f355
add a verification step looking for the structures that we use in the lustre component. Disable the ccomponent if not found.
2015-11-12 10:35:11 -06:00
Yossi
b750b72a81
Merge pull request #1127 from yosefe/topic/pml-ucx-implement-cancel
...
pml_ucx: implement cancel, and add small optimizations.
2015-11-12 10:50:48 +02:00
yosefe
7becc54d67
pml_ucx: fix typo.
2015-11-12 09:57:41 +02:00
Gilles Gouaillardet
6ab3289582
rpm: fix openmpi.spec not to include the /usr directory
...
/usr cannot be included on RHEL7 like distros
2015-11-12 16:21:40 +09:00
rhc54
0d175bfca1
Merge pull request #1128 from rhc54/topic/warning
...
If an executable isn't found, it's possible for the state machine to…
2015-11-11 17:50:49 -08:00
Ralph Castain
986a8c1d48
If an executable isn't found, it's possible for the state machine to hit the grpcomm with a zero-node map before we actually terminate with error. Silence the annoying malloc warning about zero-byte requests.
...
In a novm operation that only has the HNP, ensure the #nodes gets set
Clean up the error reporting
2015-11-11 14:24:13 -08:00
Matias Cabral
45c27843e1
Merge pull request #1129 from matcabral/HFI1_openib_params
...
Default values for Intel HFI1 (OmniPath gen1 device) in openib btl
2015-11-11 14:04:24 -08:00
Ralph Castain
1607daeb10
Update the scaling script to output data into a CSV file for easy import into Excel
2015-11-11 13:29:37 -08:00
Ralph Castain
efbea40a8b
Minor typo for slurm scaling test support, add aprun for use on Cray
2015-11-11 13:29:37 -08:00
Matias A Cabral
254a05dbbb
Default values for Intel HFI1 (OmniPath gen1 device) in openib btl
2015-11-11 12:35:35 -08:00
Mike Dubman
8ec5c99412
Merge pull request #1126 from alex-mikheev/topic/ikrit_err_fix
...
Topic/ikrit err fix
2015-11-11 15:31:06 +02:00
Mike Dubman
93847e4ca9
Merge pull request #1125 from igor-ivanov/pr/oshmem_new_mca_vars
...
oshmem: Add new mca variables oshmem_abort_delay and oshmem_abort_pri…
2015-11-11 14:34:12 +02:00
Mike Dubman
ae6b6ba05b
Merge pull request #1124 from igor-ivanov/pr/oshmem_error_output
...
oshmem: Enable force output for error messages
2015-11-11 14:33:40 +02:00
Alex Mikheev
cd8ea438d3
OSHMEM/SPML/ikrit: memcheck support
2015-11-11 13:46:20 +02:00
Alex Mikheev
2a8de45b43
OSHMEM/SPML/IKRIT: check return of mxm_req_send correctly
...
do not force memory registration if main and additional comm
channels are both ud
2015-11-11 13:34:26 +02:00
Igor Ivanov
f288cd7254
oshmem: Add new mca variables oshmem_abort_delay and oshmem_abort_print_stack
...
This commit allows to control output during abnormal oshmem application
termination.
2015-11-11 13:33:28 +02:00
Igor Ivanov
c0518c0417
oshmem: Enable force output for error messages
...
This change fixes issue when oshmem related error messages are not
visible for an user.
2015-11-11 13:26:10 +02:00
Jeff Squyres
8bd356549a
orte proc_info.h: use symbolic names
...
This fix was actually applied in the v2.x branch first (as commit
open-mpi/ompi-release@a9b22afc1a ).
2015-11-10 13:39:21 -08:00
Mark Santcroos
299fd69c6d
Merge branch 'master' into fix/alpsinfov3
2015-11-10 15:40:19 -05:00
Todd Kordenbrock
b9630f802b
Merge pull request #1120 from francois-wellenreiter/mtl_min_mdbind
...
mtl-portals4 : remove useless PtlMDBind PtlMDRelease calls for rendez-vous messages
2015-11-10 14:34:19 -06:00
rhc54
474a869b8d
Merge pull request #1121 from dmt4/orterun-manpage-typos
...
change -0bind-to and -bind-to to --bind-to in the manpages
2015-11-10 11:24:08 -08:00
rhc54
8af89a9f83
Merge pull request #1119 from rhc54/topic/fixtools
...
Prevent a segfault on tools if a connection attempt fails - tools don…
2015-11-10 10:41:39 -08:00
Dimitar Pashov
9f6e306064
change -0bind-to and -bind-to to --bind-to in the manpages
2015-11-10 17:44:53 +00:00
Ralph Castain
6a607d42a6
Prevent a segfault on tools if a connection attempt fails - tools don't open the opal/pmix framework and thus have no way of looking up a proc hostname
2015-11-10 09:11:34 -08:00
yosefe
d66b01d380
pml_ucx: implement cancel, and add small optimizations.
2015-11-10 17:40:06 +02:00
Mark Santcroos
8c255452cf
Merge branch 'master' into fix/alpsinfov3
2015-11-10 04:17:42 -05:00
Gilles Gouaillardet
d6ff25b9a2
pml/monitoring: initialize common symbols
2015-11-10 13:58:54 +09:00
Nathan Hjelm
6ae82ff090
Merge pull request #1115 from hjelmn/flist_fix
...
opal_free_list: fix strange size check
2015-11-09 20:55:46 -07:00
Nathan Hjelm
2c02294389
opal_free_list: fix strange size check
...
OPAL free lists can be initialized with a fragment size that differs
from the size of objects from a class. This allows the free list code
to support OPAL objects that have flexible array members.
Unfortunately the free list code will throw out the desired length in
some cases. The code in question was committed in
open-mpi/ompi@90fb58de . The side effects of this are varied and can
cause segmentation faults, assert failures, hangs, etc. This commit
adds a check to ensure the requested size is at least as large as the
class size and makes opal_free_list allocations always honor the
requested fragment size (as long as it is larger than the class
size).
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-09 19:47:55 -07:00
Gilles Gouaillardet
c415ecf39e
test/monitoring: build monitoring_prof lib only if dynamic libs are built
...
Thanks Mark Santcroos for reporting this
2015-11-10 11:33:12 +09:00
Mark Santcroos
5ec2b4d98c
Fix some messages in the process.
2015-11-09 18:03:26 -05:00
Ralph Castain
52ea538bc1
Per fix from Nysal: set the listener_active flag before starting the progress thread, and declare the flag to be volatile
2015-11-09 09:00:59 -08:00
Mark Santcroos
8ec89001b3
Merge branch 'master' into fix/alpsinfov3
2015-11-09 02:45:23 -05:00
Ralph Castain
9b0cdc0de2
Add support for -pernode and -npernode options to orte-submit
2015-11-08 11:34:18 -08:00
Ralph Castain
187fa9b131
Extend the scaling test script to support multiple starters, including mpirun, orterun (if mpirun not present), orte-dvm, and srun. Auto-detect which are p
...
resent and allow the user to run all of them. Auto-detect the number of nodes in the allocation.
2015-11-08 11:34:06 -08:00
Ralph Castain
f2805fb0f9
Provide a mechanism for renaming symbols in the opposite direction - i.e., #define prefix_foo[suffix] foo.
2015-11-07 18:07:09 -08:00
rhc54
c788d7bf88
Merge pull request #1110 from rhc54/topic/renames
...
Update the libevent renaming file to ensure that all public symbols a…
2015-11-07 17:22:56 -07:00
Ralph Castain
73c8c30c5d
Update the scaling.pl test script to support orte-dvm and srun
2015-11-07 13:13:36 -08:00
Ralph Castain
ee9aa67483
Update the libevent renaming file to ensure that all public symbols are covered
2015-11-07 12:52:31 -08:00
Ralph Castain
1f44fef4d6
Add ability to provide a suffix to the symbol renames
2015-11-07 12:37:14 -08:00
Ralph Castain
6864a9b68a
Add a new script for creating symbol hiding "rename" files
2015-11-07 12:11:54 -08:00
rhc54
66b1ef24de
Merge pull request #1108 from rhc54/topic/exitcode
...
Need to delay registration of the waitpid callback until after the fo…
2015-11-07 08:23:13 -07:00