Geoff Paulsen
4917e44a7d
Merge pull request #2832 from jjhursey/topic/ibm/osc-base-dt-abort
...
osc/base: Detect unsupported data types and abort
2017-02-05 04:26:04 -06:00
Howard Pritchard
f4ad119693
Merge pull request #2914 from hppritcha/topic/nbc_compiler_warning
...
swat some compiler warnings
2017-02-04 11:56:52 -05:00
Howard Pritchard
acaecb2448
swat some compiler warnings
...
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-02-03 08:28:15 -07:00
Ralph Castain
ead453ee8e
Merge pull request #2911 from rhc54/topic/retry
...
For performance, try to send the oob/tcp message a few times before dropping back into the event library
2017-02-02 12:57:18 -08:00
Ralph Castain
b661275dba
For performance, try to send the oob/tcp message a few times before dropping back into the event library
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-02 06:44:15 -08:00
Gilles Gouaillardet
e879d2910a
coll/tuned: make coll_tuned_gather_algorithms MCA settable
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-02 11:00:38 +09:00
Ralph Castain
50ca9fb66b
Merge pull request #2893 from rhc54/topic/sim
...
Cleanup the ras simulator capability, and the relay route thru grpcomm
2017-02-01 16:17:40 -08:00
Ralph Castain
230d15f0d9
Cleanup the ras simulator capability, and the relay route thru grpcomm
...
direct. Don't resend wireup info if nothing has changed
Fix release of buffer
Correct the unpacking order
Fix the DVM - now minimized data transfer to it
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-01 15:01:58 -08:00
Nathan Hjelm
362ac8b87e
osc/pt2pt: fix threading issues
...
This commit fixes a number of threading issues discovered in
osc/pt2pt. This includes:
- Lock the synchronization object not the module in osc_pt2pt_start.
This fixes a race between the start function and processing post
messages.
- Always lock before calling cond_broadcast. Fixes a race between
the waiting thread and signaling thread.
- Make all atomically updated values volatile.
- Make the module lock recursive to protect against some deadlock
conditions. Will roll this back once the locks have been
re-designed.
- Mark incoming complete *after* completing an accumulate not
before. This was causing an incorrect answer under certain
conditions.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-01 10:33:01 -07:00
Ralph Castain
a9d836bae3
Merge pull request #2890 from rhc54/topic/alps
...
Correct the path to the ORTE data dir - allows master to be built with --no-ompi
2017-02-01 07:47:33 -08:00
Ralph Castain
8bf3ac828c
Correct the path to the ORTE data dir - allows master to be built with --no-ompi
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-01 07:30:18 -07:00
Howard Pritchard
e62fca896f
Merge pull request #2889 from hppritcha/topic/fix_ess_alps_makefie
...
ess/alps: fix problem in makefile
2017-02-01 05:46:51 -05:00
Howard Pritchard
db4039f565
ess/alps: fix problem in makefile
...
./autogen.pl --no-ompi doesn't work without this
fix when alps can be configured.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-01-31 21:56:16 -06:00
Gilles Gouaillardet
02558134ef
coll/base: remove unused local variable
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-01 11:54:17 +09:00
Gilles Gouaillardet
ad44ecb2ba
pml/base: initialize global variables
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-01 11:49:47 +09:00
bosilca
c331e6794c
Allow all tuned MCA parameters to be modified programatically. ( #2829 )
...
Fix a comment in the MCA header.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-01-31 21:47:36 -05:00
Ralph Castain
6cb484a3cb
Merge pull request #2887 from rhc54/topic/update
...
Update to latest PMIx master
2017-01-31 11:05:37 -08:00
Jeff Squyres
45b791542c
Merge pull request #2809 from jjhursey/fix/ibm/opal-verbose
...
opal/output: Make sure verbose gets updated when id 0 gets updated.
2017-01-31 12:18:38 -05:00
Josh Hursey
5fcd69da52
Merge pull request #2831 from jjhursey/topic/ibm/pml-bsend
...
pml/base: Expose some bsend varaibles so PMLs may reference them
2017-01-31 10:31:42 -06:00
Josh Hursey
31faf0a950
Merge pull request #2861 from jjhursey/topic/ibm/master/orted-timeout-improv
...
orterun: Add parameter to control when we give up on stack traces
2017-01-31 10:25:57 -06:00
Ralph Castain
edcfdf2365
Update to latest PMIx master
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-31 08:01:37 -08:00
Yossi
0b822522fb
Merge pull request #2883 from alex-mikheev/topic/oshmem_mem_prefetch
...
oshmem: mem use hook: apply code review fixes
2017-01-30 12:39:04 +02:00
Alex Mikheev
ea3ea4835b
oshmem: mem use hook: apply code review fixes
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
(cherry picked from commit a422154a141f0be5b92d2b6c26d7b2b4176dfe18)
2017-01-30 11:30:20 +02:00
Gilles Gouaillardet
6c9ba35d2d
Merge pull request #2880 from ggouaillardet/topic/red_sched_chain
...
coll/libnbc: fix the red_schain algo of ireduce with MPI_IN_PLACE
2017-01-30 15:08:03 +09:00
Gilles Gouaillardet
9bcadbd51b
coll/libnbc: fix the red_schain algo of ireduce with MPI_IN_PLACE
...
this fixes a regression introduced in open-mpi/ompi@045d0c5f4c
Fixes open-mpi/ompi#2879
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 14:19:45 +09:00
Gilles Gouaillardet
b12ab2b4f2
Merge pull request #2857 from ggouaillardet/topic/pmix_ext11
...
pmix/ext11 fixes, plugs and rename
2017-01-30 11:44:07 +09:00
Gilles Gouaillardet
b078e57e73
pmix/ext1x: fix misc memory leaks in namespace registration
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:42 +09:00
Gilles Gouaillardet
f51fc293a2
ext1x/pmix1x_client: plug misc memory leaks
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:42 +09:00
Gilles Gouaillardet
022cca79ea
pmix/ext1x: plug a memory leak in opal_lkupcbfunc()
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:36 +09:00
Gilles Gouaillardet
f485d12a82
pmix: rename the ext11 component into ext1x
...
also use the same naming scheme thann pmix/ext2x
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:35 +09:00
Gilles Gouaillardet
dccb1899e6
pmix/ext11: correctly use PMIx_server_register_nspace()
...
PMIx_server_register_nspace() is an asynchronous operation, so
the pmix glue wait for it completes before returning.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 09:23:19 +09:00
Gilles Gouaillardet
6955e1e25c
pmix/ext11: fix compilation
...
the argc field from the opal_pmix_app_t struct was removed,
so adjust the pmix/ext11 glue accordingly.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 09:23:18 +09:00
Ralph Castain
7ab26a4946
Merge pull request #2878 from rhc54/topic/scaling
...
Add new platform files. Modify scaling.pl to support ppn option
2017-01-29 15:57:04 -08:00
Ralph Castain
28abe78f8c
Add new platform files. Modify scaling.pl to support ppn option
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-29 15:55:49 -08:00
Mike Dubman
048d47df48
Merge pull request #2874 from yosefe/topic/pml-yalla-fix-dt-leak
...
yalla: fix memory leak with blocking non-contig send.
2017-01-29 19:36:37 +02:00
Yossi Itigin
13c3bf0dd7
yalla: fix memory leak with blocking non-contig send.
...
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-01-29 18:51:43 +02:00
Yossi
149ecef289
Merge pull request #2845 from alex-mikheev/topic/oshmem_mem_prefetch
...
oshmem: spml: add memory allocation hook
2017-01-29 14:15:01 +02:00
Alex Mikheev
9da9e6260d
oshmem: spml ucx: on error print ucx error string
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-29 10:28:24 +02:00
Ralph Castain
2b2ea2fed2
Merge pull request #2869 from rhc54/topic/staticports
...
Fix static port and partial allocation operations
2017-01-28 11:03:11 -08:00
Ralph Castain
b59ae14a2a
Fix static port and partial allocation operations
...
Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message.
Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-28 10:09:44 -08:00
Howard Pritchard
47450eb282
Merge pull request #2868 from hppritcha/topic/typo_fix
...
mca help: fix typo found by user
2017-01-28 09:53:47 -07:00
Howard Pritchard
fca45a2742
mca help: fix typo found by user
...
Fix typo found by @pozdneev
Fixes #2821
bot:notest
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-01-28 09:37:43 -07:00
Ralph Castain
06ef1aafb1
Merge pull request #2867 from rhc54/topic/spawn
...
Cleanup a typo that can cause a segfault
2017-01-27 17:45:25 -08:00
Ralph Castain
a865d4060c
Merge pull request #2866 from rhc54/topic/qrsh
...
Minor change to allow qrsh to tree spawn, if supported
2017-01-27 17:27:15 -08:00
Ralph Castain
3302864a7d
Cleanup a typo that can cause a segfault - use a local variable name different than the one passed into the function
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 16:49:25 -08:00
Ralph Castain
c803af5d3d
Minor change to allow qrsh to tree spawn, if supported
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 16:34:08 -08:00
Ralph Castain
410befd255
Merge pull request #2864 from rhc54/topic/rsh
...
Repair rsh/ssh tree spawn
2017-01-27 16:31:35 -08:00
Ralph Castain
3440b46e5e
Merge pull request #2820 from rhc54/topic/async
...
Per f2f meeting: if async modex is given, default to no MPI init barr…
2017-01-27 15:43:43 -08:00
Ralph Castain
7c795f4416
If the HNP is going to request topology info, it cannot do so via a routed OOB message as the intervening daemons may not be ready. So disable routing until the VM is ready, and have daemons start routing as they receive the xcast launch msg (which includes the data they need to talk to their peers).
...
Do a little optimization and minimize recomputation of the routing plan.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 15:37:16 -08:00
Ralph Castain
d672fad849
Repair rsh/ssh tree spawn
...
Repair rsh/ssh tree spawn by unpacking and updating the nidmap in remote_spawn.
Add more specific error messages so the cause of a messaging problem is a little clearer. Remove some stale code. Ensure we stop trying to send a message after a few times.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 11:35:00 -08:00