Nathan Hjelm
3c18f2f1d9
Merge pull request #2924 from hjelmn/ras_slurm
...
ras/slurm: fix compile error due to missing header
2017-02-06 09:33:58 -07:00
Gilles Gouaillardet
d4d4cab5bf
orte/util: fix OPAL_HAVE_ZLIB usage
...
use #if instead of #ifdef
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-05 16:24:10 +01:00
Geoff Paulsen
4917e44a7d
Merge pull request #2832 from jjhursey/topic/ibm/osc-base-dt-abort
...
osc/base: Detect unsupported data types and abort
2017-02-05 04:26:04 -06:00
Carlos Bederián
ccea3de44c
amd64 timers: use lfence instead of cpuid for serialization
...
Signed-off-by: Carlos Bederián <bc@famaf.unc.edu.ar>
2017-02-04 18:50:29 -03:00
Carlos Bederián
4009ba6b94
opal_progress: use usec native timer only when a native cycle counter isn't available
...
Signed-off-by: Carlos Bederián <bc@famaf.unc.edu.ar>
2017-02-04 18:31:14 -03:00
Howard Pritchard
f4ad119693
Merge pull request #2914 from hppritcha/topic/nbc_compiler_warning
...
swat some compiler warnings
2017-02-04 11:56:52 -05:00
Artem Polyakov
9f7e2098ac
orte: Fix MPI_Spawn
...
Register namespace even if there is no node-local processes that
belongs to it. We need this for the MPI_Spawn case.
Addressing https://github.com/open-mpi/ompi/issues/2920 .
Was introduced in be3ef77739
.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-02-04 12:07:00 +07:00
Nathan Hjelm
b928a6b9ea
ras/slurm: fix compile error due to missing header
...
On some systems this component fails to build due to the missing
netdb.h header.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-03 15:22:34 -07:00
Nathan Hjelm
1c4b735f5f
oob/tcp: cleanup peers before event bases
...
This commit fixes an error in teardown where the event bases are town
down before the peer structures are released. This causes us to call
event_del on an invalid event base. At best this makes valgrind
complain and at worst this causes aborts or segvs.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-03 15:18:41 -07:00
Howard Pritchard
acaecb2448
swat some compiler warnings
...
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-02-03 08:28:15 -07:00
Ralph Castain
ead453ee8e
Merge pull request #2911 from rhc54/topic/retry
...
For performance, try to send the oob/tcp message a few times before dropping back into the event library
2017-02-02 12:57:18 -08:00
Ralph Castain
b661275dba
For performance, try to send the oob/tcp message a few times before dropping back into the event library
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-02 06:44:15 -08:00
Gilles Gouaillardet
e879d2910a
coll/tuned: make coll_tuned_gather_algorithms MCA settable
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-02 11:00:38 +09:00
Ralph Castain
50ca9fb66b
Merge pull request #2893 from rhc54/topic/sim
...
Cleanup the ras simulator capability, and the relay route thru grpcomm
2017-02-01 16:17:40 -08:00
Ralph Castain
230d15f0d9
Cleanup the ras simulator capability, and the relay route thru grpcomm
...
direct. Don't resend wireup info if nothing has changed
Fix release of buffer
Correct the unpacking order
Fix the DVM - now minimized data transfer to it
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-01 15:01:58 -08:00
Nathan Hjelm
362ac8b87e
osc/pt2pt: fix threading issues
...
This commit fixes a number of threading issues discovered in
osc/pt2pt. This includes:
- Lock the synchronization object not the module in osc_pt2pt_start.
This fixes a race between the start function and processing post
messages.
- Always lock before calling cond_broadcast. Fixes a race between
the waiting thread and signaling thread.
- Make all atomically updated values volatile.
- Make the module lock recursive to protect against some deadlock
conditions. Will roll this back once the locks have been
re-designed.
- Mark incoming complete *after* completing an accumulate not
before. This was causing an incorrect answer under certain
conditions.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-01 10:33:01 -07:00
Ralph Castain
a9d836bae3
Merge pull request #2890 from rhc54/topic/alps
...
Correct the path to the ORTE data dir - allows master to be built with --no-ompi
2017-02-01 07:47:33 -08:00
Ralph Castain
8bf3ac828c
Correct the path to the ORTE data dir - allows master to be built with --no-ompi
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-01 07:30:18 -07:00
Howard Pritchard
e62fca896f
Merge pull request #2889 from hppritcha/topic/fix_ess_alps_makefie
...
ess/alps: fix problem in makefile
2017-02-01 05:46:51 -05:00
Howard Pritchard
db4039f565
ess/alps: fix problem in makefile
...
./autogen.pl --no-ompi doesn't work without this
fix when alps can be configured.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-01-31 21:56:16 -06:00
Gilles Gouaillardet
02558134ef
coll/base: remove unused local variable
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-01 11:54:17 +09:00
Gilles Gouaillardet
ad44ecb2ba
pml/base: initialize global variables
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-01 11:49:47 +09:00
bosilca
c331e6794c
Allow all tuned MCA parameters to be modified programatically. ( #2829 )
...
Fix a comment in the MCA header.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-01-31 21:47:36 -05:00
Ralph Castain
6cb484a3cb
Merge pull request #2887 from rhc54/topic/update
...
Update to latest PMIx master
2017-01-31 11:05:37 -08:00
Jeff Squyres
45b791542c
Merge pull request #2809 from jjhursey/fix/ibm/opal-verbose
...
opal/output: Make sure verbose gets updated when id 0 gets updated.
2017-01-31 12:18:38 -05:00
Josh Hursey
5fcd69da52
Merge pull request #2831 from jjhursey/topic/ibm/pml-bsend
...
pml/base: Expose some bsend varaibles so PMLs may reference them
2017-01-31 10:31:42 -06:00
Josh Hursey
31faf0a950
Merge pull request #2861 from jjhursey/topic/ibm/master/orted-timeout-improv
...
orterun: Add parameter to control when we give up on stack traces
2017-01-31 10:25:57 -06:00
Ralph Castain
edcfdf2365
Update to latest PMIx master
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-31 08:01:37 -08:00
Yossi
0b822522fb
Merge pull request #2883 from alex-mikheev/topic/oshmem_mem_prefetch
...
oshmem: mem use hook: apply code review fixes
2017-01-30 12:39:04 +02:00
Alex Mikheev
ea3ea4835b
oshmem: mem use hook: apply code review fixes
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
(cherry picked from commit a422154a141f0be5b92d2b6c26d7b2b4176dfe18)
2017-01-30 11:30:20 +02:00
Gilles Gouaillardet
6c9ba35d2d
Merge pull request #2880 from ggouaillardet/topic/red_sched_chain
...
coll/libnbc: fix the red_schain algo of ireduce with MPI_IN_PLACE
2017-01-30 15:08:03 +09:00
Gilles Gouaillardet
9bcadbd51b
coll/libnbc: fix the red_schain algo of ireduce with MPI_IN_PLACE
...
this fixes a regression introduced in open-mpi/ompi@045d0c5f4c
Fixes open-mpi/ompi#2879
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 14:19:45 +09:00
Gilles Gouaillardet
b12ab2b4f2
Merge pull request #2857 from ggouaillardet/topic/pmix_ext11
...
pmix/ext11 fixes, plugs and rename
2017-01-30 11:44:07 +09:00
Gilles Gouaillardet
b078e57e73
pmix/ext1x: fix misc memory leaks in namespace registration
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:42 +09:00
Gilles Gouaillardet
f51fc293a2
ext1x/pmix1x_client: plug misc memory leaks
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:42 +09:00
Gilles Gouaillardet
022cca79ea
pmix/ext1x: plug a memory leak in opal_lkupcbfunc()
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:36 +09:00
Gilles Gouaillardet
f485d12a82
pmix: rename the ext11 component into ext1x
...
also use the same naming scheme thann pmix/ext2x
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:35 +09:00
Gilles Gouaillardet
dccb1899e6
pmix/ext11: correctly use PMIx_server_register_nspace()
...
PMIx_server_register_nspace() is an asynchronous operation, so
the pmix glue wait for it completes before returning.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 09:23:19 +09:00
Gilles Gouaillardet
6955e1e25c
pmix/ext11: fix compilation
...
the argc field from the opal_pmix_app_t struct was removed,
so adjust the pmix/ext11 glue accordingly.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 09:23:18 +09:00
Ralph Castain
7ab26a4946
Merge pull request #2878 from rhc54/topic/scaling
...
Add new platform files. Modify scaling.pl to support ppn option
2017-01-29 15:57:04 -08:00
Ralph Castain
28abe78f8c
Add new platform files. Modify scaling.pl to support ppn option
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-29 15:55:49 -08:00
Mike Dubman
048d47df48
Merge pull request #2874 from yosefe/topic/pml-yalla-fix-dt-leak
...
yalla: fix memory leak with blocking non-contig send.
2017-01-29 19:36:37 +02:00
Yossi Itigin
13c3bf0dd7
yalla: fix memory leak with blocking non-contig send.
...
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-01-29 18:51:43 +02:00
Yossi
149ecef289
Merge pull request #2845 from alex-mikheev/topic/oshmem_mem_prefetch
...
oshmem: spml: add memory allocation hook
2017-01-29 14:15:01 +02:00
Alex Mikheev
9da9e6260d
oshmem: spml ucx: on error print ucx error string
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-29 10:28:24 +02:00
Ralph Castain
2b2ea2fed2
Merge pull request #2869 from rhc54/topic/staticports
...
Fix static port and partial allocation operations
2017-01-28 11:03:11 -08:00
Ralph Castain
b59ae14a2a
Fix static port and partial allocation operations
...
Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message.
Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-28 10:09:44 -08:00
Howard Pritchard
47450eb282
Merge pull request #2868 from hppritcha/topic/typo_fix
...
mca help: fix typo found by user
2017-01-28 09:53:47 -07:00
Howard Pritchard
fca45a2742
mca help: fix typo found by user
...
Fix typo found by @pozdneev
Fixes #2821
bot:notest
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-01-28 09:37:43 -07:00
Ralph Castain
06ef1aafb1
Merge pull request #2867 from rhc54/topic/spawn
...
Cleanup a typo that can cause a segfault
2017-01-27 17:45:25 -08:00