Howard Pritchard
e62fca896f
Merge pull request #2889 from hppritcha/topic/fix_ess_alps_makefie
...
ess/alps: fix problem in makefile
2017-02-01 05:46:51 -05:00
Howard Pritchard
db4039f565
ess/alps: fix problem in makefile
...
./autogen.pl --no-ompi doesn't work without this
fix when alps can be configured.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-01-31 21:56:16 -06:00
Gilles Gouaillardet
02558134ef
coll/base: remove unused local variable
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-01 11:54:17 +09:00
Gilles Gouaillardet
ad44ecb2ba
pml/base: initialize global variables
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-01 11:49:47 +09:00
bosilca
c331e6794c
Allow all tuned MCA parameters to be modified programatically. ( #2829 )
...
Fix a comment in the MCA header.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-01-31 21:47:36 -05:00
Ralph Castain
6cb484a3cb
Merge pull request #2887 from rhc54/topic/update
...
Update to latest PMIx master
2017-01-31 11:05:37 -08:00
Jeff Squyres
45b791542c
Merge pull request #2809 from jjhursey/fix/ibm/opal-verbose
...
opal/output: Make sure verbose gets updated when id 0 gets updated.
2017-01-31 12:18:38 -05:00
Josh Hursey
5fcd69da52
Merge pull request #2831 from jjhursey/topic/ibm/pml-bsend
...
pml/base: Expose some bsend varaibles so PMLs may reference them
2017-01-31 10:31:42 -06:00
Josh Hursey
31faf0a950
Merge pull request #2861 from jjhursey/topic/ibm/master/orted-timeout-improv
...
orterun: Add parameter to control when we give up on stack traces
2017-01-31 10:25:57 -06:00
Ralph Castain
edcfdf2365
Update to latest PMIx master
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-31 08:01:37 -08:00
Yossi
0b822522fb
Merge pull request #2883 from alex-mikheev/topic/oshmem_mem_prefetch
...
oshmem: mem use hook: apply code review fixes
2017-01-30 12:39:04 +02:00
Alex Mikheev
ea3ea4835b
oshmem: mem use hook: apply code review fixes
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
(cherry picked from commit a422154a141f0be5b92d2b6c26d7b2b4176dfe18)
2017-01-30 11:30:20 +02:00
Gilles Gouaillardet
6c9ba35d2d
Merge pull request #2880 from ggouaillardet/topic/red_sched_chain
...
coll/libnbc: fix the red_schain algo of ireduce with MPI_IN_PLACE
2017-01-30 15:08:03 +09:00
Gilles Gouaillardet
9bcadbd51b
coll/libnbc: fix the red_schain algo of ireduce with MPI_IN_PLACE
...
this fixes a regression introduced in open-mpi/ompi@045d0c5f4c
Fixes open-mpi/ompi#2879
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 14:19:45 +09:00
Gilles Gouaillardet
b12ab2b4f2
Merge pull request #2857 from ggouaillardet/topic/pmix_ext11
...
pmix/ext11 fixes, plugs and rename
2017-01-30 11:44:07 +09:00
Gilles Gouaillardet
b078e57e73
pmix/ext1x: fix misc memory leaks in namespace registration
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:42 +09:00
Gilles Gouaillardet
f51fc293a2
ext1x/pmix1x_client: plug misc memory leaks
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:42 +09:00
Gilles Gouaillardet
022cca79ea
pmix/ext1x: plug a memory leak in opal_lkupcbfunc()
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:36 +09:00
Gilles Gouaillardet
f485d12a82
pmix: rename the ext11 component into ext1x
...
also use the same naming scheme thann pmix/ext2x
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:35 +09:00
Gilles Gouaillardet
dccb1899e6
pmix/ext11: correctly use PMIx_server_register_nspace()
...
PMIx_server_register_nspace() is an asynchronous operation, so
the pmix glue wait for it completes before returning.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 09:23:19 +09:00
Gilles Gouaillardet
6955e1e25c
pmix/ext11: fix compilation
...
the argc field from the opal_pmix_app_t struct was removed,
so adjust the pmix/ext11 glue accordingly.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 09:23:18 +09:00
Ralph Castain
7ab26a4946
Merge pull request #2878 from rhc54/topic/scaling
...
Add new platform files. Modify scaling.pl to support ppn option
2017-01-29 15:57:04 -08:00
Ralph Castain
28abe78f8c
Add new platform files. Modify scaling.pl to support ppn option
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-29 15:55:49 -08:00
Mike Dubman
048d47df48
Merge pull request #2874 from yosefe/topic/pml-yalla-fix-dt-leak
...
yalla: fix memory leak with blocking non-contig send.
2017-01-29 19:36:37 +02:00
Yossi Itigin
13c3bf0dd7
yalla: fix memory leak with blocking non-contig send.
...
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-01-29 18:51:43 +02:00
Yossi
149ecef289
Merge pull request #2845 from alex-mikheev/topic/oshmem_mem_prefetch
...
oshmem: spml: add memory allocation hook
2017-01-29 14:15:01 +02:00
Alex Mikheev
9da9e6260d
oshmem: spml ucx: on error print ucx error string
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-29 10:28:24 +02:00
Ralph Castain
2b2ea2fed2
Merge pull request #2869 from rhc54/topic/staticports
...
Fix static port and partial allocation operations
2017-01-28 11:03:11 -08:00
Ralph Castain
b59ae14a2a
Fix static port and partial allocation operations
...
Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message.
Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-28 10:09:44 -08:00
Howard Pritchard
47450eb282
Merge pull request #2868 from hppritcha/topic/typo_fix
...
mca help: fix typo found by user
2017-01-28 09:53:47 -07:00
Howard Pritchard
fca45a2742
mca help: fix typo found by user
...
Fix typo found by @pozdneev
Fixes #2821
bot:notest
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-01-28 09:37:43 -07:00
Ralph Castain
06ef1aafb1
Merge pull request #2867 from rhc54/topic/spawn
...
Cleanup a typo that can cause a segfault
2017-01-27 17:45:25 -08:00
Ralph Castain
a865d4060c
Merge pull request #2866 from rhc54/topic/qrsh
...
Minor change to allow qrsh to tree spawn, if supported
2017-01-27 17:27:15 -08:00
Ralph Castain
3302864a7d
Cleanup a typo that can cause a segfault - use a local variable name different than the one passed into the function
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 16:49:25 -08:00
Ralph Castain
c803af5d3d
Minor change to allow qrsh to tree spawn, if supported
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 16:34:08 -08:00
Ralph Castain
410befd255
Merge pull request #2864 from rhc54/topic/rsh
...
Repair rsh/ssh tree spawn
2017-01-27 16:31:35 -08:00
Ralph Castain
3440b46e5e
Merge pull request #2820 from rhc54/topic/async
...
Per f2f meeting: if async modex is given, default to no MPI init barr…
2017-01-27 15:43:43 -08:00
Ralph Castain
7c795f4416
If the HNP is going to request topology info, it cannot do so via a routed OOB message as the intervening daemons may not be ready. So disable routing until the VM is ready, and have daemons start routing as they receive the xcast launch msg (which includes the data they need to talk to their peers).
...
Do a little optimization and minimize recomputation of the routing plan.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 15:37:16 -08:00
Ralph Castain
d672fad849
Repair rsh/ssh tree spawn
...
Repair rsh/ssh tree spawn by unpacking and updating the nidmap in remote_spawn.
Add more specific error messages so the cause of a messaging problem is a little clearer. Remove some stale code. Ensure we stop trying to send a message after a few times.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 11:35:00 -08:00
Joshua Hursey
3c47432e3d
orterun: Add parameter to control when we give up on stack traces
...
* MCA option to control how long we wait for stack traces:
- orte_timeout_for_stack_trace INTEGER
Default: 30
Setting to <= 0 will cause it to wait forever
* Useful when gathering stack traces from large jobs which might take
a long time.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-27 09:16:35 -06:00
Josh Hursey
f4a86904c4
Merge pull request #2813 from jjhursey/fix/ibm/comm-cleanup
...
communicator: Fix uninitialized variable
2017-01-26 14:35:32 -06:00
Josh Hursey
2e64bf42fb
Merge pull request #2810 from jjhursey/fix/ibm/stdiag-to-stdout
...
Extend options for stddiag routing
2017-01-26 14:29:16 -06:00
Josh Hursey
770c41f493
Merge pull request #2807 from jjhursey/fix/ibm/event-external
...
libevent/external: Add opal_event_include to this component
2017-01-26 14:26:50 -06:00
Josh Hursey
ebc90f926e
Merge pull request #2806 from jjhursey/fix/ibm/aint-diff-type
...
Fix a minor error at MPI_AINT_DIFF.
2017-01-26 14:23:21 -06:00
Josh Hursey
0408c116eb
Merge pull request #2805 from jjhursey/fix/ibm/base-allgatherv
...
coll/base: Allgatherv MPI_IN_PLACE Bug
2017-01-26 14:21:57 -06:00
Jeff Squyres
b9d7b27cfa
Merge pull request #2855 from gpaulsen/fix/ibm/nbc_ireduce_anysrc
...
Fixing comment only in MPI_IN_PLACE case for ireduce in libnbc.
2017-01-26 11:00:50 -08:00
Geoffrey Paulsen
d2527cff46
Fixing comment only in MPI_IN_PLACE case for ireduce in libnbc.
...
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2017-01-26 10:58:51 -08:00
Jeff Squyres
2c277a66fd
Merge pull request #2772 from jjhursey/topic/stacktrace-improv
...
master: opal/stacktrace improvements
2017-01-26 10:48:41 -08:00
Joshua Hursey
6d98559be9
stacktrace: Add flexibility in stacktrace ouptut
...
- New MCA option: opal_stacktrace_output
- Specifies where the stack trace output stream goes.
- Accepts: none, stdout, stderr, file[:filename]
- Default filename 'stacktrace'
- Filename will be `stacktrace.PID`, or if VPID is available,
then the filename will be `stacktrace.VPID.PID`
- Update util/stacktrace to allow for different output avenues
including files. Previously this was hardcoded to 'stderr'.
- Since opal_backtrace_print needs to be signal safe, passing it a
FILE object that actually represents a file stream is difficult. This
is because we cannot open the file in the signal handler using
`fopen` (not safe), but have to use `open` (safe). Additionally, we
cannot use `fdopen` to convert the `int fd` to a `FILE *fh` since it
is also not signal safe.
- I did not want to break the backtrace.h API so I introduced a new
rule (documented in `backtrace.c`) that if the `FILE *file`
argument is `NULL` then look for the `opal_stacktrace_output_fileno`
variable to tell you which file descriptor to use for output.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-26 11:55:32 -06:00
Joshua Hursey
f8918e37a9
opal/stacktace: Raise the signal after processing
...
- This prevents us for accidentally masking a signal that was meant to
terminate the application.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-26 11:55:28 -06:00