1
1
Граф коммитов

26922 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
e62fca896f Merge pull request #2889 from hppritcha/topic/fix_ess_alps_makefie
ess/alps: fix problem in makefile
2017-02-01 05:46:51 -05:00
Howard Pritchard
db4039f565 ess/alps: fix problem in makefile
./autogen.pl --no-ompi doesn't work without this
fix when alps can be configured.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-01-31 21:56:16 -06:00
Gilles Gouaillardet
02558134ef coll/base: remove unused local variable
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-01 11:54:17 +09:00
Gilles Gouaillardet
ad44ecb2ba pml/base: initialize global variables
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-01 11:49:47 +09:00
bosilca
c331e6794c Allow all tuned MCA parameters to be modified programatically. (#2829)
Fix a comment in the MCA header.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-01-31 21:47:36 -05:00
Ralph Castain
6cb484a3cb Merge pull request #2887 from rhc54/topic/update
Update to latest PMIx master
2017-01-31 11:05:37 -08:00
Jeff Squyres
45b791542c Merge pull request #2809 from jjhursey/fix/ibm/opal-verbose
opal/output: Make sure verbose gets updated when id 0 gets updated.
2017-01-31 12:18:38 -05:00
Josh Hursey
5fcd69da52 Merge pull request #2831 from jjhursey/topic/ibm/pml-bsend
pml/base: Expose some bsend varaibles so PMLs may reference them
2017-01-31 10:31:42 -06:00
Josh Hursey
31faf0a950 Merge pull request #2861 from jjhursey/topic/ibm/master/orted-timeout-improv
orterun: Add parameter to control when we give up on stack traces
2017-01-31 10:25:57 -06:00
Ralph Castain
edcfdf2365 Update to latest PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-31 08:01:37 -08:00
Yossi
0b822522fb Merge pull request #2883 from alex-mikheev/topic/oshmem_mem_prefetch
oshmem: mem use hook: apply code review fixes
2017-01-30 12:39:04 +02:00
Alex Mikheev
ea3ea4835b oshmem: mem use hook: apply code review fixes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
(cherry picked from commit a422154a141f0be5b92d2b6c26d7b2b4176dfe18)
2017-01-30 11:30:20 +02:00
Gilles Gouaillardet
6c9ba35d2d Merge pull request #2880 from ggouaillardet/topic/red_sched_chain
coll/libnbc: fix the red_schain algo of ireduce with MPI_IN_PLACE
2017-01-30 15:08:03 +09:00
Gilles Gouaillardet
9bcadbd51b coll/libnbc: fix the red_schain algo of ireduce with MPI_IN_PLACE
this fixes a regression introduced in open-mpi/ompi@045d0c5f4c

Fixes open-mpi/ompi#2879

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 14:19:45 +09:00
Gilles Gouaillardet
b12ab2b4f2 Merge pull request #2857 from ggouaillardet/topic/pmix_ext11
pmix/ext11 fixes, plugs and rename
2017-01-30 11:44:07 +09:00
Gilles Gouaillardet
b078e57e73 pmix/ext1x: fix misc memory leaks in namespace registration
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:42 +09:00
Gilles Gouaillardet
f51fc293a2 ext1x/pmix1x_client: plug misc memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:42 +09:00
Gilles Gouaillardet
022cca79ea pmix/ext1x: plug a memory leak in opal_lkupcbfunc()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:36 +09:00
Gilles Gouaillardet
f485d12a82 pmix: rename the ext11 component into ext1x
also use the same naming scheme thann pmix/ext2x

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:35 +09:00
Gilles Gouaillardet
dccb1899e6 pmix/ext11: correctly use PMIx_server_register_nspace()
PMIx_server_register_nspace() is an asynchronous operation, so
the pmix glue wait for it completes before returning.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 09:23:19 +09:00
Gilles Gouaillardet
6955e1e25c pmix/ext11: fix compilation
the argc field from the opal_pmix_app_t struct was removed,
so adjust the pmix/ext11 glue accordingly.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 09:23:18 +09:00
Ralph Castain
7ab26a4946 Merge pull request #2878 from rhc54/topic/scaling
Add new platform files. Modify scaling.pl to support ppn option
2017-01-29 15:57:04 -08:00
Ralph Castain
28abe78f8c Add new platform files. Modify scaling.pl to support ppn option
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-29 15:55:49 -08:00
Mike Dubman
048d47df48 Merge pull request #2874 from yosefe/topic/pml-yalla-fix-dt-leak
yalla: fix memory leak with blocking non-contig send.
2017-01-29 19:36:37 +02:00
Yossi Itigin
13c3bf0dd7 yalla: fix memory leak with blocking non-contig send.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-01-29 18:51:43 +02:00
Yossi
149ecef289 Merge pull request #2845 from alex-mikheev/topic/oshmem_mem_prefetch
oshmem: spml: add memory allocation hook
2017-01-29 14:15:01 +02:00
Alex Mikheev
9da9e6260d
oshmem: spml ucx: on error print ucx error string
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-29 10:28:24 +02:00
Ralph Castain
2b2ea2fed2 Merge pull request #2869 from rhc54/topic/staticports
Fix static port and partial allocation operations
2017-01-28 11:03:11 -08:00
Ralph Castain
b59ae14a2a Fix static port and partial allocation operations
Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message.

Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-28 10:09:44 -08:00
Howard Pritchard
47450eb282 Merge pull request #2868 from hppritcha/topic/typo_fix
mca help: fix typo found by user
2017-01-28 09:53:47 -07:00
Howard Pritchard
fca45a2742 mca help: fix typo found by user
Fix typo found by @pozdneev

Fixes #2821

bot:notest

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-01-28 09:37:43 -07:00
Ralph Castain
06ef1aafb1 Merge pull request #2867 from rhc54/topic/spawn
Cleanup a typo that can cause a segfault
2017-01-27 17:45:25 -08:00
Ralph Castain
a865d4060c Merge pull request #2866 from rhc54/topic/qrsh
Minor change to allow qrsh to tree spawn, if supported
2017-01-27 17:27:15 -08:00
Ralph Castain
3302864a7d Cleanup a typo that can cause a segfault - use a local variable name different than the one passed into the function
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 16:49:25 -08:00
Ralph Castain
c803af5d3d Minor change to allow qrsh to tree spawn, if supported
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 16:34:08 -08:00
Ralph Castain
410befd255 Merge pull request #2864 from rhc54/topic/rsh
Repair rsh/ssh tree spawn
2017-01-27 16:31:35 -08:00
Ralph Castain
3440b46e5e Merge pull request #2820 from rhc54/topic/async
Per f2f meeting: if async modex is given, default to no MPI init barr…
2017-01-27 15:43:43 -08:00
Ralph Castain
7c795f4416 If the HNP is going to request topology info, it cannot do so via a routed OOB message as the intervening daemons may not be ready. So disable routing until the VM is ready, and have daemons start routing as they receive the xcast launch msg (which includes the data they need to talk to their peers).
Do a little optimization and minimize recomputation of the routing plan.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 15:37:16 -08:00
Ralph Castain
d672fad849 Repair rsh/ssh tree spawn
Repair rsh/ssh tree spawn by unpacking and updating the nidmap in remote_spawn.

Add more specific error messages so the cause of a messaging problem is a little clearer. Remove some stale code. Ensure we stop trying to send a message after a few times.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 11:35:00 -08:00
Joshua Hursey
3c47432e3d orterun: Add parameter to control when we give up on stack traces
* MCA option to control how long we wait for stack traces:
   - orte_timeout_for_stack_trace INTEGER
     Default: 30
     Setting to <= 0 will cause it to wait forever
 * Useful when gathering stack traces from large jobs which might take
   a long time.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-27 09:16:35 -06:00
Josh Hursey
f4a86904c4 Merge pull request #2813 from jjhursey/fix/ibm/comm-cleanup
communicator: Fix uninitialized variable
2017-01-26 14:35:32 -06:00
Josh Hursey
2e64bf42fb Merge pull request #2810 from jjhursey/fix/ibm/stdiag-to-stdout
Extend options for stddiag routing
2017-01-26 14:29:16 -06:00
Josh Hursey
770c41f493 Merge pull request #2807 from jjhursey/fix/ibm/event-external
libevent/external: Add opal_event_include to this component
2017-01-26 14:26:50 -06:00
Josh Hursey
ebc90f926e Merge pull request #2806 from jjhursey/fix/ibm/aint-diff-type
Fix a minor error at MPI_AINT_DIFF.
2017-01-26 14:23:21 -06:00
Josh Hursey
0408c116eb Merge pull request #2805 from jjhursey/fix/ibm/base-allgatherv
coll/base: Allgatherv MPI_IN_PLACE Bug
2017-01-26 14:21:57 -06:00
Jeff Squyres
b9d7b27cfa Merge pull request #2855 from gpaulsen/fix/ibm/nbc_ireduce_anysrc
Fixing comment only in MPI_IN_PLACE case for ireduce in libnbc.
2017-01-26 11:00:50 -08:00
Geoffrey Paulsen
d2527cff46 Fixing comment only in MPI_IN_PLACE case for ireduce in libnbc.
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2017-01-26 10:58:51 -08:00
Jeff Squyres
2c277a66fd Merge pull request #2772 from jjhursey/topic/stacktrace-improv
master: opal/stacktrace improvements
2017-01-26 10:48:41 -08:00
Joshua Hursey
6d98559be9 stacktrace: Add flexibility in stacktrace ouptut
- New MCA option: opal_stacktrace_output
   - Specifies where the stack trace output stream goes.
   - Accepts: none, stdout, stderr, file[:filename]
   - Default filename 'stacktrace'
     - Filename will be `stacktrace.PID`, or if VPID is available,
       then the filename will be `stacktrace.VPID.PID`
 - Update util/stacktrace to allow for different output avenues
   including files. Previously this was hardcoded to 'stderr'.
 - Since opal_backtrace_print needs to be signal safe, passing it a
   FILE object that actually represents a file stream is difficult. This
   is because we cannot open the file in the signal handler using
   `fopen` (not safe), but have to use `open` (safe). Additionally, we
   cannot use `fdopen` to convert the `int fd` to a `FILE *fh` since it
   is also not signal safe.
   - I did not want to break the backtrace.h API so I introduced a new
     rule (documented in `backtrace.c`) that if the `FILE *file`
     argument is `NULL` then look for the `opal_stacktrace_output_fileno`
     variable to tell you which file descriptor to use for output.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-26 11:55:32 -06:00
Joshua Hursey
f8918e37a9 opal/stacktace: Raise the signal after processing
- This prevents us for accidentally masking a signal that was meant to
   terminate the application.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-26 11:55:28 -06:00