1
1
Граф коммитов

26876 Коммитов

Автор SHA1 Сообщение Дата
Alex Mikheev
c63137e1c0 oshmem: sshmem ucx: minor code cleanup
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
132fbd9ae9 oshmem: sshmem: add UCX allocator
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
e038e3f9e0 oshmem: sshmem: code cleaunp
The commit removes unused code and interface function, moves
common code to the base.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:47:59 +02:00
Yossi
fb67c966a8 Merge pull request #2944 from alex-mikheev/topic/pml_ucx_bsend
ompi: pml ucx: add support for the buffered send
2017-02-22 12:21:03 +02:00
Artem Polyakov
717f3fef62 ompi: Avoid unnecessary PMIx lookups when adding procs.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-02-22 16:09:30 +07:00
Nathan Hjelm
60ad9d1817 rcache/base: do not free memory with the vma lock held
This commit makes the vma tree garbage collection list a lifo. This
way we can avoid having to hold any lock when releasing vmas. In
theory this should finally fix the hold-and-wait deadlock detailed
in #1654.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-21 21:04:46 -07:00
Ralph Castain
4ef6563722 Merge pull request #3010 from rhc54/topic/pmixheaders
Ensure that the pmix headers and lib get installed when --with-devel-…
2017-02-21 15:44:12 -08:00
Ralph Castain
8cffdcf127 Ensure that the pmix headers and lib get installed when --with-devel-headers is given so that PMIx applications can be built and executed against the "embedded" PMIx version
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-21 13:46:46 -08:00
Alex Mikheev
b015c8bb48 ompi: pml ucx: add support for the buffered send
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-21 17:19:22 +02:00
Gilles Gouaillardet
4184c01be5 Merge pull request #2393 from bosilca/topic/no_predefined_ddt_refcount
Don't refcount the predefined datatypes.
2017-02-21 09:38:11 +09:00
Mark Santcroos
3895c106a7 Merge pull request #3007 from rhc54/topic/correction
Fix launch_id matching of -hosts
2017-02-20 17:32:22 +01:00
Ralph Castain
22c88f5ab5 Fix launch_id matching of -hosts
Need to check the entire value instead of just the last N digits. Otherwise, "-host 15" will match nid0015, nid0115, and any other launch id ending in 15

It appears strtol can return either a NULL or a zero-length string, so check for both cases

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-20 07:03:53 -08:00
Gilles Gouaillardet
5c64c0bc3b Merge pull request #3008 from ggouaillardet/topic/pmix_f57d9b2953b3da09a892cd69e9e607f15298935a
pmix2x: synchronize to the latest PMIx master
2017-02-20 11:29:35 +09:00
Gilles Gouaillardet
bb2481a84b pmix2x: synchronize to the latest PMIx master
pmix/master@f57d9b2953

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-20 10:45:17 +09:00
Ralph Castain
af7e2cc33b Merge pull request #3004 from jjhursey/topic/oob-tcp-timeout
oob/tcp: Adjust TCP keepalive default values
2017-02-19 14:28:01 -08:00
Ralph Castain
26c366a7c0 Merge pull request #2964 from rhc54/topic/copyright
Protect the embedded libraries when updating copyrights
2017-02-18 08:07:55 -08:00
Ralph Castain
665850ed69 Use regex to define the protected files
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-18 06:45:13 -08:00
Ralph Castain
2f0aec709a Protect the embedded libraries when updating copyrights - we shouldn't be overwriting their copyrights with our own
bot:notest

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-18 06:45:13 -08:00
Mark Santcroos
7762c21c23 Merge pull request #3006 from rhc54/topic/lid
Support -host launch_id
2017-02-18 09:46:24 +01:00
Ralph Castain
bf0f274f06 Allow -host to look for the number of a host when running in a managed environment that supports launch id's. For example, this will allow someone who has been allocated a node of "nid0015" to refer to it with "-host 15".
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-17 18:35:54 -08:00
Ralph Castain
95bfc7b7c6 Merge pull request #2991 from jjhursey/fix/ibm/errmgr-help-msg
orte/errmgr: Improve help message on connection lost
2017-02-17 11:34:18 -08:00
Nathaniel Graham
91810173b3 Merge pull request #2993 from nrgraham23/man_page_update
Update the mpirun man page
2017-02-17 11:49:17 -07:00
Joshua Hursey
df0f8e95cd oob/tcp: Adjust TCP keepalive default values
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-17 11:02:25 -06:00
Ralph Castain
5eb3ebdf6d Merge pull request #3000 from rhc54/topic/configclean
Fix some pmix configuration code
2017-02-16 12:30:12 -08:00
Ralph Castain
f49118eaab Fix some pmix configuration code
Remove stale file reference that caused a check to always fail. Update psm2 function check to new libs

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-16 10:54:47 -08:00
Howard Pritchard
b272f87926 Merge pull request #2968 from hjelmn/pmix_cray
pmix/cray: performance improvements and cleanup
2017-02-16 11:41:59 -07:00
Todd Kordenbrock
ac3c2c5030 Merge pull request #2986 from tkordenbrock/topic/master/implement.osc.noncontig
osc-portals4: add support for noncontiguous datatypes
2017-02-16 07:32:23 -06:00
Ralph Castain
ba420758fe Merge pull request #2994 from rhc54/topic/queue
Take a shot at fixing a segfault
2017-02-15 19:29:12 -08:00
Ralph Castain
201f8571ca Ensure we retain the peer object until we are done with it, then detect that the socket has closed due to a lost connection and cleanly release the message event
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 18:30:55 -08:00
Ralph Castain
ba47f73887 Merge pull request #2992 from rhc54/topic/pe1
Fix binding policy bug and support pe=1 modifier
2017-02-15 17:58:25 -08:00
Ralph Castain
0ae873de5c Fix a bug where we failed to compute #procs for nperXXX directives, thus resulting in an incorrect default binding
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 16:32:24 -08:00
Nathaniel Graham
f9c05bdb03 Update the mpirun man page
This update should fix the mpirun man page so all
mpirun command line options are included, and
mpirun commands that have been removed are no
longer in the man page.  I also fixed some of
the file formatting, and bolding of command
parameters.

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2017-02-15 17:24:28 -07:00
Ralph Castain
223495325d Fix binding policy bug and support pe=1 modifier
Allow someone to specify the "pe=N" modifier to a mapping policy when N=1. This equates to just "bind-to core", but helps people who use a script to set the PE policy. Fix a bug where setting the binding policy left a lingering "if-supported" flag that shouldn't be there.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 14:55:17 -08:00
Todd Kordenbrock
048f757d9f osc-portals4: add support for noncontiguous datatypes
This commit implements onesided operations for noncontiguous
datatypes using two different algorithms.

 * If the result and/or origin datatype is noncontiguous and the
   target datatype is contiguous, then an iovec MD is created for
   the result and origin.  The operation is performed using a
   single Portals4 call (unless it exceeds the max message size).
 * If the target datatype is noncontigous, then an algorithm
   similar to the one in osc-rdma is used to loop over the
   contiguous blocks of each datatype.  The operation is
   performed using multiple Portals4 calls.

This commit ensures that individual operations do not exceed the
max atomic size or the max message size supported by the device.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2017-02-15 16:17:13 -06:00
Joshua Hursey
c452f68495 orte/errmgr: Improve help message on connection lost
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-15 16:36:00 -05:00
Ralph Castain
578d8819cf Merge pull request #2982 from rhc54/topic/threadstop
Cleanup PMIx shutdown
2017-02-15 06:00:55 -08:00
Ralph Castain
9cd7349d7c Instead of completely free'ing the event base, pause the PMIx progress thread before tearing down the infrastructure, and then release the event base at the end of the procedure. This allows any infrastructure objects holding events to delete them prior to free'ing the event base.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 05:02:43 -08:00
Ralph Castain
f7fe2f7189 Merge pull request #2977 from rhc54/topic/spawn
Fix comm_spawn by registering nspace info only when needed
2017-02-15 04:31:54 -08:00
Gilles Gouaillardet
6ff74dde05 Merge pull request #2978 from ggouaillardet/topic/osc_sm_align
osc/sm: fix MPI_Win_allocate_shared() alignment
2017-02-15 16:21:31 +09:00
Gilles Gouaillardet
cd4537193c osc/sm: fix MPI_Win_allocate_shared() alignment
add padding so the memory allocated by MPI_Win_allocate_shared()
is 64 bytes aligned.

Thanks Joseph Schuchart for the bug report

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-15 13:40:48 +09:00
Ralph Castain
68b53e2179 Fix comm_spawn by registering nspace info only when needed - either when we have local procs, or when job-level info is required by connecting jobs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-14 19:47:56 -08:00
Ralph Castain
404fe327be Merge pull request #2973 from rhc54/topic/cleanups
Update to newest PMIx master (includes configuration cleanups). Silence trivial Coverity warning in hwloc base.
2017-02-14 17:38:18 -08:00
Ralph Castain
0c8609ca16 Update to newest PMIx master (includes configuration cleanups). Silence trivial Coverity warning in hwloc base.
Cleanup a race condition segfault during finalize by ensuring the PMIx progress thread is stopped prior to starting to tear down the messaging components

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-14 15:14:00 -08:00
Josh Hursey
0b273c2561 Merge pull request #2808 from jjhursey/fix/ibm/reduce-local-to-coll
coll: Move reduce_local into the coll framework
2017-02-14 15:54:15 -06:00
Ralph Castain
060cc09474 Revert "orte: Fix MPI_Spawn"
This reverts commit 9f7e2098ac.
2017-02-14 13:32:28 -08:00
Josh Hursey
a17b547430 Merge pull request #2957 from jjhursey/topic/ibm/rsh-sigint-fix
plm/rsh: Fix signal handling for rsh launcher
2017-02-14 15:29:00 -06:00
Nathan Hjelm
8562b87ad3 Merge pull request #2967 from hjelmn/auto_bool
mca/base: add new base enumerator (auto_bool)
2017-02-14 12:25:56 -07:00
Nathan Hjelm
5683e7836f Merge pull request #2965 from hjelmn/deprecated_fix
mca/base: fix deprecated variable help message
2017-02-14 12:22:11 -07:00
Nathan Hjelm
1df6bdd30e schizo/alps: set orte_bound_at_launch when launched with aprun
Set the orte_bound_at_launch MCA variable. This resolves a launch
performance bug when using aprun to launch Open MPI processes. If
this variable is not set it can take minutes longer to launch with
high ppn.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-14 11:13:48 -07:00
Nathan Hjelm
3b912ea2a7 pmix/cray: performance improvements and cleanup
Do not use opal_output_verbose inside O(n) loops. This was causing us
to make O(n) calls to snprintf which was greatly slowing launch at
scale.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-14 11:13:10 -07:00