Alex Mikheev
c63137e1c0
oshmem: sshmem ucx: minor code cleanup
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
132fbd9ae9
oshmem: sshmem: add UCX allocator
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
e038e3f9e0
oshmem: sshmem: code cleaunp
...
The commit removes unused code and interface function, moves
common code to the base.
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:47:59 +02:00
Yossi
fb67c966a8
Merge pull request #2944 from alex-mikheev/topic/pml_ucx_bsend
...
ompi: pml ucx: add support for the buffered send
2017-02-22 12:21:03 +02:00
Artem Polyakov
717f3fef62
ompi: Avoid unnecessary PMIx lookups when adding procs.
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-02-22 16:09:30 +07:00
Nathan Hjelm
60ad9d1817
rcache/base: do not free memory with the vma lock held
...
This commit makes the vma tree garbage collection list a lifo. This
way we can avoid having to hold any lock when releasing vmas. In
theory this should finally fix the hold-and-wait deadlock detailed
in #1654 .
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-21 21:04:46 -07:00
Ralph Castain
4ef6563722
Merge pull request #3010 from rhc54/topic/pmixheaders
...
Ensure that the pmix headers and lib get installed when --with-devel-…
2017-02-21 15:44:12 -08:00
Ralph Castain
8cffdcf127
Ensure that the pmix headers and lib get installed when --with-devel-headers is given so that PMIx applications can be built and executed against the "embedded" PMIx version
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-21 13:46:46 -08:00
Alex Mikheev
b015c8bb48
ompi: pml ucx: add support for the buffered send
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-21 17:19:22 +02:00
Gilles Gouaillardet
4184c01be5
Merge pull request #2393 from bosilca/topic/no_predefined_ddt_refcount
...
Don't refcount the predefined datatypes.
2017-02-21 09:38:11 +09:00
Mark Santcroos
3895c106a7
Merge pull request #3007 from rhc54/topic/correction
...
Fix launch_id matching of -hosts
2017-02-20 17:32:22 +01:00
Ralph Castain
22c88f5ab5
Fix launch_id matching of -hosts
...
Need to check the entire value instead of just the last N digits. Otherwise, "-host 15" will match nid0015, nid0115, and any other launch id ending in 15
It appears strtol can return either a NULL or a zero-length string, so check for both cases
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-20 07:03:53 -08:00
Gilles Gouaillardet
5c64c0bc3b
Merge pull request #3008 from ggouaillardet/topic/pmix_f57d9b2953b3da09a892cd69e9e607f15298935a
...
pmix2x: synchronize to the latest PMIx master
2017-02-20 11:29:35 +09:00
Gilles Gouaillardet
bb2481a84b
pmix2x: synchronize to the latest PMIx master
...
pmix/master@f57d9b2953
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-20 10:45:17 +09:00
Ralph Castain
af7e2cc33b
Merge pull request #3004 from jjhursey/topic/oob-tcp-timeout
...
oob/tcp: Adjust TCP keepalive default values
2017-02-19 14:28:01 -08:00
Ralph Castain
26c366a7c0
Merge pull request #2964 from rhc54/topic/copyright
...
Protect the embedded libraries when updating copyrights
2017-02-18 08:07:55 -08:00
Ralph Castain
665850ed69
Use regex to define the protected files
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-18 06:45:13 -08:00
Ralph Castain
2f0aec709a
Protect the embedded libraries when updating copyrights - we shouldn't be overwriting their copyrights with our own
...
bot:notest
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-18 06:45:13 -08:00
Mark Santcroos
7762c21c23
Merge pull request #3006 from rhc54/topic/lid
...
Support -host launch_id
2017-02-18 09:46:24 +01:00
Ralph Castain
bf0f274f06
Allow -host to look for the number of a host when running in a managed environment that supports launch id's. For example, this will allow someone who has been allocated a node of "nid0015" to refer to it with "-host 15".
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-17 18:35:54 -08:00
Ralph Castain
95bfc7b7c6
Merge pull request #2991 from jjhursey/fix/ibm/errmgr-help-msg
...
orte/errmgr: Improve help message on connection lost
2017-02-17 11:34:18 -08:00
Nathaniel Graham
91810173b3
Merge pull request #2993 from nrgraham23/man_page_update
...
Update the mpirun man page
2017-02-17 11:49:17 -07:00
Joshua Hursey
df0f8e95cd
oob/tcp: Adjust TCP keepalive default values
...
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-17 11:02:25 -06:00
Ralph Castain
5eb3ebdf6d
Merge pull request #3000 from rhc54/topic/configclean
...
Fix some pmix configuration code
2017-02-16 12:30:12 -08:00
Ralph Castain
f49118eaab
Fix some pmix configuration code
...
Remove stale file reference that caused a check to always fail. Update psm2 function check to new libs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-16 10:54:47 -08:00
Howard Pritchard
b272f87926
Merge pull request #2968 from hjelmn/pmix_cray
...
pmix/cray: performance improvements and cleanup
2017-02-16 11:41:59 -07:00
Todd Kordenbrock
ac3c2c5030
Merge pull request #2986 from tkordenbrock/topic/master/implement.osc.noncontig
...
osc-portals4: add support for noncontiguous datatypes
2017-02-16 07:32:23 -06:00
Ralph Castain
ba420758fe
Merge pull request #2994 from rhc54/topic/queue
...
Take a shot at fixing a segfault
2017-02-15 19:29:12 -08:00
Ralph Castain
201f8571ca
Ensure we retain the peer object until we are done with it, then detect that the socket has closed due to a lost connection and cleanly release the message event
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 18:30:55 -08:00
Ralph Castain
ba47f73887
Merge pull request #2992 from rhc54/topic/pe1
...
Fix binding policy bug and support pe=1 modifier
2017-02-15 17:58:25 -08:00
Ralph Castain
0ae873de5c
Fix a bug where we failed to compute #procs for nperXXX directives, thus resulting in an incorrect default binding
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 16:32:24 -08:00
Nathaniel Graham
f9c05bdb03
Update the mpirun man page
...
This update should fix the mpirun man page so all
mpirun command line options are included, and
mpirun commands that have been removed are no
longer in the man page. I also fixed some of
the file formatting, and bolding of command
parameters.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2017-02-15 17:24:28 -07:00
Ralph Castain
223495325d
Fix binding policy bug and support pe=1 modifier
...
Allow someone to specify the "pe=N" modifier to a mapping policy when N=1. This equates to just "bind-to core", but helps people who use a script to set the PE policy. Fix a bug where setting the binding policy left a lingering "if-supported" flag that shouldn't be there.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 14:55:17 -08:00
Todd Kordenbrock
048f757d9f
osc-portals4: add support for noncontiguous datatypes
...
This commit implements onesided operations for noncontiguous
datatypes using two different algorithms.
* If the result and/or origin datatype is noncontiguous and the
target datatype is contiguous, then an iovec MD is created for
the result and origin. The operation is performed using a
single Portals4 call (unless it exceeds the max message size).
* If the target datatype is noncontigous, then an algorithm
similar to the one in osc-rdma is used to loop over the
contiguous blocks of each datatype. The operation is
performed using multiple Portals4 calls.
This commit ensures that individual operations do not exceed the
max atomic size or the max message size supported by the device.
Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2017-02-15 16:17:13 -06:00
Joshua Hursey
c452f68495
orte/errmgr: Improve help message on connection lost
...
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-15 16:36:00 -05:00
Ralph Castain
578d8819cf
Merge pull request #2982 from rhc54/topic/threadstop
...
Cleanup PMIx shutdown
2017-02-15 06:00:55 -08:00
Ralph Castain
9cd7349d7c
Instead of completely free'ing the event base, pause the PMIx progress thread before tearing down the infrastructure, and then release the event base at the end of the procedure. This allows any infrastructure objects holding events to delete them prior to free'ing the event base.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 05:02:43 -08:00
Ralph Castain
f7fe2f7189
Merge pull request #2977 from rhc54/topic/spawn
...
Fix comm_spawn by registering nspace info only when needed
2017-02-15 04:31:54 -08:00
Gilles Gouaillardet
6ff74dde05
Merge pull request #2978 from ggouaillardet/topic/osc_sm_align
...
osc/sm: fix MPI_Win_allocate_shared() alignment
2017-02-15 16:21:31 +09:00
Gilles Gouaillardet
cd4537193c
osc/sm: fix MPI_Win_allocate_shared() alignment
...
add padding so the memory allocated by MPI_Win_allocate_shared()
is 64 bytes aligned.
Thanks Joseph Schuchart for the bug report
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-02-15 13:40:48 +09:00
Ralph Castain
68b53e2179
Fix comm_spawn by registering nspace info only when needed - either when we have local procs, or when job-level info is required by connecting jobs
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-14 19:47:56 -08:00
Ralph Castain
404fe327be
Merge pull request #2973 from rhc54/topic/cleanups
...
Update to newest PMIx master (includes configuration cleanups). Silence trivial Coverity warning in hwloc base.
2017-02-14 17:38:18 -08:00
Ralph Castain
0c8609ca16
Update to newest PMIx master (includes configuration cleanups). Silence trivial Coverity warning in hwloc base.
...
Cleanup a race condition segfault during finalize by ensuring the PMIx progress thread is stopped prior to starting to tear down the messaging components
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-14 15:14:00 -08:00
Josh Hursey
0b273c2561
Merge pull request #2808 from jjhursey/fix/ibm/reduce-local-to-coll
...
coll: Move reduce_local into the coll framework
2017-02-14 15:54:15 -06:00
Ralph Castain
060cc09474
Revert "orte: Fix MPI_Spawn"
...
This reverts commit 9f7e2098ac
.
2017-02-14 13:32:28 -08:00
Josh Hursey
a17b547430
Merge pull request #2957 from jjhursey/topic/ibm/rsh-sigint-fix
...
plm/rsh: Fix signal handling for rsh launcher
2017-02-14 15:29:00 -06:00
Nathan Hjelm
8562b87ad3
Merge pull request #2967 from hjelmn/auto_bool
...
mca/base: add new base enumerator (auto_bool)
2017-02-14 12:25:56 -07:00
Nathan Hjelm
5683e7836f
Merge pull request #2965 from hjelmn/deprecated_fix
...
mca/base: fix deprecated variable help message
2017-02-14 12:22:11 -07:00
Nathan Hjelm
1df6bdd30e
schizo/alps: set orte_bound_at_launch when launched with aprun
...
Set the orte_bound_at_launch MCA variable. This resolves a launch
performance bug when using aprun to launch Open MPI processes. If
this variable is not set it can take minutes longer to launch with
high ppn.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-14 11:13:48 -07:00
Nathan Hjelm
3b912ea2a7
pmix/cray: performance improvements and cleanup
...
Do not use opal_output_verbose inside O(n) loops. This was causing us
to make O(n) calls to snprintf which was greatly slowing launch at
scale.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-14 11:13:10 -07:00