Open MPI Team
dba106ee10
pmix nightly tarball: only save 7 days
...
We don't have infinite disk space: only save 7 days of builds, not 28.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-20 19:16:43 +00:00
Open MPI Team
96a90ffab3
remove-old.pl: update / fix minor bugs
...
- Ensure that $to_delete is always defined
- Re-indent to 4 spaces for readability
- Don't only delete files -- it's ok to delete directories, too
- Print the directory from which we are deleting
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-20 19:16:43 +00:00
Open MPI Team
e642d1d91c
nightly tarball: put the SSH target in a variable
...
Just to make the scripts a little less error-prone. Also split up the
ssh/scp lines just for readability.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-20 19:16:43 +00:00
Jeff Squyres
328b654626
snapshot: fix hash comparison
...
- Don't use "-i" CLI option to perl; it's unnecessary here and causes
a warning
- Branch names may not be entirely letters (e.g., "v1.11"), so take
any character in the regexp to match the branch name
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-20 13:26:42 -05:00
Ralph Castain
cfce565ce9
Merge pull request #2763 from naughtont3/tjn-ortedvm-daemonize
...
dvm: add daemonize and set-sid options
2017-01-20 08:08:21 -08:00
Thomas Naughton
39d335a277
dvm: add daemonize and set-sid options
...
Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
2017-01-20 09:28:26 -05:00
Ralph Castain
33d97b22bc
Merge pull request #2766 from rhc54/topic/zlib
...
Compress the xcast message if bigger than a defined size to further improve launch performance at scale
2017-01-19 23:14:04 -08:00
Ralph Castain
668421b6ec
Compress the xcast message if bigger than a defined size to further improve launch performance at scale
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-19 22:08:02 -08:00
Ralph Castain
37ee823a0f
Merge pull request #2765 from rhc54/topic/bypass
...
Allow parallel processing of launch msg while relaying
2017-01-19 20:23:28 -08:00
Ralph Castain
1f46e48b94
Have mpirun and orteds activate the oob/tcp progress thread by default, leaving a way to turn it off via MCA param. Provide a method by which the add_procs command can be processed in parallel with relaying the cmd message to the next daemons down the tree.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-19 18:52:58 -08:00
Ralph Castain
bb132f6d03
Merge pull request #2764 from rhc54/topic/dvm
...
If a tool sees the HNP it is attached to die (thereby losing connecti…
2017-01-19 15:39:30 -08:00
Ralph Castain
ca50b31de1
Merge pull request #2762 from rhc54/topic/oobfast
...
Speed-up the OOB/TCP communications by using writev instead of writing the header, and then separately write the body
2017-01-19 15:39:06 -08:00
Ralph Castain
63caeba84d
Merge pull request #2747 from rhc54/topic/topo
...
Try a different approach for scalably dealing with hetero clusters
2017-01-19 14:22:36 -08:00
Ralph Castain
19bb64cfb8
If a tool sees the HNP it is attached to die (thereby losing connection), then stop the event loop instead of going through the abort code path. This will allow the tool to cleanup before exiting
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-19 14:04:06 -08:00
Mark Santcroos
cbb28f372a
Merge pull request #2760 from marksantcroos/topic/python_bindings
...
Expose opal_set_using_threads in python bindings
2017-01-19 22:26:38 +01:00
Ralph Castain
e5f687f896
Speed-up the OOB/TCP communications by using writev instead of writing the header, and then separately write the body
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-19 13:03:44 -08:00
Ralph Castain
16dc2e8c79
Merge pull request #2759 from ggouaillardet/topic/dirpath_create
...
opal/util: fix a race condition in opal_os_dirpath_create()
2017-01-19 10:40:37 -08:00
Mark Santcroos
656bdcfc54
Expose opal_set_using_threads and improve error message on missing ompi_info.
...
Signed-off-by: Mark Santcroos <mark.santcroos@rutgers.edu>
2017-01-19 07:57:58 -05:00
Gilles Gouaillardet
dffaad9de2
opal/util: fix a race condition in opal_os_dirpath_create()
...
always check the permissions of the created directory,
in case some one else created the very same directory but
with incompatible permissions
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-19 14:02:47 +09:00
Ralph Castain
6da4dbbb33
Quick fix: save the errno from the mkdir call as the call to stat will likely overwrite it
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-18 15:42:31 -08:00
Ralph Castain
d1880d8ba1
Merge pull request #2755 from rhc54/topic/session
...
Update and cleanup os_dirpath
2017-01-18 13:57:44 -08:00
Ralph Castain
b257c32d2c
Cleanup the os_dirpath logic so it doesn't error out if the directory actually gets created (regardless of what mkdir returns), and pretty-prints the error if it does error out.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-18 12:05:47 -08:00
Gilles Gouaillardet
a3f21fb2aa
opal_os_dirpath_create: fix TOCTOU
...
as reported by Coverity with CID 70396
(cherry picked from commit 58d1b3f4d0
)
2017-01-18 11:48:30 -08:00
Ralph Castain
368684bd63
Revert e9bc293
and try a different approach for scalably dealing with hetero clusters. Have each orted send back its topo "signature". If mpirun detects that this signature has not been seen before, then ask for that daemon to send back its full topology description. This allows the system to only get the topology once for each unique topo in the cluster.
...
Cleanup a typo, and remove no longer needed MCA params for hetero nodes and hetero apps. Hetero nodes will always be automatically detected. We don't support a mix of 32 and 64 bit apps
Modify the orte_node_t to use orte_topology_t instead of hwloc_topology_t, updating all the places that use it. Ensure that we properly update topology when we see a different one on a compute node.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-18 10:22:15 -08:00
Jeff Squyres
ec51ba3133
Merge pull request #2745 from jsquyres/pr/oshmem-deprecated-names
...
oshmem: add some deprecated names in shmem.h.in
2017-01-18 10:55:17 -05:00
George Bosilca
999d4973a9
Fix an issue with extremely large data identified by tjb900.
...
Due to the conversion from ssize_t to int we were losing bytes, and
ended up writing outside the receiver buffer. Similarly on the send,
due to the conversion to a lesser type, we could missinterpret the
end of the fragment.
2017-01-18 10:33:12 -05:00
Jeff Squyres
e79e478447
oshmem: add some deprecated names in shmem.h.in
...
Per https://github.com/openshmem-org/tests-uh/issues/17 , add some
deprecated constant names that we didn't previously support in Open
MPI.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-17 16:13:38 -08:00
Ralph Castain
c8768e3dab
Merge pull request #2740 from rhc54/topic/hnp
...
Add an MCA param "hnp_on_smgmt_node"
2017-01-17 05:49:51 -08:00
Ralph Castain
817e0fff82
Merge pull request #2739 from rhc54/topic/wrappers
...
Add some missing qualifiers to the wrapper compilers for -lopen-rte and -lopen-pal
2017-01-17 05:49:35 -08:00
Ralph Castain
e9bc2934be
Add an MCA param "hnp_on_smgmt_node" that mpirun can use to tell the orteds to ignore its topology signature as mpirun is executing on a system mgmt node, and hence a different topology than the compute nodes
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-16 19:32:01 -08:00
Ralph Castain
2fb9e7cc2b
Add some missing qualifiers to the wrapper compilers for -lopen-rte and -lopen-pal
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-16 19:24:16 -08:00
Ralph Castain
568b58af75
Merge pull request #2738 from rhc54/topic/cancel
...
Cancel the waitpid callback once the waitpid on a process has fired to avoid multiple notifications
2017-01-16 15:53:53 -08:00
Ralph Castain
74a285be83
Cancel the waitpid callback once the waitpid on a process has fired to avoid multiple notifications
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-16 14:32:02 -08:00
Mike Dubman
bd6da46821
Merge pull request #2736 from alex-mikheev/topic/memheap_init_fix
...
oshmem: memheap: refactor component selection code
2017-01-16 16:37:52 +02:00
Alex Mikheev
83c2ab76a5
oshmem: memheap: refactor component selection code
...
Do not call component's init function until the component has been
selected.
Use mca_base_select() instead of the custom component selection code.
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-16 13:48:58 +02:00
Ralph Castain
9e8c7d6295
Silence Coverity warning
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-15 07:51:37 -08:00
Ralph Castain
6b34cc67d6
Correct typo
...
Fixes #2691
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-15 07:48:31 -08:00
Ralph Castain
d9b30e429f
Merge pull request #2735 from rhc54/topic/sigh
...
Plug fd leaks
2017-01-14 18:28:32 -08:00
Ralph Castain
3a157f0496
One more time - we "push" IOF for stdout, stderr, and stddiag with separate calls. However, we were creating the sinks for all three of them each time, which caused them to leak. Create the sinks only once for each channel.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-14 17:40:36 -08:00
Ralph Castain
d9fc88c2c7
Merge pull request #2733 from rhc54/topic/dvm
...
Cleanup DVM leaks
2017-01-12 21:29:54 -08:00
Ralph Castain
5c87fc10dc
Update the mpirun man page to accurately reflect the use of -host
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-12 20:28:53 -08:00
Ralph Castain
b55c03255a
Strange - I had created a new IOF API "complete" for cleaning up at the end of jobs, but somehow the implementation is missing. It also appears that the orted's never actually cleaned up their job-related information. These things are fine for normal mpirun-based operations, but cause significant resource leaks for the DVM.
...
Complete the implementation and seal the leaks
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-12 19:54:18 -08:00
Ralph Castain
6a092a41c2
Merge pull request #2732 from rhc54/topic/fudge
...
Missed one spot - plug fd leaks in orteds
2017-01-12 16:28:33 -08:00
Ralph Castain
0e2df3be3e
Missed one spot - plug fd leaks in orteds
...
Fixes #2691
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-12 13:45:46 -08:00
Ralph Castain
9ad02b5d13
Merge pull request #2718 from rhc54/topic/leaks
...
Don't remove the IOF framework's tracking info for a proc until the state machine tells it to do so.
2017-01-12 09:57:17 -08:00
Nathan Hjelm
110840fc87
ess/hnp: add support for forwarding additional signals ( #2712 )
...
* ess/hnp: add support for forwarding additional signals
This commit adds support to the hnp ess module to forward additional
signals beyond the default SIGUSR1, SIGUSR2, SIGSTP, and SIGCONT.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* Generalize this a bit to allow a broader range of signals to be forwarded. Turns out that SIGURG is now a "standard" signal, though the value differs across systems. So setup to forward it (and some friends) if they are defined. Allow users to provide the signal name (instead of the integer value) as the value of even the more common signals does vary across systems. Don't limit the number that can be supported.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
* ess/hnp: fix some bugs in the signal forwarding code
This commit fixes two bugs:
- signals_set needs to be set even if no signals are being
forwarded. If it is not set we will SEGV in libevent if
ess_hnp_forward_signals == none.
- SIGTERM and SIGHUP are handled with a different type of handler. Do
not allow the user to specify these to be forwarded.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* We are sure to get "dinged" if error messages aren't nicely output via show_help, so do so here
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-12 10:09:41 -07:00
Nathan Hjelm
91c34c8df6
Merge pull request #2703 from hjelmn/rcache_fix
...
rcache/base: do not release vma stuctures in vma_tree_delete
2017-01-12 09:53:34 -07:00
Ralph Castain
fa419d3c0d
Don't remove the IOF framework's tracking info for a proc until the
...
state machine tells it to do so. This plugs leaked file descriptors as
we were losing track prior to destructing the resources.
Fixes #2691
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-12 08:34:29 -08:00
Nysal Jan K A
16ca8c18c6
Merge pull request #2706 from nysal/ppc_atomic_master
...
asm/ppc: Fix a regression in powerpc atomics
2017-01-12 19:43:33 +05:30
Jeff Squyres
938ab01ad6
Merge pull request #2714 from hjelmn/timer_rollover
...
timer/linux: prevent 64-bit overflow
2017-01-12 06:40:52 -05:00