1
1
Граф коммитов

26455 Коммитов

Автор SHA1 Сообщение Дата
Edgar Gabriel
d3a8d38cc6 common/ompio: correctly position shared fp in append mode
Fixes a bug reported on the mailing list. ompio did only reposition the individual
file pointer when the file was opened in append mode. Set the shared file
pointer also to point to the end of the file, similarly to the individual
file pointer.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-01-23 08:59:05 -06:00
Ralph Castain
0b4648b3a7 Merge pull request #2779 from hjelmn/oob_param
oob/base: fix num_threads registration type
2017-01-22 14:09:06 -08:00
Nathan Hjelm
954a4b7be3 oob/base: fix num_threads registration type
This commit fixes a bug in the registration of the num_threads MCA
variable. The variable is of type int and was being registered as
a boolean.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-01-22 14:02:34 -07:00
Ralph Castain
c549f82cdc Merge pull request #2778 from rhc54/topic/threads
Ensure that oob/base level data is always accessed in the oob/base event thread. Make debruijn the default routed component
2017-01-22 11:21:34 -08:00
Ralph Castain
ac4fcd3f97 Ensure that oob/base level data is always accessed in the oob/base event thread. Make debruijn the default routed component
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-22 10:33:32 -08:00
Ralph Castain
adbcefebf8 Merge pull request #2777 from rhc54/topic/spawn
Fix comm_spawn and orte-dvm by resetting all used "node mapped" flags after building the child list
2017-01-22 08:07:08 -08:00
Ralph Castain
6560617c04 Fix comm_spawn and orte-dvm by resetting all used "node mapped" flags after building the child list
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-22 05:55:53 -08:00
Ralph Castain
59eafebf66 Merge pull request #2776 from rhc54/topic/fix
Add missing flag set to ensure nodes do not get double-added to job map.
2017-01-21 20:54:37 -08:00
Ralph Castain
639cdd4f9d Add missing flag set to ensure nodes do not get double-added to job map.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-21 20:06:50 -08:00
Ralph Castain
164fc6436d Merge pull request #2775 from rhc54/topic/oob3
More scaling efficiencies
2017-01-21 15:45:57 -08:00
Ralph Castain
e8e5f81abd Something not quite right about the revised allocation algos, so revert them while retaining the larger initial and threshold sizes
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-21 14:37:45 -08:00
Ralph Castain
be3ef77739 Improve packing efficiency by raising the initial buffer size and modifying the extension code. Flag if a job map has had its nodes added so we don't have to loop repeatedly to check it.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-21 14:03:19 -08:00
Ralph Castain
466cbd4d29 Rework the threading in oob/tcp so that daemons (including mpirun) use multiple progress threads to get messages out to their children, and so that the oob/base uses a separate one to setup sends. This allows the daemon cmd processor to execute in parallel with relay of messages, which significantly reduces launch times at scale
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-21 13:26:19 -08:00
Ralph Castain
917b88a2d5 Merge pull request #2771 from rhc54/topic/zlib
Check for zlib.h
2017-01-20 13:44:43 -08:00
Ralph Castain
08b5fe46db Check for zlib.h
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-20 11:55:11 -08:00
Open MPI Team
dba106ee10 pmix nightly tarball: only save 7 days
We don't have infinite disk space: only save 7 days of builds, not 28.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-20 19:16:43 +00:00
Open MPI Team
96a90ffab3 remove-old.pl: update / fix minor bugs
- Ensure that $to_delete is always defined
- Re-indent to 4 spaces for readability
- Don't only delete files -- it's ok to delete directories, too
- Print the directory from which we are deleting

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-20 19:16:43 +00:00
Open MPI Team
e642d1d91c nightly tarball: put the SSH target in a variable
Just to make the scripts a little less error-prone.  Also split up the
ssh/scp lines just for readability.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-20 19:16:43 +00:00
Jeff Squyres
328b654626 snapshot: fix hash comparison
- Don't use "-i" CLI option to perl; it's unnecessary here and causes
  a warning
- Branch names may not be entirely letters (e.g., "v1.11"), so take
  any character in the regexp to match the branch name

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-20 13:26:42 -05:00
Ralph Castain
cfce565ce9 Merge pull request #2763 from naughtont3/tjn-ortedvm-daemonize
dvm: add daemonize and set-sid options
2017-01-20 08:08:21 -08:00
Thomas Naughton
39d335a277 dvm: add daemonize and set-sid options
Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
2017-01-20 09:28:26 -05:00
Ralph Castain
33d97b22bc Merge pull request #2766 from rhc54/topic/zlib
Compress the xcast message if bigger than a defined size to further improve launch performance at scale
2017-01-19 23:14:04 -08:00
Ralph Castain
668421b6ec Compress the xcast message if bigger than a defined size to further improve launch performance at scale
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-19 22:08:02 -08:00
Ralph Castain
37ee823a0f Merge pull request #2765 from rhc54/topic/bypass
Allow parallel processing of launch msg while relaying
2017-01-19 20:23:28 -08:00
Ralph Castain
1f46e48b94 Have mpirun and orteds activate the oob/tcp progress thread by default, leaving a way to turn it off via MCA param. Provide a method by which the add_procs command can be processed in parallel with relaying the cmd message to the next daemons down the tree.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-19 18:52:58 -08:00
Ralph Castain
bb132f6d03 Merge pull request #2764 from rhc54/topic/dvm
If a tool sees the HNP it is attached to die (thereby losing connecti…
2017-01-19 15:39:30 -08:00
Ralph Castain
ca50b31de1 Merge pull request #2762 from rhc54/topic/oobfast
Speed-up the OOB/TCP communications by using writev instead of writing the header, and then separately write the body
2017-01-19 15:39:06 -08:00
Ralph Castain
63caeba84d Merge pull request #2747 from rhc54/topic/topo
Try a different approach for scalably dealing with hetero clusters
2017-01-19 14:22:36 -08:00
Ralph Castain
19bb64cfb8 If a tool sees the HNP it is attached to die (thereby losing connection), then stop the event loop instead of going through the abort code path. This will allow the tool to cleanup before exiting
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-19 14:04:06 -08:00
Mark Santcroos
cbb28f372a Merge pull request #2760 from marksantcroos/topic/python_bindings
Expose opal_set_using_threads in python bindings
2017-01-19 22:26:38 +01:00
Ralph Castain
e5f687f896 Speed-up the OOB/TCP communications by using writev instead of writing the header, and then separately write the body
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-19 13:03:44 -08:00
Ralph Castain
16dc2e8c79 Merge pull request #2759 from ggouaillardet/topic/dirpath_create
opal/util: fix a race condition in opal_os_dirpath_create()
2017-01-19 10:40:37 -08:00
Mark Santcroos
656bdcfc54 Expose opal_set_using_threads and improve error message on missing ompi_info.
Signed-off-by: Mark Santcroos <mark.santcroos@rutgers.edu>
2017-01-19 07:57:58 -05:00
Gilles Gouaillardet
dffaad9de2 opal/util: fix a race condition in opal_os_dirpath_create()
always check the permissions of the created directory,
in case some one else created the very same directory but
with incompatible permissions

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-19 14:02:47 +09:00
Ralph Castain
6da4dbbb33 Quick fix: save the errno from the mkdir call as the call to stat will likely overwrite it
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-18 15:42:31 -08:00
Ralph Castain
d1880d8ba1 Merge pull request #2755 from rhc54/topic/session
Update and cleanup os_dirpath
2017-01-18 13:57:44 -08:00
Ralph Castain
b257c32d2c Cleanup the os_dirpath logic so it doesn't error out if the directory actually gets created (regardless of what mkdir returns), and pretty-prints the error if it does error out.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-18 12:05:47 -08:00
Gilles Gouaillardet
a3f21fb2aa opal_os_dirpath_create: fix TOCTOU
as reported by Coverity with CID 70396

(cherry picked from commit 58d1b3f4d0)
2017-01-18 11:48:30 -08:00
Ralph Castain
368684bd63 Revert e9bc293 and try a different approach for scalably dealing with hetero clusters. Have each orted send back its topo "signature". If mpirun detects that this signature has not been seen before, then ask for that daemon to send back its full topology description. This allows the system to only get the topology once for each unique topo in the cluster.
Cleanup a typo, and remove no longer needed MCA params for hetero nodes and hetero apps. Hetero nodes will always be automatically detected. We don't support a mix of 32 and 64 bit apps

Modify the orte_node_t to use orte_topology_t instead of hwloc_topology_t, updating all the places that use it. Ensure that we properly update topology when we see a different one on a compute node.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-18 10:22:15 -08:00
Jeff Squyres
ec51ba3133 Merge pull request #2745 from jsquyres/pr/oshmem-deprecated-names
oshmem: add some deprecated names in shmem.h.in
2017-01-18 10:55:17 -05:00
George Bosilca
999d4973a9
Fix an issue with extremely large data identified by tjb900.
Due to the conversion from ssize_t to int we were losing bytes, and
ended up writing outside the receiver buffer. Similarly on the send,
due to the conversion to a lesser type, we could missinterpret the
end of the fragment.
2017-01-18 10:33:12 -05:00
Jeff Squyres
e79e478447 oshmem: add some deprecated names in shmem.h.in
Per https://github.com/openshmem-org/tests-uh/issues/17, add some
deprecated constant names that we didn't previously support in Open
MPI.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-17 16:13:38 -08:00
Ralph Castain
c8768e3dab Merge pull request #2740 from rhc54/topic/hnp
Add an MCA param "hnp_on_smgmt_node"
2017-01-17 05:49:51 -08:00
Ralph Castain
817e0fff82 Merge pull request #2739 from rhc54/topic/wrappers
Add some missing qualifiers to the wrapper compilers for -lopen-rte and -lopen-pal
2017-01-17 05:49:35 -08:00
Ralph Castain
e9bc2934be Add an MCA param "hnp_on_smgmt_node" that mpirun can use to tell the orteds to ignore its topology signature as mpirun is executing on a system mgmt node, and hence a different topology than the compute nodes
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-16 19:32:01 -08:00
Ralph Castain
2fb9e7cc2b Add some missing qualifiers to the wrapper compilers for -lopen-rte and -lopen-pal
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-16 19:24:16 -08:00
Ralph Castain
568b58af75 Merge pull request #2738 from rhc54/topic/cancel
Cancel the waitpid callback once the waitpid on a process has fired to avoid multiple notifications
2017-01-16 15:53:53 -08:00
Ralph Castain
74a285be83 Cancel the waitpid callback once the waitpid on a process has fired to avoid multiple notifications
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-16 14:32:02 -08:00
Mike Dubman
bd6da46821 Merge pull request #2736 from alex-mikheev/topic/memheap_init_fix
oshmem: memheap: refactor component selection code
2017-01-16 16:37:52 +02:00
Alex Mikheev
83c2ab76a5
oshmem: memheap: refactor component selection code
Do not call component's init function until the component has been
selected.

Use mca_base_select() instead of the custom component selection code.

Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-16 13:48:58 +02:00