1
1

180 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
8f496b01b7 Try automatically adding local spawn threads to parallelize the fork/exec process to speed up the launch on large SMPs. Harvest the threads after initial spawn to minimize any impact on running jobs.
Change the determination of #spawn threads to be done on basis of #local procs in first job being spawned. Someone can look at an optimization that handles subsequent dynamic spawns that might be larger in size.

Leave the threads running, but blocked, for the life of the daemon, and use them to harvest the local procs as they terminate. This helps short-lived jobs in particular.

Add MCA params to set:
  * max number of spawn threads (default: 4)
  * set a specific number of spawn threads (default: -1, indicating no set number)
  * cutoff - minimum number of local procs before using spawn threads (default: 32)

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-29 19:54:00 -08:00
Artem Polyakov
10d6e90bf5 Revert "plm/rsh: Propagate PMIx prefix to orted's"
This reverts commit 71da0fcbef3e19e2c2f10fcc15da83229112a0e0.
(per https://github.com/open-mpi/ompi/pull/4052).
Refs: https://github.com/open-mpi/ompi/issues/3980

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-08-14 21:37:57 +07:00
Artem Polyakov
71da0fcbef plm/rsh: Propagate PMIx prefix to orted's
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-08-02 08:06:13 +03:00
Ralph Castain
93cf3c7203 Update OPAL and ORTE for thread safety
(I swear, if I look this over one more time, I'll puke)

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 12:30:57 -07:00
Ralph Castain
180809f2ef Do not pass topologies during tree spawn of daemons as there is no way the HNP can know the backend topologies at that point. Any needed topologies will be sent along with the launch_apps command
Do not pass param file MCA params if the user has requested that no param files be read - required when trying to avoid launch time penalties from large numbers of processes reading default param files. The daemon picks them up and passes them along anyway, so it isn't clear what value we gain from having them all read the defaults

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-07 21:14:43 -07:00
Ralph Castain
92c996487c Update how we pass the node regex so we pass _all_ nodes, even those without daemons. This allows the backend daemons to form a complete picture of the allocation. Include info on which nodes have daemons on them, and populate that info on the backend as well.
Set the daemons' state to "running" and mark them as "alive" by default when constructing the nidmap

Get the DVM running again

Fix direct modex by eliminating race condition caused by releasing data while sending it

Up the size limit before compressing

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-03 19:25:15 -07:00
Ralph Castain
52c9e631de Silence Coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-14 07:30:42 -07:00
Ralph Castain
105fb152e1 Silence Coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 08:38:51 -07:00
Ralph Castain
48fc339718 Create an alternative mapping method that pushes responsibility
onto the backend daemons. By default, let mpirun only pack the app_context
info and send that to the backend daemons where the mapping will
be done. This significantly reduces the computational time on mpirun as it isn't
running up/down the topology tree computing thousands of binding
locations, and it reduces the launch message to a very small number of
bytes.

When running -novm, fall back to the old way of doing things
where mpirun computes the entire map and binding, and then sends
the full info to the backend daemon.

Add a new cmd line option/mca param --fwd-mpirun-port that allows
mpirun to dynamically select a port, but then passes that back to
all the other daemons so they will use that port as a static port
for their own wireup. In this mode, we no longer "phone home" directly
to mpirun, but instead use the static port to wireup at daemon
start. We then use the routing tree to rollup the initial
launch report, and limit the number of open sockets on mpirun's node.

Update ras simulator to track the new nidmap code

Cleanup some bugs in the nidmap regex code, and enhance the error message for not enough slots to include the host on which the problem is found.

Update gadget platform file

Initialize the range count when starting a new range

Fix the no-np case in managed allocation

Ensure DVM node usage gets cleaned up after each job

Update scaling.pl script to use --fwd-mpirun-port. Pre-connect the daemon to its parent during launch while we are otherwise waiting for the daemon's children to send their "phone home" rollup messages

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-07 20:43:12 -08:00
Josh Hursey
a17b547430 Merge pull request #2957 from jjhursey/topic/ibm/rsh-sigint-fix
plm/rsh: Fix signal handling for rsh launcher
2017-02-14 15:29:00 -06:00
Joshua Hursey
843fcca03c plm/rsh: Fix signal handling for rsh launcher
* Similar to the other launchers (i.e., slurm, alps) we need to put the
   children in a separate process group to prevent SIGINT (from a CTRL-C)
   from being delivered to the whole process group and prematurely
   killing the rsh/ssh connections to the remote daemons.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-14 08:54:17 -06:00
Ralph Castain
dee2d8646d Fix plm/rsh runtime check
Fix the check for rsh/ssh so we allow the check for SGE and LoadLeveler to occur if user doesn't specify their own launch agent. Fix a Coverity warning

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-13 16:54:03 -08:00
Ralph Castain
b59ae14a2a Fix static port and partial allocation operations
Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message.

Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-28 10:09:44 -08:00
Ralph Castain
7c795f4416 If the HNP is going to request topology info, it cannot do so via a routed OOB message as the intervening daemons may not be ready. So disable routing until the VM is ready, and have daemons start routing as they receive the xcast launch msg (which includes the data they need to talk to their peers).
Do a little optimization and minimize recomputation of the routing plan.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 15:37:16 -08:00
Ralph Castain
d672fad849 Repair rsh/ssh tree spawn
Repair rsh/ssh tree spawn by unpacking and updating the nidmap in remote_spawn.

Add more specific error messages so the cause of a messaging problem is a little clearer. Remove some stale code. Ensure we stop trying to send a message after a few times.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 11:35:00 -08:00
Ralph Castain
86ab751c5e Next step in reducing launch time: begin reducing the size of the launch message itself. Start by expressing the daemon map as a set of three regular expression strings. On an 8k cluster, this reduces the nidmap contribution from over 200kBytes to 21 bytes in size.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-23 19:54:47 -08:00
Gilles Gouaillardet
6b9343a966 plm/rsh: plug a memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Ralph Castain
649301a3a2 Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier).
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
2016-10-23 21:52:39 -07:00
Gilles Gouaillardet
1846c2d8ad plm/rsh: use an alternate port if the ORTE_NODE_PORT attribute is set 2016-10-19 16:18:52 +09:00
Ralph Castain
de7b1494d9 Clean out old cruft from the ORCM project 2016-09-21 00:13:30 -07:00
Gilles Gouaillardet
c09899f6af plm: plus resource leaks
as reported by Coverity with CIDs 72274 and 1196733
2016-09-07 10:08:44 +09:00
Ralph Castain
9f43db7303 Further cleanup getpwuid usage - try it first (unless completely disabled), and then silently failover to try other methods. 2016-08-15 07:51:36 -07:00
Ralph Castain
5717b75b45 Restore the rsh template creation code 2016-08-12 12:43:40 -07:00
Ralph Castain
0140ff048d Now that we have an "isolated" PLM component, we cannot just let rsh silently decline to run when it cannot find a launch agent - if we do, then we will -always- run on the local node. So if the user specifies a launch agent and we can't find it, then generate a pretty error message, report a fatal error back to the component select, and exit out.
This required modifying the mca_component_select function to actually check the return code on a component query - it was blissfully ignoring it.

Also do a little cleanup to avoid bombarding the user with multiple error messages.

Thanks to Patrick Begou for reporting the problem
2015-09-24 07:16:48 -07:00
Ralph Castain
0b1d4b62be Cleanup some cruft and update to coordinate with CM operations:
* don't pass --tree-spawn to the orted cmd line. If someone doesn't want tree-spawn, it shows up as an MCA param anyway
* ensure state/orted component disqualifies itself from CM operations
* clarify the DVM proc_type definitions
* ensure we stop littering the tmp dir with session directories
2015-08-12 10:32:14 -07:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Ralph Castain
c21cd1c91e Ensure the ssh session is dead 2015-05-23 08:14:29 -07:00
Ralph Castain
920562d9b4 Ensure that all ssh sessions are terminated when abnormally terminating the job 2015-05-23 08:14:29 -07:00
Ralph Castain
8e3f0b1d33 Ensure the --tree-spawn option is inside any parens from the sh and ksh shell support 2015-05-06 15:18:15 -07:00
Ralph Castain
34b53ac3dc Silence Coverity warnings 2015-04-18 07:48:22 -07:00
Ralph Castain
12bfb27161 Redo in cleaner form: Per request from Andy Rieb, add ability to pass PATH and LD_LIBRARY_PATH elements to ssh command 2015-04-17 16:11:37 -07:00
Ralph Castain
d9c555b547 Revert "Per request from Andy Rieb, add ability to pass PATH and LD_LIBRARY_PATH elements to ssh command"
This reverts commit open-mpi/ompi@278324c52a.

Revert "Add the ability to pass args to the rsh/ssh command line"

This reverts commit open-mpi/ompi@6f227f8564.
2015-04-16 08:03:14 -06:00
Ralph Castain
278324c52a Per request from Andy Rieb, add ability to pass PATH and LD_LIBRARY_PATH elements to ssh command 2015-04-15 20:30:04 -06:00
Ralph Castain
6f227f8564 Add the ability to pass args to the rsh/ssh command line 2015-04-15 20:07:13 -06:00
Gilles Gouaillardet
7de3f35b90 pml/rsh: fix misc memory leaks
as reported by Coverity with CIDs 71091, 71230, 71231, 72274, 72389,
1196718 and 1196719
2015-03-05 20:03:37 +09:00
Ralph Castain
894acb0aa8 configury: new OPAL_SET_MCA_PREFIX/ORTE_SET_MCA_CMD_LINE_ID macros
These two macros set the MCA prefix and MCA cmd line id,
   respectively.  Specifically, MCA parameters will be named
   PREFIX<foo> in the environment, and the cmd line will use
   -ID foo bar.

   These macros must be called during configure.ac and a value
   supplied. In the case of Open MPI, the values given are
   PREFIX=OMPI_MCA_ and ID=mca.

   Other projects (such as ORCM) will call these macros with
   their own unique values.  For example, ORCM uses PREFIX=ORCM_MCA_
   and ID=omca

   This scheme is necessary to allow running Open MPI applications under
   systems that use their own versions of ORTE and OPAL.  For example,
   when running OMPI applications under ORCM, we need the MCA params passed
   to the ORCM daemons to be separated from those recognized by the OMPI application.
2014-10-22 18:57:40 -07:00
Ralph Castain
b6aa691e0a Fix incorrect implementation of new MCA param mca_base_env_list - it was not picking up envars and forwarding them, but only worked if you explicitly set a value for the envar. Ensure it works for both direct and indirect launch modes. Remove stale code as this replaced orte_forward_envars. Ensure it doesn't get passed to the ORTE daemons. 2014-10-16 12:58:56 -07:00
Jeff Squyres
e95ed94a94 plm_rsh_module.c: output to the framework output
Trivial fix from r32686: don't output to stream 0, but rather to
orte_plm_base_framework.framework_output (this is the way it was
before r32686).  In reality, this is going to end up being stream 0,
anyway, but we might as well be pedantically correct...

Refs trac:4897.

This commit was SVN r32726.

The following SVN revision numbers were found above:
  r32686 --> open-mpi/ompi@4df1aa63f7

The following Trac tickets were found above:
  Ticket 4897 --> https://svn.open-mpi.org/trac/ompi/ticket/4897
2014-09-13 00:46:35 +00:00
Ralph Castain
4df1aa63f7 Since we've run into the situation where someone puts a script wrapper around a launcher such as srun, we need to always protect MCA cmd line params with quotes. This means we also need to protect the backend from quotes coming into the system as part of a value, or else the parser gets confused.
So add a new function for wrapping MCA arguments, and tell the backend parser to ignore/remove leading/trailing quotes.

cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32686.
2014-09-08 20:38:46 +00:00
Ralph Castain
039b7acfb5 Fix the quoting algorithm so only rsh command lines get quoted values
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32586.
2014-08-22 22:47:38 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Jeff Squyres
1551339eba rsh: revert part of r32517: keep the quoting
As part of reviewing CMR #4860, I talked through r32517 with Ralph.

In attempt to fix various rsh quoting problems, r32517 removed all the
quoting from the main code path and then only added it back in at the
end in some cases.

This commit puts back the quoting parts that were removed in r32517
(r32517 fixed 2 other important bugs: a) change "--<foo>" to "--mca
<foo_equivalent> 1" so that de-duplication works, and b) change a !=
to ==).

refs trac:4860

This commit was SVN r32524.

The following SVN revision numbers were found above:
  r32517 --> open-mpi/ompi@7342bce58f

The following Trac tickets were found above:
  Ticket 4860 --> https://svn.open-mpi.org/trac/ompi/ticket/4860
2014-08-13 19:27:10 +00:00
Ralph Castain
7342bce58f Cleanup the over-aggressive quoting of params on the orted cmd line. Remove duplicates caused by passing on both cmd line shortcuts and the mca param version of the same thing.
Fixes trac:4857

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32517.

The following Trac tickets were found above:
  Ticket 4857 --> https://svn.open-mpi.org/trac/ompi/ticket/4857
2014-08-13 03:51:04 +00:00
Ralph Castain
0cad281a92 Single-word cmd line values for orted are dealt with in orte_plm_base_orted_append_basic_args, so protect against special characters there. Have the rsh module only deal with multi-word arguments as those were skipped by orte_plm_base_orted_append_basic_args.
Refs trac:4802

This commit was SVN r32293.

The following Trac tickets were found above:
  Ticket 4802 --> https://svn.open-mpi.org/trac/ompi/ticket/4802
2014-07-23 17:06:51 +00:00
Ralph Castain
a94a97bd50 Cleanup the passing of MCA params on the orted cmd line in ssh by ensuring that we quote all values since they could be multi-word and/or contain special characters. Thanks to Dirk Schubert for pointing it out.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32280.
2014-07-22 18:22:06 +00:00
Ralph Castain
5216bd5558 Multiple sigchld reports can occur within a single event callback, so have to reap them until none remain. Also, need to ensure the daemon is flagged as alive prior to calling wait_cb
Refs trac:4717

This commit was SVN r32020.

The following Trac tickets were found above:
  Ticket 4717 --> https://svn.open-mpi.org/trac/ompi/ticket/4717
2014-06-17 18:46:40 +00:00
Ralph Castain
42bf7466fc This isn't as big a change as it appears - a change in one place caused a whole bunch of files to require updated #include's due to some arcane linkage. Rework the orte_wait code to reflect the introduction of the state machine. If we are in cleanup mode and just want to kill all our local children, then there is no reason to be polite about it as that introduces *very* long delays at scale. Just kill the procs and move on.
Refs trac:4717

This commit was SVN r32019.

The following Trac tickets were found above:
  Ticket 4717 --> https://svn.open-mpi.org/trac/ompi/ticket/4717
2014-06-17 17:57:51 +00:00
Ralph Castain
f1978fba7c Cleanup a set of typos on the orte_get_attribute call
This commit was SVN r31942.
2014-06-03 20:36:38 +00:00
Ralph Castain
5668f085a3 Silence some useless warnings, and fix a missed updated in the tm plm
This commit was SVN r31930.
2014-06-02 17:57:56 +00:00