1
1
Граф коммитов

752 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
16c8931ec9
Daemonize the orteds during tree-spawn
Somehow, the line of code that actually added the daemonize option to
the orted cmd line was removed.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-30 15:48:27 -07:00
Joshua Hursey
05d003b109
plm/rsh: Fix segv on missing agent.
* Additionally, fixes the `NULL` option to `OMPI_MCA_plm_rsh_agent`
   would would also lead to a segv. Now it operates as intended by
   disqualifying the `rsh` component and falling back onto the `isolated`
   component.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 62d0058738)
2020-01-27 10:34:28 -06:00
Scott Miller
8eae54fd27 plm/rsh: Add chdir option to change directory before orted exec
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
(cherry picked from commit c1b8599528)

Conflicts:
	orte/mca/plm/rsh/plm_rsh_module.c
2019-10-29 15:49:41 -04:00
Joshua Hursey
4c1160e257 Fix tree spawn routed component issue
* Fix #6618
   - See comments on Issue #6618 for finer details.
 * The `plm/rsh` component uses the highest priority `routed` component
   to construct the launch tree. The remote orted's will activate all
   available `routed` components when updating routes. This allows the
   opportunity for the parent vpid on the remote `orted` to not match
   that which was expected in the tree launch. The result is that the
   remote orted tries to contact their parent with the wrong contact
   information and orted wireup will fail.
 * This fix forces the orteds to use the same `routed` component as
   the HNP used when contructing the tree, if tree launch is enabled.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-08-29 16:26:43 -04:00
Jordan Hayes
e00d0abe56 plm_slurm_module: adjust for new SLURM CLI options
SLURM 19 discontinued the use of --cpu_bind (and changed it to
--cpu-bind).  There's no easy way to test at run time which one is
accepted, so set the environment variable SLURM_CPU_BIND to "none",
which should do the same thing as the srun CLI parameter.

Signed-off-by: Jordan Hayes <jhayes@ucr.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 7dad74032e)
2019-05-16 09:13:28 -07:00
Gilles Gouaillardet
a05456ab5e orte: only set the ORTE_NODE_ALIAS attribute when required
When there is no alias for a given node, do not set the
ORTE_NODE_ALIAS attribute to an empty string any more.

Thanks Erico for reporting this issue.
Thanks Ralph for the guidance.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-04-25 11:43:46 +09:00
Jeff Squyres
5394845ce6 plm/slurm: slightly improve verbose warning message
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-04-16 17:04:12 -07:00
Nathan Hjelm
664ba32435 plm/base: fix typo in variable name
An incorrectly named variable caused all pml variables to disappear
from ompi_info. This commit fixes the typo. We may add some logic into
the MCA base to catch these sorts of things in the future.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-04-10 17:53:16 -06:00
Ralph Castain
322f6c5056 Fix a breakage in the ranking system
While it may be faster to reverse the order of the assignment loops, it also results in the wrong answer

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-25 15:55:56 -07:00
Boris Karasev
6afc7099a0 plm/base: fixed the hosts filtering
Reseting the `ORTE_NODE_FLAG_MAPPED` flag after hosts filtering, this
flag is used subsequently and can be affect to the node mapping logic

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-03-23 09:41:16 +03:00
Ralph Castain
0434b615b5 Update ORTE to support PMIx v3
This is a point-in-time update that includes support for several new PMIx features, mostly focused on debuggers and "instant on":

* initial prototype support for PMIx-based debuggers. For the moment, this is restricted to using the DVM. Supports direct launch of apps under debugger control, and indirect launch using prun as the intermediate launcher. Includes ability for debuggers to control the environment of both the launcher and the spawned app procs. Work continues on completing support for indirect launch

* IO forwarding for tools. Output of apps launched under tool control is directed to the tool and output there - includes support for XML formatting and output to files. Stdin can be forwarded from the tool to apps, but this hasn't been implemented in ORTE yet.

* Fabric integration for "instant on". Enable collection of network "blobs" to be delivered to network libraries on compute nodes prior to local proc spawn. Infrastructure is in place - implementation will come later.

* Harvesting and forwarding of envars. Enable network plugins to harvest envars and include them in the launch msg for setting the environment prior to local proc spawn. Currently, only OmniPath is supported. PMIx MCA params control which envars are included, and also allows envars to be excluded.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-02 02:00:31 -08:00
Artem Polyakov
7333f128f6
Merge pull request #4815 from artpol84/slurm/plm_fix
plm/slurm:
2018-02-15 12:45:43 -08:00
Gilles Gouaillardet
dd24c746dc output-filename: cleanup obsolete code.
Since output-filename has been moved to a per-job attribute,
remove the orte_output_filename global variable, and stop passing
this option to orted.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-02-15 10:40:44 +09:00
Artem Polyakov
ab8bb4b0a3 plm/slurm:
Sync command line output for Slurm with RSH launcher.
Currently Slurm launch cmdline will only be visible in debug mode, while for RSH
it is enabled always.
cmdline makes sense for troubleshooting and should be enabled.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2018-02-15 03:09:18 +07:00
Ralph Castain
e9cd7fd7e6 Update orte
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:53:43 -08:00
Ralph Castain
75eb56522c Continue resolving add_host behavior
Fix a problem in packing/unpacking job updates. There remains a race condition that causes messages to attempt to be sent to the second new daemon before it is completely ready. Not entirely sure where it is coming from.

Refs #4665

Rebase to master. Reset orte_nidmap_communicated if hosts are added. Check for duplicate hostnames in an add_host command. Turn off tree_spawn for dynamic launch of additional daemons.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-15 08:21:01 -08:00
Ralph Castain
4cd7f3b202 Convert nidmap to regx framework
Handle the need for different regex generator/parsers by moving the
orte/util/nidmap and orte/util/regex code into a new "regx" framework.
Use the original code to complete a "fwd" component, and create a
scaffold for IBM's "reverse" component.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-10 20:28:21 -08:00
Ralph Castain
e2bc941f1e Silence some warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-05 11:28:20 -08:00
Gilles Gouaillardet
03da5218ea orte: remove some dead code related to the new tree_spawn method
Now that the daemon calls remote_spawn itself, there is no longer
a need for the "tree_spawn" command nor the associated command
processing code since the HNP is no longer sending a tree-spawn
message to the orted.

Thanks Ralph for the guidance !

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-04 09:35:17 +09:00
Gilles Gouaillardet
4527584840 orted: fix tree-spawn when the node regex is too long
When the node regex is too long to be sent on the command line,
retrieve  it first from the parent, and then spawn the remote orted

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-04 09:33:46 +09:00
Gilles Gouaillardet
799152e7fb plm/base: add the orte_plm_base_node_regex_threshold MCA parameter
This parameter can be used to set the node regex max length that can
be passed to the orted command line.
For testing purpose, it can be set to zero in order to force the node regex
being retrieved by orted from its parent.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-04 09:33:46 +09:00
Gilles Gouaillardet
c4cd12bc43 plm/rsh: fix parameter handling in rsh_wait_daemon()
since open-mpi/ompi@8f496b01b7
rsh_wait_daemon is invoked with an orte_wait_tracker_t *,
that must be used to reach the orte_plm_rsh_caddy_t *.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-04 09:33:46 +09:00
Ralph Castain
7a58f91ab9 Fix the tree-spawn-with-rollup
Somehow, the code for passing a daemon's parent was accidentally removed, thus breaking the tree-spawn callback sequence and causing all daemons to phone directly home. Note that this is noticeably slower than no-tree-spawn for small clusters where directly ssh launch of the child daemons from the HNP doesn't overload the available file descriptors.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-15 16:03:43 -08:00
Ralph Castain
4316213805 Fix add-host support by including the location for procs of prior jobs when spawning new daemons.
Thanks to CalugaruVaxile for the report

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-07 14:48:58 -08:00
Gilles Gouaillardet
8e17127258 plm/alps: fix orte_wait_cb() usage
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-11-30 13:49:22 +09:00
Ralph Castain
8f496b01b7 Try automatically adding local spawn threads to parallelize the fork/exec process to speed up the launch on large SMPs. Harvest the threads after initial spawn to minimize any impact on running jobs.
Change the determination of #spawn threads to be done on basis of #local procs in first job being spawned. Someone can look at an optimization that handles subsequent dynamic spawns that might be larger in size.

Leave the threads running, but blocked, for the life of the daemon, and use them to harvest the local procs as they terminate. This helps short-lived jobs in particular.

Add MCA params to set:
  * max number of spawn threads (default: 4)
  * set a specific number of spawn threads (default: -1, indicating no set number)
  * cutoff - minimum number of local procs before using spawn threads (default: 32)

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-29 19:54:00 -08:00
Matt Ezell
e45761d498 Disable the LSF plm if CSM is detected
LSF running on top of CSM does not provide LSF daemons on the compute nodes.

Signed-off-by: Matt Ezell <ezellma@ornl.gov>
2017-11-02 13:48:46 -04:00
Josh Hursey
252be7ffb0 Merge pull request #4215 from jjhursey/fix/plm-lsf-rc
plm/lsf: Improve error message if lsb_launch fails
2017-09-18 11:14:25 -05:00
Ralph Castain
3c914a7a97 Complete the fix of the ORTE DVM. We will now use "prun" instead of "orterun -hnp foo" to execute jobs. This provides the feature of automatic discovery of the orte-dvm so you don't need to manually enter URI's or contact file locations. All IO is forwarded to prun.
Still in the "needs to be done" category:

* mapping/ranking/binding options aren't correctly supported

* if the DVM encounters some errors (e.g., not enough resources for the job), the resulting error is globally set and impacts any subsequent job submission

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-16 13:13:07 -07:00
Joshua Hursey
89c1aaf646 plm/lsf: Improve error message if lsb_launch fails
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-09-15 09:45:58 -05:00
Joshua Hursey
e1d079544b mca: Dynamic components link against project lib
* Resolves #3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-24 11:56:16 -04:00
Artem Polyakov
10d6e90bf5 Revert "plm/rsh: Propagate PMIx prefix to orted's"
This reverts commit 71da0fcbef.
(per https://github.com/open-mpi/ompi/pull/4052).
Refs: https://github.com/open-mpi/ompi/issues/3980

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-08-14 21:37:57 +07:00
Artem Polyakov
71da0fcbef plm/rsh: Propagate PMIx prefix to orted's
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-08-02 08:06:13 +03:00
Ralph Castain
7a83fdb9bb Update to hwloc 2.0.0a with shmem support.
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 20:26:22 -07:00
Ralph Castain
b225366012 Bring the ofi/rml component online by completing the wireup protocol for the daemons. Cleanup the current confusion over how connection info gets created and
passed to make it all flow thru the opal/pmix "put/get" operations. Update the PMIx code to latest master to pickup some required behaviors.

Remove the no-longer-required get_contact_info and set_contact_info from the RML layer.

Add an MCA param to allow the ofi/rml component to route messages if desired. This is mainly for experimentation at this point as we aren't sure if routing wi
ll be beneficial at large scales. Leave it "off" by default.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-20 21:01:57 -07:00
Gilles Gouaillardet
9f29f3bff4 hwloc: since WHOLE_SYSTEM is no more used, remove useless
checks related to offline and disallowed elements

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-20 17:39:21 +09:00
Ralph Castain
543c16b28d Fix the isolated pmix component. Cleanup the ess/singleton component - we shouldn't be automatically discovering the local topology as that is now done on-demand.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-19 12:14:29 -07:00
Gilles Gouaillardet
823382f5d7 plm/base: do not abort when configure'd with --enable-heterogeneous
and a mix of BE/LE is detected

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-07 10:43:54 +09:00
Ralph Castain
2753f53e6d Detect that we have a mix of BE/LE in the system, provide a warning that OMPI doesn't currently support this environment, and error out
Fixes #2817

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-03 15:47:05 -07:00
Ralph Castain
8a4565874e Enable ORTE to continue running when a node fails - user takes responsibility for zombies. Minor cleanup to orte-clean
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-27 09:05:26 -07:00
Ralph Castain
f4411c4393 Enable use of OFI fabrics for launch and other collective operations. Update the PMIx repo to the latest master to get the required support for the server to "push" modex info, and to retrieve all its own "modex" values for sending back to mpirun. Have mpirun cache them in its local modex hash as OFI goes point-to-point direct and doesn't route - so the remote daemons don't need a copy of this connection info.
Remove the opal_ignore from the RML/OFI component, but disable that component unless the user specifically requests it via the "rml_ofi_desired=1" MCA param. This will let us test compile in various environments without interfering with operations while we continue to debug

Fix an error when computing the number of infos during server init

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-23 19:57:21 -07:00
Ralph Castain
1f0f03b45b Print a better error message when srun isn't found in the path. Ensure we don't segfault if -host specifies a node not included in the allocation
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-09 07:46:47 -07:00
Ralph Castain
93cf3c7203 Update OPAL and ORTE for thread safety
(I swear, if I look this over one more time, I'll puke)

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 12:30:57 -07:00
Ralph Castain
657e701c65 Add debug verbosity to the orte data server and pmix pub/lookup functions
Start updating the various mappers to the new procedure. Remove the stale lama component as it is now very out-of-date. Bring round_robin and PPR online, and modify the mindist component (but cannot test/debug it).

Remove unneeded test

Fix memory corruption by re-initializing variable to NULL in loop

Resolve the race condition identified by @ggouaillardet by resetting the
mapped flag within the same event where it was set. There is no need to
retain the flag beyond that point as it isn't used again.

Add a new job attribute ORTE_JOB_FULLY_DESCRIBED to indicate that all the job information (including locations and binding) is included in the launch message. Thus, the backend daemons do not need to do any map computation for the job. Use this for the seq, rankfile, and mindist mappers until someone decides to update them.

Note that this will maintain functionality, but means that users of those three mappers will see large launch messages and less performant scaling than those using the other mappers.

Have the mindist module add procs to the job's proc array as it is a fully described module

Protect the hnp-not-in-allocation case

Per path suggested by Gilles - protect the HNP node when it gets added in the absence of any other allocation or hostfile

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-25 18:41:27 -07:00
Ralph Castain
29e083bffd Fix total_slots_allocated computation
On unmanaged allocations, we need to update the total_slots_allocated once the daemons have been launched and "discovered" their topology

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-12 08:21:52 -07:00
Ralph Castain
180809f2ef Do not pass topologies during tree spawn of daemons as there is no way the HNP can know the backend topologies at that point. Any needed topologies will be sent along with the launch_apps command
Do not pass param file MCA params if the user has requested that no param files be read - required when trying to avoid launch time penalties from large numbers of processes reading default param files. The daemon picks them up and passes them along anyway, so it isn't clear what value we gain from having them all read the defaults

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-07 21:14:43 -07:00
Ralph Castain
a143800bce Enable full operations under SLURM on Cray systems by co-locating a daemon with mpirun when mpirun is executing on a compute node in that environment. This allows local application procs to inherit their security credential from the daemon as it will have been launched via SLURM
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-06 19:08:50 -07:00
Gilles Gouaillardet
57b4144e57 orte: use compression for ORTE_DAEMON_REPORT_TOPOLOGY_CMD answer
Refs open-mpi/ompi#3414

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-27 17:21:59 +09:00
Gilles Gouaillardet
49cd40b2df compress the topology sent by the first orted
Refs open-mpi/ompi#3414

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-27 16:20:11 +09:00
Ralph Castain
97e38e6d84 Move a free to a little later in case the verbose output needs it
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-11 11:21:12 -07:00