1
1
Граф коммитов

71 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
4603852740 orterun: use consistent CLI option name for --bind-to
Since the new binding option is tied to the --cpu-list orterun CLI
option, make the --bind-to option reflect the same name (vs. the
--cpu-set CLI option, which is entirely different).  For example:

    mpirun --bind-to cpu-list:ordered ...

Note that "--bind-to cpulist:ordered" is accepted as a synonym,
because people will be lazy.

Also add some minor updates to the orterun.1in man page for
clarification.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-21 08:22:00 -07:00
Ralph Castain
d2838139e4 Update man and help output for new binding option
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-21 06:36:11 -07:00
Ralph Castain
795140e590 Make use of "instant-on" feature optional
The PMIx support for "instant on" remains experimental, so disable it by default. Provide an MCA param and corresponding command line option to enable it at runtime.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-17 02:42:00 -07:00
Ralph Castain
0434b615b5 Update ORTE to support PMIx v3
This is a point-in-time update that includes support for several new PMIx features, mostly focused on debuggers and "instant on":

* initial prototype support for PMIx-based debuggers. For the moment, this is restricted to using the DVM. Supports direct launch of apps under debugger control, and indirect launch using prun as the intermediate launcher. Includes ability for debuggers to control the environment of both the launcher and the spawned app procs. Work continues on completing support for indirect launch

* IO forwarding for tools. Output of apps launched under tool control is directed to the tool and output there - includes support for XML formatting and output to files. Stdin can be forwarded from the tool to apps, but this hasn't been implemented in ORTE yet.

* Fabric integration for "instant on". Enable collection of network "blobs" to be delivered to network libraries on compute nodes prior to local proc spawn. Infrastructure is in place - implementation will come later.

* Harvesting and forwarding of envars. Enable network plugins to harvest envars and include them in the launch msg for setting the environment prior to local proc spawn. Currently, only OmniPath is supported. PMIx MCA params control which envars are included, and also allows envars to be excluded.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-02 02:00:31 -08:00
Scott Miller
d7e594fcff Fix PATH and LD_LIBRARY_PATH prefixing to use first app context value for ORTE_APP_PREFIX_DIR
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
2018-02-28 18:41:47 -05:00
Ralph Castain
af07b3df89 Update help and man pages for output-filename
Warn that relative path will be converted to absolute path, meaning that the file system on remote nodes must be the same as on the node where mpirun is executed.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-13 15:33:33 -08:00
Ralph Castain
fe9b584c05 Fully support OMPI spawn options. Fix a bug in the round-robin mappers where we weren't adding nodes to the job map node array, and so resources were not released
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 285d8cfef74ffc899e9c51e1d9c597b7fb2ceb89)
2017-09-21 10:29:27 -07:00
Joshua Hursey
e1d079544b mca: Dynamic components link against project lib
* Resolves #3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-24 11:56:16 -04:00
Ralph Castain
bd4a6fee22 Attempt to detect when we are direct-launched without the necessary PMI support, and thus are incorrectly identified as being "singleton". Advise the user on the required PMI(x) support and error out.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-29 15:26:53 -07:00
Ralph Castain
9178219e6b Deregister event handlers only on final call to finalize. Ensure we pass PMIx mca params
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-28 15:00:43 -07:00
Ralph Castain
8afa1433b8 Only set the "bound" flag if we wre actually bound
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-14 13:22:01 -07:00
Ralph Castain
321abfc8c6 Fix cwd and preload-binary options
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-30 14:07:22 -07:00
Nathaniel Graham
01312b2f90 Additional mpirun --help changes
This commit recategorizes several mpirun arguments,
and moves the information for mpirun --help arguments
to the bottom of the general help message.  I also
added the OPAL_CMD_LINE_OTYPE field to two commands
that were missed initially because they were not
in the same area as the others.

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2017-04-19 11:43:45 -06:00
Nathaniel Graham
19e5d15491 mpirun --help output revamp
This commit modifies the output from the mpirun --help
command.  The options have been split into groups, to
make the output smaller and more readable.  The groups
are: general, debug, output, input, mapping, ranking,
binding, devel, compatibility, launch, dvm, and
unsupported. There is also a special "full" command
that can be used to get the old behaviour of printing
out all of the options.  Unsupported options may only
be seen with this full output.

This commit also adds a special case for the help
argument.  It makes it possible for the user to
enter 0 or 1 arguments instead of having to always
enter an argument.  This defaults to printing out
the "general" help options so the user can then
see what help arguments there are.

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2017-04-04 10:59:32 -06:00
Ralph Castain
35f817911e Fix coverity issues
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-24 08:09:46 -07:00
Ralph Castain
d645557fa0 Update to include the PMIx 2.0 APIs for monitoring and job control. Include required integration, but leave the monitors off for now. Move the sensor framework out of ORTE as it is being absorbed into PMIx
Fix typo and silence warnings

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 17:47:08 -07:00
Ralph Castain
70591bf4dc Enable parallel fork/exec of local procs by providing the option of multiple odls progress threads
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 20:48:04 -08:00
Ralph Castain
c6bc3ccb76 Sync to latest PMIx master and PMIx reference server
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 12:50:38 -08:00
Ralph Castain
48fc339718 Create an alternative mapping method that pushes responsibility
onto the backend daemons. By default, let mpirun only pack the app_context
info and send that to the backend daemons where the mapping will
be done. This significantly reduces the computational time on mpirun as it isn't
running up/down the topology tree computing thousands of binding
locations, and it reduces the launch message to a very small number of
bytes.

When running -novm, fall back to the old way of doing things
where mpirun computes the entire map and binding, and then sends
the full info to the backend daemon.

Add a new cmd line option/mca param --fwd-mpirun-port that allows
mpirun to dynamically select a port, but then passes that back to
all the other daemons so they will use that port as a static port
for their own wireup. In this mode, we no longer "phone home" directly
to mpirun, but instead use the static port to wireup at daemon
start. We then use the routing tree to rollup the initial
launch report, and limit the number of open sockets on mpirun's node.

Update ras simulator to track the new nidmap code

Cleanup some bugs in the nidmap regex code, and enhance the error message for not enough slots to include the host on which the problem is found.

Update gadget platform file

Initialize the range count when starting a new range

Fix the no-np case in managed allocation

Ensure DVM node usage gets cleaned up after each job

Update scaling.pl script to use --fwd-mpirun-port. Pre-connect the daemon to its parent during launch while we are otherwise waiting for the daemon's children to send their "phone home" rollup messages

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-07 20:43:12 -08:00
Thomas Naughton
006be92df5 dvm: Add envvar 'ORTE_HNP_DVM_URI' to schizo:ompi
Add ability to pass DVM URI purely via environment
to simplify invocation from command-line (e.g., start dvm,
export URI, mpirun w/o needing to add `--hnp` arg).
If user passes both envvar *and* cmdline, the cmdline wins.

Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
2017-02-24 16:55:32 -05:00
Nathan Hjelm
1df6bdd30e schizo/alps: set orte_bound_at_launch when launched with aprun
Set the orte_bound_at_launch MCA variable. This resolves a launch
performance bug when using aprun to launch Open MPI processes. If
this variable is not set it can take minutes longer to launch with
high ppn.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-14 11:13:48 -07:00
Ralph Castain
ef86707fbe Deprecate the --slot-list paramaeter in favor of --cpu-list. Remove the --cpu-set param (mark it as deprecated) and use --cpu-list instead as it was confusing having the two params. The --cpu-list param defines the cpus to be used by procs of this job, and the binding policy will be overlayed on top of it.
Note: since the discovered cpus are filtered against this list, #slots will be set to the #cpus in the list if no slot values are given in a -host or -hostname specification.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-24 13:33:22 -08:00
Ralph Castain
368684bd63 Revert e9bc293 and try a different approach for scalably dealing with hetero clusters. Have each orted send back its topo "signature". If mpirun detects that this signature has not been seen before, then ask for that daemon to send back its full topology description. This allows the system to only get the topology once for each unique topo in the cluster.
Cleanup a typo, and remove no longer needed MCA params for hetero nodes and hetero apps. Hetero nodes will always be automatically detected. We don't support a mix of 32 and 64 bit apps

Modify the orte_node_t to use orte_topology_t instead of hwloc_topology_t, updating all the places that use it. Ensure that we properly update topology when we see a different one on a compute node.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-18 10:22:15 -08:00
Ralph Castain
9eab9a1ed3 Remove stale global variables
Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers.

Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation).

Begin first cut at memory profiler

Some minor cleanups of memprobe

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-02 14:04:24 -08:00
Ralph Castain
269753f5c1 Transfer back changes from debugger attach work
Silence warning

Remove debug

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-17 10:00:52 -08:00
Ralph Castain
215d6290e0 Add a flux component for LLNL
Fine tuning of flux component
Fix a few minor issues with the initial cut:
* Job id could be obtained from the PMI kvsname like SLURM,
  but simpler to getenv (FLUX_JOB_ID)
* Flux pmi-1 doesn't define PMI_BOOL, PMI_TRUE, PMI_FALSE
* Flux pmi-1 maps the deprecated PMI_Get_kvs_domain_id() to
  PMI_KVS_Get_my_name() internally, so just call that instead.
* Drop residual slurm references.

Add wrappers for PMI functions so that if HAVE_FLUX_PMI_LIBRARY
is not defined, the component can dlopen libpmi.so at location
specified by the FLUX_PMI_LIBRARY_PATH env variable, which adds
flexibility.  If HAVE_FLUX_PMI_LIBRARY is defined, link with
libpmi.so at build time in the usual way.

Update configury for flux component

Update m4 so the configure options work as follows:

 --with-flux-pmi
      Build Flux PMI support (default: yes)

 --with-flux-pmi-library
      Link Flux PMI support with PMI library at build
      time. Otherwise the library is opened at runtime at
      location specified by FLUX_PMI_LIBRARY_PATH environment
      variable. Use this option to enable Flux support when
      building statically or without dlopen support (default: no)

If the latter option is provided, the library/header is located at
build time using the pkg-config module 'flux-pmi'.  Otherwise there
is no library/header dependency.

Handle the case where ompi is configured with --disable-dlopen
or --enable-statkc.  In those cases, don't build the component
unless --with-flux-pmi-library is provided.

It is fatal if the user explicitly requests --with-flux-pmi but
it cannot be built (e.g. due to --disable-dlopen).

Add a schizo/flux component

Update schizo/flux component

Eliminate slurm-specific usage cases.

Since the module is only loaded if FLUX_JOB_ID is set, there are
only two cases to handle:

1) App was launched indirectly through mpirun.  This is not yet
supported with Flux, but hook remains in case this mode is supported
in the future.

2) App was launched directly by Flux, with Flux providing
CPU binding, if any.

Fix up white space in pmix/flux component

Drop non-blocking fence from pmix:flux component

The flux PMI-1 library is not thread safe, therefore
register a regular blocking fence callback instead of the
thread-shifting fencenb().

pmix/flux component avoids extra PMI_KVS_Gets

Keys stored into the base cache under the wildcard
rank are not intended to be part of the global key namespace.
These keys therefore should not trigger a PMI_KVS_Get() if they
are not found in the cache.

Minor pmix/flux component cleanup

pmix/flux: drop code for fetching unused pmix_id

pmix/flux: err_exit must return error

Problem: in flux_init(), although 'ret' (variable holding
err_exit return code) is initialized to OPAL_ERROR, the
variable is reused as a temporary result code, so if there are
some successes followed by a failure that doesn't set 'ret',
flux_init() could return success with PMI not initialized.

Ensure that a "goto err_exit" returns OPAL_ERROR if 'ret'
is not set to some other error code.

pmix/flux: don't mix OPAL_ and PMI_ return codes

Problem: flux_init() can return both PMI_ and OPAL_ return
codes.  Although OPAL_SUCCESS and PMI_SUCCESS are both defined
as 0, other codes are not compatible.

Ensure that flux_init() consistently uses 'rc' for PMI_
return codes and 'ret' for OPAL_ return codes.

pmix/flux: factor out repeated code for cache put

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-16 18:26:38 -08:00
Ralph Castain
d5fd635efe Bring forward the debugger-related changes
Refs https://github.com/open-mpi/ompi/pull/2425

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-29 13:15:20 -08:00
Howard Pritchard
2cbc0e8472 pmix/cray: fix disable-dlopen problem
PR open-mpi/ompi#2432 introduced a regression where configure
and build with --disable-dlopn caused build failure owing
to unresolved alps lli symbols in the libopal-pal shared library.

This commit fixes this problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-11-21 13:45:10 -06:00
Ralph Castain
9c6c2fa61d Bring the v2.0.x debugger patch up to the master branch
Ensure the personality gets set as specified by user, or defaults to
"ompi"

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-18 12:45:45 -08:00
Ralph Castain
57114a09ae Pickup the npernode and npersocket options and include them in the job object 2016-10-17 12:26:21 -07:00
Gregory M. Kurtzer
16794cc260 Updates to support Singularity containers v2.2 2016-09-15 09:52:06 -07:00
Artem Polyakov
81195ab724 Several fixes related to session directories:
* enable OMPI to retrieve paths from RM through PMIx
* cleanups related to tempdirs.
2016-09-05 07:48:44 +03:00
Artem Polyakov
55ac3b0be3 orte/schizo: fix binding detection in slurm component
in SLURM 16.05 the SLURM_CPU_BIND_TYPE is equal to "mask_cpu:"
instead of "mask_cpu". Account for that.
2016-08-26 09:55:52 +03:00
Ralph Castain
ae2af61ee3 Update the session dir structure. Restore the creation of a top-level dir based on userid so that everything is contained under the user's top-level dir. Make the next level down (the "job family" level) be either the pid (indicated by a name of "pid.N") or the job family if not launched by mpirun. This allows for proper rendezvous by direct-launched procs. 2016-08-15 22:46:46 -05:00
Ralph Castain
20a91c2baf Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program.
Cleanup debug message
2016-07-13 15:28:33 -07:00
Ralph Castain
ee56d9dc1a Shorten the session directory name as some OS's are now providing unusually long temp directory names, causing us to overflow the sockaddr field 2016-07-05 14:59:50 -07:00
Ralph Castain
a6e6c37484 Remove stale map-reduce support 2016-06-12 07:41:57 -07:00
Ralph Castain
3913595e10 Enable simulation of large-scale clusters by allowing multiple daemons/node. Specifying the ras_base_multiplier parameter to be greater than 1 will cause ORTE to replicate each allocated node by that factor. A daemon will be spawned for each replica, thus letting ORTE function as if it were on a much larger cluster.
Note that this cannot be used for MPI performance testing. It is really only useful for ORTE scaling tests. It also only works with the rsh/ssh launcher.
2016-05-29 18:56:18 -07:00
Ralph Castain
ebe159acef Add a timeout cmd line option and an option to report state info upon timeout to assist with debugging Jenkins tests
If requested, obtain stacktraces for each application process and report it to stderr upon timeout

stack traces: minor improvements

- Also include the hostname and PID of the each process for which
  we're sending the stack traces (vs. just including the ORTE process
  name)
- Send a specific error message if we couldn't find "gstack" in the
  $PATH (e.g., on OS X)
- Send a sepcific error message if gstack fails to run
- Print a message that obtaining the stack traces may take a few
  seconds so that users don't wonder what's happening

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

help-orterun.txt: minor tweaks

Trivial update: show "--timeout" (instead of "-timeout") in the help
message, just to encourage the use of double-dash options.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

trivial: stacktrace -> stack trace

Trivial word smything.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-28 08:36:25 -07:00
George Bosilca
50b37758d4 Don't overwrite the function argument.
In a MPMD setup the app in the jdata can be NULL, so make sure we
don't leave the main argument to an inconsistent value.
2016-05-19 10:35:23 -04:00
Ralph Castain
7e5ef6a240 Fix the env_list support - the MCA param was being set way too early, so provide a "backdoor" way of providing the value 2016-05-06 15:38:39 -07:00
Ralph Castain
58dd41facf Repair the processing of cmd line options that mapped to MCA params. This was responsible for breaking things like map-by <foo>.
Remove debug, let orterun send terminate cmd to DVM

Recover the DVM support
2016-05-06 13:14:03 -07:00
Ralph Castain
6ac7929bd0 Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need.
Cleanups per @jjhursey review
2016-05-01 11:30:25 -07:00
Ralph Castain
8c14df2328 Revert "Modify singularity support per patch from Greg Kurtzer"
This reverts commit open-mpi/ompi@f7257a8310.

Ensure that we properly cleanup the session directory tree. Prior code had issues with symlinks, especially if the file that the link points to was already removed as we traverse the tree. Also found that the dirent checks for directory type weren't fully portable, and so fall back to the stat-based approach which is known to be portable.

Fix singularity singletons by detecting we are in a container and properly setting the pmix selection to pick the isolated component. Remove a stale restriction blocking use of the sm btl
2016-03-24 11:27:18 -07:00
Ralph Castain
f7257a8310 Modify singularity support per patch from Greg Kurtzer 2016-03-09 07:52:11 -08:00
Ralph Castain
4d0cc27eb7 Update the singularity support to match that of the latest singularity master. Remove the restriction on shared memory components by instructing singularity to not isolate the PID space. Add a new schizo API to allow setting up the original app_context. Ensure the container is installed prior to execution. 2016-03-05 21:47:42 -08:00
Ralph Castain
ce0a05d7d1 Minor cleanup - Singularity now has an internal check for installed, so we no longer need to do so. 2016-03-04 19:07:53 -08:00
Ralph Castain
c9f7bb6751 Add the include file to all the schizo components 2016-03-01 13:18:23 -08:00
Ralph Castain
625083fe18 Add include file 2016-03-01 13:04:20 -08:00
Ralph Castain
011403c04a Fix a number of issues, some of which have lingered for a long time:
* provide a more reliable way of determining that a process is a singleton by leveraging the schizo framework. Add new components for slurm, alps, and orte to detect when we are in a managed environment, and if we have been launched by mpirun or a native launcher. Set the correct envars to control ess and pmix selection in each case.

* change the relative priority of the pmix120 and pmix112 components to make pmix120 the default

* fix singleton comm-spawn by correctly setting the num_apps field of the orte_job_t created by the daemon - this fixes a segfault in register_nspace on newly created daemons

* ensure orterun doesn't propagate any ess or pmix directives in its environment

* Cleanup a few valgrind issues and memory leaks

* Fix a race condition that prevented the client from completing notification registrations (missing thread shift)

* Ensure the shizo/alps component detects launch by mpirun
2016-03-01 06:53:00 -08:00