openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	4603852740	orterun: use consistent CLI option name for --bind-to Since the new binding option is tied to the --cpu-list orterun CLI option, make the --bind-to option reflect the same name (vs. the --cpu-set CLI option, which is entirely different). For example: mpirun --bind-to cpu-list:ordered ... Note that "--bind-to cpulist:ordered" is accepted as a synonym, because people will be lazy. Also add some minor updates to the orterun.1in man page for clarification. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-21 08:22:00 -07:00
Ralph Castain	d2838139e4	Update man and help output for new binding option Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-21 06:36:11 -07:00
Ralph Castain	795140e590	Make use of "instant-on" feature optional The PMIx support for "instant on" remains experimental, so disable it by default. Provide an MCA param and corresponding command line option to enable it at runtime. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-17 02:42:00 -07:00
Ralph Castain	0434b615b5	Update ORTE to support PMIx v3 This is a point-in-time update that includes support for several new PMIx features, mostly focused on debuggers and "instant on": * initial prototype support for PMIx-based debuggers. For the moment, this is restricted to using the DVM. Supports direct launch of apps under debugger control, and indirect launch using prun as the intermediate launcher. Includes ability for debuggers to control the environment of both the launcher and the spawned app procs. Work continues on completing support for indirect launch * IO forwarding for tools. Output of apps launched under tool control is directed to the tool and output there - includes support for XML formatting and output to files. Stdin can be forwarded from the tool to apps, but this hasn't been implemented in ORTE yet. * Fabric integration for "instant on". Enable collection of network "blobs" to be delivered to network libraries on compute nodes prior to local proc spawn. Infrastructure is in place - implementation will come later. * Harvesting and forwarding of envars. Enable network plugins to harvest envars and include them in the launch msg for setting the environment prior to local proc spawn. Currently, only OmniPath is supported. PMIx MCA params control which envars are included, and also allows envars to be excluded. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-03-02 02:00:31 -08:00
Scott Miller	d7e594fcff	Fix PATH and LD_LIBRARY_PATH prefixing to use first app context value for ORTE_APP_PREFIX_DIR Signed-off-by: Scott Miller <scott.miller1@ibm.com>	2018-02-28 18:41:47 -05:00
Ralph Castain	af07b3df89	Update help and man pages for output-filename Warn that relative path will be converted to absolute path, meaning that the file system on remote nodes must be the same as on the node where mpirun is executed. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-02-13 15:33:33 -08:00
Ralph Castain	fe9b584c05	Fully support OMPI spawn options. Fix a bug in the round-robin mappers where we weren't adding nodes to the job map node array, and so resources were not released Signed-off-by: Ralph Castain <rhc@open-mpi.org> (cherry picked from commit 285d8cfef74ffc899e9c51e1d9c597b7fb2ceb89)	2017-09-21 10:29:27 -07:00
Joshua Hursey	e1d079544b	mca: Dynamic components link against project lib * Resolves #3705 * Components should link against the project level library to better support `dlopen` with `RTLD_LOCAL`. * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am` with the appropriate project level library: ``` MCA components in ompi/ $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la MCA components in orte/ $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la MCA components in opal/ $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la MCA components in oshmem/ $(top_builddir)/oshmem/liboshmem.la" ``` Note: The changes in this commit were automated by the script in the commit that proceeds it with the `libadd_mca_comp_update.py` script. Some components were not included in this change because they are statically built only. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-08-24 11:56:16 -04:00
Ralph Castain	bd4a6fee22	Attempt to detect when we are direct-launched without the necessary PMI support, and thus are incorrectly identified as being "singleton". Advise the user on the required PMI(x) support and error out. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-29 15:26:53 -07:00
Ralph Castain	9178219e6b	Deregister event handlers only on final call to finalize. Ensure we pass PMIx mca params Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-28 15:00:43 -07:00
Ralph Castain	8afa1433b8	Only set the "bound" flag if we wre actually bound Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-14 13:22:01 -07:00
Ralph Castain	321abfc8c6	Fix cwd and preload-binary options Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-30 14:07:22 -07:00
Nathaniel Graham	01312b2f90	Additional mpirun --help changes This commit recategorizes several mpirun arguments, and moves the information for mpirun --help arguments to the bottom of the general help message. I also added the OPAL_CMD_LINE_OTYPE field to two commands that were missed initially because they were not in the same area as the others. Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>	2017-04-19 11:43:45 -06:00
Nathaniel Graham	19e5d15491	mpirun --help output revamp This commit modifies the output from the mpirun --help command. The options have been split into groups, to make the output smaller and more readable. The groups are: general, debug, output, input, mapping, ranking, binding, devel, compatibility, launch, dvm, and unsupported. There is also a special "full" command that can be used to get the old behaviour of printing out all of the options. Unsupported options may only be seen with this full output. This commit also adds a special case for the help argument. It makes it possible for the user to enter 0 or 1 arguments instead of having to always enter an argument. This defaults to printing out the "general" help options so the user can then see what help arguments there are. Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>	2017-04-04 10:59:32 -06:00
Ralph Castain	35f817911e	Fix coverity issues Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-24 08:09:46 -07:00
Ralph Castain	d645557fa0	Update to include the PMIx 2.0 APIs for monitoring and job control. Include required integration, but leave the monitors off for now. Move the sensor framework out of ORTE as it is being absorbed into PMIx Fix typo and silence warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-21 17:47:08 -07:00
Ralph Castain	70591bf4dc	Enable parallel fork/exec of local procs by providing the option of multiple odls progress threads Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-11 20:48:04 -08:00
Ralph Castain	c6bc3ccb76	Sync to latest PMIx master and PMIx reference server Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-11 12:50:38 -08:00
Ralph Castain	48fc339718	Create an alternative mapping method that pushes responsibility onto the backend daemons. By default, let mpirun only pack the app_context info and send that to the backend daemons where the mapping will be done. This significantly reduces the computational time on mpirun as it isn't running up/down the topology tree computing thousands of binding locations, and it reduces the launch message to a very small number of bytes. When running -novm, fall back to the old way of doing things where mpirun computes the entire map and binding, and then sends the full info to the backend daemon. Add a new cmd line option/mca param --fwd-mpirun-port that allows mpirun to dynamically select a port, but then passes that back to all the other daemons so they will use that port as a static port for their own wireup. In this mode, we no longer "phone home" directly to mpirun, but instead use the static port to wireup at daemon start. We then use the routing tree to rollup the initial launch report, and limit the number of open sockets on mpirun's node. Update ras simulator to track the new nidmap code Cleanup some bugs in the nidmap regex code, and enhance the error message for not enough slots to include the host on which the problem is found. Update gadget platform file Initialize the range count when starting a new range Fix the no-np case in managed allocation Ensure DVM node usage gets cleaned up after each job Update scaling.pl script to use --fwd-mpirun-port. Pre-connect the daemon to its parent during launch while we are otherwise waiting for the daemon's children to send their "phone home" rollup messages Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-07 20:43:12 -08:00
Thomas Naughton	006be92df5	dvm: Add envvar 'ORTE_HNP_DVM_URI' to schizo:ompi Add ability to pass DVM URI purely via environment to simplify invocation from command-line (e.g., start dvm, export URI, mpirun w/o needing to add `--hnp` arg). If user passes both envvar and cmdline, the cmdline wins. Signed-off-by: Thomas Naughton <naughtont@ornl.gov>	2017-02-24 16:55:32 -05:00
Nathan Hjelm	1df6bdd30e	schizo/alps: set orte_bound_at_launch when launched with aprun Set the orte_bound_at_launch MCA variable. This resolves a launch performance bug when using aprun to launch Open MPI processes. If this variable is not set it can take minutes longer to launch with high ppn. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-14 11:13:48 -07:00
Ralph Castain	ef86707fbe	Deprecate the --slot-list paramaeter in favor of --cpu-list. Remove the --cpu-set param (mark it as deprecated) and use --cpu-list instead as it was confusing having the two params. The --cpu-list param defines the cpus to be used by procs of this job, and the binding policy will be overlayed on top of it. Note: since the discovered cpus are filtered against this list, #slots will be set to the #cpus in the list if no slot values are given in a -host or -hostname specification. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-24 13:33:22 -08:00
Ralph Castain	368684bd63	Revert `e9bc293` and try a different approach for scalably dealing with hetero clusters. Have each orted send back its topo "signature". If mpirun detects that this signature has not been seen before, then ask for that daemon to send back its full topology description. This allows the system to only get the topology once for each unique topo in the cluster. Cleanup a typo, and remove no longer needed MCA params for hetero nodes and hetero apps. Hetero nodes will always be automatically detected. We don't support a mix of 32 and 64 bit apps Modify the orte_node_t to use orte_topology_t instead of hwloc_topology_t, updating all the places that use it. Ensure that we properly update topology when we see a different one on a compute node. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-18 10:22:15 -08:00
Ralph Castain	9eab9a1ed3	Remove stale global variables Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers. Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation). Begin first cut at memory profiler Some minor cleanups of memprobe Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-02 14:04:24 -08:00
Ralph Castain	269753f5c1	Transfer back changes from debugger attach work Silence warning Remove debug Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-17 10:00:52 -08:00
Ralph Castain	215d6290e0	Add a flux component for LLNL Fine tuning of flux component Fix a few minor issues with the initial cut: * Job id could be obtained from the PMI kvsname like SLURM, but simpler to getenv (FLUX_JOB_ID) * Flux pmi-1 doesn't define PMI_BOOL, PMI_TRUE, PMI_FALSE * Flux pmi-1 maps the deprecated PMI_Get_kvs_domain_id() to PMI_KVS_Get_my_name() internally, so just call that instead. * Drop residual slurm references. Add wrappers for PMI functions so that if HAVE_FLUX_PMI_LIBRARY is not defined, the component can dlopen libpmi.so at location specified by the FLUX_PMI_LIBRARY_PATH env variable, which adds flexibility. If HAVE_FLUX_PMI_LIBRARY is defined, link with libpmi.so at build time in the usual way. Update configury for flux component Update m4 so the configure options work as follows: --with-flux-pmi Build Flux PMI support (default: yes) --with-flux-pmi-library Link Flux PMI support with PMI library at build time. Otherwise the library is opened at runtime at location specified by FLUX_PMI_LIBRARY_PATH environment variable. Use this option to enable Flux support when building statically or without dlopen support (default: no) If the latter option is provided, the library/header is located at build time using the pkg-config module 'flux-pmi'. Otherwise there is no library/header dependency. Handle the case where ompi is configured with --disable-dlopen or --enable-statkc. In those cases, don't build the component unless --with-flux-pmi-library is provided. It is fatal if the user explicitly requests --with-flux-pmi but it cannot be built (e.g. due to --disable-dlopen). Add a schizo/flux component Update schizo/flux component Eliminate slurm-specific usage cases. Since the module is only loaded if FLUX_JOB_ID is set, there are only two cases to handle: 1) App was launched indirectly through mpirun. This is not yet supported with Flux, but hook remains in case this mode is supported in the future. 2) App was launched directly by Flux, with Flux providing CPU binding, if any. Fix up white space in pmix/flux component Drop non-blocking fence from pmix:flux component The flux PMI-1 library is not thread safe, therefore register a regular blocking fence callback instead of the thread-shifting fencenb(). pmix/flux component avoids extra PMI_KVS_Gets Keys stored into the base cache under the wildcard rank are not intended to be part of the global key namespace. These keys therefore should not trigger a PMI_KVS_Get() if they are not found in the cache. Minor pmix/flux component cleanup pmix/flux: drop code for fetching unused pmix_id pmix/flux: err_exit must return error Problem: in flux_init(), although 'ret' (variable holding err_exit return code) is initialized to OPAL_ERROR, the variable is reused as a temporary result code, so if there are some successes followed by a failure that doesn't set 'ret', flux_init() could return success with PMI not initialized. Ensure that a "goto err_exit" returns OPAL_ERROR if 'ret' is not set to some other error code. pmix/flux: don't mix OPAL_ and PMI_ return codes Problem: flux_init() can return both PMI_ and OPAL_ return codes. Although OPAL_SUCCESS and PMI_SUCCESS are both defined as 0, other codes are not compatible. Ensure that flux_init() consistently uses 'rc' for PMI_ return codes and 'ret' for OPAL_ return codes. pmix/flux: factor out repeated code for cache put Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-16 18:26:38 -08:00
Ralph Castain	d5fd635efe	Bring forward the debugger-related changes Refs https://github.com/open-mpi/ompi/pull/2425 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-29 13:15:20 -08:00
Howard Pritchard	2cbc0e8472	pmix/cray: fix disable-dlopen problem PR open-mpi/ompi#2432 introduced a regression where configure and build with --disable-dlopn caused build failure owing to unresolved alps lli symbols in the libopal-pal shared library. This commit fixes this problem. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-11-21 13:45:10 -06:00
Ralph Castain	9c6c2fa61d	Bring the v2.0.x debugger patch up to the master branch Ensure the personality gets set as specified by user, or defaults to "ompi" Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-18 12:45:45 -08:00
Ralph Castain	57114a09ae	Pickup the npernode and npersocket options and include them in the job object	2016-10-17 12:26:21 -07:00
Gregory M. Kurtzer	16794cc260	Updates to support Singularity containers v2.2	2016-09-15 09:52:06 -07:00
Artem Polyakov	81195ab724	Several fixes related to session directories: * enable OMPI to retrieve paths from RM through PMIx * cleanups related to tempdirs.	2016-09-05 07:48:44 +03:00
Artem Polyakov	55ac3b0be3	orte/schizo: fix binding detection in slurm component in SLURM 16.05 the SLURM_CPU_BIND_TYPE is equal to "mask_cpu:" instead of "mask_cpu". Account for that.	2016-08-26 09:55:52 +03:00
Ralph Castain	ae2af61ee3	Update the session dir structure. Restore the creation of a top-level dir based on userid so that everything is contained under the user's top-level dir. Make the next level down (the "job family" level) be either the pid (indicated by a name of "pid.N") or the job family if not launched by mpirun. This allows for proper rendezvous by direct-launched procs.	2016-08-15 22:46:46 -05:00
Ralph Castain	20a91c2baf	Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program. Cleanup debug message	2016-07-13 15:28:33 -07:00
Ralph Castain	ee56d9dc1a	Shorten the session directory name as some OS's are now providing unusually long temp directory names, causing us to overflow the sockaddr field	2016-07-05 14:59:50 -07:00
Ralph Castain	a6e6c37484	Remove stale map-reduce support	2016-06-12 07:41:57 -07:00
Ralph Castain	3913595e10	Enable simulation of large-scale clusters by allowing multiple daemons/node. Specifying the ras_base_multiplier parameter to be greater than 1 will cause ORTE to replicate each allocated node by that factor. A daemon will be spawned for each replica, thus letting ORTE function as if it were on a much larger cluster. Note that this cannot be used for MPI performance testing. It is really only useful for ORTE scaling tests. It also only works with the rsh/ssh launcher.	2016-05-29 18:56:18 -07:00
Ralph Castain	ebe159acef	Add a timeout cmd line option and an option to report state info upon timeout to assist with debugging Jenkins tests If requested, obtain stacktraces for each application process and report it to stderr upon timeout stack traces: minor improvements - Also include the hostname and PID of the each process for which we're sending the stack traces (vs. just including the ORTE process name) - Send a specific error message if we couldn't find "gstack" in the $PATH (e.g., on OS X) - Send a sepcific error message if gstack fails to run - Print a message that obtaining the stack traces may take a few seconds so that users don't wonder what's happening Signed-off-by: Jeff Squyres <jsquyres@cisco.com> help-orterun.txt: minor tweaks Trivial update: show "--timeout" (instead of "-timeout") in the help message, just to encourage the use of double-dash options. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> trivial: stacktrace -> stack trace Trivial word smything. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-28 08:36:25 -07:00
George Bosilca	50b37758d4	Don't overwrite the function argument. In a MPMD setup the app in the jdata can be NULL, so make sure we don't leave the main argument to an inconsistent value.	2016-05-19 10:35:23 -04:00
Ralph Castain	7e5ef6a240	Fix the env_list support - the MCA param was being set way too early, so provide a "backdoor" way of providing the value	2016-05-06 15:38:39 -07:00
Ralph Castain	58dd41facf	Repair the processing of cmd line options that mapped to MCA params. This was responsible for breaking things like map-by <foo>. Remove debug, let orterun send terminate cmd to DVM Recover the DVM support	2016-05-06 13:14:03 -07:00
Ralph Castain	6ac7929bd0	Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need. Cleanups per @jjhursey review	2016-05-01 11:30:25 -07:00
Ralph Castain	8c14df2328	Revert "Modify singularity support per patch from Greg Kurtzer" This reverts commit open-mpi/ompi@f7257a8310. Ensure that we properly cleanup the session directory tree. Prior code had issues with symlinks, especially if the file that the link points to was already removed as we traverse the tree. Also found that the dirent checks for directory type weren't fully portable, and so fall back to the stat-based approach which is known to be portable. Fix singularity singletons by detecting we are in a container and properly setting the pmix selection to pick the isolated component. Remove a stale restriction blocking use of the sm btl	2016-03-24 11:27:18 -07:00
Ralph Castain	f7257a8310	Modify singularity support per patch from Greg Kurtzer	2016-03-09 07:52:11 -08:00
Ralph Castain	4d0cc27eb7	Update the singularity support to match that of the latest singularity master. Remove the restriction on shared memory components by instructing singularity to not isolate the PID space. Add a new schizo API to allow setting up the original app_context. Ensure the container is installed prior to execution.	2016-03-05 21:47:42 -08:00
Ralph Castain	ce0a05d7d1	Minor cleanup - Singularity now has an internal check for installed, so we no longer need to do so.	2016-03-04 19:07:53 -08:00
Ralph Castain	c9f7bb6751	Add the include file to all the schizo components	2016-03-01 13:18:23 -08:00
Ralph Castain	625083fe18	Add include file	2016-03-01 13:04:20 -08:00
Ralph Castain	011403c04a	Fix a number of issues, some of which have lingered for a long time: * provide a more reliable way of determining that a process is a singleton by leveraging the schizo framework. Add new components for slurm, alps, and orte to detect when we are in a managed environment, and if we have been launched by mpirun or a native launcher. Set the correct envars to control ess and pmix selection in each case. * change the relative priority of the pmix120 and pmix112 components to make pmix120 the default * fix singleton comm-spawn by correctly setting the num_apps field of the orte_job_t created by the daemon - this fixes a segfault in register_nspace on newly created daemons * ensure orterun doesn't propagate any ess or pmix directives in its environment * Cleanup a few valgrind issues and memory leaks * Fix a race condition that prevented the client from completing notification registrations (missing thread shift) * Ensure the shizo/alps component detects launch by mpirun	2016-03-01 06:53:00 -08:00

1 2

71 Коммитов