openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	8f496b01b7	Try automatically adding local spawn threads to parallelize the fork/exec process to speed up the launch on large SMPs. Harvest the threads after initial spawn to minimize any impact on running jobs. Change the determination of #spawn threads to be done on basis of #local procs in first job being spawned. Someone can look at an optimization that handles subsequent dynamic spawns that might be larger in size. Leave the threads running, but blocked, for the life of the daemon, and use them to harvest the local procs as they terminate. This helps short-lived jobs in particular. Add MCA params to set: * max number of spawn threads (default: 4) * set a specific number of spawn threads (default: -1, indicating no set number) * cutoff - minimum number of local procs before using spawn threads (default: 32) Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-11-29 19:54:00 -08:00
Joshua Hursey	e1d079544b	mca: Dynamic components link against project lib * Resolves #3705 * Components should link against the project level library to better support `dlopen` with `RTLD_LOCAL`. * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am` with the appropriate project level library: ``` MCA components in ompi/ $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la MCA components in orte/ $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la MCA components in opal/ $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la MCA components in oshmem/ $(top_builddir)/oshmem/liboshmem.la" ``` Note: The changes in this commit were automated by the script in the commit that proceeds it with the `libadd_mca_comp_update.py` script. Some components were not included in this change because they are statically built only. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-08-24 11:56:16 -04:00
Mark Allen	552216f9ba	scripted symbol name change (ompi_ prefix) Passed the below set of symbols into a script that added ompi_ to them all. Note that if processing a symbol named "foo" the script turns foo into ompi_foo but doesn't turn foobar into ompi_foobar But beyond that the script is blind to C syntax, so it hits strings and comments etc as well as vars/functions. coll_base_comm_get_reqs comm_allgather_pml comm_allreduce_pml comm_bcast_pml fcoll_base_coll_allgather_array fcoll_base_coll_allgatherv_array fcoll_base_coll_bcast_array fcoll_base_coll_gather_array fcoll_base_coll_gatherv_array fcoll_base_coll_scatterv_array fcoll_base_sort_iovec mpit_big_lock mpit_init_count mpit_lock mpit_unlock netpatterns_base_err netpatterns_base_verbose netpatterns_cleanup_narray_knomial_tree netpatterns_cleanup_recursive_doubling_tree_node netpatterns_cleanup_recursive_knomial_allgather_tree_node netpatterns_cleanup_recursive_knomial_tree_node netpatterns_init netpatterns_register_mca_params netpatterns_setup_multinomial_tree netpatterns_setup_narray_knomial_tree netpatterns_setup_narray_tree netpatterns_setup_narray_tree_contigous_ranks netpatterns_setup_recursive_doubling_n_tree_node netpatterns_setup_recursive_doubling_tree_node netpatterns_setup_recursive_knomial_allgather_tree_node netpatterns_setup_recursive_knomial_tree_node pml_v_output_close pml_v_output_open intercept_extra_state_t odls_base_default_wait_local_proc _event_debug_mode_on _evthread_cond_fns _evthread_id_fn _evthread_lock_debugging_enabled _evthread_lock_fns cmd_line_option_t cmd_line_param_t crs_base_self_checkpoint_fn crs_base_self_continue_fn crs_base_self_restart_fn event_enable_debug_output event_global_current_base_ event_module_include eventops sync_wait_mt trigger_user_inc_callback var_type_names var_type_sizes Signed-off-by: Mark Allen <markalle@us.ibm.com>	2017-07-11 02:13:23 -04:00
Ralph Castain	8a4565874e	Enable ORTE to continue running when a node fails - user takes responsibility for zombies. Minor cleanup to orte-clean Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-27 09:05:26 -07:00
Ralph Castain	952726c121	Update to latest PMIx master - equivalent to 2.0rc2. Update the thread support in the opal/pmix framework to protect the framework-level structures. This now passes the loop test, and so we believe it resolves the random hangs in finalize. Changes in PMIx master that are included here: * Fixed a bug in the PMIx_Get logic * Fixed self-notification procedure * Made pmix_output functions thread safe * Fixed a number of thread safety issues * Updated configury to use 'uname -n' when hostname is unavailable Work on cleaning up the event handler thread safety problem Rarely used functions, but protect them anyway Fix the last part of the intercomm problem Ensure we don't cover any PMIx calls with the framework-level lock. Protect against NULL argv comm_spawn Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-20 09:02:15 -07:00
Ralph Castain	81ab79f311	Ensure the orted doesn't go into an infinite loop during force-terminate Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-07 21:44:49 -07:00
Ralph Castain	919d7fcf49	We cannot use OFI to determine when daemons can finalize as we don't see the "sockets" go away. So always use the OOB for the mgmt conduit - this provides the necessary termination signal AND ensures that IOF and other mgmt messages go solely across TCP. Cleanup the way we look for matching OFI addresses by using the opal_net_samenetwork helper function. This now works for multi-network environments, but only using the socket provider Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-07 13:51:30 -07:00
Ralph Castain	93cf3c7203	Update OPAL and ORTE for thread safety (I swear, if I look this over one more time, I'll puke) Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-06 12:30:57 -07:00
Ralph Castain	2ab4f93f6a	Instead of "forced_terminate" just quietly causing the daemon to disappear, let's at least attempt to let the user know where the problem occurred. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-02 08:28:16 -07:00
Ralph Castain	ef0e0171c9	Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c). Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-08 10:04:50 -07:00
Thomas Naughton	beb5b250bf	orte dvm: debug fix for DVM early quit Ensure that job errors do not cause the DVM to fail unless the failed job is the DVM itself. Refs #2987, with improvements from Ralph Signed-off-by: Thomas Naughton <naughtont@ornl.gov> Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-23 10:17:53 -05:00
Joshua Hursey	c452f68495	orte/errmgr: Improve help message on connection lost Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-15 16:36:00 -05:00
Ralph Castain	b59ae14a2a	Fix static port and partial allocation operations Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message. Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-28 10:09:44 -08:00
Ralph Castain	d672fad849	Repair rsh/ssh tree spawn Repair rsh/ssh tree spawn by unpacking and updating the nidmap in remote_spawn. Add more specific error messages so the cause of a messaging problem is a little clearer. Remove some stale code. Ensure we stop trying to send a message after a few times. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-27 11:35:00 -08:00
Ralph Castain	19bb64cfb8	If a tool sees the HNP it is attached to die (thereby losing connection), then stop the event loop instead of going through the abort code path. This will allow the tool to cleanup before exiting Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-19 14:04:06 -08:00
Ralph Castain	9f69b0183f	Ensure jobs that fail always return a non-zero exit code. Thanks to Ashley Pittman for the report. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-14 09:41:06 -08:00
Ralph Castain	d031946c46	When mpirun operates in --continuous mode, we won't terminate the job when a remote process dies. In that case, we have to activate both the waitpid _and_ the IOF complete states to ensure we properly mark the proc as dead and perform any required notifications Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-10-25 12:18:14 -07:00
Ralph Castain	649301a3a2	Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier). Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.	2016-10-23 21:52:39 -07:00
Ralph Castain	df8ac7b747	Properly mark a node as down and decrease the number of daemons so any subsequent grpcomm collectives can correctly operate. Note that only the direct grpcomm component knows how to deal with down nodes.	2016-10-21 09:53:37 -07:00
Ralph Castain	48d35a9627	Ensure we properly convert pmix status to ORTE state before activating an error state upon notification. Cleanup some conversion issues on notification info. Add a new orte_notify.c test program	2016-08-12 21:14:29 -07:00
Ralph Castain	20a91c2baf	Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program. Cleanup debug message	2016-07-13 15:28:33 -07:00
Ralph Castain	5d330d5220	Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler. Add PMIx 2.0 Remove PMIx 1.1.4 Cleanup copying of component Add missing file Touchup a typo in the Makefile.am Update the pmix ext114 component Minor cleanups and resync to master Update to latest PMIx 2.x Update to the PMIx event notification branch latest changes	2016-06-14 13:08:41 -07:00
Ralph Castain	378d9cbb5e	Extend the abort on non zero status flag to apply to processes which die as the result of signals.	2016-03-24 08:33:55 -07:00
Ralph Castain	4a55fba414	Fix registration of error handlers thru the pmix120 component. A thread-shift operation was hanging on the sync_event_base, which made it dependent on someone calling opal_progress. Unfortunately, a process in "sleep" or spinning outside the MPI library won't do that, and so we never complete errhandler registration.	2016-03-02 15:01:01 -08:00
Ralph Castain	d653cf2847	Convert the orte_job_data pointer array to a hash table so it doesn't grow forever as we run lots and lots of jobs in the persistent DVM.	2016-02-21 11:55:49 -08:00
Ralph Castain	06c3dfc052	Refactor the ORTE DVM code so that external codes can submit multiple jobs using only a single connection to the HNP. * Clean up the DVM so it continues to run even when applications error out and we would ordinarily abort the daemons. * Create a new errmgr component for the DVM to handle the differences. * Cleanup the DVM state component. * Add ORTE bindings directory and brief README * Pass a local tool index around to match jobs. * Pass the jobid on job completion. * Fix initialization logic. * Add framework for python wrapper. * Fix terminate-with-non-zero-exit behavior so it properly terminates only the indicated procs, notifies orte-submit, and orte-dvm continues executing. * Add some missing options to orte-dvm * Fix a bug in -host processing that caused us to ignore the #slots designator. Add a new attribute to indicate "do not expand the DVM" when submitting job spawn requests. * It actually makes no sense that we treat the termination of all children differently than terminating the children of a specific job - it only creates confusion over the difference in behavior. So terminate children the same way regardless. Extend the cmd_line utility to easily allow layering of command line definitions Catch up with ORTE interface change and make build more generic. Disable "fixed dvm" logic for now. Add another cmd_line function to merge a table of cmd line options with another one, reporting as errors any duplicate entries. Use this to allow orterun to reuse the orted_submit code Fix the "fixed_dvm" logic by ensuring we reset num_new_daemons to zero. Also ensure that the nidmap is sent with the first job so the downstream daemons get the node info. Remove a duplicate cmd line entry in orterun. Revise the DVM startup procedure to pass the nidmap only once, at the startup of the DVM. This reduces the overhead on each job launch and ensures that the nidmap doesn't get overwritten. Add new commands to get_orted_comm_cmd_str(). Move ORTE command line options to orte_globals.[ch]. Catch up with extra orte_submit_init parameter. Add example code. Add documentation. Bump version. The nidmap and routing data must be updated prior to propagating the xcast or else the xcast will fail. Fix the return code so it is something more expected when an error occurs. Ensure we get an error returned to us when we fail to launch for some reason. In this case, we will always get a launch_cb as we did indeed attempt to spawn it. The error code will be returned in the complete_cb. Fix the return code from orte_submit_job - it was returning the tracker index instead of "success". Take advantage of ORTE's pretty-print capabilities to provide a nice error output explaining why we failed to launch. Ensure we always get a launch_cb when we fail to launch, but no complete_cb as the job never launched. Extend the error reporting capability to job completion as well. Add index parameter to orte_submit_job(). Add orte_job_cancel and implement ORTE_DAEMON_TERMINATE_JOB_CMD. Factor out dvm termination. Parse the terminate option at tool level. Add error string for ORTE_ERR_JOB_CANCELLED. Add some safeguards. Cleanup and/of comments. Enable the return. Properly ORTE_DECLSPEC orte_submit_halt. Add orte_submit_halt and orte_submit_cancel to interface. Use the plm interface to terminate the job	2016-02-13 08:10:44 -08:00
Nathan Hjelm	8b5810f7f7	mca/base: add priority output to mca_base_select The mca_base_select function uses returned priorities to select the best component/module. This priority may be of use to the caller so pass that information back in an optional argument. If the priority is not needed pass NULL. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-10-19 12:32:41 -06:00
Ralph Castain	459f169e06	Fix segfault upon job error Silence some unnecessary error-logs	2015-09-08 14:03:06 -07:00
Jeff Squyres	bc9e5652ff	whitespace: purge whitespace at end of lines Generated by running "./contrib/whitespace-purge.sh".	2015-09-08 09:47:17 -07:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	5040f47ef3	Use the correct verbosity in an output_verbose	2015-08-13 22:33:25 -07:00
Ralph Castain	a2243dcddd	Add an opal/errhandler so opal-level errors can be up-leveled	2015-07-11 07:09:11 -07:00
Nathan Hjelm	4d92c9989e	more c99 updates This commit does two things. It removes checks for C99 required headers (stdlib.h, string.h, signal.h, etc). Additionally it removes definitions for required C99 types (intptr_t, int64_t, int32_t, etc). Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-06-25 10:14:13 -06:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Jeff Squyres	5e52ce26b5	help-errmgr-base.txt: remove trailing newline Removed spurrious newline at end of file so that the emitted help message doesn't contain a blank line before the final "-----" output.	2015-05-23 03:33:23 -07:00
Ralph Castain	55cd2a07f6	Update exit code	2015-05-22 21:06:43 -07:00
Ralph Castain	3510bb4ced	Set the exit code when a daemon fails	2015-05-22 21:05:23 -07:00
Ralph Castain	bc7a7f3de5	Fix abnormal shutdown when a node dies	2015-05-22 17:29:06 -07:00
Gilles Gouaillardet	2e384a3b65	initialize common symbols from orte A few uninitialized common symbols are remaining (generated by flex) : * orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_leng * orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_text * orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_leng * orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_text	2015-05-08 10:11:58 +09:00
Nathan Hjelm	45e053dbce	orte: use C99 subobject naming for component initialization This commit helps future-proof orte components by initializing each component member by name. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-18 10:29:58 -06:00
Nathan Hjelm	b68d66bb9b	MCA: Add the project/project version to the MCA base component This commit adds support for project_framework_component_* parameter matching. This is the first step in allowing the same framework name in multiple projects. This change also bumps the MCA component version to 2.1.0. All master frameworks have been updated to use the new component versioning macro. An mca.h has been added to each project to add a project specific versioning macro of the form PROJECT_MCA_VERSION_2_1_0. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-03-27 10:59:04 -06:00
Howard Pritchard	bf89131f9e	add owner files to opa/ompi/orte mca directories This commit adds an owner file in each of the component directories for each framework. This allows for a simple script to parse the contents of the files and generate, among other things, tables to be used on the project's wiki page. Currently there are two "fields" in the file, an owner and a status. A tool to parse the files and generate tables for the wiki page will be added in a subsequent commit.	2015-02-22 15:10:23 -07:00
Ralph Castain	2b0b012460	Continue refinement of the DVM operations. Send the spawn request to the right place (it helps) as it isn't a comm_spawn request and has to be treated a little differently. Ensure IO gets forwarded back to the tool. Ensure the tool outputs show_help locally as there is no place to send it.	2015-02-04 06:21:54 -08:00
Rolf vandeVaart	f4aecdbfd2	Change logging function name from log to logfn. Fixes issue with PGI compile	2014-12-12 09:46:44 -05:00
Ralph Castain	54c955c92d	Fix a race condition that only appears to be affecting certain setups. The pmix.finalize function closes the file descriptor to the server, which then triggers the errhandler callback. Since the errmgr is about to be unloaded, it might be getting hit.	2014-12-03 12:19:00 -08:00
Ralph Castain	780c93ee57	Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.	2014-11-11 17:00:42 -08:00
Ralph Castain	f2b26bde4c	Resolve a race condition that could cause us to hang during abnormal terminations due to multi-counting num_terminated This commit was SVN r32660.	2014-09-02 00:32:52 +00:00
Ralph Castain	2b225e3776	Cleanup a race condition regarding marking that waitpid_fired. We should always mark it as fired when we enter the wait_local_proc routine, and also mark it as no longer alive if iof_complete has also been found. If other places in the code also update those flags, there is no harm done. This commit was SVN r32643.	2014-08-29 17:03:31 +00:00
Ralph Castain	5a13cdb739	Fix a race condition caused by a bad attribute flag that created an OR instead of an AND condition check This commit was SVN r32587.	2014-08-22 22:48:16 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00

1 2 3 4 5 ...

308 Коммитов