openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	503e1274a9	Per the discussion on the telecon, change the -host behavior so we only run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements Ensure the returned exit status is non-zero if we fail to map If no -np is given, but either -host and/or -hostfile was given, then error out with a message telling the user that this combination is not supported. If -np is given, and -host is given with only one instance of each host, then default the #slots to the detected #pe's and enforce oversubscription rules. If -np is given, and -host is given with more than one instance of a given host, then set the #slots for that host to the number of times it was given and enforce oversubscription rules. Alternatively, the #slots can be specified via "-host foo:N". I therefore believe that row #7 on Jeff's spreadsheet is incorrect. With that one correction, this now passes all the given use-cases on that spreadsheet. Make things behave under unmanaged allocations more like their managed cousins - if the #slots is given, then no-np shall fill things up. Fixes #1344	2016-03-29 11:21:57 -07:00
Ralph Castain	d72c1c72ff	Do not push child processes into separate process groups so that any host RM can still "see" them, and ensure that any signal sent to the orted's themselves will be provided to all child processes. Forward all signals from mpirun to the child processes, removing the old MCA parameter required to turn that behavior "on".	2016-03-06 17:55:09 -08:00
Ralph Castain	011403c04a	Fix a number of issues, some of which have lingered for a long time: * provide a more reliable way of determining that a process is a singleton by leveraging the schizo framework. Add new components for slurm, alps, and orte to detect when we are in a managed environment, and if we have been launched by mpirun or a native launcher. Set the correct envars to control ess and pmix selection in each case. * change the relative priority of the pmix120 and pmix112 components to make pmix120 the default * fix singleton comm-spawn by correctly setting the num_apps field of the orte_job_t created by the daemon - this fixes a segfault in register_nspace on newly created daemons * ensure orterun doesn't propagate any ess or pmix directives in its environment * Cleanup a few valgrind issues and memory leaks * Fix a race condition that prevented the client from completing notification registrations (missing thread shift) * Ensure the shizo/alps component detects launch by mpirun	2016-03-01 06:53:00 -08:00
Ralph Castain	e8d347d7bd	Add missing includes	2016-02-24 08:56:02 -06:00
Ralph Castain	77f800b7e8	Tools don't create the orte_job_data table, so don't remove jobs from it	2016-02-21 16:29:00 -08:00
Ralph Castain	d653cf2847	Convert the orte_job_data pointer array to a hash table so it doesn't grow forever as we run lots and lots of jobs in the persistent DVM.	2016-02-21 11:55:49 -08:00
Ralph Castain	8f9508cace	Further enhance the support for Singularity containers. Extend the "personality" command-line option to allow specifying both model (e.g., "ompi") and container (e.g., "singularity"), and add the necessary logic to support multiple options. Add a new pmix "isolated" component to handle singletons where no HNP is available since containers cannot launch the HNP.	2016-02-17 13:33:06 -08:00
Ralph Castain	50431001a3	Modify the IOF subsystem to handle per-job directives for redirecting IO to files, tagging IO, and timestamping IO. Fix stdin reader	2016-02-16 18:54:38 -08:00
Ralph Castain	06c3dfc052	Refactor the ORTE DVM code so that external codes can submit multiple jobs using only a single connection to the HNP. * Clean up the DVM so it continues to run even when applications error out and we would ordinarily abort the daemons. * Create a new errmgr component for the DVM to handle the differences. * Cleanup the DVM state component. * Add ORTE bindings directory and brief README * Pass a local tool index around to match jobs. * Pass the jobid on job completion. * Fix initialization logic. * Add framework for python wrapper. * Fix terminate-with-non-zero-exit behavior so it properly terminates only the indicated procs, notifies orte-submit, and orte-dvm continues executing. * Add some missing options to orte-dvm * Fix a bug in -host processing that caused us to ignore the #slots designator. Add a new attribute to indicate "do not expand the DVM" when submitting job spawn requests. * It actually makes no sense that we treat the termination of all children differently than terminating the children of a specific job - it only creates confusion over the difference in behavior. So terminate children the same way regardless. Extend the cmd_line utility to easily allow layering of command line definitions Catch up with ORTE interface change and make build more generic. Disable "fixed dvm" logic for now. Add another cmd_line function to merge a table of cmd line options with another one, reporting as errors any duplicate entries. Use this to allow orterun to reuse the orted_submit code Fix the "fixed_dvm" logic by ensuring we reset num_new_daemons to zero. Also ensure that the nidmap is sent with the first job so the downstream daemons get the node info. Remove a duplicate cmd line entry in orterun. Revise the DVM startup procedure to pass the nidmap only once, at the startup of the DVM. This reduces the overhead on each job launch and ensures that the nidmap doesn't get overwritten. Add new commands to get_orted_comm_cmd_str(). Move ORTE command line options to orte_globals.[ch]. Catch up with extra orte_submit_init parameter. Add example code. Add documentation. Bump version. The nidmap and routing data must be updated prior to propagating the xcast or else the xcast will fail. Fix the return code so it is something more expected when an error occurs. Ensure we get an error returned to us when we fail to launch for some reason. In this case, we will always get a launch_cb as we did indeed attempt to spawn it. The error code will be returned in the complete_cb. Fix the return code from orte_submit_job - it was returning the tracker index instead of "success". Take advantage of ORTE's pretty-print capabilities to provide a nice error output explaining why we failed to launch. Ensure we always get a launch_cb when we fail to launch, but no complete_cb as the job never launched. Extend the error reporting capability to job completion as well. Add index parameter to orte_submit_job(). Add orte_job_cancel and implement ORTE_DAEMON_TERMINATE_JOB_CMD. Factor out dvm termination. Parse the terminate option at tool level. Add error string for ORTE_ERR_JOB_CANCELLED. Add some safeguards. Cleanup and/of comments. Enable the return. Properly ORTE_DECLSPEC orte_submit_halt. Add orte_submit_halt and orte_submit_cancel to interface. Use the plm interface to terminate the job	2016-02-13 08:10:44 -08:00
Gilles Gouaillardet	1d38430e43	opal: replace opal_convert_jobid_to_string with opal_snprintf_jobid	2016-01-14 10:39:03 +09:00
Ralph Castain	64b695669a	Cleanup warnings in opal and orte layers when building optimized on Mac	2015-12-17 07:51:24 -08:00
Ralph Castain	3a56f0d34b	Create the pmix external component. Fix a few places where opal/util/argv.h were required when building with an external pmix (go figure). NOTE: Building with external pmix requires that you also build with external libevent and hwloc libraries. Detect this at configure and error out with large message if this requirement is violated. Closes #1204 (replaces it) Fixes #1064	2015-12-15 15:26:13 -08:00
Ralph Castain	6a607d42a6	Prevent a segfault on tools if a connection attempt fails - tools don't open the opal/pmix framework and thus have no way of looking up a proc hostname	2015-11-10 09:11:34 -08:00
Ralph Castain	169c44258d	Fix missing check	2015-11-03 19:00:28 -08:00
Ralph Castain	fe0c995f6b	Fix a couple of minor issues identified by Jeff	2015-11-03 17:30:51 -08:00
Ralph Castain	8bfbe7f16c	Add a new MCA parameter for default_dash_host to offer a mirror of the default_hostfile	2015-10-31 19:09:54 -07:00
Ralph Castain	6506b0a5e5	Resolve a race condition that prevented the sigchild callback from being registered before short-lived apps terminated Thanks to Mark Santcroos for the assistance in tracking it down.	2015-10-23 21:02:31 -07:00
Ralph Castain	8f6855459d	Cleanup some coverity warnings	2015-09-30 10:33:53 -07:00
Ralph Castain	f872e99315	Fix orte-submit so it allows application procs to select the correct ess component. Protect orte_data_server from multiple calls to finalize.	2015-09-21 20:31:57 -07:00
Ralph Castain	c1bbbb5e2f	Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds	2015-09-15 13:08:35 -07:00
Ralph Castain	37c3ed68e7	Cleanup connect/disconnect and bring comm_spawn back online!	2015-09-06 10:27:39 -07:00
rhc54	665b30376a	Merge pull request #868 from rhc54/topic/hwloc Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given	2015-09-04 17:58:07 -07:00
Ralph Castain	d97bc29102	Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given	2015-09-04 16:54:40 -07:00
Ralph Castain	f6948c2bb4	Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working	2015-09-04 10:07:17 -07:00
Ralph Castain	a772b46c15	Bring the MPI_Publish and friends online	2015-09-02 12:04:07 -07:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	683efcb850	Rename the current opal_event_base to opal_sync_event_base in preparation for adding an async progress thread to opal. No functional changes made here - just a simple rename.	2015-07-11 10:08:19 -07:00
Ralph Castain	ed93154e43	Fix hetero operations. An error in the hwloc utilities only allocated memory for the first display of a binding map, and then assumed that all nodes had the same number of cores in them. This resulted in memory corruption whenever someone displayed a binding pattern for a hetero cluster, and a smaller node was first in line.	2015-07-07 12:52:16 -07:00
Ralph Castain	a4557d4ed2	Add new component to support OpenMP envars per request from IBM and LLNL	2015-06-27 17:57:04 -07:00
Nathan Hjelm	4d92c9989e	more c99 updates This commit does two things. It removes checks for C99 required headers (stdlib.h, string.h, signal.h, etc). Additionally it removes definitions for required C99 types (intptr_t, int64_t, int32_t, etc). Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-06-25 10:14:13 -06:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Ralph Castain	869b2891c4	When doing comm-spawn, track the last object we bound to and ensure that we start the next job on the next object so we avoid overload situations when they aren't necessary	2015-06-17 09:20:08 -07:00
Ralph Castain	ea35e47228	Fat SMPs (i.e., systems with nodes containing large numbers of cpus) were failing to start due to connection failures of the opal/pmix support. Root cause was that (a) we were setting the client socket to non-blocking before calling connect, and (b) the server was using the event library to harvest the accepts, and also did the handshake while in that event. So the server would backup beyond the connection backlog limit, and we would fail. Changing the client to leave its socket as blocking during the connect doesn't solve the problem by itself - you also have to introduce a sleep delay once the backlog is hit to avoid simply machine-gunning your way thru retries. This gets somewhat difficult to adjust as you don't want to unnecessarily prolong startup time. We've solved this before by adding a listening thread that simply reaps accepts and shoves them into the event library for subsequent processing. This would resolve the problem, but meant yet another daemon-level thread. So I centralized the listening thread support and let multiple elements register listeners on it. Thus, each daemon now has a single listening thread that reaps accepts from multiple sources - for now, the orte/pmix server and the oob/usock support are using it. I'll add in the oob/tcp component later. This still didn't fully resolve the SMP problem, especially on coprocessor cards (e.g., KNC). Removing the shared memory dstore support helped further improve the behavior - it looks like there is some kind of memory paging issue there that needs further understanding. Given that the shared memory support was about to be lost when I bring over the PMIx integration (until it is restored in that library), it seemed like a reasonable thing to just remove it at this point.	2015-05-29 14:37:14 -07:00
Nathan Hjelm	7db48c581d	orte_quit: Remove logically dead code CID 71993 Logically dead code (DEADCODE) As indicated by coverity proc can not be NULL at any point after the continue. Removed dead code. CID 1269682 Unchecked return value (CHECKED_RETURN) Check the return code of orte_get_attribute. I assume we still need to check for a NULL proc in case the aborted proc attribute is set to NULL. This might be better as an assert (). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-05-26 12:16:12 -06:00
Gilles Gouaillardet	2e384a3b65	initialize common symbols from orte A few uninitialized common symbols are remaining (generated by flex) : * orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_leng * orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_text * orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_leng * orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_text	2015-05-08 10:11:58 +09:00
Ralph Castain	e26e7ad736	Better support automated tests for map, rank, and bind options	2015-04-30 14:01:13 -07:00
Ralph Castain	d5e4fd059f	Ensure the binding and locale strings are always defined	2015-04-23 07:43:37 -07:00
Ralph Castain	cb7330a543	Get the output to lineup properly	2015-04-23 07:38:51 -07:00
Jeff Squyres	79243aca4e	display-devel-map: minor output tweak hwloc output can get fairly long, especially on machines with lots of cores and/or hyperthreads. So put the Locale and Binding output on separate lines.	2015-04-23 06:14:57 -07:00
Ralph Castain	58e646ccfd	Reduce confusion by having the devel-map display in the same format as report-bindings	2015-04-23 04:30:00 -07:00
Nathan Hjelm	a7b0c00ab6	fix memory leaks and valgrind errors This commit fixes several vagrind errors. Included: - installdirs did not correctly reinitialize all pointers to NULL at close. This causes valgrind errors on a subsequent call to opal_init_tool. - several opal strings were leaked by opal_deregister_params which was setting them to NULL instead of letting them be freed by the MCA variable system. - move opal_net_init to AFTER the variable system is initialized and opal's MCA variables have been registered. opal_net_init uses a variable registered by opal_register_params! - do not leak ompi_mpi_main_thread when it is allocated by MPI_T_init_thread. - do not overwrite ompi_mpi_main_thread if it is already set (by MPI_T_init_thread). - mca_base_var: read_files was overwritting mca_base_var_file_list even if it was non-NULL. - mca_base_var: set all file global variables to initial states on finalize. - btl/vader: decrement enumerator reference count to ensure that it is freed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-11 09:28:35 -06:00
Nathan Hjelm	9cd955badf	opal: fix multiple bugs in MCA and opal This commit fixes the following bugs: - opal_output_finalize did not properly set internal state. This caused problems when calling the sequence opal_output_init (), opal_output_finalize (), opal_output_init (). - opal_info support called mca_base_open () but never called the matching mca_base_close (). mca_base_open () and mca_base_close () have been updated to use a open count instead of an open flag to allow mca_base_open to be called through multiple paths (as may be the case when MPI_T is in use). - orte_info support did not register opal variables. This can cause orte-info to not return opal variables. - opal_info, orte_info, and ompi_info support have been updated to use a register count. - When opening the dl framework the reference count was added to ensure the framework stuck around. The framework being closed prematurely was a bug in the MCA base that has since been corrected. The increment (and associated decrement) have been removed. - dl/dlopen did not set the value of mca_dl_dlopen_component.filename_suffixes_mca_storage on each call to register. Instead the value was set in the component structure. This caused the value to be lost when re-loading the component. Fixed by setting the default value in register. - Reset shmem framework state on close to avoid returning a stale component after reloading opal/shmem. - MCA base parameters were not properly deregistered when the MCA base was closed. This commit may fix #374. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-07 19:13:20 -06:00
Adrian Reber	f45dd069bd	FT: fix compilation using --with-ft (1/5) Enabling the FT code breaks compilation (again). This series tries to fix the compiler errors. This is again only fixing the compiler errors without any warranty that the result might actually support FT again. This first patch moves orte_cr_continue_like_restart from ORTE to opal_cr_continue_like_restart in OPAL. This only leaves three calls from OPAL to ORTE in the FT code. As it is not yet 100% clear how to handle these calls the code orte_sstore.set_attr() has been #ifdef'd out for now.	2015-03-11 14:23:33 +01:00
Ralph Castain	df2cd96772	Display the local/global attribute flag more prominently. Mark the attributes as global in orte-submit so they will be communicated	2015-02-10 10:47:32 -08:00
Ralph Castain	74385302c0	Add the personality to the orte_job_t datatype support	2015-01-27 09:29:42 -06:00
Ralph Castain	028b00154d	Complete implementation of the schizo framework to support OMPI component	2015-01-27 09:29:42 -06:00
Ralph Castain	9ac39b63cc	Use the opal_progress_threads support for the ORTE progress thread in applications	2015-01-15 07:55:19 -08:00
Ralph Castain	bb529ebd8e	Revise the way we handle hetero nodes as users are finding this (a) a significant surprise, and (b) confusing as to when it is required. So try to automate it a bit by creating a topology "signature" that mpirun can share on the cmd line with the remote daemons, thus allowing them to check to see if they match. This isn't comprehensive of course - for now, it only checks the number of each type of hwloc object on the node. This is good enough to pickup major differences (e.g., where we have different numbers of sockets or assigned core bindings). Retain the hetero-nodes flag for those cases where the user knows that there are differences and our automated system isn't good enough to see it. Will obviously require further refinement as we find out which variances it can detect, and which it cannot.	2014-12-08 15:38:14 -08:00
Elena	6cf3925b09	restored _process_name_print_for_opal function in orte_init: it's required for opal output from daemons which never called ompi_init so didn't set opal_process_name_print pointer	2014-12-08 13:13:35 +02:00
Ralph Castain	14cdb04327	Revise the ess/pmi selection logic as all APPs must select it, and no daemons. Cleanup some of the mca param levels in ess so we don't printout the topology quite as easily.	2014-12-01 21:19:11 -08:00

1 2 3 4 5 ...

644 Коммитов