openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	649301a3a2	Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier). Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.	2016-10-23 21:52:39 -07:00
Ralph Castain	de7b1494d9	Clean out old cruft from the ORCM project	2016-09-21 00:13:30 -07:00
Artem Polyakov	81195ab724	Several fixes related to session directories: * enable OMPI to retrieve paths from RM through PMIx * cleanups related to tempdirs.	2016-09-05 07:48:44 +03:00
Ralph Castain	92102304b6	Minor typo - init the job_data stdin_target field to 0 for default behavior. Add test.	2016-08-22 21:03:45 -07:00
Ralph Castain	ae2af61ee3	Update the session dir structure. Restore the creation of a top-level dir based on userid so that everything is contained under the user's top-level dir. Make the next level down (the "job family" level) be either the pid (indicated by a name of "pid.N") or the job family if not launched by mpirun. This allows for proper rendezvous by direct-launched procs.	2016-08-15 22:46:46 -05:00
Howard Pritchard	ff669e7b15	code cleanup: clang is now a happier panda Clang 5.1 on my mac was a sad panda compiling a couple of files, complaining about uninitialized stack variables. This commit makes clang a happier panda (or at least not so sad). Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-08-04 19:34:44 -06:00
Ralph Castain	aa78f902f2	Add some missing info to the job map so remote procs get their app_rank	2016-07-12 09:50:12 -07:00
Ralph Castain	e3e4d73986	Need to be a little more careful when checking the range on a publish/lookup operation. If the range was constrained at publish, then we need to check that the lookup fits within that constraint. Otherwise, we should provide the data. More detailed constraint checking will be provided later.	2016-06-24 17:01:49 -07:00
Ralph Castain	3913595e10	Enable simulation of large-scale clusters by allowing multiple daemons/node. Specifying the ras_base_multiplier parameter to be greater than 1 will cause ORTE to replicate each allocated node by that factor. A daemon will be spawned for each replica, thus letting ORTE function as if it were on a much larger cluster. Note that this cannot be used for MPI performance testing. It is really only useful for ORTE scaling tests. It also only works with the rsh/ssh launcher.	2016-05-29 18:56:18 -07:00
Ralph Castain	503e1274a9	Per the discussion on the telecon, change the -host behavior so we only run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements Ensure the returned exit status is non-zero if we fail to map If no -np is given, but either -host and/or -hostfile was given, then error out with a message telling the user that this combination is not supported. If -np is given, and -host is given with only one instance of each host, then default the #slots to the detected #pe's and enforce oversubscription rules. If -np is given, and -host is given with more than one instance of a given host, then set the #slots for that host to the number of times it was given and enforce oversubscription rules. Alternatively, the #slots can be specified via "-host foo:N". I therefore believe that row #7 on Jeff's spreadsheet is incorrect. With that one correction, this now passes all the given use-cases on that spreadsheet. Make things behave under unmanaged allocations more like their managed cousins - if the #slots is given, then no-np shall fill things up. Fixes #1344	2016-03-29 11:21:57 -07:00
Ralph Castain	d72c1c72ff	Do not push child processes into separate process groups so that any host RM can still "see" them, and ensure that any signal sent to the orted's themselves will be provided to all child processes. Forward all signals from mpirun to the child processes, removing the old MCA parameter required to turn that behavior "on".	2016-03-06 17:55:09 -08:00
Ralph Castain	011403c04a	Fix a number of issues, some of which have lingered for a long time: * provide a more reliable way of determining that a process is a singleton by leveraging the schizo framework. Add new components for slurm, alps, and orte to detect when we are in a managed environment, and if we have been launched by mpirun or a native launcher. Set the correct envars to control ess and pmix selection in each case. * change the relative priority of the pmix120 and pmix112 components to make pmix120 the default * fix singleton comm-spawn by correctly setting the num_apps field of the orte_job_t created by the daemon - this fixes a segfault in register_nspace on newly created daemons * ensure orterun doesn't propagate any ess or pmix directives in its environment * Cleanup a few valgrind issues and memory leaks * Fix a race condition that prevented the client from completing notification registrations (missing thread shift) * Ensure the shizo/alps component detects launch by mpirun	2016-03-01 06:53:00 -08:00
Ralph Castain	e8d347d7bd	Add missing includes	2016-02-24 08:56:02 -06:00
Ralph Castain	77f800b7e8	Tools don't create the orte_job_data table, so don't remove jobs from it	2016-02-21 16:29:00 -08:00
Ralph Castain	d653cf2847	Convert the orte_job_data pointer array to a hash table so it doesn't grow forever as we run lots and lots of jobs in the persistent DVM.	2016-02-21 11:55:49 -08:00
Ralph Castain	8f9508cace	Further enhance the support for Singularity containers. Extend the "personality" command-line option to allow specifying both model (e.g., "ompi") and container (e.g., "singularity"), and add the necessary logic to support multiple options. Add a new pmix "isolated" component to handle singletons where no HNP is available since containers cannot launch the HNP.	2016-02-17 13:33:06 -08:00
Ralph Castain	50431001a3	Modify the IOF subsystem to handle per-job directives for redirecting IO to files, tagging IO, and timestamping IO. Fix stdin reader	2016-02-16 18:54:38 -08:00
Ralph Castain	06c3dfc052	Refactor the ORTE DVM code so that external codes can submit multiple jobs using only a single connection to the HNP. * Clean up the DVM so it continues to run even when applications error out and we would ordinarily abort the daemons. * Create a new errmgr component for the DVM to handle the differences. * Cleanup the DVM state component. * Add ORTE bindings directory and brief README * Pass a local tool index around to match jobs. * Pass the jobid on job completion. * Fix initialization logic. * Add framework for python wrapper. * Fix terminate-with-non-zero-exit behavior so it properly terminates only the indicated procs, notifies orte-submit, and orte-dvm continues executing. * Add some missing options to orte-dvm * Fix a bug in -host processing that caused us to ignore the #slots designator. Add a new attribute to indicate "do not expand the DVM" when submitting job spawn requests. * It actually makes no sense that we treat the termination of all children differently than terminating the children of a specific job - it only creates confusion over the difference in behavior. So terminate children the same way regardless. Extend the cmd_line utility to easily allow layering of command line definitions Catch up with ORTE interface change and make build more generic. Disable "fixed dvm" logic for now. Add another cmd_line function to merge a table of cmd line options with another one, reporting as errors any duplicate entries. Use this to allow orterun to reuse the orted_submit code Fix the "fixed_dvm" logic by ensuring we reset num_new_daemons to zero. Also ensure that the nidmap is sent with the first job so the downstream daemons get the node info. Remove a duplicate cmd line entry in orterun. Revise the DVM startup procedure to pass the nidmap only once, at the startup of the DVM. This reduces the overhead on each job launch and ensures that the nidmap doesn't get overwritten. Add new commands to get_orted_comm_cmd_str(). Move ORTE command line options to orte_globals.[ch]. Catch up with extra orte_submit_init parameter. Add example code. Add documentation. Bump version. The nidmap and routing data must be updated prior to propagating the xcast or else the xcast will fail. Fix the return code so it is something more expected when an error occurs. Ensure we get an error returned to us when we fail to launch for some reason. In this case, we will always get a launch_cb as we did indeed attempt to spawn it. The error code will be returned in the complete_cb. Fix the return code from orte_submit_job - it was returning the tracker index instead of "success". Take advantage of ORTE's pretty-print capabilities to provide a nice error output explaining why we failed to launch. Ensure we always get a launch_cb when we fail to launch, but no complete_cb as the job never launched. Extend the error reporting capability to job completion as well. Add index parameter to orte_submit_job(). Add orte_job_cancel and implement ORTE_DAEMON_TERMINATE_JOB_CMD. Factor out dvm termination. Parse the terminate option at tool level. Add error string for ORTE_ERR_JOB_CANCELLED. Add some safeguards. Cleanup and/of comments. Enable the return. Properly ORTE_DECLSPEC orte_submit_halt. Add orte_submit_halt and orte_submit_cancel to interface. Use the plm interface to terminate the job	2016-02-13 08:10:44 -08:00
Gilles Gouaillardet	1d38430e43	opal: replace opal_convert_jobid_to_string with opal_snprintf_jobid	2016-01-14 10:39:03 +09:00
Ralph Castain	64b695669a	Cleanup warnings in opal and orte layers when building optimized on Mac	2015-12-17 07:51:24 -08:00
Ralph Castain	3a56f0d34b	Create the pmix external component. Fix a few places where opal/util/argv.h were required when building with an external pmix (go figure). NOTE: Building with external pmix requires that you also build with external libevent and hwloc libraries. Detect this at configure and error out with large message if this requirement is violated. Closes #1204 (replaces it) Fixes #1064	2015-12-15 15:26:13 -08:00
Ralph Castain	6a607d42a6	Prevent a segfault on tools if a connection attempt fails - tools don't open the opal/pmix framework and thus have no way of looking up a proc hostname	2015-11-10 09:11:34 -08:00
Ralph Castain	169c44258d	Fix missing check	2015-11-03 19:00:28 -08:00
Ralph Castain	fe0c995f6b	Fix a couple of minor issues identified by Jeff	2015-11-03 17:30:51 -08:00
Ralph Castain	8bfbe7f16c	Add a new MCA parameter for default_dash_host to offer a mirror of the default_hostfile	2015-10-31 19:09:54 -07:00
Ralph Castain	6506b0a5e5	Resolve a race condition that prevented the sigchild callback from being registered before short-lived apps terminated Thanks to Mark Santcroos for the assistance in tracking it down.	2015-10-23 21:02:31 -07:00
Ralph Castain	8f6855459d	Cleanup some coverity warnings	2015-09-30 10:33:53 -07:00
Ralph Castain	f872e99315	Fix orte-submit so it allows application procs to select the correct ess component. Protect orte_data_server from multiple calls to finalize.	2015-09-21 20:31:57 -07:00
Ralph Castain	c1bbbb5e2f	Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds	2015-09-15 13:08:35 -07:00
Ralph Castain	37c3ed68e7	Cleanup connect/disconnect and bring comm_spawn back online!	2015-09-06 10:27:39 -07:00
rhc54	665b30376a	Merge pull request #868 from rhc54/topic/hwloc Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given	2015-09-04 17:58:07 -07:00
Ralph Castain	d97bc29102	Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given	2015-09-04 16:54:40 -07:00
Ralph Castain	f6948c2bb4	Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working	2015-09-04 10:07:17 -07:00
Ralph Castain	a772b46c15	Bring the MPI_Publish and friends online	2015-09-02 12:04:07 -07:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	683efcb850	Rename the current opal_event_base to opal_sync_event_base in preparation for adding an async progress thread to opal. No functional changes made here - just a simple rename.	2015-07-11 10:08:19 -07:00
Ralph Castain	ed93154e43	Fix hetero operations. An error in the hwloc utilities only allocated memory for the first display of a binding map, and then assumed that all nodes had the same number of cores in them. This resulted in memory corruption whenever someone displayed a binding pattern for a hetero cluster, and a smaller node was first in line.	2015-07-07 12:52:16 -07:00
Ralph Castain	a4557d4ed2	Add new component to support OpenMP envars per request from IBM and LLNL	2015-06-27 17:57:04 -07:00
Nathan Hjelm	4d92c9989e	more c99 updates This commit does two things. It removes checks for C99 required headers (stdlib.h, string.h, signal.h, etc). Additionally it removes definitions for required C99 types (intptr_t, int64_t, int32_t, etc). Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-06-25 10:14:13 -06:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Ralph Castain	869b2891c4	When doing comm-spawn, track the last object we bound to and ensure that we start the next job on the next object so we avoid overload situations when they aren't necessary	2015-06-17 09:20:08 -07:00
Ralph Castain	ea35e47228	Fat SMPs (i.e., systems with nodes containing large numbers of cpus) were failing to start due to connection failures of the opal/pmix support. Root cause was that (a) we were setting the client socket to non-blocking before calling connect, and (b) the server was using the event library to harvest the accepts, and also did the handshake while in that event. So the server would backup beyond the connection backlog limit, and we would fail. Changing the client to leave its socket as blocking during the connect doesn't solve the problem by itself - you also have to introduce a sleep delay once the backlog is hit to avoid simply machine-gunning your way thru retries. This gets somewhat difficult to adjust as you don't want to unnecessarily prolong startup time. We've solved this before by adding a listening thread that simply reaps accepts and shoves them into the event library for subsequent processing. This would resolve the problem, but meant yet another daemon-level thread. So I centralized the listening thread support and let multiple elements register listeners on it. Thus, each daemon now has a single listening thread that reaps accepts from multiple sources - for now, the orte/pmix server and the oob/usock support are using it. I'll add in the oob/tcp component later. This still didn't fully resolve the SMP problem, especially on coprocessor cards (e.g., KNC). Removing the shared memory dstore support helped further improve the behavior - it looks like there is some kind of memory paging issue there that needs further understanding. Given that the shared memory support was about to be lost when I bring over the PMIx integration (until it is restored in that library), it seemed like a reasonable thing to just remove it at this point.	2015-05-29 14:37:14 -07:00
Nathan Hjelm	7db48c581d	orte_quit: Remove logically dead code CID 71993 Logically dead code (DEADCODE) As indicated by coverity proc can not be NULL at any point after the continue. Removed dead code. CID 1269682 Unchecked return value (CHECKED_RETURN) Check the return code of orte_get_attribute. I assume we still need to check for a NULL proc in case the aborted proc attribute is set to NULL. This might be better as an assert (). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-05-26 12:16:12 -06:00
Gilles Gouaillardet	2e384a3b65	initialize common symbols from orte A few uninitialized common symbols are remaining (generated by flex) : * orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_leng * orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_text * orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_leng * orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_text	2015-05-08 10:11:58 +09:00
Ralph Castain	e26e7ad736	Better support automated tests for map, rank, and bind options	2015-04-30 14:01:13 -07:00
Ralph Castain	d5e4fd059f	Ensure the binding and locale strings are always defined	2015-04-23 07:43:37 -07:00
Ralph Castain	cb7330a543	Get the output to lineup properly	2015-04-23 07:38:51 -07:00
Jeff Squyres	79243aca4e	display-devel-map: minor output tweak hwloc output can get fairly long, especially on machines with lots of cores and/or hyperthreads. So put the Locale and Binding output on separate lines.	2015-04-23 06:14:57 -07:00
Ralph Castain	58e646ccfd	Reduce confusion by having the devel-map display in the same format as report-bindings	2015-04-23 04:30:00 -07:00
Nathan Hjelm	a7b0c00ab6	fix memory leaks and valgrind errors This commit fixes several vagrind errors. Included: - installdirs did not correctly reinitialize all pointers to NULL at close. This causes valgrind errors on a subsequent call to opal_init_tool. - several opal strings were leaked by opal_deregister_params which was setting them to NULL instead of letting them be freed by the MCA variable system. - move opal_net_init to AFTER the variable system is initialized and opal's MCA variables have been registered. opal_net_init uses a variable registered by opal_register_params! - do not leak ompi_mpi_main_thread when it is allocated by MPI_T_init_thread. - do not overwrite ompi_mpi_main_thread if it is already set (by MPI_T_init_thread). - mca_base_var: read_files was overwritting mca_base_var_file_list even if it was non-NULL. - mca_base_var: set all file global variables to initial states on finalize. - btl/vader: decrement enumerator reference count to ensure that it is freed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-11 09:28:35 -06:00
Nathan Hjelm	9cd955badf	opal: fix multiple bugs in MCA and opal This commit fixes the following bugs: - opal_output_finalize did not properly set internal state. This caused problems when calling the sequence opal_output_init (), opal_output_finalize (), opal_output_init (). - opal_info support called mca_base_open () but never called the matching mca_base_close (). mca_base_open () and mca_base_close () have been updated to use a open count instead of an open flag to allow mca_base_open to be called through multiple paths (as may be the case when MPI_T is in use). - orte_info support did not register opal variables. This can cause orte-info to not return opal variables. - opal_info, orte_info, and ompi_info support have been updated to use a register count. - When opening the dl framework the reference count was added to ensure the framework stuck around. The framework being closed prematurely was a bug in the MCA base that has since been corrected. The increment (and associated decrement) have been removed. - dl/dlopen did not set the value of mca_dl_dlopen_component.filename_suffixes_mca_storage on each call to register. Instead the value was set in the component structure. This caused the value to be lost when re-loading the component. Fixed by setting the default value in register. - Reset shmem framework state on close to avoid returning a stale component after reloading opal/shmem. - MCA base parameters were not properly deregistered when the MCA base was closed. This commit may fix #374. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-07 19:13:20 -06:00
Adrian Reber	f45dd069bd	FT: fix compilation using --with-ft (1/5) Enabling the FT code breaks compilation (again). This series tries to fix the compiler errors. This is again only fixing the compiler errors without any warranty that the result might actually support FT again. This first patch moves orte_cr_continue_like_restart from ORTE to opal_cr_continue_like_restart in OPAL. This only leaves three calls from OPAL to ORTE in the FT code. As it is not yet 100% clear how to handle these calls the code orte_sstore.set_attr() has been #ifdef'd out for now.	2015-03-11 14:23:33 +01:00
Ralph Castain	df2cd96772	Display the local/global attribute flag more prominently. Mark the attributes as global in orte-submit so they will be communicated	2015-02-10 10:47:32 -08:00
Ralph Castain	74385302c0	Add the personality to the orte_job_t datatype support	2015-01-27 09:29:42 -06:00
Ralph Castain	028b00154d	Complete implementation of the schizo framework to support OMPI component	2015-01-27 09:29:42 -06:00
Ralph Castain	9ac39b63cc	Use the opal_progress_threads support for the ORTE progress thread in applications	2015-01-15 07:55:19 -08:00
Ralph Castain	bb529ebd8e	Revise the way we handle hetero nodes as users are finding this (a) a significant surprise, and (b) confusing as to when it is required. So try to automate it a bit by creating a topology "signature" that mpirun can share on the cmd line with the remote daemons, thus allowing them to check to see if they match. This isn't comprehensive of course - for now, it only checks the number of each type of hwloc object on the node. This is good enough to pickup major differences (e.g., where we have different numbers of sockets or assigned core bindings). Retain the hetero-nodes flag for those cases where the user knows that there are differences and our automated system isn't good enough to see it. Will obviously require further refinement as we find out which variances it can detect, and which it cannot.	2014-12-08 15:38:14 -08:00
Elena	6cf3925b09	restored _process_name_print_for_opal function in orte_init: it's required for opal output from daemons which never called ompi_init so didn't set opal_process_name_print pointer	2014-12-08 13:13:35 +02:00
Ralph Castain	14cdb04327	Revise the ess/pmi selection logic as all APPs must select it, and no daemons. Cleanup some of the mca param levels in ess so we don't printout the topology quite as easily.	2014-12-01 21:19:11 -08:00
Gilles Gouaillardet	a6744b8177	fix misc memory leaks specific to the master	2014-11-25 13:52:10 +09:00
Ralph Castain	780c93ee57	Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.	2014-11-11 17:00:42 -08:00
Elena	03fc809bc9	This commit contains new dstore component sm which is used for communication between pmix server and clients at the same node via shared memory.	2014-11-06 16:01:19 +02:00
Ralph Castain	6fbc68c830	Update the grpcomm direct component's priority so it sits at the bottom of the list, as it should now that the other components are active. Cleanup up the signature print function a touch to make it more readable. Remove the unneeded xcast functions in brks and rcd components as we will just fall thru to using the "direct" one	2014-11-03 14:43:17 -08:00
Jeff Squyres	c22e1ae33b	configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros These two macros set the prefix for the OPAL and ORTE libraries, respectively. Specifically, the OPAL library will be named libPREFIXopen-pal.la and the ORTE library will be named libPREFIXopen-rte.la. These macros must be called, even if the prefix argument is empty. The intent is that Open MPI will call these macros with an empty prefix, but other projects (such as ORCM) will call these macros with a non-empty prefix. For example, ORCM libraries can be named liborcm-open-pal.la and liborcm-open-rte.la. This scheme is necessary to allow running Open MPI applications under systems that use their own versions of ORTE and OPAL. For example, when running MPI applications under ORTE, if the ORTE and OPAL libraries between OMPI and ORCM are not identical (which, because they are released at different times, are likely to be different), we need to ensure that the OMPI applications link against their ORTE and OPAL libraries, but the ORCM executables link against their ORTE and OPAL libraries.	2014-10-22 10:32:19 -07:00
Jeff Squyres	01fd96bfa5	Revert "Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build." This reverts commit `63f619f871`.	2014-10-22 10:32:11 -07:00
Ralph Castain	b6aa691e0a	Fix incorrect implementation of new MCA param mca_base_env_list - it was not picking up envars and forwarding them, but only worked if you explicitly set a value for the envar. Ensure it works for both direct and indirect launch modes. Remove stale code as this replaced orte_forward_envars. Ensure it doesn't get passed to the ORTE daemons.	2014-10-16 12:58:56 -07:00
Ralph Castain	1ae34da5e5	Add an attributes parameter to the dstore.open function so we can pass directives to the active storage component. This can, for example, include the backing file info for a new shared memory segment.	2014-10-10 12:13:25 -07:00
Ralph Castain	63f619f871	Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build.	2014-10-10 11:39:08 -07:00
Ralph Castain	1be1654e5f	Correctly identify the synonym for orte_direct_modex_cutoff as ompi_hostname_cutoff	2014-10-10 06:05:06 -07:00
Elena	c905fe9b78	pmix: removed pmix_base_direct modex mca parameter, renamed orte_full_modex_cutoff and ompi_hostname_cutoff to direct_modex_cutoff	2014-10-09 06:15:31 +02:00
Elena	e319c95267	fixes for grpcomm rcd/brucks algorithms	2014-10-09 06:12:26 +02:00
Ralph Castain	fd6a044b7f	Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages. Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.	2014-10-03 16:02:57 -06:00
Jeff Squyres	413e775dbf	version configury: make dist now works Update the VERSION file scheme: * Remove "want_repo_rev". * Add "tarball_version". All values are now always included (major, minor, release, greek, repo_rev). However, configure.ac now runs "opal_get_version.sh ... --tarball", which will return the value of tarball_version (if it is non-empty) or the "full" version string (i.e., "major.minor.releasegreek").	2014-10-02 11:32:54 -07:00
Ralph Castain	a74428513d	Provide a better help message when we are unable to complete a connection due to a firewall. cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32743.	2014-09-16 16:28:29 +00:00
Ralph Castain	dfb952fa78	[Contribution from Artem - moved it to svn from git for him] Replace our old, clunky timing setup with a much nicer one that is only available if configured with --enable-timing. Add a tool for profiling clock differences between the nodes so you can get more precise timing measurements. I'll ask Artem to update the Github wiki with full instructions on how to use this setup. This commit was SVN r32738.	2014-09-15 18:00:46 +00:00
Ralph Castain	731a878ff3	Add a bunch of debug to help track down the problem, and eventually find another place where comparison of signatures was incorrectly performed - use the dss compare operation to be consistent and safe This commit was SVN r32620.	2014-08-27 19:52:20 +00:00
Ralph Castain	3c24770bce	Protect debug printing on backend nodes This commit was SVN r32618.	2014-08-27 16:17:28 +00:00
Ralph Castain	1221e8a96f	Compare the full signature - thanks to Gilles for identifying the problem This commit was SVN r32595.	2014-08-25 14:52:06 +00:00
Ralph Castain	8f1b9b463e	Fix shared memory operations - need to pass the local topology and cpusets of all local peers so we can properly compute relative locality for them. Also need to set default locality to "on node" in case where cpusets are not passed because procs are not bound. This commit was SVN r32577.	2014-08-22 05:17:51 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Gilles Gouaillardet	f24699623f	check-help-strings cleanup This commit was SVN r32495.	2014-08-11 03:25:22 +00:00
Ralph Castain	daeb9b6c4f	Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain. Thanks to Gilles for pointing out some of the discrepancies. This commit was SVN r32398.	2014-08-01 14:44:11 +00:00
Ralph Castain	6c5e592785	Revert r32222, r32210, and r32203 as they created a problem when daemon collectives did not involve app procs on every node. Instead, modify the ompi/mca/rte/orte/rte_orte.h to add a new function that allows apps to request new daemon collective ids for use in barrier and modex operations. This will only appear in ORTE-based installations, but it is only being used by a couple of researchers at the moment. Update the orte/test/mpi/coll_test.c test to show the revised example. This commit was SVN r32234. The following SVN revision numbers were found above: r32203 --> open-mpi/ompi@a523dba41d r32210 --> open-mpi/ompi@2ce11ed5c4 r32222 --> open-mpi/ompi@d55f16db50	2014-07-15 03:48:00 +00:00
Ralph Castain	a523dba41d	NOTE: this modifies the MPI-RTE interface We have been getting several requests for new collectives that need to be inserted in various places of the MPI layer, all in support of either checkpoint/restart or various research efforts. Until now, this would require that the collective id's be generated at launch. which required modification s to ORTE and other places. We chose not to make collectives reusable as the race conditions associated with resetting collective counters are daunti ng. This commit extends the collective system to allow self-generation of collective id's that the daemons need to support, thereby allowing developers to request any number of collectives for their work. There is one restriction: RTE collectives must occur at the process level - i.e., we don't curren tly have a way of tagging the collective to a specific thread. From the comment in the code: * In order to allow scalable * generation of collective id's, they are formed as: * * top 32-bits are the jobid of the procs involved in * the collective. For collectives across multiple jobs * (e.g., in a connect_accept), the daemon jobid will * be used as the id will be issued by mpirun. This * won't cause problems because daemons don't use the * collective_id * * bottom 32-bits are a rolling counter that recycles * when the max is hit. The daemon will cleanup each * collective upon completion, so this means a job can * never have more than 2*32 collectives going on at a time. If someone needs more than that - they've got * a problem. * * Note that this means (for now) that RTE-level collectives * cannot be done by individual threads - they must be * done at the overall process level. This is required as * there is no guaranteed ordering for the collective id's, * and all the participants must agree on the id of the * collective they are executing. So if thread A on one * process asks for a collective id before thread B does, * but B asks before A on another process, the collectives will * be mixed and not result in the expected behavior. We may * find a way to relax this requirement in the future by * adding a thread context id to the jobid field (maybe taking the * lower 16-bits of that field). This commit includes a test program (orte/test/mpi/coll_test.c) that cycles 100 times across barrier and modex collectives. This commit was SVN r32203.	2014-07-10 18:53:12 +00:00
Ralph Castain	356e7ea904	Move all collective id's into the attributes and let the job pack/unpack take care of them instead of singling them out. Add the envars just prior to forking the children instead of into the launch message itself. Remove a few #if CR as the attributes functionality can handle this condition now. This commit was SVN r32133.	2014-07-03 15:58:13 +00:00
George Bosilca	2883adcdf3	Remove useless variables. This commit was SVN r32123.	2014-07-03 00:30:54 +00:00
Ralph Castain	5216bd5558	Multiple sigchld reports can occur within a single event callback, so have to reap them until none remain. Also, need to ensure the daemon is flagged as alive prior to calling wait_cb Refs trac:4717 This commit was SVN r32020. The following Trac tickets were found above: Ticket 4717 --> https://svn.open-mpi.org/trac/ompi/ticket/4717	2014-06-17 18:46:40 +00:00
Ralph Castain	42bf7466fc	This isn't as big a change as it appears - a change in one place caused a whole bunch of files to require updated #include's due to some arcane linkage. Rework the orte_wait code to reflect the introduction of the state machine. If we are in cleanup mode and just want to kill all our local children, then there is no reason to be polite about it as that introduces very long delays at scale. Just kill the procs and move on. Refs trac:4717 This commit was SVN r32019. The following Trac tickets were found above: Ticket 4717 --> https://svn.open-mpi.org/trac/ompi/ticket/4717	2014-06-17 17:57:51 +00:00
Ralph Castain	f1978fba7c	Cleanup a set of typos on the orte_get_attribute call This commit was SVN r31942.	2014-06-03 20:36:38 +00:00
Ralph Castain	742c0d2284	Fix typo that would cause a segfault if orte_startup_timeout was set This commit was SVN r31929.	2014-06-02 15:59:18 +00:00
Ralph Castain	8736a1c138	Per RFC: http://www.open-mpi.org/community/lists/devel/2014/05/14822.php Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root). This commit was SVN r31916.	2014-06-01 16:14:10 +00:00
Ralph Castain	1107f9099e	Per the RFC issued here: http://www.open-mpi.org/community/lists/devel/2014/05/14827.php Refactor PMI support This commit was SVN r31907.	2014-06-01 04:28:17 +00:00
Gilles Gouaillardet	5b9364fc12	Fix a memory leak in orte_register_params() mca_base_var_register (..., MCA_BASE_VAR_TYPE_STRING, ...) will dup() the orte_set_slots string, so there is no need to do this in the first place. cmr=v1.8.2:reviewer=rhc This commit was SVN r31773.	2014-05-15 10:31:19 +00:00
Ralph Castain	aaae4841e9	Flush the show_help system on our way out - this also restores the opal_show_help function pointer to the OPAL layer for any subsequent processing. cmr=v1.8.2:reviewer=jsquyres This commit was SVN r31685.	2014-05-08 14:37:47 +00:00
Ralph Castain	5602156a1c	Use the correct abstraction layer name for the data dirs This commit was SVN r31684.	2014-05-08 14:32:24 +00:00
Ralph Castain	60c554e097	Ugh - protect that --display-devel print with some NULL checks This commit was SVN r31604.	2014-05-02 14:28:45 +00:00
Ralph Castain	c7f55be387	Per a user request, add binding info to the simple --diplay-map option This commit was SVN r31603.	2014-05-02 14:25:59 +00:00
Ralph Castain	c4c9bc1573	As per the RFC: http://www.open-mpi.org/community/lists/devel/2014/04/14496.php Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM This commit was SVN r31557.	2014-04-29 21:49:23 +00:00
Ralph Castain	50c30d62ca	Repair builds without hwloc cmr=v1.7.5:reviewer=jsquyres This commit was SVN r30940.	2014-03-05 02:48:15 +00:00
Ralph Castain	0ac97761cc	Now that we are binding by default, the issue of #slots and what to do when oversubscribed has become a bit more complicated. This isn't a problem in managed environments as we are always provided an accurate assignment for the #slots, or when -host is used to define the allocation since we automatically assume one slot for every time a node is named. The problem arises when a hostfile is used, and the user provides host names without specifying the slots= paramater. In these cases, we assign slots=1, but automatically allow oversubscription since that number isn't confirmed. We then provide a separate parameter by which the user can direct that we assign the number of slots based on the sensed hardware - e.g., by telling us to set the #slots equal to the #cores on each node. However, this has been set to "off" by default. In order to make this a little less complex for the user, set the default such that we automatically set #slots equal to #cores (or #hwt's if use_hwthreads_as_cpus has been set) only for those cases where the user provides names in a hostfile but does not provide slot information. Also cleanup some a couple of issues in the mapping/binding system: * ensure we only override the binding directive if we are oversubscribed and overload is not allowed * ensure that the MPI procs don't attempt to bind themselves if they are launched by an orted as any binding directive (no matter what it was) would have been serviced by the orted on launch * minor cleanup to the warning message when oversubscribed and binding was requested cmr=v1.7.5:reviewer=rhc:subject=update mapping/binding system This commit was SVN r30909.	2014-03-03 16:46:37 +00:00

1 2 3 4 5 ...

703 Коммитов