openmpi

Автор	SHA1	Сообщение	Дата
Artem Polyakov	71da0fcbef	plm/rsh: Propagate PMIx prefix to orted's Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-08-02 08:06:13 +03:00
Ralph Castain	f39ce67982	Merge pull request #3951 from rhc54/topic/hwloc2 Update to hwloc 2.0.0a	2017-08-01 15:18:31 -06:00
Ralph Castain	e94786f4b7	Revert "Check for OPAL_PREFIX and set corresponding PMIX_PREFIX if found" This reverts commit `3744967adb`. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-01 08:14:12 -06:00
Ralph Castain	3744967adb	Check for OPAL_PREFIX and set corresponding PMIX_PREFIX if found Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-31 09:14:01 -06:00
Boris Karasev	e20b581529	pmix: fixed immediate request This commit fixes a hang when using external PMIx v1 module Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2017-07-28 15:53:48 +06:00
Ralph Castain	7a83fdb9bb	Update to hwloc 2.0.0a with shmem support. Update to support passing of HWLOC shmem topology to client procs Update use of distance API per @bgoglin Have the openib component lookup its object in the distance matrix Bring usnic up-to-date Restore binding for hwloc2 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-25 20:26:22 -07:00
Ralph Castain	0042c758f1	Update the tools support so it allows tools to access PMIx Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-25 17:10:08 -07:00
Ralph Castain	b225366012	Bring the ofi/rml component online by completing the wireup protocol for the daemons. Cleanup the current confusion over how connection info gets created and passed to make it all flow thru the opal/pmix "put/get" operations. Update the PMIx code to latest master to pickup some required behaviors. Remove the no-longer-required get_contact_info and set_contact_info from the RML layer. Add an MCA param to allow the ofi/rml component to route messages if desired. This is mainly for experimentation at this point as we aren't sure if routing wi ll be beneficial at large scales. Leave it "off" by default. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-20 21:01:57 -07:00
Gilles Gouaillardet	60aa9cfcb6	hwloc: add support for hwloc v2 API Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-20 17:39:44 +09:00
Gilles Gouaillardet	9f29f3bff4	hwloc: since WHOLE_SYSTEM is no more used, remove useless checks related to offline and disallowed elements Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-20 17:39:21 +09:00
Gilles Gouaillardet	1a34224948	hwloc: do not set the HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM flag Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-20 17:39:16 +09:00
Ralph Castain	fca68b070b	Merge pull request #3934 from rhc54/topic/singleton Fix the isolated pmix component. Cleanup the ess/singleton component …	2017-07-19 16:02:37 -05:00
Ralph Castain	543c16b28d	Fix the isolated pmix component. Cleanup the ess/singleton component - we shouldn't be automatically discovering the local topology as that is now done on-demand. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-19 12:14:29 -07:00
Geoffrey Paulsen	71333a4b14	Transitioning ownership of rmaps/seq and rmaps/rank_file from Intel to IBM.	2017-07-18 21:31:01 -04:00
Gilles Gouaillardet	da34e2f109	ess/base: silence a warning by fixing a static initializer Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-19 09:30:53 +09:00
Ralph Castain	8a98aab6cc	Fix signal forwarding on ORTE daemons so that _all_ daemons do it, regardless of environment. Add missing support for SIGTSTP and a few others. Thanks to Eugene Dedits for reporting the problem. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-18 09:58:55 -07:00
Jeff Squyres	ccf17808b6	Merge pull request #3258 from markalle/pr/symbol_name_pollution symbol name pollution	2017-07-12 16:19:25 -05:00
Artem Polyakov	832f1b03a4	Merge pull request #3790 from artpol84/orte/iof_sbatch orte/iof: Address the case when output is a regular file	2017-07-12 09:38:01 -05:00
Gilles Gouaillardet	626e94b689	oob/tcp: make mca_oob_tcp_msg_type_t an uint8_t so no conversion is required when heterogeneous mode is enabled Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-12 10:27:45 +09:00
Mark Allen	552216f9ba	scripted symbol name change (ompi_ prefix) Passed the below set of symbols into a script that added ompi_ to them all. Note that if processing a symbol named "foo" the script turns foo into ompi_foo but doesn't turn foobar into ompi_foobar But beyond that the script is blind to C syntax, so it hits strings and comments etc as well as vars/functions. coll_base_comm_get_reqs comm_allgather_pml comm_allreduce_pml comm_bcast_pml fcoll_base_coll_allgather_array fcoll_base_coll_allgatherv_array fcoll_base_coll_bcast_array fcoll_base_coll_gather_array fcoll_base_coll_gatherv_array fcoll_base_coll_scatterv_array fcoll_base_sort_iovec mpit_big_lock mpit_init_count mpit_lock mpit_unlock netpatterns_base_err netpatterns_base_verbose netpatterns_cleanup_narray_knomial_tree netpatterns_cleanup_recursive_doubling_tree_node netpatterns_cleanup_recursive_knomial_allgather_tree_node netpatterns_cleanup_recursive_knomial_tree_node netpatterns_init netpatterns_register_mca_params netpatterns_setup_multinomial_tree netpatterns_setup_narray_knomial_tree netpatterns_setup_narray_tree netpatterns_setup_narray_tree_contigous_ranks netpatterns_setup_recursive_doubling_n_tree_node netpatterns_setup_recursive_doubling_tree_node netpatterns_setup_recursive_knomial_allgather_tree_node netpatterns_setup_recursive_knomial_tree_node pml_v_output_close pml_v_output_open intercept_extra_state_t odls_base_default_wait_local_proc _event_debug_mode_on _evthread_cond_fns _evthread_id_fn _evthread_lock_debugging_enabled _evthread_lock_fns cmd_line_option_t cmd_line_param_t crs_base_self_checkpoint_fn crs_base_self_continue_fn crs_base_self_restart_fn event_enable_debug_output event_global_current_base_ event_module_include eventops sync_wait_mt trigger_user_inc_callback var_type_names var_type_sizes Signed-off-by: Mark Allen <markalle@us.ibm.com>	2017-07-11 02:13:23 -04:00
Mark Allen	efc25168cd	symbol name pollution: making some vars static As part of addressing symbol name pollution, I'm switching a few vars/functions to static. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2017-07-11 02:13:22 -04:00
Gilles Gouaillardet	823382f5d7	plm/base: do not abort when configure'd with --enable-heterogeneous and a mix of BE/LE is detected Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-07 10:43:54 +09:00
Ralph Castain	2a580fa71e	Merge pull request #3801 from rhc54/topic/hetero Detect that we have a mix of BE/LE in the system	2017-07-06 15:29:06 -07:00
Ralph Castain	8979bfe71e	Silence Coverity warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-06 06:07:28 -07:00
anandhi	793ebc272e	When opening conduit, checking for the transport preference in below order - (1) rml_ofi_transports mca parameter. This parameter should have the list of transports (currently ethernet,fabric are valid) fabric is higher priority if provided. (2) ORTE_RML_TRANSPORT_TYPE key with values "ethernet" or "fabric". "fabric" is higher priority. If specific provider is required use ORTE_RML_OFI_PROV_NAME key with values "socket" or "OPA" or any other supported in system. modified: ../orte/mca/rml/ofi/rml_ofi.h modified: ../orte/mca/rml/ofi/rml_ofi_component.c modified: ../orte/mca/rml/ofi/rml_ofi_send.c On send_msg choose the provider on local and peer to follow below rules - 1. if the user specified the transport for this conduit (even giving us a prioritized list of candidates), then the one we selected is the _only_ one we will use. If the remote peer has a matching endpoint, then we use it - otherwise, we error out 2. if the user didn't specify a transport, then we look for matches against _all_ of our available transports, starting with fabric and then going to Ethernet, taking the first one that matches. 3. if we can't find any match, then we error out modified: ../orte/mca/rml/ofi/rml_ofi.h modified: ../orte/mca/rml/ofi/rml_ofi_component.c modified: ../orte/mca/rml/ofi/rml_ofi_send.c send_msg() -> Fixed case when the local provider chosen at time of opening conduit is not present in peer (destination) node modified: ../orte/mca/rml/ofi/rml_ofi.h modified: ../orte/mca/rml/ofi/rml_ofi_send.c When opening conduit, checking for the transport preference in below order - (1) rml_ofi_transports mca parameter. This parameter should have the list of transports (currently ethernet,fabric are valid) fabric is higher priority if provided. (2) ORTE_RML_TRANSPORT_TYPE key with values "ethernet" or "fabric". "fabric" is higher priority. If specific provider is required use ORTE_RML_OFI_PROV_NAME key with values "socket" or "OPA" or any other supported in system. modified: ../orte/mca/rml/ofi/rml_ofi.h modified: ../orte/mca/rml/ofi/rml_ofi_component.c modified: ../orte/mca/rml/ofi/rml_ofi_send.c On send_msg choose the provider on local and peer to follow below rules - 1. if the user specified the transport for this conduit (even giving us a prioritized list of candidates), then the one we selected is the _only_ one we will use. If the remote peer has a matching endpoint, then we use it - otherwise, we error out 2. if the user didn't specify a transport, then we look for matches against _all_ of our available transports, starting with fabric and then going to Ethernet, taking the first one that matches. 3. if we can't find any match, then we error out modified: ../orte/mca/rml/ofi/rml_ofi.h modified: ../orte/mca/rml/ofi/rml_ofi_component.c modified: ../orte/mca/rml/ofi/rml_ofi_send.c send_msg() -> Fixed case when the local provider chosen at time of opening conduit is not present in peer (destination) node modified: ../orte/mca/rml/ofi/rml_ofi.h modified: ../orte/mca/rml/ofi/rml_ofi_send.c Signed-off-by: Anandhi Jayakumar <anandhi.s.jayakumar@intel.com>	2017-07-05 15:40:14 -07:00
Ralph Castain	2753f53e6d	Detect that we have a mix of BE/LE in the system, provide a warning that OMPI doesn't currently support this environment, and error out Fixes #2817 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-03 15:47:05 -07:00
Artem Polyakov	374c824a5c	orte/iof: Generalize the fix related to always-ready fds Reference: https://bugzilla.kernel.org/show_bug.cgi?id=15272. Work with both stdin/stdout fds that are known to be always ready using libevent timers. Such fds can not be effectively used with non-blocking I/O functions like epoll, poll, select: - for poll/select the event will be triggered immediately; - for epoll `epoll_ctl` will reject an attempt to add this fd to the working set. Reference: http://www.wangafu.net/~nickm/libevent-book/Ref4_event.html Libevent suggests to use timers over event_active for the reasons provided by the link above. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-07-01 02:24:14 +07:00
Artem Polyakov	d9ad918a14	orte/iof: Address the case when output is a regular file Regular files are always write-ready, so non-blocking I/O does not give any benefits for them. More than that - if libevent is using "epoll" to track fd events, epoll_ctl will refuse attempt to add an fd pointing to a regular file descriptor with EPERM. This fix checks the object referenced by fd and avoids event_add using event_active instead. In the original configuration that uncovered this issue "epoll" was used in libevent, it was triggering the following warning message: "[warn] Epoll ADD(1) on fd 0 failed. Old events were 0; read change was 1 (add); write change was 0 (none): Operation not permitted" And the side effect was accumulation of all output in mpirun memory and actually writing it only at mpirun exit. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-07-01 02:24:14 +07:00
Ralph Castain	7cbea77238	Merge pull request #3778 from rhc54/topic/warn Attempt to detect when we are direct-launched without the necessary P…	2017-06-29 16:53:12 -07:00
Ralph Castain	85f8eb4c6b	Stop all progress threads prior to releasing the peer objects to avoid a race condition whereby a lost connection could be reported after a peer object was freed and before the threads were stopped. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-29 15:48:18 -07:00
Ralph Castain	bd4a6fee22	Attempt to detect when we are direct-launched without the necessary PMI support, and thus are incorrectly identified as being "singleton". Advise the user on the required PMI(x) support and error out. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-29 15:26:53 -07:00
Ralph Castain	9178219e6b	Deregister event handlers only on final call to finalize. Ensure we pass PMIx mca params Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-28 15:00:43 -07:00
Ralph Castain	c6c0258cd8	Need to signal -pgrp to get to all members of a process group. Thanks to Ted Sussman for the report and patience in tracking it down Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-27 12:10:34 -07:00
Ralph Castain	8a4565874e	Enable ORTE to continue running when a node fails - user takes responsibility for zombies. Minor cleanup to orte-clean Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-27 09:05:26 -07:00
Ralph Castain	6e2778ad3b	Silence coverity warnings, correctly transfer the endpoint blob bytes Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-26 08:32:06 -07:00
Ralph Castain	9dad3f7cbf	Add the modex code to combine all info from local providers into a single modex send, and then retrieve them on recv Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-25 07:24:29 -07:00
Ralph Castain	f4411c4393	Enable use of OFI fabrics for launch and other collective operations. Update the PMIx repo to the latest master to get the required support for the server to "push" modex info, and to retrieve all its own "modex" values for sending back to mpirun. Have mpirun cache them in its local modex hash as OFI goes point-to-point direct and doesn't route - so the remote daemons don't need a copy of this connection info. Remove the opal_ignore from the RML/OFI component, but disable that component unless the user specifically requests it via the "rml_ofi_desired=1" MCA param. This will let us test compile in various environments without interfering with operations while we continue to debug Fix an error when computing the number of infos during server init Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-23 19:57:21 -07:00
Ralph Castain	38636f4f0a	Ensure we properly cleanup on termination, including when terminating due to ctrl-c Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-21 06:33:37 -07:00
Ralph Castain	501ba8faad	Merge pull request #3704 from rhc54/topic/signal Control distribution of signals to children vs grandchildren	2017-06-20 11:11:43 -07:00
Ralph Castain	952726c121	Update to latest PMIx master - equivalent to 2.0rc2. Update the thread support in the opal/pmix framework to protect the framework-level structures. This now passes the loop test, and so we believe it resolves the random hangs in finalize. Changes in PMIx master that are included here: * Fixed a bug in the PMIx_Get logic * Fixed self-notification procedure * Made pmix_output functions thread safe * Fixed a number of thread safety issues * Updated configury to use 'uname -n' when hostname is unavailable Work on cleaning up the event handler thread safety problem Rarely used functions, but protect them anyway Fix the last part of the intercomm problem Ensure we don't cover any PMIx calls with the framework-level lock. Protect against NULL argv comm_spawn Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-20 09:02:15 -07:00
Ralph Castain	206aec6083	By default, apply signals to all direct children _and_ any children they might have spawned (so long as they remain in the same process group). Provide an MCA param (odls_base_signal_direct_children_only) to indicate that the signal is to go _only_ to our direct children, and not be delivered to any children spawned by those procs. Refs https://www.mail-archive.com/users@lists.open-mpi.org/msg31221.html Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-15 12:26:11 -07:00
Ralph Castain	8f09929469	Fix rank-file mapper launch by correctly setting up the remote map from the provided data Put a simple protection for the case where procs fail while we are trying to deregister handlers Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-15 08:33:29 -07:00
Ralph Castain	8afa1433b8	Only set the "bound" flag if we wre actually bound Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-14 13:22:01 -07:00
Ralph Castain	1f0f03b45b	Print a better error message when srun isn't found in the path. Ensure we don't segfault if -host specifies a node not included in the allocation Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-09 07:46:47 -07:00
Ralph Castain	7b39f19f60	Fix the backend mapper algorithm for comm_spawn. The front and back ends need to get the nodes into the job map in the same order so that the ranking algorithms will reach the same results Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-08 08:00:52 -07:00
Ralph Castain	81ab79f311	Ensure the orted doesn't go into an infinite loop during force-terminate Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-07 21:44:49 -07:00
Ralph Castain	919d7fcf49	We cannot use OFI to determine when daemons can finalize as we don't see the "sockets" go away. So always use the OOB for the mgmt conduit - this provides the necessary termination signal AND ensures that IOF and other mgmt messages go solely across TCP. Cleanup the way we look for matching OFI addresses by using the opal_net_samenetwork helper function. This now works for multi-network environments, but only using the socket provider Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-07 13:51:30 -07:00
Ralph Castain	bd1793ad17	Get the pmix/ext2x component to work. Fix a minor problem in the libevent external component. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-06 20:06:28 -07:00
Ralph Castain	93cf3c7203	Update OPAL and ORTE for thread safety (I swear, if I look this over one more time, I'll puke) Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-06 12:30:57 -07:00
Ralph Castain	a28eaf914a	Silence warnings when terminating Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-05 13:53:07 -07:00
Ralph Castain	8f526968c2	Do not hang if we cannot relay messages. Eliminate extra error log message Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-05 06:35:19 -07:00
Ralph Castain	51b4078b70	Merge pull request #3648 from rhc54/topic/ofi Clean up the conduit open code so we return detectable errors when co…	2017-06-02 18:08:55 -07:00
Ralph Castain	e884cbf5f5	Even though the ofi component doesn't do any routing itself, the rest of the code base (e.g., grpcomm) needs to know what routing module this component is using. So set it to the "direct" module, and don't allow ofi to be used if that module isn't available. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-02 15:47:25 -07:00
Ralph Castain	ba9a6078c2	Add ability to select transport, and only compare the first one in the conduit list for a match. This lets you select which conduit to use for OFI - if you set "-mca rml_ofi_transports ethernet" you'll pickup the mgmt conduit. If you set "-mca rml_ofi_transports fabric", you'll get the coll conduit Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-02 14:31:23 -07:00
Jeff Squyres	af9565ec25	ess: add missing <signal.h> header Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-06-02 14:11:40 -07:00
Ralph Castain	066d5eedce	Shift the signal forwarding code to ess/base so it can be available to more than just the hnp component. Extend the slurm component to use it so that any signals given directly to the daemons by their slurmstepd get forwarded to their local clients Check for NULL Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-02 10:59:14 -07:00
Ralph Castain	6b3bbd30c5	Clean up the conduit open code so we return detectable errors when conduit not opened. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-02 10:40:51 -07:00
Ralph Castain	2ab4f93f6a	Instead of "forced_terminate" just quietly causing the daemon to disappear, let's at least attempt to let the user know where the problem occurred. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-02 08:28:16 -07:00
anandhi	6ddb487744	Cleaned up the send_msg(), moved checking for send to self into the send_nb() and send_buffer_nb() modified: orte/mca/rml/ofi/rml_ofi_send.c Signed-off-by: Anandhi Jayakumar <anandhi.s.jayakumar@intel.com>	2017-06-01 17:50:54 -07:00
Ralph Castain	9d6b929894	Fix uninitialized variable. Set exit codes for failed launch so we get pretty error messages Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-31 07:38:37 -07:00
Ralph Castain	26e7515a5e	Don't sweat the "sync" settings on file descriptors as those flags aren't apparently fully portable Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-30 20:37:26 -07:00
Ralph Castain	5d990b557c	Reorg ordering so that bare executable names also are found Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-30 15:58:55 -07:00
Ralph Castain	321abfc8c6	Fix cwd and preload-binary options Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-30 14:07:22 -07:00
Ralph Castain	ad108ba44d	Fix the DVM Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-30 11:42:42 -07:00
Ralph Castain	9a8811a246	Ensure that data from a job that was stored in ompi-server is purged once that job completes. Cleanup a few typos. Silence a Coverity warning Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-30 09:43:01 -07:00
Ralph Castain	f3ab326b4a	Add some debug code for detecting leaking file descriptors. At the end of each job (and if MCA param is set), have each daemon compute the number of open fds and their characteristics and print a summary Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-29 11:25:20 -07:00
Ralph Castain	87201a80ff	Silence coverity warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-27 11:45:53 -07:00
Ralph Castain	8c2a06477c	Fix ompi-server operations Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-26 08:57:55 -07:00
Ralph Castain	657e701c65	Add debug verbosity to the orte data server and pmix pub/lookup functions Start updating the various mappers to the new procedure. Remove the stale lama component as it is now very out-of-date. Bring round_robin and PPR online, and modify the mindist component (but cannot test/debug it). Remove unneeded test Fix memory corruption by re-initializing variable to NULL in loop Resolve the race condition identified by @ggouaillardet by resetting the mapped flag within the same event where it was set. There is no need to retain the flag beyond that point as it isn't used again. Add a new job attribute ORTE_JOB_FULLY_DESCRIBED to indicate that all the job information (including locations and binding) is included in the launch message. Thus, the backend daemons do not need to do any map computation for the job. Use this for the seq, rankfile, and mindist mappers until someone decides to update them. Note that this will maintain functionality, but means that users of those three mappers will see large launch messages and less performant scaling than those using the other mappers. Have the mindist module add procs to the job's proc array as it is a fully described module Protect the hnp-not-in-allocation case Per path suggested by Gilles - protect the HNP node when it gets added in the absence of any other allocation or hostfile Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-25 18:41:27 -07:00
Gilles Gouaillardet	22ab73cb1a	Merge pull request #3471 from ggouaillardet/topic/execve_cmd odls: fix handling of the orte fork agent	2017-05-15 15:07:39 +09:00
Ralph Castain	b527c40dae	Remove debug Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-12 12:41:36 -07:00
Ralph Castain	23af6c9d02	Merge pull request #3519 from rhc54/topic/nolocal Fix --nolocal	2017-05-12 09:57:52 -07:00
Ralph Castain	45bbd598c1	Fix --nolocal Fix the --nolocal option by ensuring we always check/remove the HNP from the list of available nodes if the flag is set Ensure that the HNP node is included as available when nothing else is given Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-12 09:03:26 -07:00
Ralph Castain	29e083bffd	Fix total_slots_allocated computation On unmanaged allocations, we need to update the total_slots_allocated once the daemons have been launched and "discovered" their topology Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-12 08:21:52 -07:00
Ralph Castain	9164afbb08	When a daemon force-terminates, we don't get the show_help message it was trying to send because the message is at a lower priority than the termination event. Resolve this by putting the oob in its own progress thread. Also, use only that one thread by default - if someone needs more progress threads in the OOB, they can use the MCA param to get them. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-11 06:52:55 -07:00
Ralph Castain	911961ee21	Sigh - remove debug Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-10 11:26:42 -07:00
Ralph Castain	2d93d15aa7	Merge pull request #3502 from rhc54/topic/cisco Fix nidmap computation to deal with hetero nodes	2017-05-10 11:21:12 -07:00
Ralph Castain	50646b07ce	Update the RML OFI by copying the updated files from @anandhis branch Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-10 09:17:06 -07:00
Ralph Castain	442e307a6e	Fix the nidmap computation to deal with hetero nodes Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-10 08:43:28 -07:00
Ralph Castain	ef0e0171c9	Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c). Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-08 10:04:50 -07:00
Gilles Gouaillardet	16fc0996e6	odls: fix handling of the orte fork agent Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-05-08 16:07:13 +09:00
Ralph Castain	180809f2ef	Do not pass topologies during tree spawn of daemons as there is no way the HNP can know the backend topologies at that point. Any needed topologies will be sent along with the launch_apps command Do not pass param file MCA params if the user has requested that no param files be read - required when trying to avoid launch time penalties from large numbers of processes reading default param files. The daemon picks them up and passes them along anyway, so it isn't clear what value we gain from having them all read the defaults Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-07 21:14:43 -07:00
Ralph Castain	a143800bce	Enable full operations under SLURM on Cray systems by co-locating a daemon with mpirun when mpirun is executing on a compute node in that environment. This allows local application procs to inherit their security credential from the daemon as it will have been launched via SLURM Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-06 19:08:50 -07:00
Ralph Castain	3a434d75d6	By default, use the system default snd/recv buffer sizes Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-05 09:58:05 -07:00
Gilles Gouaillardet	57b4144e57	orte: use compression for ORTE_DAEMON_REPORT_TOPOLOGY_CMD answer Refs open-mpi/ompi#3414 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-27 17:21:59 +09:00
Gilles Gouaillardet	49cd40b2df	compress the topology sent by the first orted Refs open-mpi/ompi#3414 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-27 16:20:11 +09:00
Gilles Gouaillardet	c38ef3d46f	oob/tcp: fix short writev handling in send_msg() Fixes open-mpi/ompi#3414 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-27 10:24:38 +09:00
Howard Pritchard	462342d148	Merge pull request #3311 from hppritcha/topic/libfabric_moves_to_ofi common/libfabric: move libfabric to ofi	2017-04-21 07:50:38 -06:00
Howard Pritchard	841192645b	common/libfabric: move libfabric to ofi This PR renames the common library for OFI libfabric from libfabric to ofi. There are a number of reasons this is good to do: 1) its shorter and replaces 9 characters with three for function names for what may eventually be a fairly extensive interface 2) OFI is the term used for MTL and RML components that use the OFI libfabric interface 3) A planned OSC component will also use the OFI term. 4) Other HPC libraries that can use OFI libfabric tend to use the term "ofi" internally and also in their configure options relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM) There seem to be comments in places in the Open MPI source code that indicate that this common library will be going away. Far from it as we will want to be able to share things like AV objects between OMPI and possibly OSHMEM components that use the OFI libfabric interface. This PR also adds a synonym to the --with-libfabric(-libdir) configury options: --with-ofi and with-ofi-libdir. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-04-20 13:07:16 -06:00
Nathaniel Graham	34b4aeb17f	Merge pull request #3339 from nrgraham23/mpirun_help_improvements Additional mpirun --help changes	2017-04-19 14:05:07 -06:00
Nathaniel Graham	01312b2f90	Additional mpirun --help changes This commit recategorizes several mpirun arguments, and moves the information for mpirun --help arguments to the bottom of the general help message. I also added the OPAL_CMD_LINE_OTYPE field to two commands that were missed initially because they were not in the same area as the others. Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>	2017-04-19 11:43:45 -06:00
Howard Pritchard	3918b7a796	Merge pull request #3213 from hppritcha/topic/remove_loadleveer orte/ras: remove loadleveler support	2017-04-18 09:18:54 -06:00
Ralph Castain	bb1aaa3286	Use the node index to compare to daemon vpid when identifying procs to bind Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-14 02:37:25 -07:00
Ralph Castain	67156556ce	On behalf of Josh, ensure we flag that the child is no longer alive since we are killing it with SIGKILL Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-13 21:07:26 -07:00
Ralph Castain	1585854335	Minor coverity cleanups Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-12 19:31:35 -07:00
Ralph Castain	0500cc1c66	Update the debugger launch code to reflect the new backend mapping method. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-12 13:31:18 -07:00
Ralph Castain	539f71d0cc	Merge pull request #3310 from marksantcroos/fix/alps_wdir Bring ALPS ODLS up to par regarding wdir.	2017-04-11 17:30:04 -07:00
Mark Santcroos	27fa8aabd6	Hardcode basename to "orted" for error reporting. Signed-off-by: Mark Santcroos <mark.santcroos@rutgers.edu>	2017-04-11 18:59:23 -04:00
Mark Santcroos	af3a6e1a29	Verify that the chdir(2) succeeds. Signed-off-by: Mark Santcroos <mark.santcroos@rutgers.edu>	2017-04-12 00:37:37 +02:00
Ralph Castain	97e38e6d84	Move a free to a little later in case the verbose output needs it Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-11 11:21:12 -07:00
Mark Santcroos	36ac54b5d8	Bring ALPS ODLS up to par regarding wdir. Signed-off-by: Mark Santcroos <mark.santcroos@rutgers.edu>	2017-04-10 08:15:07 -04:00
Ralph Castain	95ae0d1df3	Cleanup timing macros for portability across compilers. Rename the --enable-timing configure option to be --enable-pmix-timing so it doesn't pickup external timing requests. Remove a stale function reference in PMIx so it can compile with timing enabled. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-10 12:56:38 +06:00
Artem Polyakov	482d7c9322	opal/timing: remove RML timings Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-04-07 21:16:21 +06:00
Artem Polyakov	79100de014	opal/timing: Remove oob tracing Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-04-07 21:16:21 +06:00
Ralph Castain	b33b4607df	Correctly identify the source of the event when notifying of abnormal termination by a process Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-06 20:50:38 -07:00
Ralph Castain	a29ca2bb0d	Enable slurm operations on Cray with constraints Cleanup some errors in the nidmap code that caused us to send unnecessary topologies Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-06 08:58:06 -07:00
Ralph Castain	40ca43e157	Set the PARENT vpid for direct routed module Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-04 19:03:28 -07:00
Ralph Castain	9cb18b8348	Merge pull request #3280 from rhc54/topic/dvm Fix the DVM by ensuring that all nodes, even those that didn't partic…	2017-04-04 18:15:33 -07:00
Ralph Castain	74863a0ea4	Fix the DVM by ensuring that all nodes, even those that didn't participate (i.e., didn't have any local children) in a job, clean up all resources associated with that job upon its completion. With the advent of backend distributed mapping, nodes that weren't part of the job would still allocate resources on other nodes - and then start from that point when mapping the next job. This change ensures that all daemons start from the same point each time. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-04 17:31:38 -07:00
Nathaniel Graham	7063f3021f	Merge pull request #3231 from nrgraham23/revamp_mpirun_help mpirun --help output revamp	2017-04-04 12:32:20 -06:00
Nathaniel Graham	19e5d15491	mpirun --help output revamp This commit modifies the output from the mpirun --help command. The options have been split into groups, to make the output smaller and more readable. The groups are: general, debug, output, input, mapping, ranking, binding, devel, compatibility, launch, dvm, and unsupported. There is also a special "full" command that can be used to get the old behaviour of printing out all of the options. Unsupported options may only be seen with this full output. This commit also adds a special case for the help argument. It makes it possible for the user to enter 0 or 1 arguments instead of having to always enter an argument. This defaults to printing out the "general" help options so the user can then see what help arguments there are. Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>	2017-04-04 10:59:32 -06:00
Ralph Castain	393c4536eb	Remove stale code line Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-04 08:13:15 -07:00
Ralph Castain	92c996487c	Update how we pass the node regex so we pass _all_ nodes, even those without daemons. This allows the backend daemons to form a complete picture of the allocation. Include info on which nodes have daemons on them, and populate that info on the backend as well. Set the daemons' state to "running" and mark them as "alive" by default when constructing the nidmap Get the DVM running again Fix direct modex by eliminating race condition caused by releasing data while sending it Up the size limit before compressing Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-03 19:25:15 -07:00
Ralph Castain	583dbe954c	Silence coverity dead-code warning Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-26 20:36:43 -07:00
Ralph Castain	ecc8000136	Silence a flood of warnings when compiling with gcc on Cray Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-24 13:37:11 -06:00
Ralph Castain	35f817911e	Fix coverity issues Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-24 08:09:46 -07:00
Ralph Castain	ea84a53faa	Merge pull request #3218 from rhc54/topic/pmix2 Update to include the PMIx 2.0 APIs for monitoring and job control.	2017-03-21 20:11:10 -07:00
Ralph Castain	d645557fa0	Update to include the PMIx 2.0 APIs for monitoring and job control. Include required integration, but leave the monitors off for now. Move the sensor framework out of ORTE as it is being absorbed into PMIx Fix typo and silence warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-21 17:47:08 -07:00
Ralph Castain	10d401b6ec	Merge pull request #3217 from rhc54/topic/wdirs Resolve a race condition for setting our working directory when fork/exec'ing application procs.	2017-03-21 17:39:54 -07:00
Ralph Castain	74fd2c30af	Cleanup alps odls module Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-21 17:41:11 -06:00
Ralph Castain	f8e1e3bed3	Ensure we properly exit with error if we cannot map the job Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-21 15:15:32 -07:00
Ralph Castain	75684dc260	Resolve a race condition for setting our working directory when fork/exec'ing application procs. We have to ensure we do it after the fork occurs since we want to use multiple threads in the odls. Otherwise, the different threads are bouncing the entire process around. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-21 13:54:03 -07:00
Howard Pritchard	9350aa5d71	orte/ras: remove loadleveler support Remove loadleveler as it is obsolescent and is no longer supported. Fixes #3167 We'll wait for final check of whether or not loadleveler even compiles/functions before merging this. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-03-21 10:32:28 -06:00
Ralph Castain	dc85e7fde7	Provide a little more help on the error messages when an executable isn't found so we have some better idea where we were looking for it. Don't double-report such errors. Ensure the ORTE_ERROR_NAME doesn't get a NULL back for the string name of an error code as that might cause some systems to segfault Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-17 09:54:37 -07:00
Howard Pritchard	1709febdea	Merge pull request #3166 from hppritcha/topic/swat_state_orted_comp_warning ORTED: swat another compiler warning	2017-03-15 08:40:59 -06:00
Ralph Castain	96d7d10c1d	Merge pull request #3170 from rhc54/topic/reg Ensure the backend daemons know if we are in a managed allocation and if the HNP was included in the allocation	2017-03-14 12:48:09 -07:00
Ralph Castain	61a71e25ef	Ensure the backend daemons know if we are in a managed allocation and if the HNP was included in the allocation Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-14 10:06:43 -07:00
Howard Pritchard	5daaf7f3fd	ORTED: swat another compiler warning Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-03-14 08:41:51 -06:00
Ralph Castain	52c9e631de	Silence Coverity warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-14 07:30:42 -07:00
Ralph Castain	b1a01d77ae	Update the TM module to support regex passing Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-13 21:50:40 -07:00
Ralph Castain	bb574a41df	Update launchers to get correct regex Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-13 11:21:44 -07:00
Ralph Castain	105fb152e1	Silence Coverity warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-13 08:38:51 -07:00
Ralph Castain	b9f5cab710	Add a minor debug statement Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-12 18:15:44 -07:00
Gilles Gouaillardet	23d44a5284	sensor/base: initialize orte_sensor_base global variable Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-03-13 09:39:43 +09:00
Ralph Castain	6d6bc9bd07	Update alps module to new APIs Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-12 09:43:07 -07:00
Ralph Castain	70591bf4dc	Enable parallel fork/exec of local procs by providing the option of multiple odls progress threads Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-11 20:48:04 -08:00
Ralph Castain	ab50665222	Restore sensor framework Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-11 17:46:32 -08:00
Ralph Castain	c6bc3ccb76	Sync to latest PMIx master and PMIx reference server Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-11 12:50:38 -08:00
Howard Pritchard	f8183f71f7	rmaps/base: swat compiler warning gcc was complaining about variables possibly used uninitialized Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-03-09 14:30:06 -06:00
Ralph Castain	48fc339718	Create an alternative mapping method that pushes responsibility onto the backend daemons. By default, let mpirun only pack the app_context info and send that to the backend daemons where the mapping will be done. This significantly reduces the computational time on mpirun as it isn't running up/down the topology tree computing thousands of binding locations, and it reduces the launch message to a very small number of bytes. When running -novm, fall back to the old way of doing things where mpirun computes the entire map and binding, and then sends the full info to the backend daemon. Add a new cmd line option/mca param --fwd-mpirun-port that allows mpirun to dynamically select a port, but then passes that back to all the other daemons so they will use that port as a static port for their own wireup. In this mode, we no longer "phone home" directly to mpirun, but instead use the static port to wireup at daemon start. We then use the routing tree to rollup the initial launch report, and limit the number of open sockets on mpirun's node. Update ras simulator to track the new nidmap code Cleanup some bugs in the nidmap regex code, and enhance the error message for not enough slots to include the host on which the problem is found. Update gadget platform file Initialize the range count when starting a new range Fix the no-np case in managed allocation Ensure DVM node usage gets cleaned up after each job Update scaling.pl script to use --fwd-mpirun-port. Pre-connect the daemon to its parent during launch while we are otherwise waiting for the daemon's children to send their "phone home" rollup messages Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-07 20:43:12 -08:00
Ralph Castain	83199979ba	Remove the stale opal/sec framework Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-02 15:41:56 -08:00
Ralph Castain	c757c3d260	Fix double-free in rml/ofi shutdown Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-01 11:53:46 -08:00
Jeff Squyres	fec519a793	hwloc: rename opal/mca/hwloc/hwloc.h -> hwloc-internal.h Per a prior commit, the presence of "hwloc.h" can cause ambiguity when using --with-hwloc=external (i.e., whether to include opal/mca/hwloc/hwloc.h or whether to include the system-installed hwloc.h). This commit: 1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h. 2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc. 3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the rest of the code base. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-02-28 07:48:42 -08:00
Ralph Castain	f054261590	Merge pull request #3027 from naughtont3/tjn-envvar-dvmuri dvm: Add envvar 'ORTE_HNP_DVM_URI' to schizo:ompi	2017-02-27 06:56:44 -08:00
Ralph Castain	efc3a98ea6	Merge pull request #3031 from rhc54/topic/ofi Add CPPFLAGS to build of rml/ofi component.	2017-02-25 11:23:03 -08:00
Ralph Castain	9f8f7f3189	Add CPPFLAGS to build of rml/ofi component. Fix finalize to ensure we only destruct the msg queue list once. Update platform file Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-25 09:17:41 -08:00
Thomas Naughton	006be92df5	dvm: Add envvar 'ORTE_HNP_DVM_URI' to schizo:ompi Add ability to pass DVM URI purely via environment to simplify invocation from command-line (e.g., start dvm, export URI, mpirun w/o needing to add `--hnp` arg). If user passes both envvar and cmdline, the cmdline wins. Signed-off-by: Thomas Naughton <naughtont@ornl.gov>	2017-02-24 16:55:32 -05:00
Thomas Naughton	beb5b250bf	orte dvm: debug fix for DVM early quit Ensure that job errors do not cause the DVM to fail unless the failed job is the DVM itself. Refs #2987, with improvements from Ralph Signed-off-by: Thomas Naughton <naughtont@ornl.gov> Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-23 10:17:53 -05:00
Ralph Castain	22c88f5ab5	Fix launch_id matching of -hosts Need to check the entire value instead of just the last N digits. Otherwise, "-host 15" will match nid0015, nid0115, and any other launch id ending in 15 It appears strtol can return either a NULL or a zero-length string, so check for both cases Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-20 07:03:53 -08:00
Ralph Castain	af7e2cc33b	Merge pull request #3004 from jjhursey/topic/oob-tcp-timeout oob/tcp: Adjust TCP keepalive default values	2017-02-19 14:28:01 -08:00
Ralph Castain	95bfc7b7c6	Merge pull request #2991 from jjhursey/fix/ibm/errmgr-help-msg orte/errmgr: Improve help message on connection lost	2017-02-17 11:34:18 -08:00
Joshua Hursey	df0f8e95cd	oob/tcp: Adjust TCP keepalive default values Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-17 11:02:25 -06:00
Howard Pritchard	b272f87926	Merge pull request #2968 from hjelmn/pmix_cray pmix/cray: performance improvements and cleanup	2017-02-16 11:41:59 -07:00
Ralph Castain	0ae873de5c	Fix a bug where we failed to compute #procs for nperXXX directives, thus resulting in an incorrect default binding Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-15 16:32:24 -08:00
Ralph Castain	223495325d	Fix binding policy bug and support pe=1 modifier Allow someone to specify the "pe=N" modifier to a mapping policy when N=1. This equates to just "bind-to core", but helps people who use a script to set the PE policy. Fix a bug where setting the binding policy left a lingering "if-supported" flag that shouldn't be there. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-15 14:55:17 -08:00
Joshua Hursey	c452f68495	orte/errmgr: Improve help message on connection lost Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-15 16:36:00 -05:00
Ralph Castain	68b53e2179	Fix comm_spawn by registering nspace info only when needed - either when we have local procs, or when job-level info is required by connecting jobs Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-14 19:47:56 -08:00
Josh Hursey	a17b547430	Merge pull request #2957 from jjhursey/topic/ibm/rsh-sigint-fix plm/rsh: Fix signal handling for rsh launcher	2017-02-14 15:29:00 -06:00
Nathan Hjelm	1df6bdd30e	schizo/alps: set orte_bound_at_launch when launched with aprun Set the orte_bound_at_launch MCA variable. This resolves a launch performance bug when using aprun to launch Open MPI processes. If this variable is not set it can take minutes longer to launch with high ppn. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-14 11:13:48 -07:00
Joshua Hursey	843fcca03c	plm/rsh: Fix signal handling for rsh launcher * Similar to the other launchers (i.e., slurm, alps) we need to put the children in a separate process group to prevent SIGINT (from a CTRL-C) from being delivered to the whole process group and prematurely killing the rsh/ssh connections to the remote daemons. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-14 08:54:17 -06:00
Ralph Castain	dee2d8646d	Fix plm/rsh runtime check Fix the check for rsh/ssh so we allow the check for SGE and LoadLeveler to occur if user doesn't specify their own launch agent. Fix a Coverity warning Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-13 16:54:03 -08:00
Nathan Hjelm	2c1980ae39	Merge pull request #2923 from hjelmn/oob_fix oob/tcp: cleanup peers before event bases	2017-02-06 09:34:10 -07:00
Nathan Hjelm	b928a6b9ea	ras/slurm: fix compile error due to missing header On some systems this component fails to build due to the missing netdb.h header. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-03 15:22:34 -07:00
Nathan Hjelm	1c4b735f5f	oob/tcp: cleanup peers before event bases This commit fixes an error in teardown where the event bases are town down before the peer structures are released. This causes us to call event_del on an invalid event base. At best this makes valgrind complain and at worst this causes aborts or segvs. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-03 15:18:41 -07:00
Ralph Castain	b661275dba	For performance, try to send the oob/tcp message a few times before dropping back into the event library Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-02 06:44:15 -08:00
Ralph Castain	50ca9fb66b	Merge pull request #2893 from rhc54/topic/sim Cleanup the ras simulator capability, and the relay route thru grpcomm	2017-02-01 16:17:40 -08:00
Ralph Castain	230d15f0d9	Cleanup the ras simulator capability, and the relay route thru grpcomm direct. Don't resend wireup info if nothing has changed Fix release of buffer Correct the unpacking order Fix the DVM - now minimized data transfer to it Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-01 15:01:58 -08:00
Ralph Castain	8bf3ac828c	Correct the path to the ORTE data dir - allows master to be built with --no-ompi Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-01 07:30:18 -07:00
Howard Pritchard	db4039f565	ess/alps: fix problem in makefile ./autogen.pl --no-ompi doesn't work without this fix when alps can be configured. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-01-31 21:56:16 -06:00
Ralph Castain	b59ae14a2a	Fix static port and partial allocation operations Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message. Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-28 10:09:44 -08:00
Ralph Castain	c803af5d3d	Minor change to allow qrsh to tree spawn, if supported Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-27 16:34:08 -08:00
Ralph Castain	7c795f4416	If the HNP is going to request topology info, it cannot do so via a routed OOB message as the intervening daemons may not be ready. So disable routing until the VM is ready, and have daemons start routing as they receive the xcast launch msg (which includes the data they need to talk to their peers). Do a little optimization and minimize recomputation of the routing plan. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-27 15:37:16 -08:00
Ralph Castain	d672fad849	Repair rsh/ssh tree spawn Repair rsh/ssh tree spawn by unpacking and updating the nidmap in remote_spawn. Add more specific error messages so the cause of a messaging problem is a little clearer. Remove some stale code. Ensure we stop trying to send a message after a few times. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-27 11:35:00 -08:00
Josh Hursey	2e64bf42fb	Merge pull request #2810 from jjhursey/fix/ibm/stdiag-to-stdout Extend options for stddiag routing	2017-01-26 14:29:16 -06:00
Nathan Hjelm	fe1c6bd881	Merge pull request #2840 from hjelmn/event_fix verbs: remove extra event user increment/decrement operation	2017-01-26 07:30:24 -08:00
Ralph Castain	399de0738e	Cleanup launch Given that we only set OOB contact info from inside of events, or before we begin threaded operations (e.g., in the ess), allow set_contact_info to directly update the oob/base framework globals. Correct the nidmap regex decompression routine. Ensure that rank=1 daemon always sends back its topology as this is the most common use-case. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-25 22:06:09 -08:00
Nathan Hjelm	9f28c0af39	verbs: remove extra event user increment/decrement operation Since the oob and connections systems do not work the same way they did in older versions of Open MPI these operations are no longer necessary. At best they do nothing and at worst they hurt performance by making us enter the event library more often in opal_progress(). Fixes #2839 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-01-25 18:37:06 -07:00
Ralph Castain	2f4e87eae9	Have rank=1 daemon always send its topology back as this is the most common use-case Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-25 09:33:11 -08:00
Jeff Squyres	230bbc597d	plm base: make sure to assign "node" early enough Make sure to assign "node" before using it in ORTE_FLAG_SET. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-01-25 08:02:59 -08:00
Ralph Castain	184ccc8e91	Cleanup some code so it is clear that it is executing in an event. Ensure that peer event base is properly set on incoming connections Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-25 06:55:11 -08:00
Gilles Gouaillardet	ef10d3fd7b	orte: add missing include file Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-25 16:15:20 +09:00
Joshua Hursey	0e9a06d2c3	orte/iof: Add app stderr to stdout redirection at source * Add an MCA parameter to combine stdout and stderr at the source - `iof_base_redirect_app_stderr_to_stdout` * Aids in user debugging when using libraries that mix stderr with stdout Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-01-24 16:23:48 -06:00
Joshua Hursey	dcd9801f7c	orte/iof: Add orte_map_stddiag_to_stdout option * Similar to `orte_map_stddiag_to_stderr` except it redirects `stddiag` to `stdout` instead of `stderr`. * Add protection so that the user canot supply both: - `orte_map_stddiag_to_stderr` - `orte_map_stddiag_to_stdout` Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-01-24 16:22:59 -06:00
Ralph Castain	ef86707fbe	Deprecate the --slot-list paramaeter in favor of --cpu-list. Remove the --cpu-set param (mark it as deprecated) and use --cpu-list instead as it was confusing having the two params. The --cpu-list param defines the cpus to be used by procs of this job, and the binding policy will be overlayed on top of it. Note: since the discovered cpus are filtered against this list, #slots will be set to the #cpus in the list if no slot values are given in a -host or -hostname specification. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-24 13:33:22 -08:00
Ralph Castain	4e9364b9a4	Merge pull request #2794 from rhc54/topic/regs Next step in reducing launch time	2017-01-24 03:19:57 -08:00
Ralph Castain	86ab751c5e	Next step in reducing launch time: begin reducing the size of the launch message itself. Start by expressing the daemon map as a set of three regular expression strings. On an 8k cluster, this reduces the nidmap contribution from over 200kBytes to 21 bytes in size. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-23 19:54:47 -08:00
Gilles Gouaillardet	0bdc594b2e	rml/base: plug a memory leak in orte_rml_API_recv_cancel() simply return when the orte event thread has gone Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-24 09:12:47 +09:00
Ralph Castain	a61f7bdb26	Merge pull request #2780 from rhc54/topic/conn Ensure we properly set the "shutting down" flag so connection drops by downstream peers are properly handled.	2017-01-23 06:40:28 -08:00
Ralph Castain	e7b12913b4	Ensure we properly set the "shutting down" flag so connection drops by downstream peers are properly handled. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-23 04:00:24 -08:00
Nathan Hjelm	954a4b7be3	oob/base: fix num_threads registration type This commit fixes a bug in the registration of the num_threads MCA variable. The variable is of type int and was being registered as a boolean. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-01-22 14:02:34 -07:00
Ralph Castain	ac4fcd3f97	Ensure that oob/base level data is always accessed in the oob/base event thread. Make debruijn the default routed component Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-22 10:33:32 -08:00
Ralph Castain	6560617c04	Fix comm_spawn and orte-dvm by resetting all used "node mapped" flags after building the child list Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-22 05:55:53 -08:00
Ralph Castain	639cdd4f9d	Add missing flag set to ensure nodes do not get double-added to job map. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-21 20:06:50 -08:00
Ralph Castain	be3ef77739	Improve packing efficiency by raising the initial buffer size and modifying the extension code. Flag if a job map has had its nodes added so we don't have to loop repeatedly to check it. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-21 14:03:19 -08:00
Ralph Castain	466cbd4d29	Rework the threading in oob/tcp so that daemons (including mpirun) use multiple progress threads to get messages out to their children, and so that the oob/base uses a separate one to setup sends. This allows the daemon cmd processor to execute in parallel with relay of messages, which significantly reduces launch times at scale Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-21 13:26:19 -08:00
Ralph Castain	668421b6ec	Compress the xcast message if bigger than a defined size to further improve launch performance at scale Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-19 22:08:02 -08:00
Ralph Castain	1f46e48b94	Have mpirun and orteds activate the oob/tcp progress thread by default, leaving a way to turn it off via MCA param. Provide a method by which the add_procs command can be processed in parallel with relaying the cmd message to the next daemons down the tree. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-19 18:52:58 -08:00
Ralph Castain	bb132f6d03	Merge pull request #2764 from rhc54/topic/dvm If a tool sees the HNP it is attached to die (thereby losing connecti…	2017-01-19 15:39:30 -08:00
Ralph Castain	ca50b31de1	Merge pull request #2762 from rhc54/topic/oobfast Speed-up the OOB/TCP communications by using writev instead of writing the header, and then separately write the body	2017-01-19 15:39:06 -08:00
Ralph Castain	19bb64cfb8	If a tool sees the HNP it is attached to die (thereby losing connection), then stop the event loop instead of going through the abort code path. This will allow the tool to cleanup before exiting Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-19 14:04:06 -08:00
Ralph Castain	e5f687f896	Speed-up the OOB/TCP communications by using writev instead of writing the header, and then separately write the body Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-19 13:03:44 -08:00
Ralph Castain	368684bd63	Revert `e9bc293` and try a different approach for scalably dealing with hetero clusters. Have each orted send back its topo "signature". If mpirun detects that this signature has not been seen before, then ask for that daemon to send back its full topology description. This allows the system to only get the topology once for each unique topo in the cluster. Cleanup a typo, and remove no longer needed MCA params for hetero nodes and hetero apps. Hetero nodes will always be automatically detected. We don't support a mix of 32 and 64 bit apps Modify the orte_node_t to use orte_topology_t instead of hwloc_topology_t, updating all the places that use it. Ensure that we properly update topology when we see a different one on a compute node. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-18 10:22:15 -08:00
Ralph Castain	e9bc2934be	Add an MCA param "hnp_on_smgmt_node" that mpirun can use to tell the orteds to ignore its topology signature as mpirun is executing on a system mgmt node, and hence a different topology than the compute nodes Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-16 19:32:01 -08:00
Ralph Castain	74a285be83	Cancel the waitpid callback once the waitpid on a process has fired to avoid multiple notifications Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-16 14:32:02 -08:00
Ralph Castain	9e8c7d6295	Silence Coverity warning Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-15 07:51:37 -08:00
Ralph Castain	6b34cc67d6	Correct typo Fixes #2691 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-15 07:48:31 -08:00
Ralph Castain	3a157f0496	One more time - we "push" IOF for stdout, stderr, and stddiag with separate calls. However, we were creating the sinks for all three of them each time, which caused them to leak. Create the sinks only once for each channel. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-14 17:40:36 -08:00
Ralph Castain	b55c03255a	Strange - I had created a new IOF API "complete" for cleaning up at the end of jobs, but somehow the implementation is missing. It also appears that the orted's never actually cleaned up their job-related information. These things are fine for normal mpirun-based operations, but cause significant resource leaks for the DVM. Complete the implementation and seal the leaks Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-12 19:54:18 -08:00
Ralph Castain	0e2df3be3e	Missed one spot - plug fd leaks in orteds Fixes #2691 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-12 13:45:46 -08:00
Ralph Castain	9ad02b5d13	Merge pull request #2718 from rhc54/topic/leaks Don't remove the IOF framework's tracking info for a proc until the state machine tells it to do so.	2017-01-12 09:57:17 -08:00
Nathan Hjelm	110840fc87	ess/hnp: add support for forwarding additional signals (#2712 ) * ess/hnp: add support for forwarding additional signals This commit adds support to the hnp ess module to forward additional signals beyond the default SIGUSR1, SIGUSR2, SIGSTP, and SIGCONT. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> * Generalize this a bit to allow a broader range of signals to be forwarded. Turns out that SIGURG is now a "standard" signal, though the value differs across systems. So setup to forward it (and some friends) if they are defined. Allow users to provide the signal name (instead of the integer value) as the value of even the more common signals does vary across systems. Don't limit the number that can be supported. Signed-off-by: Ralph Castain <rhc@open-mpi.org> * ess/hnp: fix some bugs in the signal forwarding code This commit fixes two bugs: - signals_set needs to be set even if no signals are being forwarded. If it is not set we will SEGV in libevent if ess_hnp_forward_signals == none. - SIGTERM and SIGHUP are handled with a different type of handler. Do not allow the user to specify these to be forwarded. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> * We are sure to get "dinged" if error messages aren't nicely output via show_help, so do so here Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-12 10:09:41 -07:00
Ralph Castain	fa419d3c0d	Don't remove the IOF framework's tracking info for a proc until the state machine tells it to do so. This plugs leaked file descriptors as we were losing track prior to destructing the resources. Fixes #2691 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-12 08:34:29 -08:00
Ralph Castain	aff3a00059	Protect default mapping/binding options for cases where no NUMA or SOCKET objects exist - like VMs Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-11 09:44:44 -08:00
Ralph Castain	93e4935902	Be a tad more cautious before releasing objects when running in DVM mode Fixes #2700 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-10 14:04:27 -08:00
Gilles Gouaillardet	44c1ff60f1	Merge pull request #2672 from ggouaillardet/topic/misc_memory_leaks Plug misc memory leaks	2017-01-10 13:16:04 +09:00
Joshua Ladd	3e23380bba	Merge pull request #2675 from artpol84/orte/state/exit_1_fix orte/odls: Fix ORTE state machine for the non-zero exit case	2017-01-09 12:32:37 -05:00
Joshua Ladd	7fc9f9bbac	Merge pull request #2620 from karasevb/fix_rmaps_mindist rmaps/mindist: fix pmix errors	2017-01-06 17:26:48 -05:00
Ralph Castain	684e69695f	Minor cleanups to eliminate warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-06 08:44:10 -08:00
Artem Polyakov	3eb6c98542	orte/odls: Fix ORTE state machine for the non-zero exit case This commit fixes rare race condition that occurs when the process that is calling `exit(-1)` has delay between fd cleanup and actual OS-level exit. This may happen if the process has some work to do `on_exit()`. Problem description: Consider an application process that has called `exit(nonzero)`, it's fd's was closed but it's actual termination at OS level is delayed by some cleanups (eg. in callbacks registered via `on_exit()`). Observed sequence of events was the following: * orted gets stdio disconnection and activating `IOF COMPLETE` state. * parallel OOB disconnection causes `COMMUNICATION FAILURE` state to be activated. * during `COMMUNICATION FAILURE` processing `odls_base_default_wait_local_proc` is called even though real waitpid wasn't yet called (code mentions that waitpid might not be called for unspecified reason). Because of that real exit code is unknown and set to 0. `odls_base_default_wait_local_proc` callback sees `IOF COMPLETE` flag and in conjunction with 0-exit-code it activates `WAITPID FIRED` state. * processing of `WAITPID FIRED` leads to `NORMALLY TERMINATED` to be activated. * `NORMALLY TERMINATED` state in particular leads `ORTE_PROC_FLAG_ALIVE` flag for this proc to be dropped. * when application process finally exits and `wait_signal_callback` is launched. It sets real exit code and calls `odls_base_default_wait_local_proc` again but at this time since the process has `ORTE_PROC_FLAG_ALIVE` flag dropped `WAITPID FIRED` state is activated (instead of `EXITED WITH NON-ZERO`) leading to a hang that was observed. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-01-06 11:12:55 +02:00
Gilles Gouaillardet	6b9343a966	plm/rsh: plug a memory leak Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 15:38:45 +09:00
Gilles Gouaillardet	8ba92d7516	iof/base: plug a memory leak in orte_iof_base_close() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 15:38:45 +09:00
Gilles Gouaillardet	7fe6840232	state/hnp: plug a memory leak Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 15:38:45 +09:00
Gilles Gouaillardet	4d58b8dcae	ess/pmi: plug a memory leak Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 15:38:45 +09:00
Gilles Gouaillardet	c0c5dd8ccc	orte: plug a memory leak in orte_rml.recv_cancel do not invoke orte_rml.recv_cancel after the orte progress thread has gone Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 15:38:44 +09:00
Gilles Gouaillardet	17fac4bfd1	grpcomm/base: get rid of the seq_num field of the orte_grpcomm_signature_t struct Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 15:38:44 +09:00
Gilles Gouaillardet	fe25f50871	grpcomm/base: plug a memory leak on finalize manually allocate sequence numbers to be stored into the orte_grpcomm_base.sig_table hash table, and manually release them on orte_grpcomm_base_close() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 15:38:44 +09:00
Gilles Gouaillardet	0ee5d56ab1	grpcomm/direct: plug a memory leak in barrier_release() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 13:46:35 +09:00
Gilles Gouaillardet	f2d6584189	grpcomm/base: plug misc memory leaks - add a destructor to orte_grpcomm_caddy_t in order to plug a memory leak - plug a memory leak in barrier_release() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 13:46:21 +09:00
Gilles Gouaillardet	58f2a764f9	ess/hnp: plug memory leaks Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 11:35:59 +09:00
Gilles Gouaillardet	24c61b0625	oob/tcp: plug a memory leak in mca_oob_tcp_component_lost_connection() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 11:35:59 +09:00
Gilles Gouaillardet	c7d9e62d47	rml/base: plug a memory leak add a destructor to orte_rml_send_request_t in order to plug a memory leak Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-01-06 11:35:59 +09:00
Ralph Castain	6509f60929	Complete the memprobe support. This provides a new scaling tool called "mpi_memprobe" that samples the memory footprint of the local daemon and the client procs, and then reports the results. The output contains the footprint of the daemon on each node, plus the average footprint of the client procs on that node. Samples are taken after MPI_Init, and then again after MPI_Barrier. This allows the user to see memory consumption caused by add_procs, as well as any modex contribution from forming connections if pmix_base_async_modex is given. Using the probe simply involves executing it via mpirun, with however many copies you want per node. Example: $ mpirun -npernode 2 ./mpi_memprobe Sampling memory usage after MPI_Init Data for node rhc001 Daemon: 12.483398 Client: 6.514648 Data for node rhc002 Daemon: 11.865234 Client: 4.643555 Sampling memory usage after MPI_Barrier Data for node rhc001 Daemon: 12.520508 Client: 6.576660 Data for node rhc002 Daemon: 11.879883 Client: 4.703125 Note that the client value on node rhc001 is larger - this is where rank=0 is housed, and apparently it gets a larger footprint for some reason. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-05 10:32:17 -08:00
Ralph Castain	9eab9a1ed3	Remove stale global variables Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers. Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation). Begin first cut at memory profiler Some minor cleanups of memprobe Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-02 14:04:24 -08:00
Ralph Castain	fe68f23099	Only instantiate the HWLOC topology in an MPI process if it actually will be used. There are only five places in the non-daemon code paths where opal_hwloc_topology is currently referenced: * shared memory BTLs (sm, smcuda). I have added a code path to those components that uses the location string instead of the topology itself, if available, thus avoiding instantiating the topology * openib BTL. This uses the distance matrix. At present, I haven't developed a method for replacing that reference. Thus, this component will instantiate the topology * usnic BTL. Uses the distance matrix. * treematch TOPO component. Does some complex tree-based algorithm, so it will instantiate the topology * ess base functions. If a process is direct launched and not bound at launch, this code attempts to bind it. Thus, procs in this scenario will instantiate the topology Note that instantiating the topology on complex chips such as KNL can consume megabytes of memory. Fix pernode binding policy Properly handle the unbound case Correct pointer usage Do not free static error messages! Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-29 10:33:29 -08:00
Ralph Castain	3a2d6a5ab6	Begin to reduce reliance of application procs on the topology tree itself by having the daemon provide more detailed info. In this case, provide the topology description string so that procs can readily determine the number of types of objects on the node, and a "locality" string that describes which objects this process is executing upon. The latter allows a process to compute the objects of overlap between itself and another proc without consulting the topology tree. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-28 09:14:26 -08:00
Ralph Castain	7866bb1119	Add debug, cleanup cpus/rank Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-27 21:25:52 -08:00
Ralph Castain	1e4bffd937	Fix mapping directive checks Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-27 20:42:47 -08:00
Ralph Castain	791f4f1ce3	Adjust debug output for clarity Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-26 14:04:20 -08:00
Ralph Castain	ef3f748d0d	Transfer some minor cleanups back from the PMIx reference server Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-23 08:46:04 -08:00
Boris Karasev	5fb3e0a9b6	rmaps/mindist: fix pmix errors Fixed the case were only part of the nodes in the allocation are used by the applicaton proccesses. Force PMIx nodemap key to only contain nodes that are actually used by the application proccesses. Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2016-12-21 06:42:04 +02:00
Ralph Castain	ea133206ec	Sync the internal OMPI component to PMIx master Update external PMIx v2.x component Add missing Makefile Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-19 19:14:16 -08:00
Ralph Castain	256b5adac5	Transfer across final fixes from debugger attach work Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-19 00:34:27 -08:00
Ralph Castain	c6f6f40529	Transfer debugger support changes Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-17 18:14:46 -08:00
Ralph Castain	269753f5c1	Transfer back changes from debugger attach work Silence warning Remove debug Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-17 10:00:52 -08:00
Ralph Castain	215d6290e0	Add a flux component for LLNL Fine tuning of flux component Fix a few minor issues with the initial cut: * Job id could be obtained from the PMI kvsname like SLURM, but simpler to getenv (FLUX_JOB_ID) * Flux pmi-1 doesn't define PMI_BOOL, PMI_TRUE, PMI_FALSE * Flux pmi-1 maps the deprecated PMI_Get_kvs_domain_id() to PMI_KVS_Get_my_name() internally, so just call that instead. * Drop residual slurm references. Add wrappers for PMI functions so that if HAVE_FLUX_PMI_LIBRARY is not defined, the component can dlopen libpmi.so at location specified by the FLUX_PMI_LIBRARY_PATH env variable, which adds flexibility. If HAVE_FLUX_PMI_LIBRARY is defined, link with libpmi.so at build time in the usual way. Update configury for flux component Update m4 so the configure options work as follows: --with-flux-pmi Build Flux PMI support (default: yes) --with-flux-pmi-library Link Flux PMI support with PMI library at build time. Otherwise the library is opened at runtime at location specified by FLUX_PMI_LIBRARY_PATH environment variable. Use this option to enable Flux support when building statically or without dlopen support (default: no) If the latter option is provided, the library/header is located at build time using the pkg-config module 'flux-pmi'. Otherwise there is no library/header dependency. Handle the case where ompi is configured with --disable-dlopen or --enable-statkc. In those cases, don't build the component unless --with-flux-pmi-library is provided. It is fatal if the user explicitly requests --with-flux-pmi but it cannot be built (e.g. due to --disable-dlopen). Add a schizo/flux component Update schizo/flux component Eliminate slurm-specific usage cases. Since the module is only loaded if FLUX_JOB_ID is set, there are only two cases to handle: 1) App was launched indirectly through mpirun. This is not yet supported with Flux, but hook remains in case this mode is supported in the future. 2) App was launched directly by Flux, with Flux providing CPU binding, if any. Fix up white space in pmix/flux component Drop non-blocking fence from pmix:flux component The flux PMI-1 library is not thread safe, therefore register a regular blocking fence callback instead of the thread-shifting fencenb(). pmix/flux component avoids extra PMI_KVS_Gets Keys stored into the base cache under the wildcard rank are not intended to be part of the global key namespace. These keys therefore should not trigger a PMI_KVS_Get() if they are not found in the cache. Minor pmix/flux component cleanup pmix/flux: drop code for fetching unused pmix_id pmix/flux: err_exit must return error Problem: in flux_init(), although 'ret' (variable holding err_exit return code) is initialized to OPAL_ERROR, the variable is reused as a temporary result code, so if there are some successes followed by a failure that doesn't set 'ret', flux_init() could return success with PMI not initialized. Ensure that a "goto err_exit" returns OPAL_ERROR if 'ret' is not set to some other error code. pmix/flux: don't mix OPAL_ and PMI_ return codes Problem: flux_init() can return both PMI_ and OPAL_ return codes. Although OPAL_SUCCESS and PMI_SUCCESS are both defined as 0, other codes are not compatible. Ensure that flux_init() consistently uses 'rc' for PMI_ return codes and 'ret' for OPAL_ return codes. pmix/flux: factor out repeated code for cache put Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-16 18:26:38 -08:00
Ralph Castain	2af677b1cf	Ensure that we don't bind-by-default in an oversubscribed condition Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-15 07:58:52 -08:00
Ralph Castain	884fb7fcf2	Update the PMIx2 support to include the latest shared memory optimizations Update ORTE support for dynamic PMIx operations e.g., PMIx_Spawn Update to track master Ensure that --disable-pmix-dstore actually disables the dstore. Sync to a few debugger updates Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-14 15:00:10 -08:00
Ralph Castain	9f69b0183f	Ensure jobs that fail always return a non-zero exit code. Thanks to Ashley Pittman for the report. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-14 09:41:06 -08:00
rhc54	341ab683de	Merge pull request #2532 from rhc54/topic/pmixptl Update to latest PMIx master + PTL branch	2016-12-07 17:28:22 -08:00
Ralph Castain	e1aa7939ef	Correctly cleanup the local children and node map info on remote orteds upon job completion. Ensure that register_nspace only includes procs from that job in the proc map Thanks to Ashley Pittman for the report Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-07 13:53:00 -08:00

... 3 4 5 6 7 ...

4431 Коммитов