openmpi

Автор	SHA1	Сообщение	Дата
Joshua Hursey	0e8a97c598	Fix the sigkill timeout sleep to prevent SIGCHLD from preventing completion. * The user can set `-mca odls_base_sigkill_timeout 30` to have ORTE wait 30 seconds before sending SIGTERM then another 30 seconds before sending SIGKILL to remaining processes. This usually happens on an abnormal termination. Sometimes the user wants to delay the cleanup to give the system time to write out corefile or run other diagnostics. * The problem is that child processes may be completing while ORTE is in this loop. The SIGCHLD will interrupt the `sleep` system call. Without the loop the sleep could effectively be ignored in this case. - Sleep returns the amount of time remaining to sleep. If it was interrupted by a signal then it is a positive number less than or equal to the parameter passed to it. If it slept the whole time then it returns 0. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2019-10-02 14:41:34 -04:00
Ralph Castain	7444b32494	Remove stale references to orte_oob_base.ev_base The oob is restricted to the main event base Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-09-29 18:52:03 -07:00
Ralph Castain	5e9d07d4e0	Merge pull request #7010 from rhc54/topic/oob Cleanup stale code in ORTE/OOB	2019-09-25 14:23:43 -07:00
Ralph Castain	41eb41c3f2	Cleanup stale code in ORTE/OOB Remove code for multiple OOB progress threads as it is an optimization nobody uses. Also turns out to have a race condition that can cause segfault on finalize, so maybe good that nobody is using it. Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-09-25 13:27:41 -07:00
Austen Lauria	77144689f0	Add 'orte_' prefix to noop_mpir_breakpoint_ptr. Signed-off-by: Austen Lauria <awlauria@us.ibm.com>	2019-09-18 17:44:40 -04:00
Austen Lauria	067adfa417	Conform MPIR_Breakpoint to MPIR standard. - Fix MPIR_Breakpoint standard violation by returning void instead of a void*. Signed-off-by: Austen Lauria <awlauria@us.ibm.com>	2019-09-17 15:19:00 -04:00
Ralph Castain	06d188ebf3	Be a little less restrictive on interface requirements If both types of interfaces are enabled, don't error out if one of them isn't able to open listener sockets. Only one interface family may be available on some machines, but someone might want to build the code to run more generally. Refs https://github.com/pmix/prrte/pull/249 Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-09-06 08:27:05 -07:00
Ralph Castain	373e816b37	Ensure buffer_unload leaves the buffer in a clean state Silence a warning in orte/nidmap Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-09-04 08:32:27 -07:00
Jeff Squyres	197beb30d5	orterun: remove duplicate code https://github.com/open-mpi/ompi/pull/6895 fixed the code in orterun.c to allow running as root if both OMPI_ALLOW_RUN_AS_ROOT and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM env vars are set. However, this env-var-checking code already exists in orte_submit.c:orte_submit_init() -- it looks like the geteuid()/getenv()-checking code here in orterun is now duplicate code. So let's just get rid of the duplicate code. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-08-19 15:36:59 -04:00
Simon Byrne	9c8671c48b	Run-as-root env vars in orterun.c I found that I needed to apply the same change as #5597 to orterun.c for the environment variables to work correctly. Signed-off-by: Simon Byrne <simonbyrne@gmail.com>	2019-08-12 18:52:52 -07:00
Ralph Castain	bd5a1765ee	Fix typos Provide a missing header and paren Thanks to @zerothi for the assistance Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-08-07 05:47:12 -07:00
Ralph Castain	ea0dfc3218	Allow individual jobs to set their map/rank/bind policies Override the defaults when provided. Ignore LSF binding file if user overrides by specifying a policy. Fixes #6631 Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-08-06 07:48:58 -07:00
William Zhang	4ebb37a26c	opal/util: Change opal/util/if.h macro IF_NAMESIZE to OPAL_IF_NAMESIZE Due to IF_NAMESIZE being a reused and conditionally defined macro, issues could arise from macro mismatches. In particular, in cases where opal/util/if.h is included, but net/if.h is not, IF_NAMESIZE will be 32. If net/if.h is included on Linux systems, IF_NAMESIZE will be 16. This can cause a mismatch when using the same macro on a system. Thus different parts of the code can have differring ideas on the size of a structure containing a char name[IF_NAMESIZE]. To avoid this error case, we avoid reusing the IF_NAMESIZE macro and instead define our own as OPAL_IF_NAMESIZE. Signed-off-by: William Zhang <wilzhang@amazon.com>	2019-07-29 21:24:39 +00:00
Austen Lauria	00106f5ac9	Try to prevent the compiler from optimizing out MPIR_Breakpoint(). Signed-off-by: Austen Lauria <awlauria@us.ibm.com>	2019-07-24 09:16:54 -04:00
Gilles Gouaillardet	24f2961156	ess/base: fix a misc memory leak in orte_ess_base_proc_binding() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-07-12 17:01:04 +09:00
Jeff Squyres	b738fa295d	Merge pull request #6796 from orivej/fix-tcp_component_close-segfault Fix oob_tcp tcp_component_close segfault with active listeners	2019-07-08 18:13:52 -04:00
Orivej Desh	78b7e342bd	Fix oob_tcp tcp_component_close segfault with active listeners oob_tcp in non-HNP mode shares libevent event_base with oob_base [1]. orte_oob_base_close calls: (1) oob_tcp component_shutdown, then (2) opal_progress_thread_finalize, then (3) oob_tcp tcp_component_close [2]. opal_progress_thread_finalize calls tracker_destructor [3] that frees the event_base [4]. If any oob_tcp event listeners are active at this time, oob_tcp will crash trying to delete them at [5] [6]. This change moves oob_tcp event listener cleanup from component_close to component_shutdown so that it happens before the event_base is freed. [1] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_listener.c#L160 [2] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/base/oob_base_frame.c#L95 [3] https://github.com/open-mpi/ompi/blob/v4.0.1/opal/runtime/opal_progress_threads.c#L232 [4] https://github.com/open-mpi/ompi/blob/v4.0.1/opal/runtime/opal_progress_threads.c#L65 [5] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_component.c#L192 [6] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_listener.c#L955 Signed-off-by: Orivej Desh <orivej@gmx.fr>	2019-07-04 20:45:47 +00:00
Orivej Desh	de522545c0	Fix ORTE_FORCED_TERMINATE message The format string expects to see the file and line before the error text and code. Signed-off-by: Orivej Desh <orivej@gmx.fr>	2019-06-28 00:46:55 +00:00
Jordan Hayes	7dad74032e	plm_slurm_module: adjust for new SLURM CLI options SLURM 19 discontinued the use of --cpu_bind (and changed it to --cpu-bind). There's no easy way to test at run time which one is accepted, so set the environment variable SLURM_CPU_BIND to "none", which should do the same thing as the srun CLI parameter. Signed-off-by: Jordan Hayes <jhayes@ucr.edu> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-05-15 14:52:33 -07:00
Eisuke Kawashima	027f74bc39	orterun.1in: Fix typo and other minor updates Signed-off-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-04-22 09:35:57 -04:00
Mark Allen	bf3980d70c	fix hang in -np 3 --rank-by core The following command hangs: % mpirun --rank-by core -np 3 --report-bindings hostname because of a loop where i is supposed to cycle through an array of size num_objs, but for some reason it's only looking at node->num_procs entries. I changed the counter so it stays in the loop (stays on this node) until it makes a full cycle through the array of objects without any assignments then it ends the loop so it can go to the next node. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2019-04-12 15:34:02 -04:00
Mark Allen	bdd92a7a64	-cpu-set as a constraint rather than as a binding The first category of issue I'm addressing is that recent code changes seem to only consider -cpu-set as a binding option. Eg a command like this % mpirun -np 2 --report-bindings --use-hwthread-cpus \ --bind-to cpulist:ordered --map-by hwthread --cpu-set 6,7 hostname which just round robins over the --cpu-set list. Example output which seems fine to me: > MCW rank 0: [..../..B./..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] > MCW rank 1: [..../...B/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] It should also be possible though to pass a --cpu-set to most other map/bind options and have it be a constraint on that binding. Eg % mpirun -np 2 --report-bindings \ --bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname % mpirun -np 2 --report-bindings \ --bind-to hwthread --map-by ppr:2:node,pe=2 --cpu-set 6,7,12,13 hostname The first command above errors that > Conflicting directives for mapping policy are causing the policy > to be redefined: > New policy: RANK_FILE > Prior policy: BYHWTHREAD The error check in orte_rmaps_rank_file_open() is likely too aggressive. The intent seems to be that any option like "--map-by whatever" will check to see if a rankfile is in use, and report that mapping via rmaps and using an explicit rankfile is a conflict. But the check has been expanded to not just check NULL != orte_rankfile but also errors out if (NULL != opal_hwloc_base_cpu_list && !OPAL_BIND_ORDERED_REQUESTED(opal_hwloc_binding_policy)) which seems to be only recognizing -cpu-set as a binding option and ignoring -cpu-set as a constraint on other binding policies. For now I've changed the NULL != opal_hwloc_base_cpu_list to OPAL_BIND_TO_CPUSET == OPAL_GET_BINDING_POLICY(opal_hwloc_binding_policy) so it hopefully only errors out if -cpu-set is being used as a binding policy. Whether I did that right or not it's enough to get to the next stage of testing the example commands I have above. Another place similar logic is used is hwloc_base_frame.c where it has /* did the user provide a slot list? */ if (NULL != opal_hwloc_base_cpu_list) { OPAL_SET_BINDING_POLICY(opal_hwloc_binding_policy, OPAL_BIND_TO_CPUSET); } where it used to (long ago) only do that if !OPAL_BINDING_POLICY_IS_SET(opal_hwloc_binding_policy) I think the new code is making it impossible to use --cpu-set as anything other than a binding policy. That brings us past the error detection and into the real functionality, some of which has been stripped out, probably in moving to hwloc-2: % mpirun -np 2 --report-bindings \ --bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname > MCW rank 0: [B.../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] > MCW rank 1: [.B../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] The rank_by() function in rmaps_base_ranking.c makes an array out of objects returned from opal_hwloc_base_get_obj_by_type(,,,i,) which uses df_search(). That function changed quite a bit from hwloc-1 to 2 but it used to include a check for available = opal_hwloc_base_get_available_cpus(topo, start) which is where the bitmask from --cpu-set goes. And it used to skip objs that had hwloc_bitmap_iszero(available). So I restored that behavior in ds_search() by adding a "constrained_cpuset" to replace start->cpuset that it was otherwise processing. With that change in place the first command works: % mpirun -np 2 --report-bindings \ --bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname > MCW rank 0: [..../..B./..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] > MCW rank 1: [..../...B/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] The other command uses a different path though that still ignored the available mask: % mpirun -np 2 --report-bindings \ --bind-to hwthread --map-by ppr:2:node:pe=2 --cpu-set 6,7,12,13 hostname > MCW rank 0: [BB../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] > MCW rank 1: [..BB/..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] In bind_generic() the code used to call opal_hwloc_base_find_min_bound_target_under_obj() which used opal_hwloc_base_get_ncpus(), and that's where it would intersect objects with the available cpuset and skip over ones that were't available. To match the old behavior I added a few lines in bind_generic() to skip over objects that don't intersect the available mask. After that we get % mpirun -np 2 --report-bindings \ --bind-to hwthread --map-by ppr:2:node:pe=2 --cpu-set 6,7,12,13 hostname > MCW rank 0: [..../..BB/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] > MCW rank 1: [..../..../..../BB../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....] I think the above changes are improvements, but I don't feel like they're comprehensive. I only traced through enough code to fix the two specific bugs I was dealing with. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2019-04-12 15:33:56 -04:00
James Clark	20f5840cbb	Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init. This is so when a debugger attaches using MPIR, it can step out of this stack back into main. This cannot be done with certain aggressive optimisations and missing debug information. Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Co-authored-by: Jeff Squyres <jsquyres@cisco.com>	2019-03-27 14:32:15 +00:00
Ralph Castain	8174286530	Sync nidmap to PRRTE to fix hetero topo problem Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-03-26 08:24:09 -07:00
Ralph Castain	dfbc14430d	Merge pull request #6440 from ggouaillardet/topic/yield_when_idle schizo/ompi: correctly handle the yield_when_idle option	2019-03-25 12:17:34 -07:00
Ralph Castain	5aa775c02e	Correctly set the byte_object size Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-03-18 14:29:37 -07:00
Ralph Castain	aed06e68b9	Protect against NULL node pointer Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-03-16 01:31:28 -07:00
Ralph Castain	2794ae43b3	Update nidmap Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-03-16 01:20:15 -07:00
Ralph Castain	35a597178d	Ensure that nodes are always used in order provided If a user provides a list of nodes to use via -host or -hostfile, then ensure that the ranks are placed according to that order. Also fix a bug where the number of slots on a node was incorrectly computed for localhost if the name given didn't exactly match the return from get_hostname. Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-03-15 12:58:10 -07:00
Gilles Gouaillardet	cc97c0f611	schizo/ompi: correctly handle the yield_when_idle option in schizo/ompi, sets the new OMPI_MCA_mpi_oversubscribe environment variable according to the node oversubscription state. This MCA parameter is used to set the default value of the mpi_yield_when_idle parameter. This two steps tango is needed so the mpi_yield_when_idle setting is always honored when set in a config file. Refs. open-mpi/ompi#6433 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-28 09:53:29 +09:00
Ralph Castain	60961ceb41	Fix cross-mpirun connect/accept operations Ensure we publish all the info required to be returned to the other mpirun when executing this operation. We need to know the daemon (and its URI) that is hosting each of the other procs so we can do a direct modex operation and retrieve their connection info. Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-26 17:08:48 -08:00
Ralph Castain	2f15379171	Remove stale singularity/schizo component Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-20 17:38:24 -08:00
Ralph Castain	2da5651869	Restore orted hnp_uri cmd line option Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-18 13:24:03 -08:00
Ralph Castain	e56ee1e06a	Remove the remaining cruft from dual oob transport * When we moved to allowing dual rml/oob transports, we added a bunch of stuff that is no longer needed. Remove it so as to simplify the messaging system. * Fix the routed/radix component so it correctly returns the parent's vpid Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-08 11:12:31 -08:00
Gilles Gouaillardet	b80210c36a	orte/util: strdup() in orte_util_decode_nidmap() since opal_argv_free() will free() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-08 11:11:25 -08:00
Gilles Gouaillardet	78152aec85	orte/nidmap: do not use compressed when uninitialized Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-08 11:11:25 -08:00
Ralph Castain	1ee6c185f7	Remove stale code Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-08 11:11:25 -08:00
Ralph Castain	01e9aca40f	Add topology support for hetero systems Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-08 11:11:25 -08:00
Gilles Gouaillardet	88ac05fca6	misc fixes Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-08 11:11:25 -08:00
Ralph Castain	125d236173	Move from the use of regex to compression We've been fighting the battle of trying to create a regex generator and parser that can handle arbitrary hostname schemes - without long-term success. The worst of it is that there is no way of checking to see if the computed regex is correct short of parsing it and doing a character-by-character comparison with the original string. Ugh...there has to be a better solution. One option is to investigate using 3rd-party regex libraries as those are coming from communities whose sole focus is resolving that problem. However, someone would need to spend the time to investigate it, and we'd have to find a license-friendly implementation. Another option is to quit beating our heads against the wall and just compress the information. It won't be as much of a reduction, but we also won't keep hitting scenarios where things break. In this case, it seems that "perfection" is definitely the enemy of "good enough". This PR implements the compression option while retaining the possibility of people adding regex-generating components. The compression code used in ORTE is consolidated into the opal/compress framework. That framework currently held bzip and gzip components for use in compressing checkpoint files - since we no longer support C/R, I have .opal_ignore'd those components. However, I have left the original framework APIs alone in case someone ever decides to redo C/R. The APIs of interest here are added to the framework - specifically, the "compress_block" and "decompress_block" functions. I then moved the ORTE zlib compression code into a new component in this framework. Unfortunately, the framework currently is a single-select one - i.e., only one active component at a time. Since I .opal_ignore'd the other two and made the priority of zlib high, this isn't a problem. However, if someone wants to re-enable bzip/gzip or add another component, they might need to transition opal/compress to a multi-select framework. Included changes: * Consolidate the compression code into the opal/compress framework * Move the ORTE zlib compression code into a new opal/compress/zlib component * Ignore the bzip and gzip components in opal/compress framework * Add a "compress_base_limit" MCA param to set the threshold above which we compress data - defaults to 4096 bytes * Delete stale brucks and rcd components from orte/grpcomm framework * Delete the orte/regx framework * Update the launch system to use opal/compress instead of string regex * Provide a default module if no zlib is available * Fix some misc multi-node issues * Properly generate the nidmap in response to a "connection warmup" message so the remote daemon knows the children it needs to launch. * Remove stale references to orte_node_regex * opal_byte_object_t's are not OPAL objects - properly release allocated memory. * Set the topology * Currently only handling homogeneous case * Update the compress framework files to conform * Consolidate open/close into one "frame" file. Ensure we open/close the framework Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-08 11:11:14 -08:00
Ralph Castain	fcbc7ea298	Merge pull request #6306 from karasevb/regx_host_ordering_fix regex: fixed host ordering for different prefixes	2019-02-08 11:09:55 -08:00
Ralph Castain	8794077520	Remove stale rml/ofi component Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-01-30 12:41:50 -08:00
Boris Karasev	46e38b9193	regx: fixed the order of hosts for ranges with different prefixes Example: For the list of hosts `a01,b00,a00` a regex is generated: `a[2:1.0],b[2:0]`, where `a`-hosts prefixes moved to the begining, it breaks the hosts ordering. This commit fixes regex for that case to `a[2:1],b[2:0],a[2:0]` Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2019-01-30 15:06:30 +06:00
Boris Karasev	1967e41a71	regx/reverse: fixed adding an empty range for no numerical hostnames Example: For the nodelist `jjss,jjss0000001,jjss0000003,jjss0000002` a regular expression was `jjss[0:0],jjss[7:1,3,2]` that led to incorrect unpacking the first host as `jjs0`. This commit fixes an adding empty range for not numeric hostnames. Here is the fixed regex for this exapmle: `jjss,jjss[7:1,3,2]` Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2019-01-30 09:41:00 +06:00
Boris Karasev	d1ad90f47e	regx/test: update regex test Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2019-01-30 09:40:59 +06:00
Jason Williams	98d81a5f7a	Adding changes for issue #6303 for branch master. Signed-off-by: Jason Williams <uberlinuxguy@gmail.com>	2019-01-26 10:49:47 -05:00
Howard Pritchard	b46e15535a	orte: shutdown be more careful about closing framewworks as part of orte_finalize. Owing to recent restructuring in opal to handle finalize in a more general fashion, the missing framework closes were causing meltdowns as the mca vars subsystem was cleaning itself up. This problem was recently reported by Siegmar: https://www.mail-archive.com/users@lists.open-mpi.org//msg32946.html Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2019-01-04 11:04:13 -07:00
Ralph Castain	b19e5edf76	Correct parsing of ppr directives Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2019-01-02 09:03:13 -08:00
Jeff Squyres	f96c04244d	odls_base_default_fns.c: put the free() in the right place Fixes CID 1441826. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-12-22 06:40:05 -08:00
Ralph Castain	d728380741	If job is fully described, there will be no ppn string to unpack Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-12-17 16:13:55 -08:00

1 2 3 4 5 ...

5886 Коммитов