openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	d97bc29102	Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given	2015-09-04 16:54:40 -07:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	0b1d4b62be	Cleanup some cruft and update to coordinate with CM operations: * don't pass --tree-spawn to the orted cmd line. If someone doesn't want tree-spawn, it shows up as an MCA param anyway * ensure state/orted component disqualifies itself from CM operations * clarify the DVM proc_type definitions * ensure we stop littering the tmp dir with session directories	2015-08-12 10:32:14 -07:00
Howard Pritchard	1b55d14dff	plm/alps: remove unneded env. variable setting In order to address issue #741, the orted's now are always launched with the Cray PMI environment variables PMI_NO_FORK PMI_NO_PREINITIALIZE set to disable running of the library's ctor. So there's no longer a need to set these for the application(s) being launched by the orted's. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2015-08-05 13:27:18 -07:00
Ralph Castain	023936e84b	Silence coverity warnings	2015-07-29 07:28:08 -07:00
Howard Pritchard	70096d3753	plm/alps: fix orted based launch failures. Turns out that when one builds Open MPI with --disable-dlopen for Cray, a whole bunch of cray specific libraries get linked in to the orted executable. One of these is Cray PMI. The Cray PMI has a ctor which, if run, causes job launches using mpirun to fail. This commit suppresses the running of the ctor and thus prevents failure to launch. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2015-07-23 15:07:57 -07:00
Nathan Hjelm	4d92c9989e	more c99 updates This commit does two things. It removes checks for C99 required headers (stdlib.h, string.h, signal.h, etc). Additionally it removes definitions for required C99 types (intptr_t, int64_t, int32_t, etc). Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-06-25 10:14:13 -06:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Ralph Castain	869b2891c4	When doing comm-spawn, track the last object we bound to and ensure that we start the next job on the next object so we avoid overload situations when they aren't necessary	2015-06-17 09:20:08 -07:00
Ralph Castain	c21cd1c91e	Ensure the ssh session is dead	2015-05-23 08:14:29 -07:00
Ralph Castain	920562d9b4	Ensure that all ssh sessions are terminated when abnormally terminating the job	2015-05-23 08:14:29 -07:00
Gilles Gouaillardet	2e384a3b65	initialize common symbols from orte A few uninitialized common symbols are remaining (generated by flex) : * orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_leng * orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_text * orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_leng * orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_text	2015-05-08 10:11:58 +09:00
Ralph Castain	8e3f0b1d33	Ensure the --tree-spawn option is inside any parens from the sh and ksh shell support	2015-05-06 15:18:15 -07:00
Ralph Castain	7d1980ba83	Add the ability to specify the number of desired slots in the --host option. Just giving a host name => one slot (multiple copies of the name yield one slot per copy). Giving "foo:3" indicates you want three slots - a shorthand notation for saying "foo" three times. Giving "foo:*" indicates you want the topology to set the number of slots based on the orte_set_slots param.	2015-04-30 20:35:23 -07:00
Jeff Squyres	11e8c2096b	plm rsh: assign some levels to the rsh PLM MCA params	2015-04-20 16:18:57 -07:00
Nathan Hjelm	45e053dbce	orte: use C99 subobject naming for component initialization This commit helps future-proof orte components by initializing each component member by name. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-18 10:29:58 -06:00
Ralph Castain	34b53ac3dc	Silence Coverity warnings	2015-04-18 07:48:22 -07:00
Ralph Castain	12bfb27161	Redo in cleaner form: Per request from Andy Rieb, add ability to pass PATH and LD_LIBRARY_PATH elements to ssh command	2015-04-17 16:11:37 -07:00
Nathan Hjelm	3436f2917d	Merge pull request #449 from hjelmn/mca_base_update mca/base update	2015-04-16 08:41:48 -06:00
Ralph Castain	d9c555b547	Revert "Per request from Andy Rieb, add ability to pass PATH and LD_LIBRARY_PATH elements to ssh command" This reverts commit open-mpi/ompi@278324c52a. Revert "Add the ability to pass args to the rsh/ssh command line" This reverts commit open-mpi/ompi@6f227f8564.	2015-04-16 08:03:14 -06:00
Ralph Castain	278324c52a	Per request from Andy Rieb, add ability to pass PATH and LD_LIBRARY_PATH elements to ssh command	2015-04-15 20:30:04 -06:00
Ralph Castain	6f227f8564	Add the ability to pass args to the rsh/ssh command line	2015-04-15 20:07:13 -06:00
Ralph Castain	91e1cbf284	Init variable	2015-04-11 07:44:57 -07:00
Ralph Castain	3e44d3c9e3	Enable singletons to run without any active OOB module until they attempt to comm_spawn	2015-04-10 14:06:42 -07:00
Ralph Castain	9f8ae59162	Properly enclose the different && clauses	2015-04-01 18:48:25 -07:00
Ralph Castain	57c21d5209	Ensure the DVM flows thru the "daemons reported" state	2015-04-01 16:47:34 -07:00
Mike Dubman	58d002098b	Merge pull request #474 from elenash/master Introduce -tune command line option to set env vars and mca params from ...	2015-04-01 08:23:34 +03:00
Ralph Castain	6f9140a341	Add a little more debug to launch	2015-03-31 20:10:21 -07:00
Nathan Hjelm	b68d66bb9b	MCA: Add the project/project version to the MCA base component This commit adds support for project_framework_component_* parameter matching. This is the first step in allowing the same framework name in multiple projects. This change also bumps the MCA component version to 2.1.0. All master frameworks have been updated to use the new component versioning macro. An mca.h has been added to each project to add a project specific versioning macro of the form PROJECT_MCA_VERSION_2_1_0. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-03-27 10:59:04 -06:00
Elena	90f5b2bb84	Introduce -tune command line option to set env vars and mca params from file	2015-03-26 18:33:53 +02:00
Ralph Castain	6aa33deafb	Remove debug	2015-03-25 19:58:51 -07:00
Ralph Castain	6ba76ed8d8	Per user request, we allow -host to specify a host that is not included in a hostfile (however, we reject it if we were given an allocation by a resource manager). Since we cannot know if an IP addr form references the same node that was previously given as a string name, we have no choice but to assume they are different. Get the topology from the right place in that situation so mpirun can succeed.	2015-03-25 06:16:01 -07:00
Ralph Castain	43a3baad5e	Ensure we use the first compute node's topology for mapping Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes. Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset. Correctly count the number of available PUs under each object when given a cpuset Fix the default binding settings, and correctly count PUs when no cpuset is given Ensure the binding policy gets set in all cases	2015-03-19 16:30:36 -07:00
Gilles Gouaillardet	2ab9a411f8	plm/base: fix misc memory leaks as reported by Coverity with CIDs 1196733 and 1196745	2015-03-09 16:25:07 +09:00
Gilles Gouaillardet	7de3f35b90	pml/rsh: fix misc memory leaks as reported by Coverity with CIDs 71091, 71230, 71231, 72274, 72389, 1196718 and 1196719	2015-03-05 20:03:37 +09:00
Jeff Squyres	05f00aface	plm base: ensure mca_base_var_get_value() and mca_base_var_find() succeed This was CID 993712	2015-02-24 15:48:50 -05:00
Jeff Squyres	e2223cd9bf	plm_rsh: ensure cwd array is \0-terminated This was CID 72257	2015-02-24 15:24:08 -05:00
Howard Pritchard	bf89131f9e	add owner files to opa/ompi/orte mca directories This commit adds an owner file in each of the component directories for each framework. This allows for a simple script to parse the contents of the files and generate, among other things, tables to be used on the project's wiki page. Currently there are two "fields" in the file, an owner and a status. A tool to parse the files and generate tables for the wiki page will be added in a subsequent commit.	2015-02-22 15:10:23 -07:00
Ralph Castain	3ae3b96c17	Fix master compilation - a buried header dependency must have been removed.	2015-02-10 07:22:10 -08:00
Ralph Castain	a3275aa867	Once again, fix the blasted singleton comm_spawn	2015-02-05 17:34:25 -08:00
Ralph Castain	2b0b012460	Continue refinement of the DVM operations. Send the spawn request to the right place (it helps) as it isn't a comm_spawn request and has to be treated a little differently. Ensure IO gets forwarded back to the tool. Ensure the tool outputs show_help locally as there is no place to send it.	2015-02-04 06:21:54 -08:00
Ralph Castain	ec5ccb76cf	Enable persistent ORTE DVM so users can execute multiple OMPI jobs within an allocation without restarting the DVM every time.	2015-01-30 11:00:43 -08:00
Howard Pritchard	f34dd5f5fd	plm/alps: update copyright	2015-01-07 12:33:38 -07:00
Howard Pritchard	c454d11b01	plm/alps: fix orted abort hang problem Turns out the alps plm component wasn't changing the state of the job upon terminating the orted's in the case of an abnormal termination. This caused mpirun to hang with a zommbie'd aprun process if an orted on a node in the job was killed via signal.	2015-01-07 12:31:41 -07:00
Jeff Squyres	7b43bdc984	plm base: move flag inside the #if in which it is used Avoid a compiler warning by declaring the tflag only inside the #if in which it is used (i.e., if hwloc support is built).	2014-12-18 10:56:23 -08:00
Ralph Castain	bb529ebd8e	Revise the way we handle hetero nodes as users are finding this (a) a significant surprise, and (b) confusing as to when it is required. So try to automate it a bit by creating a topology "signature" that mpirun can share on the cmd line with the remote daemons, thus allowing them to check to see if they match. This isn't comprehensive of course - for now, it only checks the number of each type of hwloc object on the node. This is good enough to pickup major differences (e.g., where we have different numbers of sockets or assigned core bindings). Retain the hetero-nodes flag for those cases where the user knows that there are differences and our automated system isn't good enough to see it. Will obviously require further refinement as we find out which variances it can detect, and which it cannot.	2014-12-08 15:38:14 -08:00
Ralph Castain	c4002a8485	Further cleanups on the LSF integration - the affinity file is apparently always present, but simply empty if affinity wasn't set.	2014-12-04 12:24:35 -08:00
Ralph Castain	c88f181efe	Fix singleton comm-spawn, yet again. The new grpcomm collectives require a complete knowledge of every active proc in the system in case they participate in a collective. So ensure we pass the required job info when we spawn new daemons, and construct the necessary connections to allow grpcomm to operate.	2014-12-03 18:11:17 -08:00
Jeff Squyres	a3af7d6dbb	Revert "lsf configury: add dependent libraries for static linking" This reverts commit `56cfa90dda`.	2014-12-03 13:32:56 -08:00
Jeff Squyres	92c2ff91ec	Revert "Cleanup static build requirements by adding the wrapper flags back to the component configure.m4's. Minor cleanup of the lsf configure logic." This reverts commit open-mpi/ompi@32bf0e7b7e.	2014-12-03 13:15:20 -08:00

1 2 3 4 5 ...

634 Коммитов