1
1
Граф коммитов

633 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
29bc24bdd5 Improve the transport key print statement to ensure that we don't get zero fields as this can be a problem for PSM 2016-04-28 20:11:12 -07:00
Ralph Castain
75dc4c305a Correctly set the #procs in the job to "job_size", and the max_procs to "univ_size" 2016-04-27 12:00:19 -07:00
Ralph Castain
2432daf065 Some minor cleanups of a memory leak and error output 2016-04-08 07:46:18 -07:00
Rainer Keller
52080a5736 As per the pull request to pmix/master:
https://github.com/pmix/master/pull/71

Have OMPI's current version of pmix120 nicely fail in case of
too long sun_path (longer than 108 or in case of OSX 103 chars).
And have OMPI return proper error messages with hints how to
amend.
2016-04-07 22:12:53 +02:00
Ralph Castain
a3fea58d1c Minor cleanups to prior PR commit 2016-03-24 15:55:14 -07:00
rhc54
6756e19aa2 Merge pull request #1457 from anandhis/master
rml changes
2016-03-24 15:17:29 -07:00
Ralph Castain
8c14df2328 Revert "Modify singularity support per patch from Greg Kurtzer"
This reverts commit open-mpi/ompi@f7257a8310.

Ensure that we properly cleanup the session directory tree. Prior code had issues with symlinks, especially if the file that the link points to was already removed as we traverse the tree. Also found that the dirent checks for directory type weren't fully portable, and so fall back to the stat-based approach which is known to be portable.

Fix singularity singletons by detecting we are in a container and properly setting the pmix selection to pick the isolated component. Remove a stale restriction blocking use of the sm btl
2016-03-24 11:27:18 -07:00
Ralph Castain
c146c4969b Revert part of open-mpi/ompi@c1bbbb5e2f to restore the usock component, thus fixing show_help aggregation.
Fixes #1467

Restore debugger attach operations

Fixes #1225
2016-03-18 21:49:04 -07:00
Ralph Castain
a4c8e8c28a Cleanup the proposed change:
* qos framework is moving to the scon layer and is no longer required in ORTE

* remove the rml/ftrm component as we now have multiple active components, and so the wrapper needs to be rethought

* no need for separating the "base" from "API" module definition. The two are identical

* move the "stub" functions into their own file for cleanliness

* general cleanup to meet coding standards

* cleanup some logic in the stubs
2016-03-10 13:14:17 -08:00
Ralph Castain
f3ae30ff39 Fix singletons yet again... 2016-03-08 10:33:35 -08:00
Ralph Castain
d72c1c72ff Do not push child processes into separate process groups so that any host RM can still "see" them, and ensure that any signal sent to the orted's themselves will be provided to all child processes. Forward all signals from mpirun to the child processes, removing the old MCA parameter required to turn that behavior "on". 2016-03-06 17:55:09 -08:00
Gilles Gouaillardet
80bdbfd9e7 add missing include file 2016-03-03 13:46:28 +09:00
Ralph Castain
4a55fba414 Fix registration of error handlers thru the pmix120 component. A thread-shift operation was hanging on the sync_event_base, which made it dependent on someone calling opal_progress. Unfortunately, a process in "sleep" or spinning outside the MPI library won't do that, and so we never complete errhandler registration. 2016-03-02 15:01:01 -08:00
Ralph Castain
011403c04a Fix a number of issues, some of which have lingered for a long time:
* provide a more reliable way of determining that a process is a singleton by leveraging the schizo framework. Add new components for slurm, alps, and orte to detect when we are in a managed environment, and if we have been launched by mpirun or a native launcher. Set the correct envars to control ess and pmix selection in each case.

* change the relative priority of the pmix120 and pmix112 components to make pmix120 the default

* fix singleton comm-spawn by correctly setting the num_apps field of the orte_job_t created by the daemon - this fixes a segfault in register_nspace on newly created daemons

* ensure orterun doesn't propagate any ess or pmix directives in its environment

* Cleanup a few valgrind issues and memory leaks

* Fix a race condition that prevented the client from completing notification registrations (missing thread shift)

* Ensure the shizo/alps component detects launch by mpirun
2016-03-01 06:53:00 -08:00
Ralph Castain
cdb494566d Provide an option to allow isolated singletons 2016-02-25 11:33:26 -06:00
Ralph Castain
d653cf2847 Convert the orte_job_data pointer array to a hash table so it doesn't grow forever as we run lots and lots of jobs in the persistent DVM. 2016-02-21 11:55:49 -08:00
Ralph Castain
8f9508cace Further enhance the support for Singularity containers. Extend the "personality" command-line option to allow specifying both model (e.g., "ompi") and container (e.g., "singularity"), and add the necessary logic to support multiple options. Add a new pmix "isolated" component to handle singletons where no HNP is available since containers cannot launch the HNP. 2016-02-17 13:33:06 -08:00
Ralph Castain
8ab28cdc82 Fix a typo that causes segfaults on multi-node executions 2015-12-24 08:43:47 -08:00
annu13
43f44f31c1 moved code to job setup first before enabling comm 2015-12-22 14:30:59 -08:00
Ralph Castain
94ffe10808 Do not override any external settings for PMIx component selection 2015-12-21 08:36:12 -08:00
Ralph Castain
1db3db022a Don't be so prescriptive about the ess component to be used - we just need to protect against the proc incorrectly taking the singleton component, so rule that one out. Ensure that the other components understand that they are only for use by daemons. 2015-12-09 19:54:44 -08:00
Ralph Castain
8823069fe9 Provide a mechanism by which a tool can request async progress thread support for ORTE 2015-12-04 08:26:57 -08:00
Nathan Hjelm
9602484568 Merge pull request #1040 from hjelmn/mtl_priority
Change how cm's priority is calculated
2015-10-19 14:18:36 -06:00
Nathan Hjelm
8b5810f7f7 mca/base: add priority output to mca_base_select
The mca_base_select function uses returned priorities to select the
best component/module. This priority may be of use to the caller so
pass that information back in an optional argument. If the priority is
not needed pass NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:41 -06:00
Ralph Castain
363f62a506 Fix singleton operations when running under a SLURM allocation. Sadly, SLURM's PMI will return success even if the PMI server isn't actually available. This leads to erroneous selection of pmix and ess components. So add a further requirement (namely, that we see a job_step envar) to the SLURM pmix components along with some modification of ess selection code to avoid the problem 2015-10-17 20:24:03 -07:00
Ralph Castain
0140ff048d Now that we have an "isolated" PLM component, we cannot just let rsh silently decline to run when it cannot find a launch agent - if we do, then we will -always- run on the local node. So if the user specifies a launch agent and we can't find it, then generate a pretty error message, report a fatal error back to the component select, and exit out.
This required modifying the mca_component_select function to actually check the return code on a component query - it was blissfully ignoring it.

Also do a little cleanup to avoid bombarding the user with multiple error messages.

Thanks to Patrick Begou for reporting the problem
2015-09-24 07:16:48 -07:00
Ralph Castain
749bd4e6fe Plug a few memory leaks identified by valgrind 2015-09-23 15:21:04 -07:00
Ralph Castain
f872e99315 Fix orte-submit so it allows application procs to select the correct ess component. Protect orte_data_server from multiple calls to finalize. 2015-09-21 20:31:57 -07:00
Howard Pritchard
8d7e759b85 oob/alps: swat compiler warning
swat some alps related compiler warnings when using --enable-picky

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-09-21 14:24:26 -07:00
Ralph Castain
8b88ea9b13 Fix singletons by removing stale code 2015-09-16 00:58:05 -07:00
Ralph Castain
c1bbbb5e2f Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds 2015-09-15 13:08:35 -07:00
Ralph Castain
dc5796b8a1 Revert "Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local""
Fix the locality computation by correctly computing the vpid of the local peer

This reverts commit open-mpi/ompi@6a8fad49e5.
2015-09-11 08:29:51 -07:00
Ralph Castain
6a8fad49e5 Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local"
This reverts commit f94f3cda21.
2015-09-11 02:01:25 -07:00
Ralph Castain
f94f3cda21 Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local 2015-09-10 10:25:30 -07:00
Ralph Castain
1cdb86b8c7 Cleanup s1 and s2 components, and ensure that mpirun and orteds only use non-direct-launch pmix components. 2015-09-08 18:37:09 -07:00
Ralph Castain
37c3ed68e7 Cleanup connect/disconnect and bring comm_spawn back online! 2015-09-06 10:27:39 -07:00
rhc54
665b30376a Merge pull request #868 from rhc54/topic/hwloc
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 17:58:07 -07:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
Ralph Castain
f6948c2bb4 Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working 2015-09-04 10:07:17 -07:00
Ralph Castain
38ba54366c Fix shared memory operations by resolving local peers 2015-08-30 12:07:14 -07:00
Ralph Castain
0d5814b5ca Cleanup Coverity issues 2015-08-29 21:19:27 -07:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
Ralph Castain
0b1d4b62be Cleanup some cruft and update to coordinate with CM operations:
* don't pass --tree-spawn to the orted cmd line. If someone doesn't want tree-spawn, it shows up as an MCA param anyway
* ensure state/orted component disqualifies itself from CM operations
* clarify the DVM proc_type definitions
* ensure we stop littering the tmp dir with session directories
2015-08-12 10:32:14 -07:00
Jeff Squyres
09f7434491 ORTE: update for the new opal_progress_thread API 2015-08-07 10:13:40 -07:00
Ralph Castain
219c4dfba5 Create a new opal_async_event_base and have the pmix/native and ORTE level use it. This reduces our thread count by one. 2015-07-12 08:23:34 -07:00
Ralph Castain
836f49597d There is no reason for tools to have an async progress thread as they can loop the event library themselves. This has the added benefit of causing the tool to "block" while waiting for events so they don't use cpu.
Also, fix orte-submit so it appropriately handles --help option
2015-07-05 10:45:28 -07:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Howard Pritchard
3382d3ce61 ess/alps: remove unnecessary vpid calc
There was a redundant computation of the vpid
for orted's happening in ess/alps rte_init
method.  Keep the more efficient alps based
method.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-05-09 20:07:38 -07:00
Ralph Castain
b5382c9bf9 Rework the OOB selection logic to allow a component (e.g., usock) to direct that it be the sole active component. Remove prior disqualifying code in the oob/tcp component as it was too restrictive - if usock wasn't able to run, it left apps with no way to communicate to their daemon. Have the local daemon check the global modex for the RML URI info of the local procs so it can route messages between them when tcp is the primary channel.
A few other minor cleanups included.
2015-05-08 11:15:21 -07:00
Ralph Castain
1f8de276de Consolidate all the QOS changes into one clean commit 2015-05-06 19:48:42 -07:00
Nathan Hjelm
359a282e7d ess/singleton: MCA variable synonyms can not currently have NULL for both framework and component
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-20 16:50:52 -06:00
Nathan Hjelm
45e053dbce orte: use C99 subobject naming for component initialization
This commit helps future-proof orte components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Nathan Hjelm
3436f2917d Merge pull request #449 from hjelmn/mca_base_update
mca/base update
2015-04-16 08:41:48 -06:00
Ralph Castain
cd686057f6 If the HNP is on a coprocessor, record it so we don't get an error log later 2015-04-11 15:30:15 -07:00
Ralph Castain
3e44d3c9e3 Enable singletons to run without any active OOB module until they attempt to comm_spawn 2015-04-10 14:06:42 -07:00
Ralph Castain
e4f6f83b9d Attempt to silence new Coverity complaint by ensuring the string read from file is NULL terminated. 2015-04-10 07:54:37 -07:00
Nathan Hjelm
c416c423bb ess/singleton: do not put component strings into the environment
putenv requires that any string put into the environment is not
changed or freed. That is not the case with constant strings as they
will go away when dlclose is called on the component. Instead, just
use opal_setenv which does not have this restriction.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-09 11:00:47 -06:00
Ralph Castain
6d205a3c80 Ensure that singletons pickup the oob/tcp component 2015-03-30 18:10:08 -07:00
Nathan Hjelm
b68d66bb9b MCA: Add the project/project version to the MCA base component
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.

All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Ralph Castain
43a3baad5e Ensure we use the first compute node's topology for mapping
Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes.

Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset.

Correctly count the number of available PUs under each object when given a cpuset

Fix the default binding settings, and correctly count PUs when no cpuset is given

Ensure the binding policy gets set in all cases
2015-03-19 16:30:36 -07:00
Ralph Castain
a0487e014c Further reduce the RARP load by removing getaddrinfo for IPv6 connections. Correct typo when checking return on inet_pton. Don't consider the TCP component for apps that are launched via mpirun as it will never be used. 2015-03-16 19:42:05 -07:00
Adrian Reber
c08e234af7 FT: fix compilation using --with-ft (5/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

With the changes introduced in the previous patches in this series
some goto constructs for cleanup are no longer necessary and removed.
2015-03-11 14:23:33 +01:00
Adrian Reber
1c5a8df724 FT: fix compilation using --with-ft (2/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

The FT code used barrier mechanisms which have been removed
with aec5cd08bd. This patch replaces
all those different barriers with opal_pmix.fence(NULL, 0);
I am not sure this is completely correct but at least a starting
point for a review.
2015-03-11 14:23:33 +01:00
Adrian Reber
f45dd069bd FT: fix compilation using --with-ft (1/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

This first patch moves orte_cr_continue_like_restart from ORTE
to opal_cr_continue_like_restart in OPAL. This only leaves three
calls from OPAL to ORTE in the FT code. As it is not yet 100%
clear how to handle these calls the code orte_sstore.set_attr()
has been #ifdef'd out for now.
2015-03-11 14:23:33 +01:00
Howard Pritchard
bf89131f9e add owner files to opa/ompi/orte mca directories
This commit adds an owner file in each of the component directories
for each framework.  This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page.  Currently there are two
"fields" in the file, an owner and a status.  A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
2015-02-22 15:10:23 -07:00
Howard Pritchard
b62d9c2c70 ess/alps: fix compile issue for pgi
remote -fi-noident cflag option.  Wasn't helping anyway
and caused pgi compiles to break.
2015-02-09 20:49:04 -08:00
Ralph Castain
a3275aa867 Once again, fix the blasted singleton comm_spawn 2015-02-05 17:34:25 -08:00
Ralph Castain
f28238af59 Fix a race condition seen by Absoft during finalize. Stop the orte progress thread without cleaning it up, thus allowing the frameworks to still cancel their posted recv's. Then cleanup the memory footprint afterwards. 2015-02-05 11:41:37 -08:00
Ralph Castain
2b0b012460 Continue refinement of the DVM operations. Send the spawn request to the right place (it helps) as it isn't a comm_spawn request and has to be treated a little differently. Ensure IO gets forwarded back to the tool. Ensure the tool outputs show_help locally as there is no place to send it. 2015-02-04 06:21:54 -08:00
Ralph Castain
7299cc3ab9 Cleanup the communications handshake so that orte-submit properly terminates upon job completion, and properly sends the terminate command to orte-dvm 2015-02-03 07:25:43 -08:00
Ralph Castain
ec5ccb76cf Enable persistent ORTE DVM so users can execute multiple OMPI jobs within an allocation without restarting the DVM every time. 2015-01-30 11:00:43 -08:00
Ralph Castain
b838df9eb8 Get slurm to stay out of the way on singletons 2015-01-27 09:29:43 -06:00
Ralph Castain
294ebc907a Fix singleton operations so they can work inside a slurm environment 2015-01-27 09:29:42 -06:00
Ralph Castain
3eca55caec Continue fixing singletons in slurm environments 2015-01-27 09:29:42 -06:00
Ralph Castain
88c38f87d2 Get the orteds to use schizo as well 2015-01-27 09:29:42 -06:00
Ralph Castain
028b00154d Complete implementation of the schizo framework to support OMPI component 2015-01-27 09:29:42 -06:00
Ralph Castain
e7ff21b3aa The opal_stop_progress_thread function releases the event base, so don't do it again 2015-01-15 10:48:40 -08:00
Ralph Castain
9ac39b63cc Use the opal_progress_threads support for the ORTE progress thread in applications 2015-01-15 07:55:19 -08:00
Ralph Castain
bb529ebd8e Revise the way we handle hetero nodes as users are finding this (a) a significant surprise, and (b) confusing as to when it is required. So try to automate it a bit by creating a topology "signature" that mpirun can share on the cmd line with the remote daemons, thus allowing them to check to see if they match. This isn't comprehensive of course - for now, it only checks the number of each type of hwloc object on the node. This is good enough to pickup major differences (e.g., where we have different numbers of sockets or assigned core bindings).
Retain the hetero-nodes flag for those cases where the user *knows* that there are differences and our automated system isn't good enough to see it.

Will obviously require further refinement as we find out which variances it can detect, and which it cannot.
2014-12-08 15:38:14 -08:00
Howard Pritchard
c67afadcfc Merge pull request #289 from hppritcha/topic/remove_pmi
Topic/remove pmi
2014-12-03 16:58:35 -07:00
Jeff Squyres
a3af7d6dbb Revert "lsf configury: add dependent libraries for static linking"
This reverts commit 56cfa90dda.
2014-12-03 13:32:56 -08:00
Jeff Squyres
92c2ff91ec Revert "Cleanup static build requirements by adding the wrapper flags back to the component configure.m4's. Minor cleanup of the lsf configure logic."
This reverts commit open-mpi/ompi@32bf0e7b7e.
2014-12-03 13:15:20 -08:00
Ralph Castain
54c955c92d Fix a race condition that only appears to be affecting certain setups. The pmix.finalize function closes the file descriptor to the server, which then triggers the errhandler callback. Since the errmgr is about to be unloaded, it might be getting hit. 2014-12-03 12:19:00 -08:00
Howard Pritchard
191fe0f949 alps configury changes
Clean up the orte_check_alps.m4.  There was a little of
unnecesary stuff for handling cle 5, since it wasn't actually
doing the right thing, which would be to use pkg-config to
find dependencies both for dynamic and static linking.

Decouple the searching for alps libs, etc. from cray pmi.

Switch the alps ess  and alps odls components' config files
to use the ALPS m4 macro.

alps configury fixes

Improve a check for detecting CLE release.
Improve an error message.
2014-12-03 09:44:17 -07:00
Howard Pritchard
e0487e7702 orte/common/alps: add an alps common lib to orte
Add an alps common lib to orte.  Add a function
to determine whether or not a process is in a
PAGG container.

Note: we need a better naming convention for
common libs, since right now they use a "flat"
naming convention.
2014-12-03 09:44:17 -07:00
Howard Pritchard
a753c3ece0 ess/alps: add initial alps ess component
Note this alps ess component has nothing to do
with the old CNOS alps component used on
Cray Seastar/Portals3 (Cray XT) systems.

To work properly, changes need to be made to the
open method of the ess/pmi component to keep it
from selecting, and thus initializing, the opal/pmix/cray
component.
2014-12-03 09:44:17 -07:00
Ralph Castain
32bf0e7b7e Cleanup static build requirements by adding the wrapper flags back to the component configure.m4's. Minor cleanup of the lsf configure logic. 2014-12-03 07:14:06 -08:00
Ralph Castain
6294ed991b Fix singletons - still working on singleton comm_spawn 2014-12-02 14:12:24 -08:00
Ralph Castain
14cdb04327 Revise the ess/pmi selection logic as all APPs must select it, and no daemons. Cleanup some of the mca param levels in ess so we don't printout the topology quite as easily. 2014-12-01 21:19:11 -08:00
Jeff Squyres
56cfa90dda lsf configury: add dependent libraries for static linking
Ensure to add the LSF dependent libraries and LD flags for the wrapper
compiler static linking case.
2014-12-01 14:59:10 -08:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Ralph Castain
2a90788724 Support physical processor ids in rankfile 2014-11-10 14:00:40 -08:00
Ralph Castain
894acb0aa8 configury: new OPAL_SET_MCA_PREFIX/ORTE_SET_MCA_CMD_LINE_ID macros
These two macros set the MCA prefix and MCA cmd line id,
   respectively.  Specifically, MCA parameters will be named
   PREFIX<foo> in the environment, and the cmd line will use
   -ID foo bar.

   These macros must be called during configure.ac and a value
   supplied. In the case of Open MPI, the values given are
   PREFIX=OMPI_MCA_ and ID=mca.

   Other projects (such as ORCM) will call these macros with
   their own unique values.  For example, ORCM uses PREFIX=ORCM_MCA_
   and ID=omca

   This scheme is necessary to allow running Open MPI applications under
   systems that use their own versions of ORTE and OPAL.  For example,
   when running OMPI applications under ORCM, we need the MCA params passed
   to the ORCM daemons to be separated from those recognized by the OMPI application.
2014-10-22 18:57:40 -07:00
Howard Pritchard
d2bb8d8829 remove alps ess component
The alps ess component is obsolete.  It relies on header
files only present in very old CLE (Cray Linux)  3.X for
the Cray XT series.  As support for these systems is being
dropped starting with release 1.9, this code is being removed.
2014-10-03 13:17:33 -06:00
Ralph Castain
4320457394 Fix the debug output - you can't print the cpuset pointer using the %p format without generating warnings
This commit was SVN r32811.
2014-09-29 17:10:38 +00:00
Howard Pritchard
f8ac8bb6b0 remove improper use of hwloc_bitmap_free
When using the native aprun launcher, it was observed that
there were frequent memory corruption errors occuring either
during a PMI kvs-fence operation, or at mpi termation during
opal cleanup of allocated objects.  This was especially bad
when using

aprun --c none

In some cases, the application would even just hang in finalize
if using ptmalloc, owing to some kind of infinite loop in
cleanup of small blocks, etc.

It turns out that the proble was in orte_ess_base_proc_binding's
improper use of opal_hwloc_base_get_available_cpus.  The cpuset
(bitmap) returned from that function is not meant to be freed
by the caller.

This problem is likely never observed when using the mpirun launcher
as there's an early exit if the OMPI_MCA_orte_bound_at_launch
environment variable is set.

This commit was SVN r32809.
2014-09-29 16:10:37 +00:00
Ralph Castain
17846411c3 Now that we have an ORTE thread running in apps, we can't just call "exit"
during RTE abort as that is happening in a thread, and (at least in some
environments) doesn't result in the main thread being immediately
terminated. Instead, we wind up going thru orte_finalize in the main
thread, which isn't what we want.

So replace the call to "exit" with the "quick exit" variant "_exit", which
causes the entire process to exit immediately.

(custom patch has been posted for 1.8.3)

This commit was SVN r32780.
2014-09-23 22:51:10 +00:00
Howard Pritchard
1508a01325 Fixes to enable mpirun to work again on Cray
The ess pmi module was not handling aprun launched
daemons.  All daemons were thinking they were vpid 1.

Also, turns out that on cray systems using MOM nodes
for launched jobs, just detecting whether or not a
process is in a PAGG container is not sufficient.

Crank up the priority of the alps PLM component in the
event that the configure detected the presence of both
slurm and alps.

Have the ESS pmi component open the pmix framework and
select a pmix component.

This commit was SVN r32773.
2014-09-23 15:37:26 +00:00
Ralph Castain
5cdbc00136 Re-enable the usock oob component. Ensure the TCP component promotes messages for other procs to the OOB base so that other components have a chance to send the relay. Seems to be passing MTT, so let's see how it works for others.
This commit was SVN r32650.
2014-08-30 19:33:46 +00:00
Ralph Castain
a2085a5916 Fix the PSM transport key generator to match prior releases
This commit was SVN r32649.
2014-08-30 00:48:25 +00:00
Ralph Castain
8f1b9b463e Fix shared memory operations - need to pass the local topology and cpusets of all local peers so we can properly compute relative locality for them. Also need to set default locality to "on node" in case where cpusets are not passed because procs are not bound.
This commit was SVN r32577.
2014-08-22 05:17:51 +00:00
Ralph Castain
c6f78d6e54 The PMI ess component now gets used for more than direct launch, so only set standalone_operation flag if no daemon uri is available so we aggregate show_help messages
This commit was SVN r32574.
2014-08-22 03:00:56 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Gilles Gouaillardet
a873f45a90 Fix r32460 race condition resolution when procs call MPI_Abort.
do not invoke orte_session_dir_finalize(...) so
orte_ess_base_app_abort(...) can successfully createi
<orte_process_info.proc_session_dir>/aborted

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32498.

The following SVN revision numbers were found above:
  r32460 --> open-mpi/ompi@abedb97be4
2014-08-11 05:50:32 +00:00
Gilles Gouaillardet
d139f75db4 check-help-strings cleanup
This commit was SVN r32492.
2014-08-11 03:20:37 +00:00
Ralph Castain
d29a5ab69d Okay, now handle the non-MPI apps
This commit was SVN r32399.
2014-08-01 14:49:25 +00:00
Ralph Castain
daeb9b6c4f Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain.
Thanks to Gilles for pointing out some of the discrepancies.

This commit was SVN r32398.
2014-08-01 14:44:11 +00:00
Ralph Castain
8cfadd1842 Per Paul Hargrove, add missing include to fix build under FreeBSD
RM-approved

cmr=v1.8.2:reviewer=ompi-gk1.8

This commit was SVN r32397.
2014-08-01 13:37:41 +00:00
Ralph Castain
6c5e592785 Revert r32222, r32210, and r32203 as they created a problem when daemon collectives did not involve app procs on every node. Instead, modify the ompi/mca/rte/orte/rte_orte.h to add a new function that allows apps to request new daemon collective ids for use in barrier and modex operations. This will only appear in ORTE-based installations, but it is only being used by a couple of researchers at the moment.
Update the orte/test/mpi/coll_test.c test to show the revised example.

This commit was SVN r32234.

The following SVN revision numbers were found above:
  r32203 --> open-mpi/ompi@a523dba41d
  r32210 --> open-mpi/ompi@2ce11ed5c4
  r32222 --> open-mpi/ompi@d55f16db50
2014-07-15 03:48:00 +00:00
Ralph Castain
1feaffbb15 Get the blasted singleton comm_spawn working again. There remain problems with the Slurm interaction in this use-case as the PMI components (if configured to build) try to run even when a Slurm allocation hasn't been made, but I leave that to someone else to resolve. I did, however, tell the Slurm ess to quit interfering with applications launched in this use-case by ORTE daemons, so things do work when inside a Slurm allocation.
Also discovered that the rsh launcher is not picking up --enable-orterun-prefix-by-default when invoked during singleton comm_spawn, but I was unable to see why that was happening and ran out of time.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32229.
2014-07-13 14:47:22 +00:00
Ralph Castain
2ce11ed5c4 Fix one spot missed by recent commit
This commit was SVN r32210.
2014-07-10 20:08:57 +00:00
Ralph Castain
a523dba41d NOTE: this modifies the MPI-RTE interface
We have been getting several requests for new collectives that need to be inserted in various places of the MPI layer, all in support of either checkpoint/restart or various research efforts. Until now, this would require that the collective id's be generated at launch. which required modification
s to ORTE and other places. We chose not to make collectives reusable as the race conditions associated with resetting collective counters are daunti
ng.

This commit extends the collective system to allow self-generation of collective id's that the daemons need to support, thereby allowing developers to request any number of collectives for their work. There is one restriction: RTE collectives must occur at the process level - i.e., we don't curren
tly have a way of tagging the collective to a specific thread. From the comment in the code:

 * In order to allow scalable
 * generation of collective id's, they are formed as:
 *
 * top 32-bits are the jobid of the procs involved in
 * the collective. For collectives across multiple jobs
 * (e.g., in a connect_accept), the daemon jobid will
 * be used as the id will be issued by mpirun. This
 * won't cause problems because daemons don't use the
 * collective_id
 *
 * bottom 32-bits are a rolling counter that recycles
 * when the max is hit. The daemon will cleanup each
 * collective upon completion, so this means a job can
 * never have more than 2**32 collectives going on at
 * a time. If someone needs more than that - they've got
 * a problem.
 *
 * Note that this means (for now) that RTE-level collectives
 * cannot be done by individual threads - they must be
 * done at the overall process level. This is required as
 * there is no guaranteed ordering for the collective id's,
 * and all the participants must agree on the id of the
 * collective they are executing. So if thread A on one
 * process asks for a collective id before thread B does,
 * but B asks before A on another process, the collectives will
 * be mixed and not result in the expected behavior. We may
 * find a way to relax this requirement in the future by
 * adding a thread context id to the jobid field (maybe taking the
 * lower 16-bits of that field).

This commit includes a test program (orte/test/mpi/coll_test.c) that cycles 100 times across barrier and modex collectives.

This commit was SVN r32203.
2014-07-10 18:53:12 +00:00
Adrian Reber
4b25e92194 get the FT code to compile again by adding/removing #includes
This commit was SVN r32086.
2014-06-25 18:42:17 +00:00
Ralph Castain
b618b36a2f Fix potential issue if opal_hwloc_topology is NULL
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32050.
2014-06-19 18:52:41 +00:00
Ralph Castain
42bf7466fc This isn't as big a change as it appears - a change in one place caused a whole bunch of files to require updated #include's due to some arcane linkage. Rework the orte_wait code to reflect the introduction of the state machine. If we are in cleanup mode and just want to kill all our local children, then there is no reason to be polite about it as that introduces *very* long delays at scale. Just kill the procs and move on.
Refs trac:4717

This commit was SVN r32019.

The following Trac tickets were found above:
  Ticket 4717 --> https://svn.open-mpi.org/trac/ompi/ticket/4717
2014-06-17 17:57:51 +00:00
Ralph Castain
8736a1c138 Per RFC:
http://www.open-mpi.org/community/lists/devel/2014/05/14822.php

Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root).

This commit was SVN r31916.
2014-06-01 16:14:10 +00:00
Ralph Castain
1107f9099e Per the RFC issued here:
http://www.open-mpi.org/community/lists/devel/2014/05/14827.php

Refactor PMI support

This commit was SVN r31907.
2014-06-01 04:28:17 +00:00
Nathan Hjelm
59d09ad9de orte: fix several small memory leaks
grpcomm: fix memory leaks

We were leaking the caddy object used to pass data to the callback
function. This commit fixes these leaks.

oob,rml: fix memory leaks

This commit fixes several leaks:

 - Both the oob/base and oob/tcp were leaking objects on their peer
   hash tables. Iterate on the hash tables and free any objects.

 - Leaked sent messages because of missing OBJ_RELEASE. I placed the
   release in ORTE_RML_SEND_COMPLETE to catch all the possible
   paths.

ess/base: close the state framework

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31776.
2014-05-15 15:06:27 +00:00
Ralph Castain
3a1c2fff3e Correct a misplaced bracket - daemons shouldn't be doing app-related operations
This may need a patch for 1.8.2, but we can try to directly apply it

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31754.
2014-05-14 15:23:30 +00:00
Ralph Castain
5602156a1c Use the correct abstraction layer name for the data dirs
This commit was SVN r31684.
2014-05-08 14:32:24 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Ralph Castain
87d809eefe Add a new "run-time controls" framework for setting controls on processes. Initially, just move the process binding code there under a new "hwloc" component. Additional components to support cgroups, power settings, etc. to follow
This commit was SVN r31633.
2014-05-05 19:22:06 +00:00
Ralph Castain
fae39a658d Add third flag for open when using O_CREAT. Thanks to "robi" for reporting it and providing a patch.
Fixes trac:4596

Reviewed by rhc, RM-approved

cmr=v1.8.2:reviewer=ompi-gk1.8

This commit was SVN r31626.

The following Trac tickets were found above:
  Ticket 4596 --> https://svn.open-mpi.org/trac/ompi/ticket/4596
2014-05-02 21:58:38 +00:00
Ralph Castain
ccd33a17b8 Since we cannot block when calling abort, and we want to ensure any "show_help" message at least has a chance to get out before we exit, introduce a slight delay into the abort procedure.
Refs trac:4576

This commit was SVN r31601.

The following Trac tickets were found above:
  Ticket 4576 --> https://svn.open-mpi.org/trac/ompi/ticket/4576
2014-05-02 10:46:25 +00:00
Ralph Castain
c1383ca1f3 Protect against NULL cpuset when not bound
This commit was SVN r31600.
2014-05-02 10:45:11 +00:00
Ralph Castain
d04a102ab8 Silence warnings
This commit was SVN r31573.
2014-04-30 20:55:46 +00:00
Ralph Castain
c4c9bc1573 As per the RFC:
http://www.open-mpi.org/community/lists/devel/2014/04/14496.php

Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM

This commit was SVN r31557.
2014-04-29 21:49:23 +00:00
Ralph Castain
e05b88fd18 Take another stab at resolving the "called-abort" requirement without getting stuck. Return to "drop a turd" mode, perhaps with a little more intelligence behind it. Don't worry about catching it if session dirs weren't created
cmr=v1.8.2:reviewer=jsquyres:subject=cleanup MPI_Abort hangs

This commit was SVN r31543.
2014-04-29 17:29:46 +00:00
Jeff Squyres
d8715f1e3a Close 3 more fd's that were leaking into child processes.
Child processes now look clean; I can't find any more fd's that are
leaking from the parent to children.

Refs trac:4550

This commit was SVN r31515.

The following Trac tickets were found above:
  Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
2014-04-24 15:36:24 +00:00
Ralph Castain
8594f5d738 Correctly set a non-zero exit status when mpirun is terminated by signal
Fixes trac:4537

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31423.

The following Trac tickets were found above:
  Ticket 4537 --> https://svn.open-mpi.org/trac/ompi/ticket/4537
2014-04-18 16:39:08 +00:00
Ralph Castain
8d72633acf Ensure that the session directory fields of orte_process_info have been initialized prior to cleaning up those directories as part of the initialization process that deals with stale session directory trees.
Fixes trac:4534

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31421.

The following Trac tickets were found above:
  Ticket 4534 --> https://svn.open-mpi.org/trac/ompi/ticket/4534
2014-04-18 14:25:48 +00:00
Ralph Castain
a368e84e70 Per the RFC, remove the sensor framework from the ORTE code area, relocating it offsite to the ORCM code area. Also update some ignores to ensure we don't pickup crosstalk in components
This commit was SVN r31403.
2014-04-15 21:48:24 +00:00
Ralph Castain
554da83865 Set the locality for remote procs even after a comm_spawn. Ensure we store our own local cpuset upon launch so it will be shared during comm_join.
This provides full locality - i.e., not just node-level, but all the way down to whatever common binding level exists between the procs.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31106.
2014-03-18 14:51:07 +00:00
Ralph Castain
fbc5e3b773 Deal with the corner case where we encounter an error when attempting to launch a daemon. In this case, we will order abnormal termination before daemons callback to us, and thus any attempt to send them a "die" message will fail. Ensure that mpirun at least exits cleanly in this scenario, thereby allowing the remote daemons that did get launched to commit suicide when comm fails.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31068.
2014-03-14 15:32:30 +00:00
Ralph Castain
f56f37d364 Shifting to an event-driven RTE raises some interesting issues during shutdown. We want the last messages to get thru, but also need to correctly shutdown the virtual machine. This requires a delicate balancing act across event priorities, and the need to check for termination conditions in places where related events get processed.
Change the priority of comm_failure and job_termination events to ensure we process final messages prior to terminating. Check for termination conditions when processing proc termination events as we may order proc termination when the daemon gets an exit command, but we can't see the proc actually terminate until we get out of that message event.

Jeff: probably easiest to review this by testing. I tested it under both Slurm and rsh on v1.7.5 as well as trunk

cmr=v1.7.5:reviewer=jsquyres:subject=resolve event priorities during VM shutdown

This commit was SVN r31042.
2014-03-12 16:49:58 +00:00
Ralph Castain
103a5c6df1 Output the bindings if ess verbosity is high enough
Refs trac:4356

This commit was SVN r30982.

The following Trac tickets were found above:
  Ticket 4356 --> https://svn.open-mpi.org/trac/ompi/ticket/4356
2014-03-11 01:21:14 +00:00
Ralph Castain
081669b440 When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it
cmr=v1.7.5:reviewer=rhc:subject=fix pretty-print of bindings

This commit was SVN r30968.
2014-03-10 15:53:07 +00:00
Adrian Reber
f17ec1ab10 ESS/BASE: orte-restart needs sstore
Running orte-restart requires an initialized sstore.
This opens the sstore component for FT builds just like
the snapc component.

This commit was SVN r30796.
2014-02-21 21:23:26 +00:00
Ralph Castain
63803f5e61 Fix the leader data for PMI direct-launch as well
This commit was SVN r30778.
2014-02-20 01:41:19 +00:00
Ralph Castain
262c927778 Define a new key and store the process name of the local_rank=0 process on each node so that the MPI layer can retrieve it as desired.
This commit was SVN r30759.
2014-02-18 00:32:58 +00:00
Ralph Castain
c3df744a3b Shift the orte_db_localrank key to the opal level. Add the job and proc-level session directory names to the database using opal_db keys.
This commit was SVN r30746.
2014-02-17 01:40:56 +00:00
Ralph Castain
0ee38353ba In case there are stale session directories around, do a purge of the relevant session directory tree when an orted, HNP, or singleton start. This won't help in the case of direct-launched apps, but it's the best we can do.
cmr=v1.7.5:reviewer=jsquyres:subject=purge stale session dirs at startup

This commit was SVN r30642.
2014-02-09 02:10:31 +00:00
Ralph Castain
1d8c061687 Fix a race condition that could result in assert failures during finalize. Ensure we shutdown the orte progress thread prior to finalizing the rml/oob frameworks so that no async operations are executing during destruct of the base-level lists and objects.
cmr=v1.7.5:reviewer=jsquyres:subject=fix race condition in finalize

This commit was SVN r30641.
2014-02-08 22:04:19 +00:00
Ralph Castain
a94920276d Fix singleton MPI_Abort. Singletons no longer immediately start an HNP, but only launch one when they need it for comm_spawn. So there isn't anyone to send the "abort" report to, and thus we just exit after emitting our message.
cmr=v1.7.5:reviewer=jsquyres:subject=Fix singleton MPI_Abort

This commit was SVN r30635.
2014-02-08 18:15:07 +00:00
Adrian Reber
fde1040d2f Use unique collective ids for the checkpoint/restart code
This commit was SVN r30552.
2014-02-04 14:03:05 +00:00
Ralph Castain
4e3d12d9c1 Fix suicide operation when MPI app loses connection to its local daemon. In that scenario, we correctly callback up to the MPI layer notifying it of the lost connection. However, when the MPI layer calls back down to tell the RTE to abort, it is passing back a flag indicating we should report that error to our local daemon - which is dead. This leads to an infinite loop. Break it by using checking the flag indicating an abnormal term was ordered by the RTE and thus don't attempt to send the message.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30475.
2014-01-29 16:56:54 +00:00
Adrian Reber
8c93ebffeb orte_snapc_base_select() wants to know if it is an application
The function

   int orte_snapc_base_select(bool seed, bool app);

wants to know if it called by an application or not. Therefore
it expects as second paremeter 'bool app'. It used to be
'!ORTE_PROC_IS_DAEMON' which is not always correct if it is
a tool or a HNP. This patch changes it to ORTE_PROC_IS_APP, which
has the correct information if it is an application.

This commit was SVN r30404.
2014-01-24 17:14:41 +00:00
Ralph Castain
657796f9e0 Revert r30327 - turns out it isn't quite right just yet. :-(
Closes trac:4138

This commit was SVN r30328.

The following SVN revision numbers were found above:
  r30327 --> open-mpi/ompi@87d5f86025

The following Trac tickets were found above:
  Ticket 4138 --> https://svn.open-mpi.org/trac/ompi/ticket/4138
2014-01-18 23:38:39 +00:00