1
1
Граф коммитов

392 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
13665bffe8 Per an off-list discussion, it appears possible for a system to report failure when executing getpwuid. There are several reasons for this error to occur, most notably if the system uses a network-based authentication protocol (e.g., NIS) and that sytem gets overwhelmed when we launch on a lot of nodes.
There is no good way to recover from this scenario, and from past experience, using the user's name in the session directory (as opposed to the uid) is very helpful when things go wrong. So print a help message when this happens (it is extremely rare, but has happened at least once now) and return an error.

cmr:v1.7.3,reviewer=jsquyres
cmr:v1.6.5,reviewer=jsquyres

This commit was SVN r28658.
2013-06-20 04:30:42 +00:00
Jeff Squyres
089c632cce Remove a bunch of dead code: gcc 4.7 warns of set-but-unused
variables.  So get rid of them.

This commit was SVN r28538.
2013-05-17 21:45:49 +00:00
Ralph Castain
45af6cf59e The move of the orte_db framework to opal required that we create an opaque opal_identifier_t type as OPAL cannot know anything about the ORTE process name. However, passing a value down to opal and then having the db components reference it causes alignment issues on Solaris Sparc platforms. So pass the pointer instead and do the old "memcpy" trick to avoid the problem.
This commit was SVN r28308.
2013-04-08 23:34:16 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Ralph Castain
24b91839aa Ensure the process knows it local cpuset early enough to perform the locality computation
This commit was SVN r28221.
2013-03-26 19:14:23 +00:00
Ralph Castain
6ee32767d4 Restore the cpus-per-proc option for byslot and bynode mapping. Remove the bind_idx (which recorded the index of the hwloc object where the proc was bound) as this would no longer be unique, and just use the bitmap as the standard reference for location. Update the relative locality computation to take bitmaps as its argument.
This commit was SVN r28219.
2013-03-26 18:27:50 +00:00
Ralph Castain
a4b6fb241f Remove all remaining vestiges of the Windows integration
This commit was SVN r28137.
2013-02-28 17:31:47 +00:00
Ralph Castain
63727aa714 Use a non-blocking send in show_help as it could be called from inside an event
This commit was SVN r28135.
2013-02-28 17:19:18 +00:00
Ralph Castain
cf9796accd Remove the old configure option for disabling full rte support - we now use the OMPI rte framework for such purposes
This commit was SVN r28134.
2013-02-28 01:35:55 +00:00
Ralph Castain
bd9265c560 Per the meeting on moving the BTLs to OPAL, move the ORTE database "db" framework to OPAL so the relocated BTLs can access it. Because the data is indexed by process, this requires that we define a new "opal_identifier_t" that corresponds to the orte_process_name_t struct. In order to support multiple run-times, this is defined in opal/mca/db/db_types.h as a uint64_t without identifying the meaning of any part of that data.
A few changes were required to support this move:

1. the PMI component used to identify rte-related data (e.g., host name, bind level) and package them as a unit to reduce the number of PMI keys. This code was moved up to the ORTE layer as the OPAL layer has no understanding of these concepts. In addition, the component locally stored data based on process jobid/vpid - this could no longer be supported (see below for the solution).

2. the hash component was updated to use the new opal_identifier_t instead of orte_process_name_t as its index for storing data in the hash tables. Previously, we did a hash on the vpid and stored the data in a 32-bit hash table. In the revised system, we don't see a separate "vpid" field - we only have a 64-bit opaque value. The orte_process_name_t hash turned out to do nothing useful, so we now store the data in a 64-bit hash table. Preliminary tests didn't show any identifiable change in behavior or performance, but we'll have to see if a move back to the 32-bit table is required at some later time.

3. the db framework was a "select one" system. However, since the PMI component could no longer use its internal storage system, the framework has now been changed to a "select many" mode of operation. This allows the hash component to handle all internal storage, while the PMI component only handles pushing/pulling things from the PMI system. This was something we had planned for some time - when fetching data, we first check internal storage to see if we already have it, and then automatically go to the global system to look for it if we don't. Accordingly, the framework was provided with a custom query function used during "select" that lets you seperately specify the "store" and "fetch" ordering.

4. the ORTE grpcomm and ess/pmi components, and the nidmap code,  were updated to work with the new db framework and to specify internal/global storage options.

No changes were made to the MPI layer, except for modifying the ORTE component of the OMPI/rte framework to support the new db framework.

This commit was SVN r28112.
2013-02-26 17:50:04 +00:00
Ralph Castain
afb0db5b6f Okay, Jeff - just for you...flow the show help thru the orte functions so help messages will be aggregated
This commit was SVN r28007.
2013-02-01 00:35:48 +00:00
Ralph Castain
e4673f3283 Add new job state
This commit was SVN r27878.
2013-01-20 00:30:27 +00:00
Ralph Castain
ab73d11368 Oops - push missing definitions
This commit was SVN r27688.
2012-12-18 16:43:03 +00:00
Ralph Castain
43f883cb42 Add some more detailed error output to the db_hash component and nidmap code. Ensure the local nodename is included in the HNP's aliases
This commit was SVN r27622.
2012-11-18 17:57:19 +00:00
Ralph Castain
e11f32038a Add an MCA param to retain all aliases based on IP addrs for node names so that procs can look them up by interface, if desired. If the param is set, pass aliases around to all daemons and procs for local use
This commit was SVN r27619.
2012-11-16 04:04:29 +00:00
Ralph Castain
bd887f7f56 Add a new "test" component to the DFS that treats all files as remote in order to test the app-to-daemon interactions on a single machine. Set a global param to indicate we are using staged execution. Add a param to indicate it is okay for non-MPI processes to execute without finalizing. Cleanup file map load and fetch operations.
This commit was SVN r27587.
2012-11-10 14:09:12 +00:00
Ralph Castain
81d0b06842 Strip the domain info from the hostname if that option is specified, protecting IP address-based names
This commit was SVN r27586.
2012-11-10 14:05:27 +00:00
Nathan Hjelm
842caae4c7 Fix a small leak in orte/util/name_fns.c
cmr:v1.7

This commit was SVN r27576.
2012-11-07 23:59:49 +00:00
Ralph Castain
27b41a7db4 If the nodename is an IP address, we need to retain the full name (even if keep_fqdn is false) so that the ssh tree spawn can proceed.
cmr:v1.7

This commit was SVN r27561.
2012-11-05 16:59:53 +00:00
Nathan Hjelm
2acd0f83de Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter".
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.

This commit was SVN r27527.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-30 19:45:18 +00:00
Ralph Castain
a080de188f Enable orterun to directly support staged execution, treating each app as a separate job. Support transfer of file maps when support exists.
This commit was SVN r27516.
2012-10-29 23:11:30 +00:00
Ralph Castain
e6014bf2e1 Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
This commit was SVN r27477.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Nathan Hjelm
d59034e6ef MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
cmr:v1.7

This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Ralph Castain
54db4c35eb Get the trunk to build again when --without-hwloc is specified. Move a couple of key type definitions and utilities out from under the HAVE_HWLOC test so they are always available as they don't really depend on hwloc's presence. Tell two compnents not to build if hwloc is disabled:
ompi/mca/sbgp/basesmsocket
orte/mca/rmaps/lama

Remove stale configure.params files from the sbgp framework as the OMPI build system no longer looks at those files.

This commit was SVN r27377.
2012-09-26 23:24:27 +00:00
Ralph Castain
67f34c3be6 Record the bind_level recvd by the daemon for each job so it can be correctly sent to the procs. Add test in get_relative_locality to avoid descending into an infinite loop if the level is NODE (==0).
This commit was SVN r27252.
2012-09-06 20:50:07 +00:00
Ralph Castain
efa50346c8 Error out if we are filtering a hostfile and encounter a node that is not in the resource-managed allocation, giving an error message identifying the file and the node. Don't filter managed allocations thru a default hostfile as this can lead to "hidden" errors.
Don't use dash-host info on managed allocations if we using soft locations

This commit was SVN r27245.
2012-09-05 19:42:00 +00:00
Ralph Castain
fde83a44ab This confusion has been around for awhile, caused by a long-ago decision to track slots allocated to a specific job as opposed to allocated to the overall mpirun instance. We eliminated that quite a while ago, but never consolidated the "slots_alloc" and "slots" fields in orte_node_t. As a result, confusion has grown in the code base as to which field to look at and/or update.
So (finally) consolidate these two fields into one "slots" field. Add a field in orte_job_t to indicate when all the procs for a job will be launched together, so that staged operations can know when MPI operations are allowed.

This commit was SVN r27239.
2012-09-05 01:30:39 +00:00
Ralph Castain
bae5dab916 If (and only if) a user requests, set the default number of slots on any node to the number of objects of the specified type. This *only* takes effect in an unmanaged environment - i.e., if an external resource manager assigns us a number of slots, then that is what we use. However, if we are using a hostfile, then the user may or may not have given us a value for the number of slots on each node.
For those nodes (and *only* those nodes) where the user does *not* specify a slot count, we will set the number of slots according to their direction: either to the number of cores, numas, sockets, or hwthreads. Otherwise, the slot count is set to 1.

Note that the default behavior remains unchanged: in the absence of any value for #slots, and in the absence of any directive to set #slots, we will set #slots=1.

This commit was SVN r27236.
2012-09-04 20:58:26 +00:00
Ralph Castain
95019cc310 Fix a few places where we weren't completely identifying hostfile-based operations against "localhost" entries. Tell the mapper base to be silent when we don't want errors announced because nodes aren't available for mapping (something it is okay if they are fully used). Fix an infinite loop in the file prepositioning code.
This commit was SVN r27210.
2012-08-31 21:28:49 +00:00
Ralph Castain
38ce23db43 Add some protection to allow NULL bytes in byte objects and NULL strings to be handled cleanly in nidmaps and modex entries. Ensure there is a valid nidmap available for the HNP to pass down to any local procs when it is operating alone.
This commit was SVN r27188.
2012-08-31 01:07:36 +00:00
Ralph Castain
05c0464dcb Add missing protections
This commit was SVN r27183.
2012-08-30 12:17:29 +00:00
Ralph Castain
1b659de132 Get staged execution working on multi-node setups. Improve efficiency by only remapping if all procs not yet mapped in the job.
This commit was SVN r27181.
2012-08-29 20:35:52 +00:00
Ralph Castain
a3b08f5800 Fix a few things relating to comm_spawn that causes new daemons to be launched. Ensure that all new daemons receive a full pidmap. Properly mark the daemon job as "updated" when daemons are added
This commit was SVN r27177.
2012-08-29 03:11:37 +00:00
Ralph Castain
98580c117b Introduce staged execution. If you don't have adequate resources to run everything without oversubscribing, don't want to oversubscribe, and aren't using MPI, then staged execution lets you (a) run as many procs as there are available resources, and (b) start additional procs as others complete and free up resources. Adds a new mapper as well as a new state machine.
Remove some stale configure.m4's we no longer need.

Optimize the nidmaps a bit by only sending info that has changed each time, instead of sending a complete copy of everything. Makes no difference for the typical MPI job - only impacts things like staged execution where we are sending multiple (possibly many) launch messages.

This commit was SVN r27165.
2012-08-28 21:20:17 +00:00
Ralph Castain
229e3f9b2a This will break systems like orcm, but we aren't trying to support those any more - so put the nodes back in their daemon-indexed position. Will continue working to reduce search requirements in other parts of the code
This commit was SVN r27038.
2012-08-14 22:26:40 +00:00
Ralph Castain
3cb8d55c8b We can't just lookup the node in the node pool by daemon vpid as the daemons aren't stored that way - this was done because when holes exist in daemon vpids, we can generate huge orte_node_pool arrays even when only a few daemons actually exist. So we have to search for the vpid in the array
This commit was SVN r27035.
2012-08-14 18:17:59 +00:00
Ralph Castain
3938ec5361 Remove debug
This commit was SVN r27024.
2012-08-13 21:35:21 +00:00
Ralph Castain
b9b41d8662 For cases where the alpha+non-zero prefix must be removed from a node name, be sure to do it everywhere we access node names - otherwise, modex methods such as pmi will fail to correctly identify procs on the same node
This commit was SVN r27022.
2012-08-13 20:44:56 +00:00
Ralph Castain
431d5361ed For those who really preferred our prior mode of operation that mapped procs and only launched daemons on the nodes that had procs on them, introduce the "novm" state machine component. This recreates the old mode of operation by re-ordering the launch sequence so that we allocate, then map, and then launch daemons only on the reqd nodes (instead of across the entire allocation).
This commit was SVN r26946.
2012-08-03 16:30:05 +00:00
Ralph Castain
6ee35e4977 Add num_local_peers to orte_process_info so we don't keep re-computing it, ensure it is available for direct launch via pmi as well
This commit was SVN r26931.
2012-07-31 21:21:50 +00:00
Ralph Castain
94d11e04fd Add an intermediate state when the VM is ready so that third party tools can take action prior to mapping/launching apps
This commit was SVN r26902.
2012-07-28 15:33:09 +00:00
Ralph Castain
cf4606cdd5 Add debug of nidmap subsystem
This commit was SVN r26739.
2012-07-04 00:04:16 +00:00
Ralph Castain
b83fc41d54 Add a state that allows mpirun or other tools to be notified of a job completion prior to terminating so that alternative actions can be performed.
This commit was SVN r26716.
2012-07-02 22:16:32 +00:00
Ralph Castain
0dfe29b1a6 Roll in the rest of the modex change. Eliminate all non-modex API access of RTE info from the MPI layer - in some cases, the info was already present (either in the ompi_proc_t or in the orte_process_info struct) and no call was necessary. This removes all calls to orte_ess from the MPI layer. Calls to orte_grpcomm remain required.
Update all the orte ess components to remove their associated APIs for retrieving proc data. Update the grpcomm API to reflect transfer of set/get modex info to the db framework.

Note that this doesn't recreate the old GPR. This is strictly a local db storage that may (at some point) obtain any missing data from the local daemon as part of an async methodology. The framework allows us to experiment with such methods without perturbing the default one.

This commit was SVN r26678.
2012-06-27 14:53:55 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Ralph Castain
0a713cd27e Add database framework to ORTE and refactor modex code to utilize it. Create the "hash" db component from the prior modex db code. Leave the other components ignored for now - will activate them later.
Modex is still a blocking operation at this point.

This commit was SVN r26618.
2012-06-19 13:38:42 +00:00
Ralph Castain
9bedb25dda Cleanup some compiler warnings, some of which are actual logic errors
This commit was SVN r26519.
2012-05-29 20:11:51 +00:00
Ralph Castain
e705de1ce6 Complete nidmap cleanup - we don't know our node until we have unpacked all the jobs since our job is always the last one, so wait until all jobs are unpacked before assigning locality
This commit was SVN r26500.
2012-05-27 18:37:57 +00:00
Ralph Castain
be6ed9c2df Allow partial use of allocations by specifying the max number of daemons (i.e., max VM size) for the job
This commit was SVN r26499.
2012-05-27 16:48:19 +00:00