1
1

343 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
Ralph Castain
0b1d4b62be Cleanup some cruft and update to coordinate with CM operations:
* don't pass --tree-spawn to the orted cmd line. If someone doesn't want tree-spawn, it shows up as an MCA param anyway
* ensure state/orted component disqualifies itself from CM operations
* clarify the DVM proc_type definitions
* ensure we stop littering the tmp dir with session directories
2015-08-12 10:32:14 -07:00
Ralph Castain
023936e84b Silence coverity warnings 2015-07-29 07:28:08 -07:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Ralph Castain
869b2891c4 When doing comm-spawn, track the last object we bound to and ensure that we start the next job on the next object so we avoid overload situations when they aren't necessary 2015-06-17 09:20:08 -07:00
Gilles Gouaillardet
2e384a3b65 initialize common symbols from orte
A few uninitialized common symbols are remaining (generated by flex) :
 * orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_leng
 * orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_text
 * orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_leng
 * orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_text
2015-05-08 10:11:58 +09:00
Ralph Castain
7d1980ba83 Add the ability to specify the number of desired slots in the --host option. Just giving a host name => one slot (multiple copies of the name yield one slot per copy). Giving "foo:3" indicates you want three slots - a shorthand notation for saying "foo" three times. Giving "foo:*" indicates you want the topology to set the number of slots based on the orte_set_slots param. 2015-04-30 20:35:23 -07:00
Nathan Hjelm
3436f2917d Merge pull request #449 from hjelmn/mca_base_update
mca/base update
2015-04-16 08:41:48 -06:00
Ralph Castain
91e1cbf284 Init variable 2015-04-11 07:44:57 -07:00
Ralph Castain
3e44d3c9e3 Enable singletons to run without any active OOB module until they attempt to comm_spawn 2015-04-10 14:06:42 -07:00
Ralph Castain
9f8ae59162 Properly enclose the different && clauses 2015-04-01 18:48:25 -07:00
Ralph Castain
57c21d5209 Ensure the DVM flows thru the "daemons reported" state 2015-04-01 16:47:34 -07:00
Mike Dubman
58d002098b Merge pull request #474 from elenash/master
Introduce -tune command line option to set env vars and mca params from ...
2015-04-01 08:23:34 +03:00
Ralph Castain
6f9140a341 Add a little more debug to launch 2015-03-31 20:10:21 -07:00
Nathan Hjelm
b68d66bb9b MCA: Add the project/project version to the MCA base component
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.

All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Elena
90f5b2bb84 Introduce -tune command line option to set env vars and mca params from file 2015-03-26 18:33:53 +02:00
Ralph Castain
6aa33deafb Remove debug 2015-03-25 19:58:51 -07:00
Ralph Castain
6ba76ed8d8 Per user request, we allow -host to specify a host that is not included in a hostfile (however, we reject it if we were given an allocation by a resource manager). Since we cannot know if an IP addr form references the same node that was previously given as a string name, we have no choice but to assume they are different. Get the topology from the right place in that situation so mpirun can succeed. 2015-03-25 06:16:01 -07:00
Ralph Castain
43a3baad5e Ensure we use the first compute node's topology for mapping
Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes.

Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset.

Correctly count the number of available PUs under each object when given a cpuset

Fix the default binding settings, and correctly count PUs when no cpuset is given

Ensure the binding policy gets set in all cases
2015-03-19 16:30:36 -07:00
Gilles Gouaillardet
2ab9a411f8 plm/base: fix misc memory leaks
as reported by Coverity with CIDs 1196733 and 1196745
2015-03-09 16:25:07 +09:00
Jeff Squyres
05f00aface plm base: ensure mca_base_var_get_value() and mca_base_var_find() succeed
This was CID 993712
2015-02-24 15:48:50 -05:00
Howard Pritchard
bf89131f9e add owner files to opa/ompi/orte mca directories
This commit adds an owner file in each of the component directories
for each framework.  This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page.  Currently there are two
"fields" in the file, an owner and a status.  A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
2015-02-22 15:10:23 -07:00
Ralph Castain
3ae3b96c17 Fix master compilation - a buried header dependency must have been removed. 2015-02-10 07:22:10 -08:00
Ralph Castain
a3275aa867 Once again, fix the blasted singleton comm_spawn 2015-02-05 17:34:25 -08:00
Ralph Castain
2b0b012460 Continue refinement of the DVM operations. Send the spawn request to the right place (it helps) as it isn't a comm_spawn request and has to be treated a little differently. Ensure IO gets forwarded back to the tool. Ensure the tool outputs show_help locally as there is no place to send it. 2015-02-04 06:21:54 -08:00
Ralph Castain
ec5ccb76cf Enable persistent ORTE DVM so users can execute multiple OMPI jobs within an allocation without restarting the DVM every time. 2015-01-30 11:00:43 -08:00
Jeff Squyres
7b43bdc984 plm base: move flag inside the #if in which it is used
Avoid a compiler warning by declaring the tflag only inside the #if in
which it is used (i.e., if hwloc support is built).
2014-12-18 10:56:23 -08:00
Ralph Castain
bb529ebd8e Revise the way we handle hetero nodes as users are finding this (a) a significant surprise, and (b) confusing as to when it is required. So try to automate it a bit by creating a topology "signature" that mpirun can share on the cmd line with the remote daemons, thus allowing them to check to see if they match. This isn't comprehensive of course - for now, it only checks the number of each type of hwloc object on the node. This is good enough to pickup major differences (e.g., where we have different numbers of sockets or assigned core bindings).
Retain the hetero-nodes flag for those cases where the user *knows* that there are differences and our automated system isn't good enough to see it.

Will obviously require further refinement as we find out which variances it can detect, and which it cannot.
2014-12-08 15:38:14 -08:00
Ralph Castain
c88f181efe Fix singleton comm-spawn, yet again. The new grpcomm collectives require a complete knowledge of every active proc in the system in case they participate in a collective. So ensure we pass the required job info when we spawn new daemons, and construct the necessary connections to allow grpcomm to operate. 2014-12-03 18:11:17 -08:00
Ralph Castain
48f702827e First part of memory leak cleanups from Gilles 2014-11-24 16:53:33 -08:00
Ralph Castain
526682e2f9 Add the ability for a tool that requests spawn of a job to also request forwarding of all output to the tool. The tool is responsible for its own call to push its stdin to the new job. The push request can come -after- the job is started, but the pull request has to be done during the spawn procedure or else output can be lost. 2014-10-23 08:16:49 -07:00
Ralph Castain
894acb0aa8 configury: new OPAL_SET_MCA_PREFIX/ORTE_SET_MCA_CMD_LINE_ID macros
These two macros set the MCA prefix and MCA cmd line id,
   respectively.  Specifically, MCA parameters will be named
   PREFIX<foo> in the environment, and the cmd line will use
   -ID foo bar.

   These macros must be called during configure.ac and a value
   supplied. In the case of Open MPI, the values given are
   PREFIX=OMPI_MCA_ and ID=mca.

   Other projects (such as ORCM) will call these macros with
   their own unique values.  For example, ORCM uses PREFIX=ORCM_MCA_
   and ID=omca

   This scheme is necessary to allow running Open MPI applications under
   systems that use their own versions of ORTE and OPAL.  For example,
   when running OMPI applications under ORCM, we need the MCA params passed
   to the ORCM daemons to be separated from those recognized by the OMPI application.
2014-10-22 18:57:40 -07:00
Ralph Castain
039b7acfb5 Fix the quoting algorithm so only rsh command lines get quoted values
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32586.
2014-08-22 22:47:38 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Jeff Squyres
1551339eba rsh: revert part of r32517: keep the quoting
As part of reviewing CMR #4860, I talked through r32517 with Ralph.

In attempt to fix various rsh quoting problems, r32517 removed all the
quoting from the main code path and then only added it back in at the
end in some cases.

This commit puts back the quoting parts that were removed in r32517
(r32517 fixed 2 other important bugs: a) change "--<foo>" to "--mca
<foo_equivalent> 1" so that de-duplication works, and b) change a !=
to ==).

refs trac:4860

This commit was SVN r32524.

The following SVN revision numbers were found above:
  r32517 --> open-mpi/ompi@7342bce58f

The following Trac tickets were found above:
  Ticket 4860 --> https://svn.open-mpi.org/trac/ompi/ticket/4860
2014-08-13 19:27:10 +00:00
Ralph Castain
7342bce58f Cleanup the over-aggressive quoting of params on the orted cmd line. Remove duplicates caused by passing on both cmd line shortcuts and the mca param version of the same thing.
Fixes trac:4857

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32517.

The following Trac tickets were found above:
  Ticket 4857 --> https://svn.open-mpi.org/trac/ompi/ticket/4857
2014-08-13 03:51:04 +00:00
Ralph Castain
0cad281a92 Single-word cmd line values for orted are dealt with in orte_plm_base_orted_append_basic_args, so protect against special characters there. Have the rsh module only deal with multi-word arguments as those were skipped by orte_plm_base_orted_append_basic_args.
Refs trac:4802

This commit was SVN r32293.

The following Trac tickets were found above:
  Ticket 4802 --> https://svn.open-mpi.org/trac/ompi/ticket/4802
2014-07-23 17:06:51 +00:00
Ralph Castain
6c5e592785 Revert r32222, r32210, and r32203 as they created a problem when daemon collectives did not involve app procs on every node. Instead, modify the ompi/mca/rte/orte/rte_orte.h to add a new function that allows apps to request new daemon collective ids for use in barrier and modex operations. This will only appear in ORTE-based installations, but it is only being used by a couple of researchers at the moment.
Update the orte/test/mpi/coll_test.c test to show the revised example.

This commit was SVN r32234.

The following SVN revision numbers were found above:
  r32203 --> open-mpi/ompi@a523dba41d
  r32210 --> open-mpi/ompi@2ce11ed5c4
  r32222 --> open-mpi/ompi@d55f16db50
2014-07-15 03:48:00 +00:00
Ralph Castain
1feaffbb15 Get the blasted singleton comm_spawn working again. There remain problems with the Slurm interaction in this use-case as the PMI components (if configured to build) try to run even when a Slurm allocation hasn't been made, but I leave that to someone else to resolve. I did, however, tell the Slurm ess to quit interfering with applications launched in this use-case by ORTE daemons, so things do work when inside a Slurm allocation.
Also discovered that the rsh launcher is not picking up --enable-orterun-prefix-by-default when invoked during singleton comm_spawn, but I was unable to see why that was happening and ran out of time.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32229.
2014-07-13 14:47:22 +00:00
Ralph Castain
a523dba41d NOTE: this modifies the MPI-RTE interface
We have been getting several requests for new collectives that need to be inserted in various places of the MPI layer, all in support of either checkpoint/restart or various research efforts. Until now, this would require that the collective id's be generated at launch. which required modification
s to ORTE and other places. We chose not to make collectives reusable as the race conditions associated with resetting collective counters are daunti
ng.

This commit extends the collective system to allow self-generation of collective id's that the daemons need to support, thereby allowing developers to request any number of collectives for their work. There is one restriction: RTE collectives must occur at the process level - i.e., we don't curren
tly have a way of tagging the collective to a specific thread. From the comment in the code:

 * In order to allow scalable
 * generation of collective id's, they are formed as:
 *
 * top 32-bits are the jobid of the procs involved in
 * the collective. For collectives across multiple jobs
 * (e.g., in a connect_accept), the daemon jobid will
 * be used as the id will be issued by mpirun. This
 * won't cause problems because daemons don't use the
 * collective_id
 *
 * bottom 32-bits are a rolling counter that recycles
 * when the max is hit. The daemon will cleanup each
 * collective upon completion, so this means a job can
 * never have more than 2**32 collectives going on at
 * a time. If someone needs more than that - they've got
 * a problem.
 *
 * Note that this means (for now) that RTE-level collectives
 * cannot be done by individual threads - they must be
 * done at the overall process level. This is required as
 * there is no guaranteed ordering for the collective id's,
 * and all the participants must agree on the id of the
 * collective they are executing. So if thread A on one
 * process asks for a collective id before thread B does,
 * but B asks before A on another process, the collectives will
 * be mixed and not result in the expected behavior. We may
 * find a way to relax this requirement in the future by
 * adding a thread context id to the jobid field (maybe taking the
 * lower 16-bits of that field).

This commit includes a test program (orte/test/mpi/coll_test.c) that cycles 100 times across barrier and modex collectives.

This commit was SVN r32203.
2014-07-10 18:53:12 +00:00
Ralph Castain
8c85ca350e Remove debug
This commit was SVN r32200.
2014-07-10 18:28:24 +00:00
Ralph Castain
356e7ea904 Move all collective id's into the attributes and let the job pack/unpack take care of them instead of singling them out. Add the envars just prior to forking the children instead of into the launch message itself. Remove a few #if CR as the attributes functionality can handle this condition now.
This commit was SVN r32133.
2014-07-03 15:58:13 +00:00
Adrian Reber
cabf1d4e68 use the orte attributes in the FT code to fix compile errors
This commit was SVN r32093.
2014-06-26 03:19:17 +00:00
Ralph Castain
42bf7466fc This isn't as big a change as it appears - a change in one place caused a whole bunch of files to require updated #include's due to some arcane linkage. Rework the orte_wait code to reflect the introduction of the state machine. If we are in cleanup mode and just want to kill all our local children, then there is no reason to be polite about it as that introduces *very* long delays at scale. Just kill the procs and move on.
Refs trac:4717

This commit was SVN r32019.

The following Trac tickets were found above:
  Ticket 4717 --> https://svn.open-mpi.org/trac/ompi/ticket/4717
2014-06-17 17:57:51 +00:00
Ralph Castain
b2413a6b88 Cannot update the proc state prior to activating the state machine as some callback functions need to compare the prior proc state against the new one.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31949.
2014-06-04 03:40:08 +00:00
Ralph Castain
f1978fba7c Cleanup a set of typos on the orte_get_attribute call
This commit was SVN r31942.
2014-06-03 20:36:38 +00:00
Ralph Castain
742c0d2284 Fix typo that would cause a segfault if orte_startup_timeout was set
This commit was SVN r31929.
2014-06-02 15:59:18 +00:00
Ralph Castain
8736a1c138 Per RFC:
http://www.open-mpi.org/community/lists/devel/2014/05/14822.php

Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root).

This commit was SVN r31916.
2014-06-01 16:14:10 +00:00