1
1
Граф коммитов

448 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
3107545709 Ensure that ORTE processes such as mpirun and orted never inadvertently bind themselves to cores. Change the mca param name used by the rank_file mapper to get user directives on slot lists to be different from that used by MPI procs to discover their binding. Add a cmd line option to orterun to make it easier for a user to specify the slot list (basically, hide the mca param name).
Discussed and reviewed with Lenny and Jeff.

This commit was SVN r19062.
2008-07-28 14:18:36 +00:00
Rolf vandeVaart
ed4920ba5f Fix a couple problems with orte-clean. Also add a new
--debug flag to help developers figure out possible future issues.

This fixes trac:1335.

This commit was SVN r18979.

The following Trac tickets were found above:
  Ticket 1335 --> https://svn.open-mpi.org/trac/ompi/ticket/1335
2008-07-22 17:41:06 +00:00
Rolf vandeVaart
84ede78e40 Clarify use of --path option to mpirun.
This commit fixes trac:1231.

This commit was SVN r18909.

The following Trac tickets were found above:
  Ticket 1231 --> https://svn.open-mpi.org/trac/ompi/ticket/1231
2008-07-14 21:27:23 +00:00
Ralph Castain
1edcfa970e Remove debug
This commit was SVN r18845.
2008-07-09 00:12:11 +00:00
Ralph Castain
f1114b4144 Upgrade the ability of orterun to deal with cmd line MCA params that are passed to the orteds. Help reduce the size of the cmd line by eliminating duplicates where possible, and alert to duplicate entries that can cause problems.
Add comments to both orterun and orted code explaining why we take a snapshot of the local environment and apply it to the local procs when they are spawned.

This commit was SVN r18842.
2008-07-08 22:36:39 +00:00
Ralph Castain
51da9f2980 Properly ensure that cmd line MCA params override environmental MCA params for procs local to mpirun.
Actually, the problem was that we were simply -adding- any enviro MCA params to whatever had been found on the cmd line. Thus, duplicate MCA param directives were winding up duplicated in the environment. Some shells took the first one in the environ array - others took the last! So we could get completely different behavior based on the whims of the shell.

This commit fixes trac:1373

This commit was SVN r18836.

The following Trac tickets were found above:
  Ticket 1373 --> https://svn.open-mpi.org/trac/ompi/ticket/1373
2008-07-08 13:48:47 +00:00
Lenny Verkhovsky
23da11fdcc add some documentation for mpirun's new --loadbalance option closing #1277
This commit was SVN r18832.
2008-07-08 07:56:54 +00:00
Lenny Verkhovsky
489c22b6b1 Added -rf|--rankfile <arg0> oprion to mpirun -h + man info
This commit was SVN r18812.
2008-07-07 13:46:22 +00:00
Lenny Verkhovsky
2fae770a86 information about carto framework in mpirun man page, ticket #1372
This commit was SVN r18809.
2008-07-07 12:02:10 +00:00
Ralph Castain
ba5498cdc6 Repair the MPI-2 dynamic operations. This includes:
1. repair of the linear and direct routed modules

2. repair of the ompi/pubsub/orte module to correctly init routes to the ompi-server, and correctly handle failure to correctly parse the provided ompi-server URI

3. modification of orterun to accept both "file" and "FILE" for designating where the ompi-server URI is to be found - purely a convenience feature

4. resolution of a message ordering problem during the connect/accept handshake that allowed the "send-first" proc to attempt to send to the "recv-first" proc before the HNP had actually updated its routes.

Let this be a further reminder to all - message ordering is NOT guaranteed in the OOB

5. Repair the ompi/dpm/orte module to correctly init routes during connect/accept.

Reminder to all: messages sent to procs in another job family (i.e., started by a different mpirun) are ALWAYS routed through the respective HNPs. As per the comments in orte/routed, this is REQUIRED to maintain connect/accept (where only the root proc on each side is capable of init'ing the routes), allow communication between mpirun's using different routing modules, and to minimize connections on tools such as ompi-server. It is all taken care of "under the covers" by the OOB to ensure that a route back to the sender is maintained, even when the different mpirun's are using different routed modules.

6. corrections in the orte/odls to ensure proper identification of daemons participating in a dynamic launch

7. corrections in build/nidmap to support update of an existing nidmap during dynamic launch

8. corrected implementation of the update_arch function in the ESS, along with consolidation of a number of ESS operations into base functions for easier maintenance. The ability to support info from multiple jobs was added, although we don't currently do so - this will come later to support further fault recovery strategies

9. minor updates to several functions to remove unnecessary and/or no longer used variables and envar's, add some debugging output, etc.

10. addition of a new macro ORTE_PROC_IS_DAEMON that resolves to true if the provided proc is a daemon

There is still more cleanup to be done for efficiency, but this at least works.

Tested on single-node Mac, multi-node SLURM via odin. Tests included connect/accept, publish/lookup/unpublish, comm_spawn, comm_spawn_multiple, and singleton comm_spawn.

Fixes ticket #1256

This commit was SVN r18804.
2008-07-03 17:53:37 +00:00
Ralph Castain
b118779c08 It is okay for us to init the ORTE mca params multiple times. Indeed, it is absolutely required by orterun as the first time has to be done prior to parsing the command line, which means that the mca values haven't been parsed yet!
Add ability for sys admins to prohibit putting session directories under specified locations. Thus, they can now protect parallel file systems from foolish user mistakes.

This commit was SVN r18721.
2008-06-24 17:50:56 +00:00
Jeff Squyres
24c3aa1d77 Really fix "make dist". Really.
This commit was SVN r18704.
2008-06-21 18:04:38 +00:00
Jeff Squyres
930667ac73 Ensure that orte-checkpoint and orte-restart man pages are always
included in the distribution tarball.  This ''appears'' to be an
Automake bug -- I have submitted a bug report to the bug-automake list:

http://lists.gnu.org/archive/html/bug-automake/2008-06/msg00019.html

This commit was SVN r18696.
2008-06-20 18:19:01 +00:00
Jeff Squyres
e4172a3c44 Shift the AM "if" logic down from orte/tools/Makefile.am down to the
individual orte/tools/*/Makefile.am files.  This causes "make" to
travese into every directory, even if it's not going to build anything
in that directory (which is a good thing).  It also helps cleanup and
dist issues.

This also affects orte-checkpoint and orte-restart, but I couldn't get
--with-ft to compile properly; I'll pass along a heads-up to Josh to
ensure that I didn't break anything.

This commit was SVN r18680.
2008-06-19 14:46:10 +00:00
Ralph Castain
571f483c39 Ensure that we don't breakpoint the debugger until -after- all procs have reported their contact info so we can successfully send the release message
This commit was SVN r18678.
2008-06-19 14:37:46 +00:00
Jeff Squyres
7e45b24001 MPIR_being_debugged is an int, not a bool.
This commit was SVN r18676.
2008-06-19 13:31:34 +00:00
Ralph Castain
b56f8ced4f Ensure params are registered prior to parsing global cmd line options in orterun so that debugger options are properly captured and acted upon.
Ensure that routes to remote procs are set on the HNP before completing launch so that the debugger message can be sent. Solves a race condition that can exist in those environments where the HNP does not have local procs.

This commit was SVN r18674.
2008-06-19 02:58:14 +00:00
Ralph Castain
282a220e7e Update the debugger interface per email thread with Jeff and Brian. Handoff to them for final test and validation
This commit was SVN r18670.
2008-06-18 15:28:46 +00:00
Ralph Castain
0532d799d6 Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm.
Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed.

This commit was SVN r18664.
2008-06-18 03:15:56 +00:00
Brian Barrett
7712b07ac4 Add perl based wrapper compilers for cross-compile environments. The default
is still to use the C based wrapper compilers (which have many more features
and are more well tested).  The Perl compilers are enabled with the option
--enable-script-wrapper-compilers, which also ignores the option
--disable-binaries (ie --enable-script-wrapper-compilers --disable-binaries
will result in perl-based wrapper compilers being installed, but no other
binaries being installed).

This commit was SVN r18655.
2008-06-13 22:52:25 +00:00
Ralph Castain
1f41069ac9 Fix CID 752 - if we can't find the daemon job object, we have to ensure we exit without attempting to dereference it
This commit was SVN r18647.
2008-06-11 14:49:58 +00:00
Ralph Castain
13ea4e4673 Be consistent - since we don't strdup the other values for param, don't strdup this one.
This allows r18645 to fix the memory corruption issue, but also allows us to resolve the memory leaks cited by CID 1039

This commit was SVN r18646.

The following SVN revision numbers were found above:
  r18645 --> open-mpi/ompi@53d83ba1c5
2008-06-11 14:42:47 +00:00
Pak Lui
53d83ba1c5 Take out a couple of free's.
This commit fixes trac:1343

This commit was SVN r18645.

The following Trac tickets were found above:
  Ticket 1343 --> https://svn.open-mpi.org/trac/ompi/ticket/1343
2008-06-11 14:02:49 +00:00
Ralph Castain
1a422995ae Fix two Coverity complaints CID 813 (value defined and not used) and 1039 (resource leak). While doing so, found and fixed another less obvious memory leak.
This commit was SVN r18641.
2008-06-10 17:53:28 +00:00
Ralph Castain
c13cadc3c7 Refs trac:1255
This commit repairs the debugger initialization procedure. I am not closing the ticket, however, pending Jeff's review of how it interfaces to the ompi_debugger code he implemented. There were duplicate symbols being created in that code, but not used anywhere. I replaced them with the ORTE-created symbols instead. However, since they aren't used anywhere, I have no way of checking to ensure I didn't break something.

So the ticket can be checked by Jeff when he returns from vacation... :-)

This commit was SVN r18625.

The following Trac tickets were found above:
  Ticket 1255 --> https://svn.open-mpi.org/trac/ompi/ticket/1255
2008-06-09 20:34:14 +00:00
Ralph Castain
2cc8b2c51f Add yet another test, this one for proper error behavior when someone call an MPI function after calling MPI_Finalize.
Add a minor debug that outputs the orterun exit status to stderr when orte_debug is set.

This commit was SVN r18622.
2008-06-09 19:21:20 +00:00
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00
Ralph Castain
83dd3d8c6f Restore the ability to forcibly terminate by providing multiple ctrl-c's
This commit was SVN r18618.
2008-06-09 13:08:54 +00:00
Ralph Castain
0da811ce79 Initial work on xml support - allocation and job map outputs completed. More to come.
This commit was SVN r18587.
2008-06-04 20:53:12 +00:00
Josh Hursey
78f14b5255 Fix the none.checkpoint command.
orte-checkpoint/orte-restart seem to not seem to totally like orte_output so revert them to opal_output for now. Since we have no need for the additional complexity of orte_output we can drop it for now and revisit this if anyone needs it later.

It seems that if you set the verbose level on an output handle then try to call a normal orte_output() on it then the message will *not* be printed. This is the same for opal_output, and seems incorrect to me because it stops some error messages from being printed out if you do not directly specify opal_output(0, ...). Maybe someone should take a look a this.


orte-checkpoint would segv if passed an incorrect PID. Fixed the return code so it errors out properly.

Thanks to Eric Roman for bringing this to my attention.

This commit was SVN r18583.
2008-06-04 14:44:11 +00:00
Jeff Squyres
a4db97c213 More man pages fixes from our Debian Open MPI package maintainer
friends.  Woo hoo!

This commit was SVN r18559.
2008-06-03 16:44:40 +00:00
Ralph Castain
c992e99035 Remove the tags from orte_output_open and the filtering operation from orte_output - this will be handled differently to improve the XML output interface
This commit was SVN r18557.
2008-06-03 14:24:01 +00:00
Ralph Castain
b456fb2d42 Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time.
Also detect orted failed-to-start by setting timeout on launch. Currently only used in TM launcher.

Neither detection is enabled by default, but are only active if heartrate is set and/or launch timeout is set. Exception for SLURM as orted failure is always detected and reported.

More info to come on devel list.

This commit was SVN r18555.
2008-06-02 21:46:34 +00:00
Ralph Castain
2772221ae0 Add the -xml option to mpirun to indicate that xml output is desired
This commit was SVN r18539.
2008-05-29 14:11:31 +00:00
Ralph Castain
72530f8fed Cleanly handle the failed start of an orted, or its unexpected failure after start. This commit will allow mpirun to exit cleanly when this occurs, and does a best-effort attempt to cleanup the mess. However, it still has two unresolved issues that need to be eventually addressed:
1. it depends upon the ability of the native environment to alert us that the orted has died/failed to start. I have included that support for SLURM, but other environments need to be done.

2. for some yet-to-be-determined reason, the message that tells the remaining daemons to "die" isn't getting out of the RML, even though no obvious blockage is standing in the way. Work will continue on resolving that problem. For now, the orteds appear to be exiting on their own quite nicely when they see their HNP "lifeline" disappear.

This represents the best-available fix for ticket #221 so I am closing that ticket at this time.

This commit was SVN r18536.
2008-05-29 13:38:27 +00:00
Jeff Squyres
af1a9290cb Ensure to check that opal_init_util() didn't fail.
This commit was SVN r18453.
2008-05-19 11:58:48 +00:00
Jeff Squyres
e7ecd56bd2 This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.

= ORTE Job-Level Output Messages =

Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):

 * orte_output(): (and corresponding friends ORTE_OUTPUT,
   orte_output_verbose, etc.)  This function sends the output directly
   to the HNP for processing as part of a job-specific output
   channel.  It supports all the same outputs as opal_output()
   (syslog, file, stdout, stderr), but for stdout/stderr, the output
   is sent to the HNP for processing and output.  More on this below.
 * orte_show_help(): This function is a drop-in-replacement for
   opal_show_help(), with two differences in functionality:
   1. the rendered text help message output is sent to the HNP for
      display (rather than outputting directly into the process' stderr
      stream)
   1. the HNP detects duplicate help messages and does not display them
      (so that you don't see the same error message N times, once from
      each of your N MPI processes); instead, it counts "new" instances
      of the help message and displays a message every ~5 seconds when
      there are new ones ("I got X new copies of the help message...")

opal_show_help and opal_output still exist, but they only output in
the current process.  The intent for the new orte_* functions is that
they can apply job-level intelligence to the output.  As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.

=== New code ===

For ORTE and OMPI programmers, here's what you need to do differently
in new code:

 * Do not include opal/util/show_help.h or opal/util/output.h.
   Instead, include orte/util/output.h (this one header file has
   declarations for both the orte_output() series of functions and
   orte_show_help()).
 * Effectively s/opal_output/orte_output/gi throughout your code.
   Note that orte_output_open() takes a slightly different argument
   list (as a way to pass data to the filtering stream -- see below),
   so you if explicitly call opal_output_open(), you'll need to
   slightly adapt to the new signature of orte_output_open().
 * Literally s/opal_show_help/orte_show_help/.  The function signature
   is identical.

=== Notes ===

 * orte_output'ing to stream 0 will do similar to what
   opal_output'ing did, so leaving a hard-coded "0" as the first
   argument is safe.
 * For systems that do not use ORTE's RML or the HNP, the effect of
   orte_output_* and orte_show_help will be identical to their opal
   counterparts (the additional information passed to
   orte_output_open() will be lost!).  Indeed, the orte_* functions
   simply become trivial wrappers to their opal_* counterparts.  Note
   that we have not tested this; the code is simple but it is quite
   possible that we mucked something up.

= Filter Framework =

Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr.  The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations.  The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc.  This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).

Filtering is not active by default.  Filter components must be
specifically requested, such as:

{{{
$ mpirun --mca filter xml ...
}}}

There can only be one filter component active.

= New MCA Parameters =

The new functionality described above introduces two new MCA
parameters:

 * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
   help messages will be aggregated, as described above.  If set to 0,
   all help messages will be displayed, even if they are duplicates
   (i.e., the original behavior).
 * '''orte_base_show_output_recursions''': An MCA parameter to help
   debug one of the known issues, described below.  It is likely that
   this MCA parameter will disappear before v1.3 final.

= Known Issues =

 * The XML filter component is not complete.  The current output from
   this component is preliminary and not real XML.  A bit more work
   needs to be done to configure.m4 search for an appropriate XML
   library/link it in/use it at run time.
 * There are possible recursion loops in the orte_output() and
   orte_show_help() functions -- e.g., if RML send calls orte_output()
   or orte_show_help().  We have some ideas how to fix these, but
   figured that it was ok to commit before feature freeze with known
   issues.  The code currently contains sub-optimal workarounds so
   that this will not be a problem, but it would be good to actually
   solve the problem rather than have hackish workarounds before v1.3 final.

This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
Ralph Castain
3e55fe6f6d Fold in the revised modex scheme. Move the ompi_proc_t modex portions to the RTE level since the daemons already have that info. Provide each process with the equivalent of a "nidmap" - both a map of what nodes are in the job, and a map of which node each process is on. This enables the use of static ports, though that hasn't been turned "on" in this commit.
Update the rsh tree spawn capability so we spawn the next wave of daemons before launching our own local procs.

Add an ability to encode nodenames for large clusters with contiguous node name numbering schemes - this allows communication of all node names in a few bytes instead of tens-of-bytes/node.

This commit was SVN r18338.
2008-04-30 19:49:53 +00:00
Ralph Castain
5311b13b60 Add a loadbalancing feature to the round-robin mapper - more to be sent to devel list
Fix a potential problem with RM-provided nodenames not matching returns from gethostname - ensure that the HNP's nodename gets DNS-resolved when comparing against RM-provided hostnames. Note that this may be an issue for RM-based clusters that don't have local DNS resolution, but hopefully that is more indicative of a poorly configured system.

This commit was SVN r18252.
2008-04-23 14:52:09 +00:00
Josh Hursey
cc83d41ad9 Merge in tmp/jjh-scratch
{{{
 svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch .
}}}

Contains:
 * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart.
 * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff
 * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P.
 * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry
 * Some other sundry cleanup items all dealing with C/R functionality in the trunk.

This commit was SVN r18241.
2008-04-23 00:17:12 +00:00
Ralph Castain
16c9100633 Add --display-allocation option to orterun that will display the node-by-node information regarding your allocation.
This commit was SVN r18216.
2008-04-20 02:25:45 +00:00
Josh Hursey
56a61bfacf switch the name of orterun to mpirun to make things more clear.
This commit was SVN r18208.
2008-04-18 12:59:23 +00:00
Ralph Castain
e7487ad533 Implement the seq rmaps module that sequentially maps process ranks to a list hosts in a hostfile.
Restore the "do-not-launch" functionality so users can test a mapping without launching it.

Add a "do-not-resolve" cmd line flag to mpirun so the opal/util/if.c code does not attempt to resolve network addresses, thus enabling a user to test a hostfile mapping without hanging on network resolve requests.

Add a function to hostfile to generate an ordered list of host names from a hostfile

This commit was SVN r18190.
2008-04-17 13:50:59 +00:00
Ralph Castain
7b91f8baff Cleanup and fix bugs in the MPI dynamics section. Modify the dpm API so it properly takes ports instead of process names (as correctly identified by Aurelien). Fix race conditions in the use of ompi-server. Fix incompatibilities between the mpi bindings and the dpm implemenation that could cause segfaults due to uninitialized memory.
Fix the ompi-server -h cmd line option so it actually tells you something!

Add two new testing codes to the orte/test/mpi area: accept and connect.

This commit was SVN r18176.
2008-04-16 14:27:42 +00:00
Ralph Castain
5e6dc24e62 Fix ompi-server so it works with unity routed module - still not working with tree routing.
Cleanup debug flag so it activates debugging on the data server code itself

This commit was SVN r18080.
2008-04-04 19:17:28 +00:00
Ralph Castain
ce96cb4800 Quite warning about uninitialized variable
This commit was SVN r18033.
2008-03-31 13:52:27 +00:00
Galen Shipman
f1e3045296 need orted_LDFLAGS as a placeholder
you will need to re autogen.sh 

This commit was SVN r17951.
2008-03-25 13:41:09 +00:00
Ralph Castain
dc7f45dafd Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure.
Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code.

This commit was SVN r17926.
2008-03-23 23:10:15 +00:00
Jeff Squyres
dee561d29e Per recent off-list discussions about the build system, I have done
some cleanups and standardizations in the various */tools/*/ 
Makefile.am files.  This commit:

 * Somewhat simplify the tool Makefile.am's 
 * Makes the tool Makefile.am's consistent with each other (do similar
   actions in similar ways)
 * Update the tool Makefile.am's to remove old kruft that was required
   by older versions of AM (trunk requires AM >=1.10)

This commit was SVN r17921.
2008-03-22 02:04:05 +00:00
Ralph Castain
6bb139e4f2 One more correction to mpirun exit codes - cleanup the application proc's exit codes in the orted so that non-zero exit codes generated by mpirun itself don't get "munged".
Modify the multi_abort function so they all return different exit codes - allows us to tell which one was being reported.

This commit was SVN r17895.
2008-03-20 13:54:11 +00:00
Ralph Castain
2ed0e60321 Bring some sanity to the exit code returned by mpirun. Ensure that we provide a non-zero code if something goes wrong, including someone exiting after calling mpi_init without calling mpi_finalize.
Jeff is preparing an (undoubtedly lengthy) explanation/matrix of how these codes are determined for the OMPI FAQ.

This commit was SVN r17879.
2008-03-19 19:00:51 +00:00
Ralph Castain
629b95a2fe Afraid this has a couple of things mixed into the commit. Couldn't be helped - had missed one commit prior to running out the door on vacation.
Fix race conditions in abnormal terminations. We had done a first-cut at this in a prior commit. However, the window remained partially open due to the fact that the HNP has multiple paths leading to orte_finalize. Most of our frameworks don't care if they are finalized more than once, but one of them does, which meant we segfaulted if orte_finalize got called more than once. Besides, we really shouldn't be doing that anyway.

So we now introduce a set of atomic locks that prevent us from multiply calling abort, attempting to call orte_finalize, etc. My initial tests indicate this is working cleanly, but since it is a race condition issue, more testing will have to be done before we know for sure that this problem has been licked.

Also, some updates relevant to the tool comm library snuck in here. Since those also touched the orted code (as did the prior changes), I didn't want to attempt to separate them out - besides, they are coming in soon anyway. More on them later as that functionality approaches completion.

This commit was SVN r17843.
2008-03-17 17:58:59 +00:00
Ralph Castain
57a72c412a Utilize Tim M's suggestion and use atomics to do the locking.
This commit was SVN r17767.
2008-03-06 21:36:32 +00:00
Ralph Castain
097cc83be2 Fix a race condition - ensure we don't call terminate in orterun more than once, even if the timeout fires while we are doing so
This commit was SVN r17766.
2008-03-06 19:35:57 +00:00
Ralph Castain
3883bbee06 Fix bug - must not "free" tsd-allocated memory
This commit was SVN r17754.
2008-03-06 03:10:14 +00:00
Tim Prins
f9916811ae Make it so we do not mangle the options the user passes to their executeable. Fixes trac:1124
The change also:
 - cleans up and simplifies the command line processing code
 - adds an error output if more than one hostfile passed for a single app context
 - gets rid of the superfluous orte_app_context_map_t type, and instead use a simple argv of -host options

This commit was SVN r17750.

The following Trac tickets were found above:
  Ticket 1124 --> https://svn.open-mpi.org/trac/ompi/ticket/1124
2008-03-05 22:12:27 +00:00
Rolf vandeVaart
03fdd57d5a Fix the use of --path and -x PATH so that things work properly.
Note that --path specifies extra directories where the executable
is searched for, but does not affect the PATH settings.

This commit fixes trac:1221.

This commit was SVN r17748.

The following Trac tickets were found above:
  Ticket 1221 --> https://svn.open-mpi.org/trac/ompi/ticket/1221
2008-03-05 21:07:43 +00:00
Ralph Castain
06d3145fe4 First cut at direct launch for TM. Able to launch non-ORTE procs and detect their completion for a clean shutdown.
This commit was SVN r17732.
2008-03-05 13:51:32 +00:00
Ralph Castain
edb8e32a7a Add default hostfile parameter plus --default-hostfile command line option.
Fix error message when job setup failed

This commit was SVN r17724.
2008-03-05 04:54:57 +00:00
Ralph Castain
9413d6cf5d Define a default exit code for when things fail prior to a job launch - still needs work, but a start.
Fix a deadlock loop when things really, really go bad. If we timeout trying to kill the job, then it's time to bail as cleanly as possible, not go back and keep trying.

This commit was SVN r17715.
2008-03-05 01:46:30 +00:00
Ralph Castain
a585923de1 Silence some minor compiler warnings
This commit was SVN r17662.
2008-02-29 02:39:39 +00:00
George Bosilca
7879e0b9c2 Be nice with parallel debugger, export this required symbol.
This commit was SVN r17637.
2008-02-28 05:59:07 +00:00
George Bosilca
9d421bea2a Replace all occurences of orte_pointer_array by opal_pointer_array. Remove the
implementation of orte_pointer_array.

This commit was SVN r17636.
2008-02-28 05:32:23 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Sharon Melamed
025b68becf Move the carto framework to the trunk.
This commit was SVN r17177.
2008-01-23 09:20:34 +00:00
Jeff Squyres
b6e9c99f7d Formatting fixes from Peter Breitenlohner.
This commit was SVN r17163.
2008-01-18 23:21:31 +00:00
Josh Hursey
0bf61a1b84 Move in some accumulated small features and minor bug fixes for C/R support.
{{{
svn merge -r 16447:16475 https://svn.open-mpi.org/svn/ompi/tmp/jjh-fgs .
}}}

This commit was SVN r16478.
2007-10-17 13:47:36 +00:00
Ralph Castain
b6196e8a39 When we can detect that a daemon has failed, then we would like to terminate the system without having it lock up. The "hang" is currently caused by the system attempting to send messages to the daemons (specifically, ordering them to kill their local procs and then terminate). Unfortunately, without some idea of which daemon has died, the system hangs while attempting to send a message to someone who is no longer alive.
This commit introduces the necessary logic to avoid that conflict. If a PLS component can identify that a daemon has failed, then we will set a flag indicating that fact. The xcast system will subsequently check that flag and, if it is set, will send all messages direct to the recipient. In the case of "kill local procs" and "terminate", the messages will go directly to each orted, thus bypassing any orted that has failed.

In addition, the xcast system will -not- wait for the messages to complete, but will return immediately (i.e., operate in non-blocking mode). Orterun will wait (via an event timer) for a period of time based on the number of daemons in the system to allow the messages to attempt to be delivered - at the end of that time, orterun will simply exit, alerting the user to the problem and -strongly- recommending they run orte-clean.

I could only test this on slurm for the case where all daemons unexpectedly died - srun apparently only executes its waitpid callback when all launched functions terminate. I have asked that Jeff integrate this capability into the OOB as he is working on it so that we execute it whenever a socket to an orted is unexpectedly closed. Meantime, the functionality will rarely get called, but at least the logic is available for anyone whose environment can support it.

This commit was SVN r16451.
2007-10-15 18:00:30 +00:00
Josh Hursey
31e9369e8b Fix orterun so it does not get influenced by an application's argv set.
For example, if I have an application that, internal to the application, takes
the argument '-mca foo bar' we do not want orterun to pick up this argument and
pass it through the system.

So the following
{{{
 shell$ mpirun -np 2 -mca btl tcp,self ./myapp -mca foo bar
}}}

orterun should pick up {{{-mca btl tcp,self}}} but not {{{-mca foo bar}}} which
it was previous to this commit.

I tested command line runs and runs with app files to confirm this patch works.

This commit was SVN r16431.
2007-10-11 18:33:40 +00:00
Josh Hursey
7437f37e96 This commit contains the following:
* Fix some missing includes in a few places.
 * Add the cr_request() functionality to the BLCR CRS component.
   We are now dependent upon the 0.6.* series of BLCR.
 * Made the CR notification mechanism a registered function.
   This way we can have an OPAL-only version and it can be replaced at
   runtime with the ORTE version.
 * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only
   CR functionality when the user wants it. Default: Disabled.
 * Fix the placement of a checkpoint request check in MPI_Init
 * Pull the OPAL notification mechanism into the SnapC framework.
   * We no longer fork/exec the 'opal-checkpoint' command for local
   checkpointing, the Local coordinator in the orted does this directly.
   * The Local and Application coordinator talk together bypassing the OPAL
   notifiation mechanism.
   * Optimized the Local <-> App Coordinator communication.
   * Improved the structure used to track vpid_snapshots in the local coord.
 * Fix a race condition in which an application under heavy communication load
   may produce an inconsistent global checkpoint.

This commit was SVN r16389.
2007-10-08 20:53:02 +00:00
Ralph Castain
54b2cf747e These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC.
The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component.

This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done:

As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in.

In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in.

The incoming changes revamp these procedures in three ways:

1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step.

The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic.

Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure.


2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed.

The size of this data has been reduced in three ways:

(a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes.

To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose.

(b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction.

(c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using.

While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly.


3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup.

It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k*50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging.

Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future.


There are a few minor additional changes in the commit that I'll just note in passing:

* propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details.

* requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details.

* cleanup of some stale header files

This commit was SVN r16364.
2007-10-05 19:48:23 +00:00
George Bosilca
4e66376e66 Fix memory leak (Coverty 702).
This commit was SVN r16122.
2007-09-13 20:11:38 +00:00
Rolf vandeVaart
a289ac114a 1. Remove some #ifdef 0 code.
2. Remove some unnecessary code that was causing a SEGV. 
There may be some more work to be done, but at least orte-clean is functional again. 

This commit was SVN r16111.
2007-09-12 19:50:58 +00:00
Shiqing Fan
dcee7e4229 - Should not use ORTE_DECLSPEC with initialization.
This commit was SVN r16086.
2007-09-11 10:13:53 +00:00
Josh Hursey
5a029a47bd forgot to separate the arguments
This commit was SVN r15940.
2007-08-21 19:43:41 +00:00
Josh Hursey
db79f2392e Make sure to enable C/R support for the HNP when restarting.
This commit was SVN r15931.
2007-08-19 20:43:33 +00:00
Brian Barrett
fe0d1f30d5 need errno.h
This commit was SVN r15862.
2007-08-15 02:15:33 +00:00
Brian Barrett
330003361b * Free memory from asprintf
* need to compare ERANGE to errno

This commit was SVN r15860.
2007-08-14 21:12:00 +00:00
Brian Barrett
881dd0654e * Provide a hook so that a PLS can tell the orted it's starting that it
needs to override the default umask.  By default, this is not used
    since most environments do what the user would expect without any
    help.
  * Have TM use the newly added umask hook, so that processes inherit
    the user's umask from mpirun rather than the pbs_mom's umask, which
    the user has no control over.

This commit was SVN r15858.
2007-08-14 18:44:52 +00:00
George Bosilca
d658a477af Update the help file to match the real name of the required argument.
This commit was SVN r15762.
2007-08-04 00:35:55 +00:00
Shiqing Fan
0f468f3668 - Remove the solution and project files, will commit them later.
This commit was SVN r15705.
2007-07-31 17:07:02 +00:00
Shiqing Fan
4d7b349cdb - Add VC8 solution and project files.
- If one wants to use this solution, remember to unload the project 'orte-restart' which is currently not working for Windows.

This commit was SVN r15680.
2007-07-30 11:05:34 +00:00
Josh Hursey
a24e530f8e Some C/R fixes (more to come)
r15390 - Changed the paradigm in which the runtime worked by enabling the mpirun
process to become an orted and spawn processes. This broke the C/R for this 
special case as it required that the orted start the process, and that 
the hierarchy remains.
The fix was to allow the global coordinator to be a local coordinator as well
for this case.

r15528 - Changed the selection logic for the RML. This caused the application to
segv if the 'ftrm' wrapper component was selected as it tried to modify a NULL
pointer.
The fix was to move the 'module swap' code into the init() function, and swap
when passed a NULL pointer. It sounds bad, but actually cleans up the code a bit
more.

Still have to fix the 'routed' framework.

This commit was SVN r15566.

The following SVN revision numbers were found above:
  r15390 --> open-mpi/ompi@bd65f8ba88
  r15528 --> open-mpi/ompi@39a6057fc6
2007-07-23 20:13:37 +00:00
Ralph Castain
d99c764e75 Resolve a problem where the orte daemon comm functions were being accessed by mpirun while still retaining occasional reference to the orted_globals. Remove all dependence on orted_globals from the comm functions. Move those functions back into their own file to make it easier to maintain the separation. Ensure that mpirun ignores any "exit" commands being sent to daemons as it will exit on its own.
This commit was SVN r15562.
2007-07-23 18:36:33 +00:00
Brian Barrett
5b9fa7e998 reapply r15517 and r15520, which were removed in r15527 so that I could get
the RML/OOB merge in slightly easier

This commit was SVN r15530.

The following SVN revision numbers were found above:
  r15517 --> open-mpi/ompi@41977fcc95
  r15520 --> open-mpi/ompi@9cbc9df1b8
  r15527 --> open-mpi/ompi@2d17dd9516
2007-07-20 02:34:29 +00:00
Brian Barrett
39a6057fc6 A number of improvements / changes to the RML/OOB layers:
* General TCP cleanup for OPAL / ORTE
  * Simplifying the OOB by moving much of the logic into the RML
  * Allowing the OOB RML component to do routing of messages
  * Adding a component framework for handling routing tables
  * Moving the xcast functionality from the OOB base to its own framework

Includes merge from tmp/bwb-oob-rml-merge revisions:

    r15506, r15507, r15508, r15510, r15511, r15512, r15513

This commit was SVN r15528.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r15506
  r15507
  r15508
  r15510
  r15511
  r15512
  r15513
2007-07-20 01:34:02 +00:00
Brian Barrett
2d17dd9516 temporarily back our r15517 and 15520 so that I can get the RML / OOB changes
to cleanly apply

This commit was SVN r15527.

The following SVN revision numbers were found above:
  r15517 --> open-mpi/ompi@41977fcc95
2007-07-20 01:10:34 +00:00
Ralph Castain
41977fcc95 Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but...
Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point.

Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings.

This commit was SVN r15517.
2007-07-19 20:56:46 +00:00
Ralph Castain
2110064a9a Ensure that the LD_LIBRARY_PATH and PATH get properly set for procs locally spawned by mpirun.
This commit was SVN r15516.
2007-07-19 19:00:06 +00:00
Josh Hursey
eeba2cb871 Add a comment to clarify the relationship between
mca_base_cmd_line_process_args() and opal_init_util() so
we do not forget their ordering needs, and subtle relationship.

This commit was SVN r15412.
2007-07-13 19:08:05 +00:00
Ralph Castain
2bded34a1d Fix a problem observed by Brian where processes launched local to mpirun lost their environment except for MCA params.
The problem stemmed from no longer launching a local orted on the same node as mpirun. The orted would save and reuse the base environment. Mpirun didn't do that, and the odls was using the orted's globally saved environment (which wasn't being set).

This fix establishes a globally accessible base launch environment that both the orted and mpirun can utilize. Since we now use that, we don't need to pass it to the odls_launch_proc function, so remove that param from the API (and modify all components to handle the change).

This commit was SVN r15405.
2007-07-13 15:47:57 +00:00
Ralph Castain
bd65f8ba88 Bring in an updated launch system for the orteds. This commit restores the ability to execute singletons and singleton comm_spawn, both in single node and multi-node environments.
Short description: major changes include -

1. singletons now fork/exec a local daemon to manage their operations.

2. the orte daemon code now resides in libopen-rte

3. daemons no longer use the orte triggering system during startup. Instead, they directly call back to their parent pls component to report ready to operate. A base function to count the callbacks has been provided.

I have modified all the pls components except xcpu and poe (don't understand either well enough to do it). Full functionality has been verified for rsh, SLURM, and TM systems. Compile has been verified for xgrid and gridengine.

This commit was SVN r15390.
2007-07-12 19:53:18 +00:00
Jeff Squyres
64083570f5 Add support for DDT parallel debugger, which required several things:
* Making some symbols and types be global (vs. static) in orterun
 * Adding a "ddt" entry in the MCA parameter orte_base_user_debugger
   default value
 * Add support for @executable@, @executable_argv@, and @single_app@
   tokens in the orte_base_user_debugger MCA parameter.
 * Added various error checks and corresponding help messages after
   finding a debugger in the PATH

Fixes trac:1081

This commit was SVN r15323.

The following Trac tickets were found above:
  Ticket 1081 --> https://svn.open-mpi.org/trac/ompi/ticket/1081
2007-07-10 12:53:48 +00:00
Jeff Squyres
892bc38ad0 Protect against a bad free: full_line points to the full buffer. But
line may point to a few characters beyond the beginning of the buffer
(if the buffer had some extra white space padding at the beginning).
So if we want to free the buffer, free full_line, not line.

This commit was SVN r15315.
2007-07-09 19:56:16 +00:00
Jeff Squyres
c796d84d61 * Integrate man pages contributed by Dirk Eddelbuettel
* Make orted.1 man page be non-descriptive because it's really an
   internal command.
 * Re-work the opal_wrapper man page logic a bit so that we can have a
   real opal_wrapper.1 installed that says "don't look here -- look at
   mpicc (etc.)"

This commit was SVN r15264.
2007-07-02 15:27:39 +00:00
Josh Hursey
f88aa6c273 This commit cleans up the AMCA parameter implementation a bit.
* Remove the 'opal_mca_base_param_use_amca_sets' global variable
* Harness the fact that you can (read should) call the cmd_line functions
  before initializing opal_init_util(). This pushes the MCA/GMCA/AMCA
  command line options into the environment before OPAL inits and starts
  to use these values. By putting the cmd_line parse before opal_init_util
  in orterun and orted we only parse the *MCA parameter files once, and 
  correctly (alleviating the need to 'recache' the files on init.)
* Small bits of cleanup.

This commit was SVN r15219.
2007-06-27 01:03:31 +00:00
Rainer Keller
15c03e8acc - Apply patch 31_manpages_lintian.dpatch
Thanks to Dirk Eddelbuettel <edd@debian.org>

This commit was SVN r15215.
2007-06-26 21:13:10 +00:00
Josh Hursey
84f102c343 Fix/Cleanup the Checkpoint Error propagation through the Snapc Full component.
This commit was SVN r15175.
2007-06-22 16:14:25 +00:00
Josh Hursey
78df098aee If we can not checkpoint, then make sure we return an error
This commit was SVN r15151.
2007-06-20 21:05:19 +00:00
Josh Hursey
dd021e7121 Remove some leftover debugging that must have been accidentally left in
r15142.

This commit was SVN r15145.

The following SVN revision numbers were found above:
  r15142 --> open-mpi/ompi@a3998a1676
2007-06-20 14:06:13 +00:00