1
1
Граф коммитов

143 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
ef1abb71d3 Check to see if the MCA param already exists before registering it.
If you register a parameter a second time, it overwrites the default
value (this was causing a problem with mpirun not being marked as orte
infrastructure, and therefore thinking that it was a singleton, and
therefore always adding the localhost into the node list).

This commit was SVN r6789.
2005-08-09 21:03:33 +00:00
Tim Prins
24dc319237 - added more documentation
- converted some things to new MCA param API
 - renamed the pls_bproc_seed component struct so its name isn't the same as
   the pls_bproc component's struct
 - minor bugfixes

This commit was SVN r6774.
2005-08-08 22:17:22 +00:00
Jeff Squyres
f8fa8f4935 Fix a problem found by Tim Prins (patch also supplied by Tim P). From
his e-mail:

I ran into a small bug in rmaps_rr.c: map_app_by_slot which was
triggered by using multiple app contexts. Basically, if not all the
slots we allocated on a node were used by an app, we would
automatically move onto the next node. This caused a problem with
multiple app contexts when the first app takes a partial allocation of
a node, the second app would not be able to access these slots because
we had already move past the node, and the byslot routine does not
wrap back around the list.

This commit was SVN r6766.
2005-08-08 18:56:17 +00:00
Jeff Squyres
32e71e5c6c Fix a problem where orterun itself would not receive MCA parameters
that were set on the command line.  This was techinically exactly the
way the code was designed, but it certainly violated the Law of Least
Astonishment (even to its designer ;-) ).  So now if you execute
something like this:

   mpirun -mca pls_rsh_debug 1 -np 4 hello

You'll see debugging output from the rsh pls component, as you would
expect (this was not previously the case -- the MCA pls_rsh_debug
parame would be set to 1 in the 4 spawned hello processes, but *not*
in the orterun process).

More specifically, MCA parameters will be set in the orterun process
in the following cases:

- The new command line switch "--gmca" (or "-gmca") is used,
  indicating that the MCA parameter is "global".  --gmca also means
  that that MCA parameter will be applied to all context app's.  For
  example:

      mpirun -gmca foo bar -np 1 hello : -np 2 goodbye

  The foo MCA param will be set in both the hello and goodbye
  processes.

- If there is only one context app.  For example:

      mpirun -mca pls_rsh_debug 1 -np 4 hello

  will set pls_rsh_debug to 1 in both the orterun process and the 4
  spawned hello processes.

Also added a few more comments inside orterun to document a somewhat
confusing use of a state variable in a recursive case.

This commit was SVN r6764.
2005-08-08 16:42:28 +00:00
Ralph Castain
3c13d699f8 Remove an old file.
This commit was SVN r6762.
2005-08-08 13:41:53 +00:00
Ralph Castain
e583f6a97f Add a couple of new functions to the schema framework to check if a trigger is a "standard" trigger or not, and to extract a jobid from a standard trigger. Both functions will be used in a later commit.
Ensure that the seed set_my_name function sets all the right initial info in the name services' structures.

This commit was SVN r6760.
2005-08-07 13:26:49 +00:00
Ralph Castain
c530521a8e Add several new interface functions to the name services:
1. dump_xxx - analogous to the registry's dump commands, allows you to examine the contents of the name services' structures

2. get_job_peers - get an array of process names for all processes in the specified job

This commit was SVN r6759.
2005-08-07 13:21:52 +00:00
Ralph Castain
1438009dbd Properly set the MCA parameter to indicate these functions are infrastructure so that the singleton flag does not get set.
Somehow, in changing over to the new MCA interfaces, the "set" part of that logic got lost, so the singleton flag was always being set. This should repair some of the anomalous behavior seen recently where the local host was always being used for an application process.

This commit was SVN r6757.
2005-08-07 04:17:10 +00:00
Josh Hursey
3b187c4db3 Fix the 'delete container' logic in gpr to prevent recursive delete of all
containers when one is requested.

Fix a bug in gpr_replica_del_index_api which doesn't preset num_tokens and
num_keys, but assumes they are 0.

Fix orte_ras_base_node_delete() function to operate properly to delete the
appropriate container in the 'orte-node' segment when requested.

This commit was SVN r6756.
2005-08-05 23:37:39 +00:00
George Bosilca
14ffc85379 I want to have it compiled too.
This commit was SVN r6754.
2005-08-05 18:47:12 +00:00
Jeff Squyres
d0a0434172 Investigating an MCA param problem -- converted over orterun to new
MCA param API in the process.

This commit was SVN r6739.
2005-08-04 18:15:47 +00:00
Josh Hursey
12031db535 Added missing help file.
This commit was SVN r6737.
2005-08-04 17:40:22 +00:00
Jeff Squyres
aa9bdcfec5 Make some fixes and add some features to the rsh pls:
- convert MCA params to the new API
- some style and indenting fixes
- look at local shell, and if [new] MCA param
  pls_rsh_assume_same_shell is 1, then assume that the remote shell is
  the same as the local shell.  If pls_rsh_assume_same_shell is 0, do
  a probe to figure out what the remote shell is (NOT CURRENTLY
  IMPLEMENTED! you'll get a run-time warning if you set this MCA param
  to 0).
- if the remote shell is not csh and not bash, then prefix the remote
  command with "( ! [ -e ./.profile ] || . ./.profile;" (and suffix it
  with ")") so that we run the .profile on the remote side in order to
  set PATHs and the like.  See the LAM FAQ for details (will someday
  be on the Open MPI FAQ:
  http://www.lam-mpi.org/faq/category4.php3#question8)
- add a bunch of debugging output if the MCA param pls_rsh_debug is
  enabled (or the top-level debug MCA param is enabled)
- add more help messages (and corresponding calls to opal_show_help())
  in help-pls-rsh.txt

This commit was SVN r6731.
2005-08-04 15:09:02 +00:00
Tim Prins
2d707f34a0 - make persistent daemons work with bproc
- added documentation
 - code cleanups

This commit was SVN r6726.
2005-08-03 20:24:52 +00:00
Tim Prins
aa0525da98 Improvements in bproc support:
- we now properly support multiple application contexts
 - much improved error messages, using opal_show_help
 - fix some small bugs in the way the processes were discovering their names
 - better searching for orted
 - use the new mca parameter interface

These changes still need some testing, but they seem stable.

This commit was SVN r6719.
2005-08-02 22:22:55 +00:00
Ralph Castain
4e1837687b Finish simplified interfaces for put and subscribe - more details to come.
This commit was SVN r6713.
2005-08-02 19:43:29 +00:00
Jeff Squyres
ef9e06451c Ensure that --mca is listed in the --help message (thanks for pointing
this out Gleb!)

This commit was SVN r6712.
2005-08-02 18:52:12 +00:00
Brian Barrett
d7bb611a74 * make all the sds components follow the prefix rule. Shame on me.
This commit was SVN r6710.
2005-08-02 18:51:28 +00:00
Josh Hursey
c6a3e67f07 switch from if's to ifdef's to prevent compiler warnings
This commit was SVN r6705.
2005-08-02 17:07:21 +00:00
Ralph Castain
716f465ce3 Fix a typo
This commit was SVN r6704.
2005-08-02 15:54:36 +00:00
Josh Hursey
943a74e266 Checkpoint.
- Add functionality to parse multiple arguments provided in the console
- Cleaned up help function
- Added an option to hide commands from the help menu

Working on launching and reaping of daemons from within the console.

This commit was SVN r6699.
2005-08-01 22:52:48 +00:00
Josh Hursey
0c32f34cb7 Let's clean up the block comment parsing to be a bit more lex-like.
This commit was SVN r6695.
2005-08-01 22:25:47 +00:00
Josh Hursey
50545ef082 Additional Comment possibilities supported for both hostfile and
mca param file.
------
/*
 * Block Quote
 */

// Line Quote

# Shell Style Line Quote
-------

This commit was SVN r6694.
2005-08-01 22:11:26 +00:00
Josh Hursey
9447050dcf Added comments to lex parser. Now you can have a hostfile that looks like:
#
# this is my hostfile
#
frogger1
frogger2
#frogger3

and it will only use frogger 1&2.

This commit was SVN r6692.
2005-08-01 21:14:09 +00:00
Ralph Castain
8c6c78c47a Add a few new functions that were requested last week - not tested yet, so please don't use them! I will test them this afternoon on a different computer. For now, they won't cause any problems since they aren't being called.
This commit was SVN r6689.
2005-08-01 16:38:15 +00:00
Tim Prins
40bf905e8e - minor bug fixes
- better error message if the daemon dies

This commit was SVN r6687.
2005-07-29 20:02:56 +00:00
Josh Hursey
835dad20d5 forgot to add orteconsole.h to the distro in the makefile
This commit was SVN r6686.
2005-07-29 17:57:35 +00:00
Ralph Castain
4e79a51395 Add a job_info segment to the system that holds a container for each job. Within each container is a keyval indicating the job state (i.e., all procs at stage1, finalized, etc.). This provides a rough state-of-health for the job.
This required a little fiddling with a number of areas. Biggest problem was that it uncovered a potential for an infinite loop to be created in the registry. If a callback function modified the registry, the registry checked the triggers to see if anything had fired. Well, if the original callback was due to a trigger firing, that condition hadn't changed - so the trigger fired again....which caused the callback to be called, which modified the registry, which checked the triggers, etc. etc.

Triggers are now checked and then "flagged" as being "in process" so that the registry will NOT recheck that trigger until all callbacks have been processed. Tried doing this with subscriptions as well, but that caused a problem - when we release processes from a stagegate, they (at the moment) immediately place data on the registry that should cause a subscription to fire. Unfortunately, the system will just hang if that subscription doesn't get processed. So, I have left the subscription system alone - any callback function that modifies the registry in a fashion that will fire a subscription will indeed fire that subscription. We'll have to see if this causes problems - it shouldn't, but a careless user could lock things up if the callback generates a callback to itself.

Also fixed the code that placed a process' RML contact info on the registry to eliminate the leading '/' from the string.

This commit was SVN r6684.
2005-07-29 14:11:19 +00:00
Josh Hursey
9acbd4e21f forgot to take out initalizer when I removed the verbose stuff
This commit was SVN r6682.
2005-07-29 00:21:10 +00:00
Josh Hursey
e849f7ba07 Significant clean up of the orteconsole.
- Added user help messages.
 - Abstracted the internal commands, and the mechanism for
   parsing and executing them.
 - Cleaned up the command line parsing
 - Some other misc. cleanup items.

Still much more work to do here, but should provide a more
intuitive interface for extending functionality in the 
system.

This commit was SVN r6676.
2005-07-28 23:48:46 +00:00
Tim Prins
5a4f8a257d - enabled new bproc components
- added support for Scyld bproc and old LANL bproc

This commit was SVN r6674.
2005-07-28 22:28:38 +00:00
Josh Hursey
018c4aa44e remove unnecessary slashes
This commit was SVN r6673.
2005-07-28 21:33:33 +00:00
Josh Hursey
8b56769307 removed the version command line option. Added some more user help messages
This commit was SVN r6672.
2005-07-28 21:17:48 +00:00
Josh Hursey
5ad860fc47 forgot to take out a line in the help message.
This commit was SVN r6671.
2005-07-28 20:51:56 +00:00
Josh Hursey
8deed21e00 Replaced some stderr fprintfs with opal_show_help functions, with
more user friendly error messages.

Removed the "--version" command line option, since they should 
get this from ompi_info [later to be orte_info].

If we find an invalid command line option print out the help
screen before exiting.

This commit was SVN r6670.
2005-07-28 20:49:17 +00:00
George Bosilca
9fdfbd9934 correct the printf for 64 bits architectures.
This commit was SVN r6667.
2005-07-28 19:54:06 +00:00
Brian Barrett
747f23099e * fix some warnings
This commit was SVN r6661.
2005-07-28 19:25:47 +00:00
Josh Hursey
033b0be417 clean up help msg for orted
This commit was SVN r6657.
2005-07-28 18:38:37 +00:00
Brian Barrett
f8fb43d792 * don't recurse into badness - call the function we want to call
This commit was SVN r6656.
2005-07-28 18:33:55 +00:00
Josh Hursey
707fbb35ce added help message file to orted
This commit was SVN r6655.
2005-07-28 17:18:33 +00:00
Brian Barrett
b0b6ddd078 * add --enable-heterogeneous (default: enabled) to enable heterogeneous
support in OMPI.  Currently only enables/disables the architecture
  sharing modex in ob1 pml.
* Add sds framework to ompi_info
* Figure out table ids to use for Portals BTL at configure time, since
  we should use 30 & 31 on Red Storm, but the reference implementation
  only supports 0-8.
* Some bug fixes in Portals UTCP sds

This commit was SVN r6650.
2005-07-28 16:16:13 +00:00
Brian Barrett
a474dabab0 * don't assume select has been called during close
* expose sds component list for ompi_info
* forgot to add pipe put into the list of put functions

This commit was SVN r6645.
2005-07-28 15:14:46 +00:00
Jeff Squyres
bbf7da16ff Print a friendly message when the local exec can't find the orted.
This commit was SVN r6643.
2005-07-28 13:00:32 +00:00
Brian Barrett
2852772b32 * add a bunch of svn:ignored files
* Add Portals UTCP reference sds for when we are using the portals
  reference implementation without the ORTE starters (when we want to
  pretend like we're on Red Storm, only with a debugger and valgrind and
  possibly even a printf that actually works...)
* Add super-secret --with flag to cnos rml to enable the cnos rml but
  disable cnos_barrier (for use with portals utcp reference implementation)

This commit was SVN r6642.
2005-07-28 06:23:34 +00:00
Brian Barrett
93ddb4bf73 * some fixups for the cnos components
This commit was SVN r6637.
2005-07-28 00:11:09 +00:00
Brian Barrett
1ce2e26272 Move set_my_name (NDS) functionality from ns_base and universe contact
test from orte_init_stage1 into a new framework, Startup Discovery Service
(sds).  This allows us to have more flexibility with platforms like
Red Storm, which do not have a universe in the usual meaning and don't have
a seed daemon they can contact

This commit was SVN r6630.
2005-07-27 23:18:16 +00:00
Brian Barrett
6aa464b67e More changes from Red Storm port
- only call sched_yield if it exists
  - don't fail out if modex doens't work in ob1
  - bunch of fixes for Portals BTL
  - add cnos rml component
  - add NULL gpr component (should only be used if replica AND proxy
    fail to load)  

This commit was SVN r6629.
2005-07-27 23:07:14 +00:00
Tim Prins
b7ab5f1ec8 only compile the bproc soh component if we are on bproc 4
This commit was SVN r6625.
2005-07-27 22:13:21 +00:00
Tim Prins
384639c5cc - more build system updates for bproc
This commit was SVN r6609.
2005-07-26 22:12:03 +00:00
Tim Prins
dcc81eb598 - fix a bug which made compiles fail when '--with-bproc' is passed
- various bugfixes for bproc components

This commit was SVN r6603.
2005-07-25 22:21:40 +00:00