If you register a parameter a second time, it overwrites the default
value (this was causing a problem with mpirun not being marked as orte
infrastructure, and therefore thinking that it was a singleton, and
therefore always adding the localhost into the node list).
This commit was SVN r6789.
- converted some things to new MCA param API
- renamed the pls_bproc_seed component struct so its name isn't the same as
the pls_bproc component's struct
- minor bugfixes
This commit was SVN r6774.
his e-mail:
I ran into a small bug in rmaps_rr.c: map_app_by_slot which was
triggered by using multiple app contexts. Basically, if not all the
slots we allocated on a node were used by an app, we would
automatically move onto the next node. This caused a problem with
multiple app contexts when the first app takes a partial allocation of
a node, the second app would not be able to access these slots because
we had already move past the node, and the byslot routine does not
wrap back around the list.
This commit was SVN r6766.
that were set on the command line. This was techinically exactly the
way the code was designed, but it certainly violated the Law of Least
Astonishment (even to its designer ;-) ). So now if you execute
something like this:
mpirun -mca pls_rsh_debug 1 -np 4 hello
You'll see debugging output from the rsh pls component, as you would
expect (this was not previously the case -- the MCA pls_rsh_debug
parame would be set to 1 in the 4 spawned hello processes, but *not*
in the orterun process).
More specifically, MCA parameters will be set in the orterun process
in the following cases:
- The new command line switch "--gmca" (or "-gmca") is used,
indicating that the MCA parameter is "global". --gmca also means
that that MCA parameter will be applied to all context app's. For
example:
mpirun -gmca foo bar -np 1 hello : -np 2 goodbye
The foo MCA param will be set in both the hello and goodbye
processes.
- If there is only one context app. For example:
mpirun -mca pls_rsh_debug 1 -np 4 hello
will set pls_rsh_debug to 1 in both the orterun process and the 4
spawned hello processes.
Also added a few more comments inside orterun to document a somewhat
confusing use of a state variable in a recursive case.
This commit was SVN r6764.
1. dump_xxx - analogous to the registry's dump commands, allows you to examine the contents of the name services' structures
2. get_job_peers - get an array of process names for all processes in the specified job
This commit was SVN r6759.
Somehow, in changing over to the new MCA interfaces, the "set" part of that logic got lost, so the singleton flag was always being set. This should repair some of the anomalous behavior seen recently where the local host was always being used for an application process.
This commit was SVN r6757.
containers when one is requested.
Fix a bug in gpr_replica_del_index_api which doesn't preset num_tokens and
num_keys, but assumes they are 0.
Fix orte_ras_base_node_delete() function to operate properly to delete the
appropriate container in the 'orte-node' segment when requested.
This commit was SVN r6756.
- convert MCA params to the new API
- some style and indenting fixes
- look at local shell, and if [new] MCA param
pls_rsh_assume_same_shell is 1, then assume that the remote shell is
the same as the local shell. If pls_rsh_assume_same_shell is 0, do
a probe to figure out what the remote shell is (NOT CURRENTLY
IMPLEMENTED! you'll get a run-time warning if you set this MCA param
to 0).
- if the remote shell is not csh and not bash, then prefix the remote
command with "( ! [ -e ./.profile ] || . ./.profile;" (and suffix it
with ")") so that we run the .profile on the remote side in order to
set PATHs and the like. See the LAM FAQ for details (will someday
be on the Open MPI FAQ:
http://www.lam-mpi.org/faq/category4.php3#question8)
- add a bunch of debugging output if the MCA param pls_rsh_debug is
enabled (or the top-level debug MCA param is enabled)
- add more help messages (and corresponding calls to opal_show_help())
in help-pls-rsh.txt
This commit was SVN r6731.
- we now properly support multiple application contexts
- much improved error messages, using opal_show_help
- fix some small bugs in the way the processes were discovering their names
- better searching for orted
- use the new mca parameter interface
These changes still need some testing, but they seem stable.
This commit was SVN r6719.
- Add functionality to parse multiple arguments provided in the console
- Cleaned up help function
- Added an option to hide commands from the help menu
Working on launching and reaping of daemons from within the console.
This commit was SVN r6699.
This required a little fiddling with a number of areas. Biggest problem was that it uncovered a potential for an infinite loop to be created in the registry. If a callback function modified the registry, the registry checked the triggers to see if anything had fired. Well, if the original callback was due to a trigger firing, that condition hadn't changed - so the trigger fired again....which caused the callback to be called, which modified the registry, which checked the triggers, etc. etc.
Triggers are now checked and then "flagged" as being "in process" so that the registry will NOT recheck that trigger until all callbacks have been processed. Tried doing this with subscriptions as well, but that caused a problem - when we release processes from a stagegate, they (at the moment) immediately place data on the registry that should cause a subscription to fire. Unfortunately, the system will just hang if that subscription doesn't get processed. So, I have left the subscription system alone - any callback function that modifies the registry in a fashion that will fire a subscription will indeed fire that subscription. We'll have to see if this causes problems - it shouldn't, but a careless user could lock things up if the callback generates a callback to itself.
Also fixed the code that placed a process' RML contact info on the registry to eliminate the leading '/' from the string.
This commit was SVN r6684.
- Added user help messages.
- Abstracted the internal commands, and the mechanism for
parsing and executing them.
- Cleaned up the command line parsing
- Some other misc. cleanup items.
Still much more work to do here, but should provide a more
intuitive interface for extending functionality in the
system.
This commit was SVN r6676.
more user friendly error messages.
Removed the "--version" command line option, since they should
get this from ompi_info [later to be orte_info].
If we find an invalid command line option print out the help
screen before exiting.
This commit was SVN r6670.
support in OMPI. Currently only enables/disables the architecture
sharing modex in ob1 pml.
* Add sds framework to ompi_info
* Figure out table ids to use for Portals BTL at configure time, since
we should use 30 & 31 on Red Storm, but the reference implementation
only supports 0-8.
* Some bug fixes in Portals UTCP sds
This commit was SVN r6650.
* Add Portals UTCP reference sds for when we are using the portals
reference implementation without the ORTE starters (when we want to
pretend like we're on Red Storm, only with a debugger and valgrind and
possibly even a printf that actually works...)
* Add super-secret --with flag to cnos rml to enable the cnos rml but
disable cnos_barrier (for use with portals utcp reference implementation)
This commit was SVN r6642.
test from orte_init_stage1 into a new framework, Startup Discovery Service
(sds). This allows us to have more flexibility with platforms like
Red Storm, which do not have a universe in the usual meaning and don't have
a seed daemon they can contact
This commit was SVN r6630.
- only call sched_yield if it exists
- don't fail out if modex doens't work in ob1
- bunch of fixes for Portals BTL
- add cnos rml component
- add NULL gpr component (should only be used if replica AND proxy
fail to load)
This commit was SVN r6629.