his e-mail:
I ran into a small bug in rmaps_rr.c: map_app_by_slot which was
triggered by using multiple app contexts. Basically, if not all the
slots we allocated on a node were used by an app, we would
automatically move onto the next node. This caused a problem with
multiple app contexts when the first app takes a partial allocation of
a node, the second app would not be able to access these slots because
we had already move past the node, and the byslot routine does not
wrap back around the list.
This commit was SVN r6766.
that were set on the command line. This was techinically exactly the
way the code was designed, but it certainly violated the Law of Least
Astonishment (even to its designer ;-) ). So now if you execute
something like this:
mpirun -mca pls_rsh_debug 1 -np 4 hello
You'll see debugging output from the rsh pls component, as you would
expect (this was not previously the case -- the MCA pls_rsh_debug
parame would be set to 1 in the 4 spawned hello processes, but *not*
in the orterun process).
More specifically, MCA parameters will be set in the orterun process
in the following cases:
- The new command line switch "--gmca" (or "-gmca") is used,
indicating that the MCA parameter is "global". --gmca also means
that that MCA parameter will be applied to all context app's. For
example:
mpirun -gmca foo bar -np 1 hello : -np 2 goodbye
The foo MCA param will be set in both the hello and goodbye
processes.
- If there is only one context app. For example:
mpirun -mca pls_rsh_debug 1 -np 4 hello
will set pls_rsh_debug to 1 in both the orterun process and the 4
spawned hello processes.
Also added a few more comments inside orterun to document a somewhat
confusing use of a state variable in a recursive case.
This commit was SVN r6764.
1. dump_xxx - analogous to the registry's dump commands, allows you to examine the contents of the name services' structures
2. get_job_peers - get an array of process names for all processes in the specified job
This commit was SVN r6759.
Somehow, in changing over to the new MCA interfaces, the "set" part of that logic got lost, so the singleton flag was always being set. This should repair some of the anomalous behavior seen recently where the local host was always being used for an application process.
This commit was SVN r6757.
containers when one is requested.
Fix a bug in gpr_replica_del_index_api which doesn't preset num_tokens and
num_keys, but assumes they are 0.
Fix orte_ras_base_node_delete() function to operate properly to delete the
appropriate container in the 'orte-node' segment when requested.
This commit was SVN r6756.
Change all the places where they are used to fit the new name.
Remove the code to check the remote arch from the PML. We will have a GPR mechanism
in ompi_mpi_initialize to do that.
This commit was SVN r6750.
the message is no longer pending
* Try to push out new messages whenever we finish a send, whether it
worked or not. Means that in the case where the other side has too
many sends pending, we'll constantly retry one (and only one, once the
pending number is reached) message until goodness returns
* Make some warnings only happen in verbose case, as they are mainly
diagnostics
This commit was SVN r6732.
- convert MCA params to the new API
- some style and indenting fixes
- look at local shell, and if [new] MCA param
pls_rsh_assume_same_shell is 1, then assume that the remote shell is
the same as the local shell. If pls_rsh_assume_same_shell is 0, do
a probe to figure out what the remote shell is (NOT CURRENTLY
IMPLEMENTED! you'll get a run-time warning if you set this MCA param
to 0).
- if the remote shell is not csh and not bash, then prefix the remote
command with "( ! [ -e ./.profile ] || . ./.profile;" (and suffix it
with ")") so that we run the .profile on the remote side in order to
set PATHs and the like. See the LAM FAQ for details (will someday
be on the Open MPI FAQ:
http://www.lam-mpi.org/faq/category4.php3#question8)
- add a bunch of debugging output if the MCA param pls_rsh_debug is
enabled (or the top-level debug MCA param is enabled)
- add more help messages (and corresponding calls to opal_show_help())
in help-pls-rsh.txt
This commit was SVN r6731.
properly. This fixes the random hangs that we were seeing this morning
on Linux that were a result of fixing the thread deadlock yesterday.
(worked great on my OS X box, which uses select() instead of poll()).
This commit was SVN r6730.
to a tty or not. Now you can do something like:
ompi_info -all | grep btl_portals
and get the full line for each btl_portals parameter.
* For the case where stdout is a tty, we have my current nomination for
Today's Useless OMPI Feature. Autodetect the width of the terminal, so
people with really wide terminals will get less wrapping
This commit was SVN r6722.
may appear.
(remove *error.h file from Makefile.am -- a cut-n-paste error that has
propagated to a surprising number of directories ;-) )
This commit was SVN r6721.
- we now properly support multiple application contexts
- much improved error messages, using opal_show_help
- fix some small bugs in the way the processes were discovering their names
- better searching for orted
- use the new mca parameter interface
These changes still need some testing, but they seem stable.
This commit was SVN r6719.