http://www.open-mpi.org/community/lists/devel/2014/05/14822.php
Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root).
This commit was SVN r31916.
This commit is a slightly better workaround to prevent mesages of
the form:
[unset]:_pmi_alps_get_apid:alps_app_lli_put_request failed
[unset]:_pmi_alps_get_appLayout:pmi_alps_get_apid returned with error: Bad file descriptor
It works by completely disabling PMI in the application process when using
mpirun. This should not be an issue for any apps.
cmr=v1.8.2:reviewer=rhc
This commit was SVN r31882.
The HNP can't know the precise reason, of course - all it knows is that the daemon failed. So output a generic error message that provides guidance on probable causes.
Refs trac:4571
This commit was SVN r31589.
The following Trac tickets were found above:
Ticket 4571 --> https://svn.open-mpi.org/trac/ompi/ticket/4571
Prior to r29058, this same logic was in place (i.e., ensure that the
extra fd to /dev/null is closed). It looks like it was accidentally
removed in the ORTE conversion to the state machine in r29058.
This ''might'' have something to do with many hangs that we're seeing
in Cisco MTT with jobs that exhibit failure (e.g., call MPI_ABORT)...?
cmr=v1.8.2:reviewer=rhc
This commit was SVN r31469.
The following SVN revision numbers were found above:
r29058 --> open-mpi/ompi@a200e4f865
newer
This commit adds a workaround for messages printed by the Cray PMI library
when launching using mpirun. We are still talking with Cray to find a
better fix but this will silence the warnings for now.
cmr=v1.8.1:reviewer=manjugv
This commit was SVN r31352.
See the ticket for more details.
cmr=v1.8.1:reviewer=rhc:ticket=4489
This commit was SVN r31351.
The following Trac tickets were found above:
Ticket 4489 --> https://svn.open-mpi.org/trac/ompi/ticket/4489
If we are aborting, then set the flags so the HNP directly sends an exit command to each daemon. Make it the halt_vm command so the remote daemon doesn't try to relay it, but instead just exits without waiting for its routed children to exit first.
cmr=v1.8.1:reviewer=jsquyres:subject=fix hangs due to abort prior to daemon wireup
This commit was SVN r31304.
The problem arises when a hostfile is used, and the user provides host names without specifying the slots= paramater. In these cases, we assign slots=1, but automatically allow oversubscription since that number isn't confirmed. We then provide a separate parameter by which the user can direct that we assign the number of slots based on the sensed hardware - e.g., by telling us to set the #slots equal to the #cores on each node. However, this has been set to "off" by default.
In order to make this a little less complex for the user, set the default such that we automatically set #slots equal to #cores (or #hwt's if use_hwthreads_as_cpus has been set) only for those cases where the user provides names in a hostfile but does not provide slot information.
Also cleanup some a couple of issues in the mapping/binding system:
* ensure we only override the binding directive if we are oversubscribed *and* overload is not allowed
* ensure that the MPI procs don't attempt to bind themselves if they are launched by an orted as any binding directive (no matter what it was) would have been serviced by the orted on launch
* minor cleanup to the warning message when oversubscribed and binding was requested
cmr=v1.7.5:reviewer=rhc:subject=update mapping/binding system
This commit was SVN r30909.
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi. This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.
This commit was SVN r30140.
Thanks to Tim Miller for reporting the regression from the 1.6 series
cmr=v1.7.4:reviewer=jsquyres:subject=Ensure that comm_spawn'd procs get user-specified forwarded envars
This commit was SVN r30012.
includes various fixes all over the C/R code which are
hard to group like the other patches.
Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values
Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)
This commit was SVN r29922.
Noticed these as part of #3694: external libevent's don't cause argv.h
to automatically get included.
Refs trac:3694
This commit was SVN r29897.
The following Trac tickets were found above:
Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694