* default to bind-to core
* map-by slot if np=2
* map-by socket (balance across sockets on each node) if np > 2
* map-by <obj> will imply rank-by <obj> by default (leave default binding as above)
Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs
cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values
This commit was SVN r29919.
Noticed these as part of #3694: external libevent's don't cause argv.h
to automatically get included.
Refs trac:3694
This commit was SVN r29897.
The following Trac tickets were found above:
Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
error: void value not ignored as it ought to be
in the C/R code by ignoring the return value of functions which
no longer return a value (only void).
Signed-off-by: Adrian Reber <adrian.reber@hs-esslingen.de>
This commit was SVN r29816.
This isn't being used yet - just enabling Nathan to do what he needs.
***** NOTE: any use of the OMPI_DB_GLOBAL_RANK database key must be protected by #ifdef OMPI_DB_GLOBAL_RANK as not all RTE's will define this key. *****
This commit was SVN r29708.
Resolves problems in loop_spawn where the timer was incorrectly firing and killing the overall job.
cmr=v1.7.4:reviewer=hjelmn
This commit was SVN r29661.
The data for each remote daemon is added later in the daemon callback function. Only the HNP retains info in the hash table.
If it is desirable to have each daemon retain its own coprocessor info, then this must be done in orte/mca/ess/base/ess_base_std_orted.c.
This commit was SVN r29497.
The following SVN revision numbers were found above:
r29489 --> open-mpi/ompi@2e2794fa15
to the hash table.
Tested and working on a system with 2 Xeon Phi co-processors.
cmr=v1.7.4:ticket=3847:reviewer=ompi-rm1.7
This commit was SVN r29489.
The following Trac tickets were found above:
Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
This change contains a non-mandatory modification
of the MPI-RTE interface. Anyone wishing to support
coprocessors such as the Xeon Phi may wish to add
the required definition and underlying support
****************************************************************
Add locality support for coprocessors such as the Intel Xeon Phi.
Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.
So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:
1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board
2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions
3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.
4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.
5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.
6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.
cmr:v1.7.4:reviewer=hjelmn
This commit was SVN r29435.
Fix two problems that surfaced when using direct launch under SLURM:
1. locally store our own data because some BTLs want to retrieve
it during add_procs rather than use what they have internally
2. cleanup MPI_Abort so it correctly passes the error status all
the way down to the actual exit. When someone implemented the
"abort_peers" API, they left out the error status. So we lost
it at that point and *always* exited with a status of 1. This
forces a change to the API to include the status.
cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch
This commit was SVN r29405.
So we now allow singletons to start on their own, only spawning an HNP when initiating an operation that actually requires it.
cmr:v1.7.4:reviewer=jsquyres
This commit was SVN r29354.
However, tools such as mpirun don't need it, and definitely shouldn't be using it. Ditto for procs launched by mpirun.
We used to have a way of dealing with this - we had the PMI component check to see if the process was the HNP or was launched by an HNP. Sadly, moving the OPAL db framework removed
that ability as OPAL has no notion of HNPs or proc type.
So add a boolean flag to the db_base_select API that allows us to restrict selection to "local" components. This gives the PMI component the ability to reject itself as required. W
e then need to pass that param into the ess_base_std_app call so it can pass it all down.
This commit was SVN r29341.
Create a new required key in the OMPI layer for retrieving a "node id" from the database. ALL RTE'S MUST DEFINE THIS KEY. This allows us to compute locality in the MPI layer, which is necessary when we do things like intercomm_create.
cmr:v1.7.4:reviewer=rhc:subject=Cleanup handling of modex data
This commit was SVN r29274.
Does not need to go to 1.7 branch as that ordering is different.
-This line, and those below, will be ignored--
M orte/mca/ess/hnp/ess_hnp_module.c
This commit was SVN r29225.
child stdout and falls back to plain pipe if openpty fails. Child uses
the 'usepty' flag to decide whether to treat this descriptor as a pty
or as a pipe.
Set 'usepty' flag to 0 upon openpty failure to inform the child that
it isn't dealing with a pty even though pty has been requested.
Thanks to Michal Peclo for reporting it and providing a patch.
cmr:v1.7.3:reviewer=jsquyres
cmr:v1.6.6:reviewer=jsquyres
This commit was SVN r29169.