* If < 0, it's an OPAL_ERR_* value
* If >= 0, it's the actual output value of the function
This is problematic for the OPAL_SOS stuff. This commit changes those
functions to always return OPAL_* statuses and send the output value
back through output parameters (like 95% of the rest of the code
base). This avoids the confusion with OPAL_SOS stuff and makes
paffinity work again (e.g., mpirun --bind-to-core ...).
I updated all paffinitiy modules for the new function signatures, and
bumped the paffinity API version up to 2.0.1. I don't think the
version change will matter, though, because we'll be introducing
support for hardware threads soon, which will either bump the
paffinity version again or we'll replace paffinity with
a new framework.
This commit was SVN r23197.
It is okay to not have a paffinity module IF you aren't using paffinity anyway. So don't error out of MPI_Init because a paffinity module wasn't selected.
Cleanup error reporting in the odls default module to (once and for all!) eliminate messages originating in the fork'd process. Create some new error codes to allow us to pass enough info back to the parent process to provide useful error messages.
This commit was SVN r23106.
anything for non-MPI apps. Oops! (But before you freak out, gentle
reader, note that mpi_paffinity_alone for MPI apps still worked fine)
When we made the switchover somewhere in the 1.3 series to have the
orted's do processor binding, then stuff like:
mpirun --mca mpi_paffinity_alone 1 hostname
should have bound hostname to processor 0. But it didn't because of a
subtle startup ordering issue: the MCA param registration for
opal_paffinity_alone was in the paffinity base (vs. being in
opal/runtime/opal_params.c), but it didn't actually get registered
until after the global variable opal_paffinity_alone was checked to
see if we wanted old-style affinity bindings. Oops.
However, for MPI apps, even though the orted didn't do the binding,
ompi_mpi_init() would notice that opal_paffinity_alone was set, yet
the process didn't seem to be bound. So the MPI process would bind
itself (this was done to support the running-without-orteds
scenarios). Hence, MPI apps still obeyed mpi_paffinity_alone
semantics.
But note that the error described above caused the new mpirun switch
--report-bindings to not work with mpi_paffinity_alone=1, meaning that
the orted would not report the bindings when mpi_paffinity_alone was
set to 1 (it ''did'' correctly report bindings if you used
--bind-to-core or one of the other binding options).
This commit separates out the paffinity base MCA param registration
into a small function that can be called at the Right place during the
startup sequence.
This commit was SVN r22602.
Revamp the affinity detection/set procedure in mpi_init to correctly detect when we have already been bound to processors, given the revised understanding of paffinity_get. Add a new paffinity macro to make checking for already bound a little nicer.
This commit was SVN r21402.
1. replacing mpi_paffinity_alone with opal_paffinity_alone - for back-compatibility, I have aliased mpi_paffinity_alone to the new param name. This caus
es a mild abstraction break in the opal/mca/paffinity framework - per the devel discussion...live with it. :-) I also moved the ompi_xxx global variable
that tracked maffinity setup so it could be properly closed in MPI_Finalize to the opal/mca/maffinity framework to avoid an abstraction break.
2. Added code to the odls/default module to perform paffinity binding and maffinity init between process fork and exec. This has been tested on IU's odi
n cluster and works for both MPI and non-MPI apps.
3. Revise MPI_Init to detect if affinity has already been set, and to attempt to set it if not already done. I have *not* tested this as I haven't yet f
igured out a way to do so - I couldn't get slurm to perform cpu bindings, even though it supposedly does do so.
This has only been lightly tested and would definitely benefit from a wider range of evaluation...
This commit was SVN r21209.
- Delete unnecessary header files using
contrib/check_unnecessary_headers.sh after applying
patches, that include headers, being "lost" due to
inclusion in one of the now deleted headers...
In total 817 files are touched.
In ompi/mpi/c/ header files are moved up into the actual c-file,
where necessary (these are the only additional #include),
otherwise it is only deletions of #include (apart from the above
additions required due to notifier...)
- To get different MCAs (OpenIB, TM, ALPS), an earlier version was
successfully compiled (yesterday) on:
Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled
This commit was SVN r21096.
In case we use memcmp, strlen, strup and friends include <string.h>
Also several constants.h are not included directly
- Let's have mca_topo_base_cart_create return ompi-errors in
ompi/mca/topo/base/topo_base_cart_create.c
This commit was SVN r20773.
base/paffinity_base_service.c:153: warning: 'phys_core' may be used uninitialized in this function
base/paffinity_base_service.c:153: note: 'phys_core' was declared here
This commit was SVN r19580.
See opal/mca/paffinity/paffinity.h for explanation as to the physical vs logical nature of the params used in the API.
Fixes trac:1435
This commit was SVN r19391.
The following Trac tickets were found above:
Ticket 1435 --> https://svn.open-mpi.org/trac/ompi/ticket/1435
* add "register" function to mca_base_component_t
* converted coll:basic and paffinity:linux and paffinity:solaris to
use this function
* we'll convert the rest over time (I'll file a ticket once all
this is committed)
* add 32 bytes of "reserved" space to the end of mca_base_component_t
and mca_base_component_data_2_0_0_t to make future upgrades
[slightly] easier
* new mca_base_component_t size: 196 bytes
* new mca_base_component_data_2_0_0_t size: 36 bytes
* MCA base version bumped to v2.0
* '''We now refuse to load components that are not MCA v2.0.x'''
* all MCA frameworks versions bumped to v2.0
* be a little more explicit about version numbers in the MCA base
* add big comment in mca.h about versioning philosophy
This commit was SVN r19073.
The following Trac tickets were found above:
Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392
For now, hide the OSX component with .ompi_ignore so only I can see it until I can ensure that it doesn't inadvertently interfere with Linux and Solaris support.
This clears the conflict with Windows.
This commit was SVN r18989.
Short version: remove opal_paffinity_alone and restore
mpi_paffinity_alone. ORTE makes various information available for the
MPI layer to decide what it wants to do in terms of processor
affinity.
Details:
* remove opal_paffinity_alone MCA param; restore mpi_paffinity_alone
MCA param
* move opal_paffinity_slot_list param registration to paffinity base
* ompi_mpi_init() calls opal_paffinity_base_slot_list_set(); if that
succeeds use that. If no slot list was set, see if
mpi_paffinity_alone was set. If so, bind this process to its Node
Local Rank (NLR). The NLR is the ORTE-maintained slot ID; if you
COMM_SPAWN to a host in this ORTE universe that already has procs
on it, the NLR for the new job will start at N (not 0). So this is
slightly better than mpi_paffinity_alone in the v1.2 series.
* If a slot list is specified *and* mpi_paffinity_alone is set, we
display an error and abort.
* Remove calls from rmaps/rank_file component to register and lookup
opal_paffinity mca params.
* Remove code in orte/odls that set affinities - instead, have them
just pass a slot_list if it exists.
* Cleanup the orte/odls code that determined
oversubscribed/want_processor as these were just opposites of each
other.
This commit was SVN r18874.
The following Trac tickets were found above:
Ticket 1383 --> https://svn.open-mpi.org/trac/ompi/ticket/1383
http://www.open-mpi.org/community/lists/devel/2008/04/3779.php
{{{
svn merge -r 18276:18380 https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play .
}}}
Any components not in the trunk, but in one of the effected frameworks *must* be
updated. Contact the list, look at the RFC, or look at the diff for how to do this.
Sorry for the early commit of this, but I wanted to get it in today (per RFC) and
didn't know if I would have a chance later today.
This commit was SVN r18381.
1. applied prefix rule to functions and variables of RMAPS rank_file component
2. cleaned ompi_mpi_init.c from paffinity code
3. paffinity code moved to new opal/mca/paffinity/base/paffinity_base_service.c file
4. added opal_paffinity_slot_list mca parameter
This commit was SVN r18019.
map_to_processor_id,
map_to_socket_core,
max_processor_id,
max_socket,
max_core.
In OS other then Linux, those functions will return OPAL_ERR_NOT_SUPPORTED.
--This Line, and those below, will be ignored--
M paffinity/linux/paffinity_linux_module.c
M paffinity/paffinity.h
M paffinity/base/base.h
M paffinity/base/paffinity_base_wrappers.c
M paffinity/windows/paffinity_windows_module.c
M paffinity/solaris/paffinity_solaris_module.c
This commit was SVN r17173.
A subset of this patch needs to be applied to v1.2
Refs trac:928
This commit was SVN r15918.
The following Trac tickets were found above:
Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928
Changes paffinity interface to use a cpu mask for available/preferred cpus
rather than the current coarse grained paffinity that lets the OS choose
which processor.
Macros for setting and clearing masks are provided.
Solaris and windows changes have not been made. Solaris subdirectory has some
suggested changes - however the relevant man pages for the Solaris 10 APIs
have some ambiguity regarding order in which one create and sets a processor
set. As we did not have access to a solaris 10 machine we could not test to
see the correct way to do the work under solaris.
This commit was SVN r14887.
The same treatement will happens on all sub-projects. The .h files
have to be C++ compatibles and all symbols with an external visibility
have to get the {PROJECT}_DECLSPEC in front of the prototype.
This commit was SVN r11340.