added on top of r29991 and:
* Consolidates the _debug variables in opal_datatype_internal.h and
opal_convertor.h
* Puts the DO_DEBUG macros back in the .c files, because they are
slightly different from each other
Refs trac:4004
This commit was SVN r29992.
The following SVN revision numbers were found above:
r29991 --> open-mpi/ompi@a88e143127
The following Trac tickets were found above:
Ticket 4004 --> https://svn.open-mpi.org/trac/ompi/ticket/4004
and extern'ed s an int in another. This caused a SIBGUS on
Solaris/SPARC.
This commit properly moves the extern to a .h file so that it's the
same in all files. It also moves the DO_DEBUG to the header file,
because it was defined to the same thing in multiple .c files.
cmr=v1.7.4:reviewer=bosilca:subject=fix SPARC SIGBUS in opal convertor code
This commit was SVN r29991.
The following Trac tickets were found above:
Ticket 3989 --> https://svn.open-mpi.org/trac/ompi/ticket/3989
and preventing access to potentially unaligned data.
Reviewed by Dave Goodell. Tested by Siegmarr Gross.
cmr=v1.7.4:reviewer=ompi-rm1.7:subject=fix SPARC SIGBUS in opal net code
This commit was SVN r29983.
The following Trac tickets were found above:
Ticket 3990 --> https://svn.open-mpi.org/trac/ompi/ticket/3990
Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff.
cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings
This commit was SVN r29978.
compiled with ummunotify support (which is the check that r29720 just
recently added).
This commit was SVN r29961.
The following SVN revision numbers were found above:
r29720 --> open-mpi/ompi@ae8c826527
includes various fixes all over the C/R code which are
hard to group like the other patches.
Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values
Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)
This commit was SVN r29922.
should be already set to the right value. This fixes a problem
identified by Guillaume Gouaillardet, where using a single
persistent receive leads to leaking the convertor stack memory.
Refs trac:3956
cmr=v1.7.4:reviewer=jsquyres:subject=Correctly handle the convertor internal stack for persistent receives.
This commit was SVN r29920.
The following Trac tickets were found above:
Ticket 3956 --> https://svn.open-mpi.org/trac/ompi/ticket/3956
* default to bind-to core
* map-by slot if np=2
* map-by socket (balance across sockets on each node) if np > 2
* map-by <obj> will imply rank-by <obj> by default (leave default binding as above)
Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs
cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values
This commit was SVN r29919.
got linked together (work on one caused work in the other):
* Clean up a bunch of VAR_SCOPE issues in configure. This includes:
* Using VAR_SCOPE_PUSH and VAR_SCOPE_POP in more places
* Cleaning up the use of some shell variables (e.g., name them better)
* Add support for external libevent via
--with-libevent=<dir-to-libevent-install-tree>, as specifically
asked for by downstream packagers.
* Revamp how wrapper compiler RPATH (and RUNPATH) support is done.
The external libevent work exposed weakenesses in how the original
RPATH/RUNPATH work was done, so we had to re-do it to be a bit more
robust.
This work has not yet been tested on Solaris.
Refs trac:3694
This commit was SVN r29899.
The following Trac tickets were found above:
Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
more:
- Remove OPAL_ENABLE_MULTI_THREADS, since it didn't really do anything
correctly. Opal always has threads enabled at this point.
- Remove OMPI_ENABLE_PROGRESS_THREADS, since this hasn't worked in
8 years and it has performance issues we'll never be able to
overcome. Note that we have plans for re-adding async progress, using
a hybrid protocol of async and sync sends.
- OMPI_ENABLE_THREAD_MULTIPLE now determines whether the thread lock
macros do the check or not.
- Condition variables are ALWAYS polling right now, which fixes the thread
live-lock currently found when THREAD_MULTIPLE is turned on.
This commit was SVN r29891.
This is helpful in the work for #3694: ensure that many places that
eventually end up in configure don't overly-pollute the global shell
variable space (because debugging accidental shell variable pollution
can be a real pain).
Refs trac:3694
This commit was SVN r29830.
The following Trac tickets were found above:
Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
* Ensure "cnt" is always initialized
* Ensure we dont' buffer overflow on strncat() -- need to ensure we
account for the terminating \0 character
* hwloc_get_type_depth() returns an int (not unsigned), and
HWLOC_TYPE_DEPTH_UNKNOWN if it's unknown (which is probably <0, but
still, might as well check what the official hwloc docs say to
check for)
cmr=v1.7.4:reviewer=rhc:subject=fix hwloc base compiler warnings
This commit was SVN r29686.
Change static opal_setlimit() function to return its value in an OUT
parameter and return the usual int error code indicating success or
failure.
The OUT param and return code need to be separated because the OUT
param is an unsigned type, but opal_setlimit() was returning -1 upon
failure. Hence, the caller could not know that it had failed because
the return type was previously an unsigned type.
cmr=v1.7.4:reviewer=rhc:subject=Fix opal sys_limits.c signed/unsigned warnings
This commit was SVN r29685.
should have been all along and fix one place that uses the file
Update opal_portable_platform.h with changes to mpi_portable_platform.h made
in r29608.
Make mpi_portable_platform.h a symlink to opal_portable_platform.h, so that
they won't get out of sync. I'd like to remove mpi_portable_platform.h, but
we don't automatically add -I${includedir}/openmpi/ to make that sane from
a header include point of view, so that's future work.
This commit was SVN r29618.
The following SVN revision numbers were found above:
r29608 --> open-mpi/ompi@b71bd51cdd
this change the next time we update libevent.
cmr=v1.7.4:ticket=3882
This commit was SVN r29597.
The following Trac tickets were found above:
Ticket 3882 --> https://svn.open-mpi.org/trac/ompi/ticket/3882
OSX atomic support is disabled by default. Enable with --enable-osx-builtin-atomics.
Fixes trac:2120
This commit was SVN r29568.
The following Trac tickets were found above:
Ticket 2120 --> https://svn.open-mpi.org/trac/ompi/ticket/2120
- Make a copy of enumerator data for default enumerators. This will allow
the caller to free their data once the enumerator has been created. This
is a change from just referencing the values array.
- Make mca_base_pvar_notify check if the pvar is valid before calling the
notify callback. This fixes a segmentation fault when destroying handles
after MPI_Finalize().
cmr=v1.7.4:ticket=trac:3861
This commit was SVN r29512.
The following Trac tickets were found above:
Ticket 3861 --> https://svn.open-mpi.org/trac/ompi/ticket/3861
First cut does not attempt any "cross-check". As we discover compilers
which complain about __noinline__, we will add specific cross checks to
handle those cases.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29488.
Fixes:
- Segmentation fault when using watermark variables.
- Segmentation fault when using a handle bound to a no longer valid
performance variable.
- Incorrect return codes from MPI_T_pvar_* functions.
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r29481.
This change contains a non-mandatory modification
of the MPI-RTE interface. Anyone wishing to support
coprocessors such as the Xeon Phi may wish to add
the required definition and underlying support
****************************************************************
Add locality support for coprocessors such as the Intel Xeon Phi.
Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.
So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:
1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board
2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions
3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.
4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.
5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.
6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.
cmr:v1.7.4:reviewer=hjelmn
This commit was SVN r29435.
Fix two problems that surfaced when using direct launch under SLURM:
1. locally store our own data because some BTLs want to retrieve
it during add_procs rather than use what they have internally
2. cleanup MPI_Abort so it correctly passes the error status all
the way down to the actual exit. When someone implemented the
"abort_peers" API, they left out the error status. So we lost
it at that point and *always* exited with a status of 1. This
forces a change to the API to include the status.
cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch
This commit was SVN r29405.
However, tools such as mpirun don't need it, and definitely shouldn't be using it. Ditto for procs launched by mpirun.
We used to have a way of dealing with this - we had the PMI component check to see if the process was the HNP or was launched by an HNP. Sadly, moving the OPAL db framework removed
that ability as OPAL has no notion of HNPs or proc type.
So add a boolean flag to the db_base_select API that allows us to restrict selection to "local" components. This gives the PMI component the ability to reject itself as required. W
e then need to pass that param into the ess_base_std_app call so it can pass it all down.
This commit was SVN r29341.
convertor accepted to be set to a position in the middle of a
predefined datatype. Once set there is was unable to provide the
second part of the datatype. This fix force the convertor to be
aligned on predefined datatypes boundaries for any non-contiguous
send convertor.
This commit was SVN r29285.
Create a new required key in the OMPI layer for retrieving a "node id" from the database. ALL RTE'S MUST DEFINE THIS KEY. This allows us to compute locality in the MPI layer, which is necessary when we do things like intercomm_create.
cmr:v1.7.4:reviewer=rhc:subject=Cleanup handling of modex data
This commit was SVN r29274.