- Make a copy of enumerator data for default enumerators. This will allow
the caller to free their data once the enumerator has been created. This
is a change from just referencing the values array.
- Make mca_base_pvar_notify check if the pvar is valid before calling the
notify callback. This fixes a segmentation fault when destroying handles
after MPI_Finalize().
cmr=v1.7.4:ticket=trac:3861
This commit was SVN r29512.
The following Trac tickets were found above:
Ticket 3861 --> https://svn.open-mpi.org/trac/ompi/ticket/3861
The data for each remote daemon is added later in the daemon callback function. Only the HNP retains info in the hash table.
If it is desirable to have each daemon retain its own coprocessor info, then this must be done in orte/mca/ess/base/ess_base_std_orted.c.
This commit was SVN r29497.
The following SVN revision numbers were found above:
r29489 --> open-mpi/ompi@2e2794fa15
This file exists to help map usernames to proper names and email
addresses in the Open MPI github mirror of the canonical SVN repository.
The github mirror can be found here:
https://github.com/open-mpi/ompi-svn-mirror
I've seeded the file with the names of Cisco contributors. In order to
avoid exposing anyone's email address without their permission, we are
using an opt-in model for adding real email addresses.
This commit was SVN r29494.
Apologies for the breakage, I did my test build in the wrong window...
No reviewer.
cmr=v1.7.4:ticket=3865
This commit was SVN r29492.
The following SVN revision numbers were found above:
r29488 --> open-mpi/ompi@25dd719d4d
The following Trac tickets were found above:
Ticket 3865 --> https://svn.open-mpi.org/trac/ompi/ticket/3865
to the hash table.
Tested and working on a system with 2 Xeon Phi co-processors.
cmr=v1.7.4:ticket=3847:reviewer=ompi-rm1.7
This commit was SVN r29489.
The following Trac tickets were found above:
Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
First cut does not attempt any "cross-check". As we discover compilers
which complain about __noinline__, we will add specific cross checks to
handle those cases.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29488.
Due to deallocation ordering (and an entirely missed deallocation), we
were leaking modest amounts of memory inside libusnic_verbs.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29485.
- some free lists simply were not being OBJ_DESTRUCTed, so they never
freed their internal memory
- channel->recv_segs.ctx was being assigned in a way that got clobbered
by ompi_free_list_init_new, so the cleanup code that relied on it
being set never ran
- numerous other ".ctx" assignments were similarly ineffectual and were
not being consumed, so I deleted them
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29484.
This new routine can be called in exceptional situations, either
conditionally in BTL code or from a debugger, to help with debugging in
cases where MSGDEBUG1/2 or stats logging are impractical but more detail
is needed.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29483.
Pull the bulk of the functionality out into a new routine,
ompi_btl_usnic_print_stats, which can be used in other debugging
contexts. This also lets us eliminate the module->final_stats state
tracking.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29482.
Fixes:
- Segmentation fault when using watermark variables.
- Segmentation fault when using a handle bound to a no longer valid
performance variable.
- Incorrect return codes from MPI_T_pvar_* functions.
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r29481.
Thanks to Charles Gerlach for identifying the issue.
Oddly, this issue exists in trunk and v1.7, but ''not'' in the v1.6
tree (!).
cmr=v1.7.4:reviewer=hjelmn
This commit was SVN r29463.
Follow the convention established by the ompi/mca/common/sm tree and
prefix both the "install" and "no install" versions of the build with
"lib" so that Automake doesn't complain. Differentiate the two by
adding a "_noinst" suffix to the "no install" version.
This commit was SVN r29462.
- removed potential double-'/' in CUPTIDIR which makes trouble with rpmbuild's debugedit program (fixes trac:3854)
This commit was SVN r29461.
The following Trac tickets were found above:
Ticket 3854 --> https://svn.open-mpi.org/trac/ompi/ticket/3854
This change contains a non-mandatory modification
of the MPI-RTE interface. Anyone wishing to support
coprocessors such as the Xeon Phi may wish to add
the required definition and underlying support
****************************************************************
Add locality support for coprocessors such as the Intel Xeon Phi.
Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.
So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:
1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board
2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions
3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.
4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.
5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.
6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.
cmr:v1.7.4:reviewer=hjelmn
This commit was SVN r29435.