1
1
openmpi/opal/mca
Ralph Castain 24c811805f ****************************************************************
This change contains a non-mandatory modification
       of the MPI-RTE interface. Anyone wishing to support
       coprocessors such as the Xeon Phi may wish to add
       the required definition and underlying support
****************************************************************

Add locality support for coprocessors such as the Intel Xeon Phi.

Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.

So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:

1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board

2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions

3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.

4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.

5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.

6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.

cmr:v1.7.4:reviewer=hjelmn

This commit was SVN r29435.
2013-10-14 16:52:58 +00:00
..
backtrace Per RFC, remove darwin backtrace, since OS X since 10.5 has supported the 2013-07-05 19:06:27 +00:00
base MCA/base: When encounter a duplicate file value don't free the filename. 2013-09-21 18:53:36 +00:00
common add missing include 2013-07-21 20:18:17 +00:00
compress Update OPAL frameworks to use the MCA framework system. 2013-03-27 21:11:47 +00:00
crs Update OPAL frameworks to use the MCA framework system. 2013-03-27 21:11:47 +00:00
db ***** THIS INCLUDES A SMALL CHANGE IN THE MPI-RTE INTERFACE ***** 2013-10-08 18:37:59 +00:00
event Modify libevent to support cygwin - patch will be pushed upstream 2013-10-06 23:53:31 +00:00
hwloc **************************************************************** 2013-10-14 16:52:58 +00:00
if As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: 2013-08-22 16:37:40 +00:00
installdirs Fixes trac:376: bu default the wrappr compilers will enable rpath support 2013-05-11 00:49:17 +00:00
memchecker Fix bug introduced by r28236: make declaration and instantiation agree 2013-04-10 14:10:47 +00:00
memcpy Update OPAL frameworks to use the MCA framework system. 2013-03-27 21:11:47 +00:00
memory Also check for /dev/mic/scif when deciding whether to enable the Linux 2013-08-27 19:40:02 +00:00
pstat Update struct member name - this is why we put such things in the trunk before moving them to a branch, especially when coming from outside :-) 2013-10-07 15:43:43 +00:00
shmem Update OPAL frameworks to use the MCA framework system. 2013-03-27 21:11:47 +00:00
timer Update OPAL frameworks to use the MCA framework system. 2013-03-27 21:11:47 +00:00
Makefile.am Update the copyright notices for IU and UTK. 2005-11-05 19:57:48 +00:00
mca.h Refs trac:3275. 2012-09-11 20:47:24 +00:00