1
1
openmpi/opal/mca/hwloc
Ralph Castain 24c811805f ****************************************************************
This change contains a non-mandatory modification
       of the MPI-RTE interface. Anyone wishing to support
       coprocessors such as the Xeon Phi may wish to add
       the required definition and underlying support
****************************************************************

Add locality support for coprocessors such as the Intel Xeon Phi.

Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.

So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:

1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board

2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions

3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.

4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.

5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.

6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.

cmr:v1.7.4:reviewer=hjelmn

This commit was SVN r29435.
2013-10-14 16:52:58 +00:00
..
base **************************************************************** 2013-10-14 16:52:58 +00:00
external Revamp the handling of wrapper compiler flags. The user flags, main configure 2013-01-29 00:00:43 +00:00
hwloc172 Per Brice, silence warning on old Linux kernels 2013-09-16 15:43:33 +00:00
configure.m4 Don't mandate PCI support, because this will make builds on platforms 2013-03-19 16:20:08 +00:00
hwloc.h **************************************************************** 2013-10-14 16:52:58 +00:00
Makefile.am Per the RFC from Jeff, move hwloc from opal/mca/common to its own static framework ala libevent. Have ORTE daemons collect the topology info at startup and, if --enable-hwloc-xml is set, send that info back to the HNP for later use. The HNP only retains unique topology "templates" to reduce memory footprint. Have the daemon include the local topology info in the nidmap buffer sent to each app so the apps don't all hammer the local system to discover it for themselves. 2011-09-11 19:02:24 +00:00
README.txt Add some notes about maintaining the hwloc framework. 2011-09-12 19:40:18 +00:00

12 Sep 2011

Notes for hwloc component maintainers:

1. There can only be *1* hwloc version component at a time.
   Specifically: if there are multiple hwlocXYZ components (i.e.,
   different versions of hwloc), then they must all be .ompi_ignore'd
   except for 1.  This is because we currently m4_include all of the
   underlying hwloc's .m4 files -- if there are multiple hwlocXYZ
   components, I don't know if m4 will barf at the multiple,
   conflicting AC_DEFUNs, or whether it'll just do something
   completely undefined.

1a. As a consequence, if you're adding a new hwloc version component,
   you'll need to .ompi_ignore all others while you're testing the new
   one. 

2. If someone wants to fix #1 someday, we might be able to do what we
   do for libevent: OMPI_CONFIG_SUBDIR (instead of slurping in hwloc's
   .m4 files).