1
1
Граф коммитов

275 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
f6da257477 configury: test external hwloc version is 1.8 or greater
hwloc_topology_dup is only available from hwloc 1.8
2014-12-22 13:42:38 +09:00
Jeff Squyres
40dd4c5b76 configury: manually remove some stamp-h? files
Due to what might be a bug in Automake, we need to remove stamp-h?
files manually.  See
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19418.
2014-12-20 08:32:57 -08:00
Ralph Castain
123fdd603f If we are using hwthread cpus, then default to binding there, letting the user override to whatever they want 2014-12-19 08:04:28 -08:00
Jeff Squyres
140bb3d421 hwloc configure: fix typo -- add missing $
Arrgh!  Missed a "$" in the last commit, making the test always
false.
2014-12-18 10:25:43 -08:00
Jeff Squyres
be6d46490f hwloc: only add CPPFLAGS if hwloc is actually being built
As pointed out by @ggouaillardet, we were adding some unnecessary -I
flags to CPPFLAFGS when --without-hwloc was being used.  This commit
slightly updates the hwloc191 component configury to only add such
things when the component is, in fact, going to be
compiled/installed.
2014-12-18 08:56:49 -08:00
Ralph Castain
0630680f36 Two cleanups required for transfer to 1.8.4:
* Use %d format for the topo signature as some systems apparently have problems with %u
* Use correct variable in show_help message
2014-12-12 17:23:32 -08:00
Ralph Castain
18d9fdfd8d Restore full topology comparison to support inventory monitoring 2014-12-09 01:33:06 -08:00
Ralph Castain
9b2f8cd840 Add the processor architecture to the topology signature 2014-12-09 01:17:00 -08:00
Ralph Castain
bb529ebd8e Revise the way we handle hetero nodes as users are finding this (a) a significant surprise, and (b) confusing as to when it is required. So try to automate it a bit by creating a topology "signature" that mpirun can share on the cmd line with the remote daemons, thus allowing them to check to see if they match. This isn't comprehensive of course - for now, it only checks the number of each type of hwloc object on the node. This is good enough to pickup major differences (e.g., where we have different numbers of sockets or assigned core bindings).
Retain the hetero-nodes flag for those cases where the user *knows* that there are differences and our automated system isn't good enough to see it.

Will obviously require further refinement as we find out which variances it can detect, and which it cannot.
2014-12-08 15:38:14 -08:00
Ralph Castain
cb15cc06e1 Minor changes per Jeff's request on PR for 1.8.4 2014-12-02 19:54:10 -08:00
Ralph Castain
960ef34988 Ensure the LSF ras adds the hosts to the allocation. Correctly handle the semi-colon vs comma situation in hwloc slot_lists 2014-11-30 14:37:37 -08:00
Ralph Castain
3f9d9ae8b6 Provide tighter LSF integration by correctly handling scenarios where the user has asked LSF to assign bindings. Fix a couple of typos in lex parser definitions. Tell hostfile parser to ignore binding designations in hostfiles. Add an attribute to indicate that cpusets were provided as physical cpu ids.
Once validated, a version of this will be backported to the v1.8.4 release.
2014-11-30 11:50:31 -08:00
Ralph Castain
d0704ef118 Restore handling of physical processors in rankfiles. Note that the prior implementation was likely incorrect as it falsely assumed that physical core indices were unique, which isn't always true. Stipulate that physical rankfiles can only include PU numbers, and bind the result to the core that contains that physical PU. Update the mpirun man page to cover the new use-case. 2014-11-10 14:00:40 -08:00
Ralph Castain
2a90788724 Support physical processor ids in rankfile 2014-11-10 14:00:40 -08:00
Ralph Castain
7269dae2da Per patch from Samuel Thibault, silence warning from Clang
This commit was SVN r32720.
2014-09-12 22:22:11 +00:00
Ralph Castain
1f2c5863f0 Revert r32675 in favor of a different solution proposed by Brice
This commit was SVN r32715.

The following SVN revision numbers were found above:
  r32675 --> open-mpi/ompi@916f98a3ee
2014-09-11 21:58:48 +00:00
Ralph Castain
e671620ac7 Per request from Jeff: tune up the help messages for binding options
Refs trac:4898

This commit was SVN r32691.

The following Trac tickets were found above:
  Ticket 4898 --> https://svn.open-mpi.org/trac/ompi/ticket/4898
2014-09-09 22:39:22 +00:00
Ralph Castain
4207b4c4ad Improve the --bind-to help message to better indicate the default options under various values of np. Remove the warning message if the user doesn't specify a binding policy and we are overloaded
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32687.
2014-09-08 21:03:51 +00:00
Ralph Castain
5649841e26 Provide missing include file - generates errors when used with Intel compilers
This commit was SVN r32685.
2014-09-08 19:04:40 +00:00
Ralph Castain
916f98a3ee Rename an HWLOC member of a union in the diff.h file to avoid a naming conflict with an external library - it isn't that HWLOC did something wrong, but rather that the name being used is so close to a type name that other folks has a tendency to #define it as well. We could argue with those folks that what they are doing is incorrect, but it is just easier to make a slight change and resolve the problem.
This commit was SVN r32675.
2014-09-07 15:42:05 +00:00
Ralph Castain
b372cd02d0 Ensure the hwloc headers get installed when --with-devel-headers is given
This commit was SVN r32663.
2014-09-02 19:58:25 +00:00
Ralph Castain
60eb7124ab Upgrade to hwloc 1.9.1
This commit was SVN r32652.
2014-08-31 03:13:06 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Gilles Gouaillardet
623945466e silence a warning on Solaris
cmr=v1.8.2:reviewer=bgoglin

This commit was SVN r32503.
2014-08-11 11:00:55 +00:00
George Bosilca
daa076995a orte_rmaps_numa_node_t -> opal_rmaps_numa_node_t
This commit was SVN r32380.
2014-07-31 19:58:47 +00:00
Ralph Castain
2aade28259 Protect against NULL return from malloc
This commit was SVN r32056.
2014-06-19 20:57:56 +00:00
Ralph Castain
3f04d50cb0 Per the ticket, resolve our handling of overload conditions to provide a more consistent response. If we are overloaded (i.e., attempting to bind more processes to a location than the number of cpus under that location), then we consider the following conditions:
(a) default binding policy is in effect. In this case, we will emit a
warning and default to not binding unless the user provided the
"oversubscribe" or "overload" modifier to the "bind-to" option.

(b) user-specified binding policy is in effect. In this case, we will
error out unless the user provided the "oversubscribe" or "overload"
modifier to the "bind-to" option as we cannot meet the directive.

Either "bind-to" modifier (oversubscribe or overload) will be accepted for
now - in 1.9, we will deprecate the "overload" term in favor of
"oversubscribe".

Also added the ability to accept a --bind-to modifier without specifying the binding policy itself so a user can specify overload-allowed with the default policy.

Closes trac:4345

cmr=v1.8.2:reviewer=rhc:subject=resolve handling of overload conditions

This commit was SVN r32005.

The following Trac tickets were found above:
  Ticket 4345 --> https://svn.open-mpi.org/trac/ompi/ticket/4345
2014-06-14 15:38:32 +00:00
Ralph Castain
06dbfa3098 Make the cpus-per-proc equivalent a little more intuitive:
* allow users to specify just a modifier for map-by instead of requiring that they also specify a policy. Thus, we now accept --map-by :pe=3 as indicating that we should use the default mapping policy, but bind 3 cpus/proc.

* if users specify a pe's/proc but no policy, default to --map-by NUMA to ensure we have access to multiple cpus for the request. This won't guarantee we have access to enough to meet the request, but gives us a chance. In addition, we know that binding a proc to multiple cpus will work best if those cpus are all in the same NUMA, so this provides some degree of optimized behavior.

Per a request from Jeff, define "oversubscribe" for binding as a synonym for the "overload" modifier.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31967.
2014-06-08 20:26:59 +00:00
Ralph Castain
5602156a1c Use the correct abstraction layer name for the data dirs
This commit was SVN r31684.
2014-05-08 14:32:24 +00:00
Jeff Squyres
81afb4e18a hwloc: commit minor bug fix from hwloc git
Bring down 3aa0ed6 from the hwloc v1.7 branch: Stevens says we should
GETFD before we SETFD, so we do

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31683.
2014-05-08 14:29:10 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Ralph Castain
2b7a3ae601 Per RFC, continue pecking away at the build system renaming
OMPI_CONFIG_SUBDIR  -> OPAL_CONFIG_SUBDIR
   OMPI_CONFIG_SUBDIR_ARGS  ->  OPAL_CONFIG_SUBDIR_ARGS

This commit was SVN r31647.
2014-05-06 16:27:38 +00:00
Ralph Castain
a1ae20fddb Per RFC: OMPI_CFLAGS_BEFORE_PICKY -> OPAL_CFLAGS_BEFORE_PICKY
- This line, and those below, will be ignored--

M    opal/mca/event/libevent2021/configure.m4
M    opal/mca/hwloc/hwloc172/configure.m4
M    configure.ac
M    config/opal_setup_libltdl.m4
M    config/opal_check_visibility.m4
M    config/opal_setup_cc.m4

This commit was SVN r31637.
2014-05-05 22:22:33 +00:00
Ralph Castain
e20dae536c Last step under current RFC: OMPI_CHECK_WITHDIR -> OPAL_CHECK_WITHDIR
This commit was SVN r31585.
2014-05-01 15:38:07 +00:00
Ralph Castain
3b64c603b4 First stage of RFC to rename OMPI_foo build system support: change OMPI_CHECK_PACKAGE -> OPAL_CHECK_PACKAGE
This commit was SVN r31582.
2014-05-01 14:24:56 +00:00
Jeff Squyres
67bb0c261a hwloc: ensure that an internal fd is marked as close-on-exec
Make sure that an internal, long-lived hwloc fd is marked as
close-on-exec so that children don't inherit it.  This patch is
committed upstream in the hwloc master and v1.9 branches as 7489287
and b654e19, respectively.  The patch applied here is the exact same
logic, but the surrounding code changed slightly since the hwloc v1.7
series, so the patch doesn't apply cleanly.

Refs trac:4550

This commit was SVN r31511.

The following Trac tickets were found above:
  Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
2014-04-23 21:36:38 +00:00
Jeff Squyres
82e104719a hwloc/rmaps base: Add missing help message.
Also, add missing ORTE_ERROR_LOG in the other case where this error
message is used (i.e., ORTE_ERROR_LOG was used in the one place, so
let's also use it in the other place).

This commit was SVN r31321.
2014-04-07 15:39:54 +00:00
Jeff Squyres
37d4c22912 hwloc base: Remove unused help messages.
These type of help messages are now displayed by the MCA var system
itself (via enumerated values).

This commit was SVN r31320.
2014-04-07 15:39:29 +00:00
Mike Dubman
3e81ee9f0d BUILD: fix "make dist" failure on some linux distro with old csh/tcsh
on some linux distro (sles11sp2) csh fails to parse $LS_COLORS and borks with error:
Unknown colorls variable `mh'.

The workaround is to unset LS_COLORS before calling to csh script

reviewed by Jeff

cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31244.
2014-03-27 06:34:00 +00:00
Ralph Castain
9fca25a8dd Catch one more place where we need to use the actual topology instead of opal_hwloc_topology. Thanks to Tetsuya Mishima for the patch
Reviewed okay. RM-approved

cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r31016.
2014-03-12 00:49:54 +00:00
Ralph Castain
081669b440 When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it
cmr=v1.7.5:reviewer=rhc:subject=fix pretty-print of bindings

This commit was SVN r30968.
2014-03-10 15:53:07 +00:00
Ralph Castain
4e32a82638 If we are binding to hwthreads, then we need to treat hwthreads as cpus to get the mapping right
cmr=v1.7.5:reviewer=jsquyres:subject=set hwthreads to cpus when binding to them

This commit was SVN r30648.
2014-02-09 16:14:38 +00:00
Ralph Castain
ca0c806662 Resolve the problem of binding in inverted topologies - check the relative depth of the map and bind objects in the topology, and let that determine whether we bind downward or upwards.
cmr=v1.7.5:reviewer=jsquyres:subject=Resolve the problem of binding in inverted topologies

This commit was SVN r30643.
2014-02-09 05:30:17 +00:00
Ralph Castain
193cceb483 Okay, since a certain other RM out there made a fuss about being able to lock their daemons to specified cores, offer the same option here. The MCA param orte_daemon_cores can be used to specify which core(s) you want the orte daemons to use. This will have no bearing on the application procs - unbound will remain unbound, and binding directives will be applied to the apps.
Yippee skippee...

This commit was SVN r30513.
2014-01-30 23:50:14 +00:00
Jeff Squyres
bc795cd25b Fix typo in comment.
This commit was SVN r30504.
2014-01-30 18:05:34 +00:00
George Bosilca
18ae20022a Don't forget to release the bitmaps.
This commit was SVN r30428.
2014-01-26 17:24:38 +00:00
Jeff Squyres
7768828d2d Addendum to r30298: tweak the wording of the help messages a bit.
Refs trac:4117.  Please use this commit rather than the patch attached to
the ticket; the patch had a few mistakes in the tweaked wording.

This commit was SVN r30362.

The following SVN revision numbers were found above:
  r30298 --> open-mpi/ompi@58479399c3

The following Trac tickets were found above:
  Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-01-22 12:17:14 +00:00
Jeff Squyres
afb33b8de8 Bring down upstream hwloc 438d9ed7457888c63d29778bda56cd27c52a8d51 to
work around buggy NUMA node cpusets (i.e., buggy BIOSs).

Thanks to Jeff Becker for reporting the issue.

Submitted by Brice Goglin, reviewed by Jeff Squyres.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30306.
2014-01-17 13:49:56 +00:00
Ralph Castain
58479399c3 As per RFC and telecon, deprecate cmd line options and their corresponding MCA params for old-style mapping and binding directives
cmr=v1.7.5:reviewer=jsquyres:subject=deprecate old-style mapping and binding directives

This commit was SVN r30298.
2014-01-15 14:48:39 +00:00
Jeff Squyres
962a14cf6d Pull upstream hwloc commit 5198d4c0fd6cae12756fb44aed16a2d4a58b1a25
Change the logic in bind.c to only include <malloc.h> if we don't have posix_memalign.
    
In http://www.open-mpi.org/community/lists/devel/2014/01/13619.php,
Paul Hargrove found a compiler warning on OpenBSD where <malloc.h>
exists, but is not intended to be used (and doesn't error out, so
AC_CHECK_HEADERS says its ok).

Reviewed by Brice Goglin.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30234.
2014-01-10 17:37:00 +00:00
Ralph Castain
fb9e427320 One last corner case - when encountering an overload condition (e.g., by comm_spawning more procs than we have cores) and we are using the default binding policy, do *not* bind the new procs to anything as this can cause major problems. Instead, let the spawn succeed since the user didn't specifically ask to be bound, and leave the new procs as unbound.
Refs trac:4077

This commit was SVN r30200.

The following Trac tickets were found above:
  Ticket 4077 --> https://svn.open-mpi.org/trac/ompi/ticket/4077
2014-01-09 22:39:34 +00:00
Ralph Castain
f179f2086b Do a better job of reporting bindings - if someone gives a spec that binds us to all processors, then we are effectively unbound and should report it clearly instead of outputting a long line of B's.
cmr=v1.7.4:reviewer=jsquyres:subject=Do a better job of reporting bindings

This commit was SVN r30179.
2014-01-09 16:16:16 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Jeff Squyres
cbc9ee5894 Refs trac:4038
Revert r30096 and use better precious variable names.

This commit was SVN r30132.

The following SVN revision numbers were found above:
  r30096 --> open-mpi/ompi@e0f6a4ef47

The following Trac tickets were found above:
  Ticket 4038 --> https://svn.open-mpi.org/trac/ompi/ticket/4038
2014-01-07 15:58:40 +00:00
Ralph Castain
31248c0985 Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match.
Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node.

Refs trac:4003

This commit was SVN r30033.

The following Trac tickets were found above:
  Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003
2013-12-20 20:42:39 +00:00
Jeff Squyres
802c89680a Protect hwloc/configure/m4's use of some temporary shell variables
Fix problem reported by Paul Hargrove:
http://www.open-mpi.org/community/lists/devel/2013/12/13519.php

cmr=v1.7.4:reviewer=brbarret

This commit was SVN r30013.
2013-12-20 14:48:40 +00:00
Ralph Castain
55cd65b149 Don't warn about binding (process and/or memory) if the node cannot do it or if we would overload, but it wasn't specifically requested by the user (i.e., it is the result of the default policy). Instead, just don't bind and quietly move along.
Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff.

cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings

This commit was SVN r29978.
2013-12-19 16:31:45 +00:00
Ralph Castain
8b6d117541 Per the OMPI devel conference that changed our default behaviors:
* default to bind-to core 
* map-by slot if np=2
* map-by socket (balance across sockets on each node) if np > 2
* map-by <obj> will imply rank-by <obj> by default (leave default binding as above) 

Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs

cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values

This commit was SVN r29919.
2013-12-15 17:25:54 +00:00
Jeff Squyres
3bd9c603ff Clean up variables used in configure with OPAL_VAR_SCOPE.
This is helpful in the work for #3694: ensure that many places that
eventually end up in configure don't overly-pollute the global shell
variable space (because debugging accidental shell variable pollution
can be a real pain).

Refs trac:3694

This commit was SVN r29830.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-06 23:40:34 +00:00
Jeff Squyres
abeef55a55 Fix a few compiler warnings reported by clang:
* Ensure "cnt" is always initialized
 * Ensure we dont' buffer overflow on strncat() -- need to ensure we
   account for the terminating \0 character
 * hwloc_get_type_depth() returns an int (not unsigned), and
   HWLOC_TYPE_DEPTH_UNKNOWN if it's unknown (which is probably <0, but
   still, might as well check what the official hwloc docs say to
   check for)

cmr=v1.7.4:reviewer=rhc:subject=fix hwloc base compiler warnings

This commit was SVN r29686.
2013-11-13 15:54:01 +00:00
Mike Dubman
840e2cb4a2 mindist: cosmetic, use fallback to byslot if unable to read NUMA info, small fix.
fixed by Elena, reviewed by Ralph/Mike
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29679.
2013-11-13 09:26:40 +00:00
Ralph Castain
6ef7dc1f42 We previously weren't checking all the bits in locality to ensure we had a complete match - instead, we would report "local" to the specified level if only one bit matched. Ensure that a est for locality tests local to the specified level by checking that *all* bits match.
cmr=v1.7.4:reviewer=hjelmn:subject=Ensure locality is properly tested

This commit was SVN r29643.
2013-11-08 04:21:05 +00:00
Ralph Castain
75c306994e Add some debug
This commit was SVN r29523.
2013-10-26 02:26:21 +00:00
Ralph Castain
772a376d73 Correct location of elog file
Refs trac:3847

This commit was SVN r29438.

The following Trac tickets were found above:
  Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
2013-10-14 19:21:45 +00:00
Ralph Castain
24c811805f ****************************************************************
This change contains a non-mandatory modification
       of the MPI-RTE interface. Anyone wishing to support
       coprocessors such as the Xeon Phi may wish to add
       the required definition and underlying support
****************************************************************

Add locality support for coprocessors such as the Intel Xeon Phi.

Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.

So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:

1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board

2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions

3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.

4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.

5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.

6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.

cmr:v1.7.4:reviewer=hjelmn

This commit was SVN r29435.
2013-10-14 16:52:58 +00:00
Ralph Castain
e01953b440 Per Brice, silence warning on old Linux kernels
Refs trac:3744

This commit was SVN r29179.

The following Trac tickets were found above:
  Ticket 3744 --> https://svn.open-mpi.org/trac/ompi/ticket/3744
2013-09-16 15:43:33 +00:00
Ralph Castain
845e92bc5d Remove the old version of hwloc. Update the new one to reflect the official release dates.
Refs trac:3744

This commit was SVN r29154.

The following Trac tickets were found above:
  Ticket 3744 --> https://svn.open-mpi.org/trac/ompi/ticket/3744
2013-09-10 16:30:13 +00:00
Ralph Castain
46ed907003 Correctly handle list of cores specified in the rankfile - i.e., a rankfile entry such as:
rank 0=foo slot=0:0-1;1:0,1

cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29152.
2013-09-08 02:04:29 +00:00
Ralph Castain
0d7fb932f1 Remove build product file
Refs trac:3744

This commit was SVN r29120.

The following Trac tickets were found above:
  Ticket 3744 --> https://svn.open-mpi.org/trac/ompi/ticket/3744
2013-09-04 16:38:22 +00:00
Ralph Castain
6011a4d29c As per the telecon, update hwloc to v1.7.2 so we can add MIC support. Ignore hwloc1.5.2 component for now until this tests out - will remove it then.
cmr:v1.7.4:reviewer=jsquyres

This commit was SVN r29107.
2013-09-03 16:23:42 +00:00
Ralph Castain
7a7cfdd519 A little cleanup - the base function to sort numa lists must return something or you get a warning about non-void function returning without value, so cleanup the return values. Ensure the mindist module actually checks for a return of "error" so it won't segfault, and have it emit a polite message when that happens.
cmr:v1.7.3:reviewer=jladd

This commit was SVN r29089.
2013-08-29 20:01:06 +00:00
Joshua Ladd
1802aabf1a Add support for autodetecting a MLNX HCA in the rmaps min distance feature. In this way, .ini files distributed with software stacks need not specify a particular HCA but instead may select the key word auto which will automatically select the discovered device. To use this feature, simply pass the keyword auto instead of a specific device name, --mca rmaps_base_dist_hca auto. If more than one card is installed, the mapper will inform the user of this and, at this point, the user will then need to specify which card via the normal route, e.g. --mca rmaps_base_dist_hca <dev_name>. This should be added to \ncmr=v1.7.4:reviewer=rhc:subject=Autodetect logic for min dist mapping
This commit was SVN r29079.
2013-08-28 16:23:33 +00:00
Ralph Castain
446e33a5d8 There are cases where we want to use the novm state machine, but the backend node topology differs from that where mpirun is executing. In those cases, we can wind up thinking we are oversubscribed because the head node has fewer cores than the compute nodes.
To resolve this situation, add the ability to specify a backend topology file that mpirun shall use for its mapping operations. Create a new "set_topology" function in opal hwloc to support it.

This commit was SVN r28682.
2013-06-27 03:04:50 +00:00
Joshua Ladd
46362d2761 Stomps compiler warnings in HCA min-dist calculation. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28620.
2013-06-12 16:25:25 +00:00
Jeff Squyres
6d173af329 This commit introduces a new "mindist" ORTE RMAPS mapper, as well as
some relevant updates/new functionality in the opal/mca/hwloc and
orte/mca/rmaps bases.  This work was mainly developed by Mellanox,
with a bunch of advice from Ralph Castain, and some minor advice from
Brice Goglin and Jeff Squyres.

Even though this is mainly Mellanox's work, Jeff is committing only
for logistical reasons (he holds the hg+svn combo tree, and can
therefore commit it directly back to SVN).

-----

Implemented distance-based mapping algorithm as a new "mindist"
component in the rmaps framework.  It allows mapping processes by NUMA
due to PCI locality information as reported by the BIOS - from the
closest to device to furthest.

To use this algorithm, specify:

   {{{mpirun --map-by dist:<device_name>}}}

where <device_name> can be mlx5_0, ib0, etc.

There are two modes provided:

 1. bynode: load-balancing across nodes
 1. byslot: go through slots sequentially (i.e., the first nodes are
     more loaded)

These options are regulated by the optional ''span'' modifier; the
command line parameter looks like:

    {{{mpirun --map-by dist:<device_name>,span}}}

So, for example, if there are 2 nodes, each with 8 cores, and we'd
like to run 10 processes, the mindist algorithm will place 8 processes
to the first node and 2 to the second by default. But if you want to
place 5 processes to each node, you can add a span modifier in your
command line to do that.

If there are two NUMA nodes on the node, each with 4 cores, and we run
6 processes, the mindist algorithm will try to find the NUMA closest
to the specified device, and if successful, it will place 4 processes
on that NUMA but leaving the remaining two to the next NUMA node.

You can also specify the number of cpus per MPI process. This option
is handled so that we map as many processes to the closest NUMA as we
can (number of available processors at the NUMA divided by number of
cpus per rank) and then go on with the next closest NUMA.

The default binding option for this mapping is bind-to-numa. It works
if you don't specify any binding policy. But if you specified binding
level that was "lower" than NUMA (i.e hwthread, core, socket) it would
bind to whatever level you specify.

This commit was SVN r28552.
2013-05-22 13:04:40 +00:00
Jeff Squyres
55382c1bf8 Bring over upstream hwloc trunk commit
https://svn.open-mpi.org/trac/hwloc/changeset/5592 to fix the merging
of groups when they are I/O objects.

This commit was SVN r28551.
2013-05-22 12:34:59 +00:00
Jeff Squyres
4b9b3a81ff Update the list of post-1.5.2 r numbers from hwloc that we have
committed here.

This commit was SVN r28458.
2013-05-07 01:22:06 +00:00
Jeff Squyres
ee0cdf86fd Fix issue raised by Stefan Friedel: remove an extraneous -L that is
added by hwloc's embedding so that it doesn't appear in
libhwloc_embedded.la (and therefore propogate all the way up to
libmpi.la). 

Committed upstream in hwloc SVN r5588.

This commit was SVN r28457.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r5588
2013-05-07 01:21:18 +00:00
Ralph Castain
c081a520a3 Fix --without-hwloc
This commit was SVN r28396.
2013-04-25 19:13:56 +00:00
Jeff Squyres
349ee654c1 Fix some --without-hwloc compile errors. Also remove one
assigned-but-not-used variable assignment.

This commit was SVN r28321.
2013-04-10 15:08:31 +00:00
Ralph Castain
3bfa53eb91 Cleanup (again) the solaris topology code in hwloc...sigh.
This commit was SVN r28294.
2013-04-06 14:45:32 +00:00
Ralph Castain
ec00fa3132 Fix missing variable declaration in hwloc 1.5.2
This commit was SVN r28293.
2013-04-05 17:43:34 +00:00
Nathan Hjelm
365cf48db5 Update OPAL frameworks to use the MCA framework system.
This commit was SVN r28239.
2013-03-27 21:11:47 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Jeff Squyres
1a048d6ee6 Remove a duplicate variable declaration.
This commit was SVN r28224.
2013-03-27 01:15:27 +00:00
Ralph Castain
317915225c Finish the binding cleanup by removing the no-longer-used binding level scheme. This proved to be fallible as there is no guarantee that the hierarchy it used matched physical reality of the machine (e.g., is L3 "above" the socket or not). Still have to complete the ppr update, but get the rest of it correct.
This commit was SVN r28223.
2013-03-26 20:09:49 +00:00
Ralph Castain
6ee32767d4 Restore the cpus-per-proc option for byslot and bynode mapping. Remove the bind_idx (which recorded the index of the hwloc object where the proc was bound) as this would no longer be unique, and just use the bitmap as the standard reference for location. Update the relative locality computation to take bitmaps as its argument.
This commit was SVN r28219.
2013-03-26 18:27:50 +00:00
Jeff Squyres
6c8d0450a3 Update the post-hwloc-1.5.2 patch list.
This commit was SVN r28218.
2013-03-26 16:18:52 +00:00
Jeff Squyres
f79716dfd4 Include <hwloc.h> so that the symbols in this file are subject to the
<hwloc/rename.h> renaming.

This commit was SVN r28215.
2013-03-26 15:49:52 +00:00
Jeff Squyres
6695b5e17a Re-apply r28040 from Eugene: a post-hwloc release fix for Solaris
binding.  This fix was included in the upstream 1.6 series, but not
the upstream 1.5 series, and was therefore missed when we brought
1.5.2 to OMPI.

This commit was SVN r28212.

The following SVN revision numbers were found above:
  r28040 --> open-mpi/ompi@3d44f97572
2013-03-26 13:27:23 +00:00
Ralph Castain
8a79d37ac2 Fix a few bugs in the hwloc integration code. The "set binding policy" macro should flag that the policy was indeed set. Some systems don't report sockets, so the print functions need to check for that condition.
cmr:v1.7

This commit was SVN r28209.
2013-03-25 17:51:45 +00:00
Jeff Squyres
e5838e6121 Don't mandate PCI support, because this will make builds on platforms
that don't have libpciaccess fail (e.g., OS X, or any machine without
libpciaccess).

This commit was SVN r28181.
2013-03-19 16:20:08 +00:00
Jeff Squyres
90802410a8 Update hwloc from 1.5.1 to 1.5.2. Re-enable hwloc PCI support by
default, since it will now use libpciaccess (if available).

This commit was SVN r28178.
2013-03-18 23:02:56 +00:00
Ralph Castain
8d2fa3693b First cut at removing the native Windows support. Remove all the Windows-specific components, and the .windows files sprinkled around. Remove the Windows platform files and MTT scripts. Update the NEWS to point Windows users to the cygwin package.
This commit was SVN r28116.
2013-02-26 20:44:56 +00:00
Brian Barrett
7c3e42a689 Work around issue shown in #3505 by not linking against libpci by default.
This commit was SVN r28076.
2013-02-19 16:19:33 +00:00
Ralph Castain
037918e7b4 Correctly parse the rank file slot_list when given "S:C" - the first position holds the socket, so start looking for cores at posn=1
This commit was SVN r28054.
2013-02-13 13:06:03 +00:00
Eugene Loh
3d44f97572 Fix hwloc get-cpubind routine for Solaris. FIRST, check
processor_bind to see if we're bound to a single core.
If not, THEN check lgroup affinity.  Already CMR'ed to
v1.6 (trac 3507) and fixed upstream in hwloc (r5295).

This commit was SVN r28040.

The following SVN revision numbers were found above:
  r5295 --> open-mpi/ompi@6df8cb0f02
2013-02-10 04:02:19 +00:00
Brian Barrett
b8442ba505 Revamp the handling of wrapper compiler flags. The user flags, main configure
flags, and mca flags are kept seperate until the very end.  The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro.  MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.

This commit was SVN r27950.
2013-01-29 00:00:43 +00:00
Ralph Castain
f6b4db0b79 Fix rank_file operations. We changed the syntax to use semi-colons between multiple slot assignments so that we could use the comma to separate specific cores, but somehow the flex definitions didn't get updated to accept that character. We also incorrectly zero'd the bitmap between slot assignment sections, and so multiple slot assignments only wound up making the last one in the list.
This commit was SVN r27908.
2013-01-25 18:33:25 +00:00