openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	e671620ac7	Per request from Jeff: tune up the help messages for binding options Refs trac:4898 This commit was SVN r32691. The following Trac tickets were found above: Ticket 4898 --> https://svn.open-mpi.org/trac/ompi/ticket/4898	2014-09-09 22:39:22 +00:00
Ralph Castain	4207b4c4ad	Improve the --bind-to help message to better indicate the default options under various values of np. Remove the warning message if the user doesn't specify a binding policy and we are overloaded cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32687.	2014-09-08 21:03:51 +00:00
Ralph Castain	5649841e26	Provide missing include file - generates errors when used with Intel compilers This commit was SVN r32685.	2014-09-08 19:04:40 +00:00
Ralph Castain	916f98a3ee	Rename an HWLOC member of a union in the diff.h file to avoid a naming conflict with an external library - it isn't that HWLOC did something wrong, but rather that the name being used is so close to a type name that other folks has a tendency to #define it as well. We could argue with those folks that what they are doing is incorrect, but it is just easier to make a slight change and resolve the problem. This commit was SVN r32675.	2014-09-07 15:42:05 +00:00
Ralph Castain	b372cd02d0	Ensure the hwloc headers get installed when --with-devel-headers is given This commit was SVN r32663.	2014-09-02 19:58:25 +00:00
Ralph Castain	60eb7124ab	Upgrade to hwloc 1.9.1 This commit was SVN r32652.	2014-08-31 03:13:06 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Gilles Gouaillardet	623945466e	silence a warning on Solaris cmr=v1.8.2:reviewer=bgoglin This commit was SVN r32503.	2014-08-11 11:00:55 +00:00
George Bosilca	daa076995a	orte_rmaps_numa_node_t -> opal_rmaps_numa_node_t This commit was SVN r32380.	2014-07-31 19:58:47 +00:00
Ralph Castain	2aade28259	Protect against NULL return from malloc This commit was SVN r32056.	2014-06-19 20:57:56 +00:00
Ralph Castain	3f04d50cb0	Per the ticket, resolve our handling of overload conditions to provide a more consistent response. If we are overloaded (i.e., attempting to bind more processes to a location than the number of cpus under that location), then we consider the following conditions: (a) default binding policy is in effect. In this case, we will emit a warning and default to not binding unless the user provided the "oversubscribe" or "overload" modifier to the "bind-to" option. (b) user-specified binding policy is in effect. In this case, we will error out unless the user provided the "oversubscribe" or "overload" modifier to the "bind-to" option as we cannot meet the directive. Either "bind-to" modifier (oversubscribe or overload) will be accepted for now - in 1.9, we will deprecate the "overload" term in favor of "oversubscribe". Also added the ability to accept a --bind-to modifier without specifying the binding policy itself so a user can specify overload-allowed with the default policy. Closes trac:4345 cmr=v1.8.2:reviewer=rhc:subject=resolve handling of overload conditions This commit was SVN r32005. The following Trac tickets were found above: Ticket 4345 --> https://svn.open-mpi.org/trac/ompi/ticket/4345	2014-06-14 15:38:32 +00:00
Ralph Castain	06dbfa3098	Make the cpus-per-proc equivalent a little more intuitive: * allow users to specify just a modifier for map-by instead of requiring that they also specify a policy. Thus, we now accept --map-by :pe=3 as indicating that we should use the default mapping policy, but bind 3 cpus/proc. * if users specify a pe's/proc but no policy, default to --map-by NUMA to ensure we have access to multiple cpus for the request. This won't guarantee we have access to enough to meet the request, but gives us a chance. In addition, we know that binding a proc to multiple cpus will work best if those cpus are all in the same NUMA, so this provides some degree of optimized behavior. Per a request from Jeff, define "oversubscribe" for binding as a synonym for the "overload" modifier. cmr=v1.8.2:reviewer=rhc This commit was SVN r31967.	2014-06-08 20:26:59 +00:00
Ralph Castain	5602156a1c	Use the correct abstraction layer name for the data dirs This commit was SVN r31684.	2014-05-08 14:32:24 +00:00
Jeff Squyres	81afb4e18a	hwloc: commit minor bug fix from hwloc git Bring down 3aa0ed6 from the hwloc v1.7 branch: Stevens says we should GETFD before we SETFD, so we do cmr=v1.8.2:reviewer=rhc This commit was SVN r31683.	2014-05-08 14:29:10 +00:00
Ralph Castain	11faab1091	The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees. This commit was SVN r31679.	2014-05-08 02:01:35 +00:00
Ralph Castain	a8e2d6c3a6	The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature: top_ompi_srcdir -> OMPI_TOP_SRCDIR top_ompi_builddir -> OMPI_TOP_BUILDDIR We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers. Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon. This commit was SVN r31678.	2014-05-07 21:48:53 +00:00
Ralph Castain	2b7a3ae601	Per RFC, continue pecking away at the build system renaming OMPI_CONFIG_SUBDIR -> OPAL_CONFIG_SUBDIR OMPI_CONFIG_SUBDIR_ARGS -> OPAL_CONFIG_SUBDIR_ARGS This commit was SVN r31647.	2014-05-06 16:27:38 +00:00
Ralph Castain	a1ae20fddb	Per RFC: OMPI_CFLAGS_BEFORE_PICKY -> OPAL_CFLAGS_BEFORE_PICKY - This line, and those below, will be ignored-- M opal/mca/event/libevent2021/configure.m4 M opal/mca/hwloc/hwloc172/configure.m4 M configure.ac M config/opal_setup_libltdl.m4 M config/opal_check_visibility.m4 M config/opal_setup_cc.m4 This commit was SVN r31637.	2014-05-05 22:22:33 +00:00
Ralph Castain	e20dae536c	Last step under current RFC: OMPI_CHECK_WITHDIR -> OPAL_CHECK_WITHDIR This commit was SVN r31585.	2014-05-01 15:38:07 +00:00
Ralph Castain	3b64c603b4	First stage of RFC to rename OMPI_foo build system support: change OMPI_CHECK_PACKAGE -> OPAL_CHECK_PACKAGE This commit was SVN r31582.	2014-05-01 14:24:56 +00:00
Jeff Squyres	67bb0c261a	hwloc: ensure that an internal fd is marked as close-on-exec Make sure that an internal, long-lived hwloc fd is marked as close-on-exec so that children don't inherit it. This patch is committed upstream in the hwloc master and v1.9 branches as 7489287 and b654e19, respectively. The patch applied here is the exact same logic, but the surrounding code changed slightly since the hwloc v1.7 series, so the patch doesn't apply cleanly. Refs trac:4550 This commit was SVN r31511. The following Trac tickets were found above: Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550	2014-04-23 21:36:38 +00:00
Jeff Squyres	82e104719a	hwloc/rmaps base: Add missing help message. Also, add missing ORTE_ERROR_LOG in the other case where this error message is used (i.e., ORTE_ERROR_LOG was used in the one place, so let's also use it in the other place). This commit was SVN r31321.	2014-04-07 15:39:54 +00:00
Jeff Squyres	37d4c22912	hwloc base: Remove unused help messages. These type of help messages are now displayed by the MCA var system itself (via enumerated values). This commit was SVN r31320.	2014-04-07 15:39:29 +00:00
Mike Dubman	3e81ee9f0d	BUILD: fix "make dist" failure on some linux distro with old csh/tcsh on some linux distro (sles11sp2) csh fails to parse $LS_COLORS and borks with error: Unknown colorls variable `mh'. The workaround is to unset LS_COLORS before calling to csh script reviewed by Jeff cmr=v1.8:reviewer=ompi-rm1.8 This commit was SVN r31244.	2014-03-27 06:34:00 +00:00
Ralph Castain	9fca25a8dd	Catch one more place where we need to use the actual topology instead of opal_hwloc_topology. Thanks to Tetsuya Mishima for the patch Reviewed okay. RM-approved cmr=v1.7.5:reviewer=ompi-gk1.7 This commit was SVN r31016.	2014-03-12 00:49:54 +00:00
Ralph Castain	081669b440	When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it cmr=v1.7.5:reviewer=rhc:subject=fix pretty-print of bindings This commit was SVN r30968.	2014-03-10 15:53:07 +00:00
Ralph Castain	4e32a82638	If we are binding to hwthreads, then we need to treat hwthreads as cpus to get the mapping right cmr=v1.7.5:reviewer=jsquyres:subject=set hwthreads to cpus when binding to them This commit was SVN r30648.	2014-02-09 16:14:38 +00:00
Ralph Castain	ca0c806662	Resolve the problem of binding in inverted topologies - check the relative depth of the map and bind objects in the topology, and let that determine whether we bind downward or upwards. cmr=v1.7.5:reviewer=jsquyres:subject=Resolve the problem of binding in inverted topologies This commit was SVN r30643.	2014-02-09 05:30:17 +00:00
Ralph Castain	193cceb483	Okay, since a certain other RM out there made a fuss about being able to lock their daemons to specified cores, offer the same option here. The MCA param orte_daemon_cores can be used to specify which core(s) you want the orte daemons to use. This will have no bearing on the application procs - unbound will remain unbound, and binding directives will be applied to the apps. Yippee skippee... This commit was SVN r30513.	2014-01-30 23:50:14 +00:00
Jeff Squyres	bc795cd25b	Fix typo in comment. This commit was SVN r30504.	2014-01-30 18:05:34 +00:00
George Bosilca	18ae20022a	Don't forget to release the bitmaps. This commit was SVN r30428.	2014-01-26 17:24:38 +00:00
Jeff Squyres	7768828d2d	Addendum to r30298: tweak the wording of the help messages a bit. Refs trac:4117. Please use this commit rather than the patch attached to the ticket; the patch had a few mistakes in the tweaked wording. This commit was SVN r30362. The following SVN revision numbers were found above: r30298 --> open-mpi/ompi@58479399c3 The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-01-22 12:17:14 +00:00
Jeff Squyres	afb33b8de8	Bring down upstream hwloc 438d9ed7457888c63d29778bda56cd27c52a8d51 to work around buggy NUMA node cpusets (i.e., buggy BIOSs). Thanks to Jeff Becker for reporting the issue. Submitted by Brice Goglin, reviewed by Jeff Squyres. cmr=v1.7.4:reviewer=ompi-rm1.7 This commit was SVN r30306.	2014-01-17 13:49:56 +00:00
Ralph Castain	58479399c3	As per RFC and telecon, deprecate cmd line options and their corresponding MCA params for old-style mapping and binding directives cmr=v1.7.5:reviewer=jsquyres:subject=deprecate old-style mapping and binding directives This commit was SVN r30298.	2014-01-15 14:48:39 +00:00
Jeff Squyres	962a14cf6d	Pull upstream hwloc commit 5198d4c0fd6cae12756fb44aed16a2d4a58b1a25 Change the logic in bind.c to only include <malloc.h> if we don't have posix_memalign. In http://www.open-mpi.org/community/lists/devel/2014/01/13619.php, Paul Hargrove found a compiler warning on OpenBSD where <malloc.h> exists, but is not intended to be used (and doesn't error out, so AC_CHECK_HEADERS says its ok). Reviewed by Brice Goglin. cmr=v1.7.4:reviewer=ompi-rm1.7 This commit was SVN r30234.	2014-01-10 17:37:00 +00:00
Ralph Castain	fb9e427320	One last corner case - when encountering an overload condition (e.g., by comm_spawning more procs than we have cores) and we are using the default binding policy, do not bind the new procs to anything as this can cause major problems. Instead, let the spawn succeed since the user didn't specifically ask to be bound, and leave the new procs as unbound. Refs trac:4077 This commit was SVN r30200. The following Trac tickets were found above: Ticket 4077 --> https://svn.open-mpi.org/trac/ompi/ticket/4077	2014-01-09 22:39:34 +00:00
Ralph Castain	f179f2086b	Do a better job of reporting bindings - if someone gives a spec that binds us to all processors, then we are effectively unbound and should report it clearly instead of outputting a long line of B's. cmr=v1.7.4:reviewer=jsquyres:subject=Do a better job of reporting bindings This commit was SVN r30179.	2014-01-09 16:16:16 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
Jeff Squyres	cbc9ee5894	Refs trac:4038 Revert r30096 and use better precious variable names. This commit was SVN r30132. The following SVN revision numbers were found above: r30096 --> open-mpi/ompi@e0f6a4ef47 The following Trac tickets were found above: Ticket 4038 --> https://svn.open-mpi.org/trac/ompi/ticket/4038	2014-01-07 15:58:40 +00:00
Ralph Castain	31248c0985	Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match. Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node. Refs trac:4003 This commit was SVN r30033. The following Trac tickets were found above: Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003	2013-12-20 20:42:39 +00:00
Jeff Squyres	802c89680a	Protect hwloc/configure/m4's use of some temporary shell variables Fix problem reported by Paul Hargrove: http://www.open-mpi.org/community/lists/devel/2013/12/13519.php cmr=v1.7.4:reviewer=brbarret This commit was SVN r30013.	2013-12-20 14:48:40 +00:00
Ralph Castain	55cd65b149	Don't warn about binding (process and/or memory) if the node cannot do it or if we would overload, but it wasn't specifically requested by the user (i.e., it is the result of the default policy). Instead, just don't bind and quietly move along. Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff. cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings This commit was SVN r29978.	2013-12-19 16:31:45 +00:00
Ralph Castain	8b6d117541	Per the OMPI devel conference that changed our default behaviors: * default to bind-to core * map-by slot if np=2 * map-by socket (balance across sockets on each node) if np > 2 * map-by <obj> will imply rank-by <obj> by default (leave default binding as above) Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values This commit was SVN r29919.	2013-12-15 17:25:54 +00:00
Jeff Squyres	3bd9c603ff	Clean up variables used in configure with OPAL_VAR_SCOPE. This is helpful in the work for #3694: ensure that many places that eventually end up in configure don't overly-pollute the global shell variable space (because debugging accidental shell variable pollution can be a real pain). Refs trac:3694 This commit was SVN r29830. The following Trac tickets were found above: Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694	2013-12-06 23:40:34 +00:00
Jeff Squyres	abeef55a55	Fix a few compiler warnings reported by clang: * Ensure "cnt" is always initialized * Ensure we dont' buffer overflow on strncat() -- need to ensure we account for the terminating \0 character * hwloc_get_type_depth() returns an int (not unsigned), and HWLOC_TYPE_DEPTH_UNKNOWN if it's unknown (which is probably <0, but still, might as well check what the official hwloc docs say to check for) cmr=v1.7.4:reviewer=rhc:subject=fix hwloc base compiler warnings This commit was SVN r29686.	2013-11-13 15:54:01 +00:00
Mike Dubman	840e2cb4a2	mindist: cosmetic, use fallback to byslot if unable to read NUMA info, small fix. fixed by Elena, reviewed by Ralph/Mike cmr=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r29679.	2013-11-13 09:26:40 +00:00
Ralph Castain	6ef7dc1f42	We previously weren't checking all the bits in locality to ensure we had a complete match - instead, we would report "local" to the specified level if only one bit matched. Ensure that a est for locality tests local to the specified level by checking that all bits match. cmr=v1.7.4:reviewer=hjelmn:subject=Ensure locality is properly tested This commit was SVN r29643.	2013-11-08 04:21:05 +00:00
Ralph Castain	75c306994e	Add some debug This commit was SVN r29523.	2013-10-26 02:26:21 +00:00
Ralph Castain	772a376d73	Correct location of elog file Refs trac:3847 This commit was SVN r29438. The following Trac tickets were found above: Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847	2013-10-14 19:21:45 +00:00
Ralph Castain	24c811805f	************************************************************** This change contains a non-mandatory modification of the MPI-RTE interface. Anyone wishing to support coprocessors such as the Xeon Phi may wish to add the required definition and underlying support ************************************************************** Add locality support for coprocessors such as the Intel Xeon Phi. Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host. So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following: 1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board 2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions 3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future. 4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time. 5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored. 6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set. cmr:v1.7.4:reviewer=hjelmn This commit was SVN r29435.	2013-10-14 16:52:58 +00:00

1 2 3 4 5

209 Коммитов