1
1
Граф коммитов

4528 Коммитов

Автор SHA1 Сообщение Дата
Adrian Reber
919260a0d2 fix communication between orte-checkpoint and orterun
Right after starting the communication with orterun the buffer
containing the message is deleted. This patch removes the deletion
of the buffer which is now done by orte_rml_send_callback(). This is
now also the callback function used by orte_rml.send_buffer_nb().
The previous callback hnp_receiver() was introduced by an
earlier patch which only was trying to get the code to compile again.

This commit was SVN r30405.
2014-01-24 17:18:28 +00:00
Adrian Reber
8c93ebffeb orte_snapc_base_select() wants to know if it is an application
The function

   int orte_snapc_base_select(bool seed, bool app);

wants to know if it called by an application or not. Therefore
it expects as second paremeter 'bool app'. It used to be
'!ORTE_PROC_IS_DAEMON' which is not always correct if it is
a tool or a HNP. This patch changes it to ORTE_PROC_IS_APP, which
has the correct information if it is an application.

This commit was SVN r30404.
2014-01-24 17:14:41 +00:00
Ralph Castain
14bf1c9463 Some minor cleanups:
* don't return null if someone wants to print ORTE_SUCCESS

* rename some stale process types

* keep show_help local if we are in standalone operation as there is nobody to send it to

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30400.
2014-01-23 21:35:20 +00:00
Ralph Castain
32996cd705 Add new sensors for chip frequency and power (when permissions allow) Note that we don't support all chipsets at this time, but others are welcome to extend as desired.
cmr=v1.7.5:reviewer=rhc

This commit was SVN r30399.
2014-01-23 21:33:21 +00:00
Ralph Castain
886fee9367 Properly set num_procs when np is not given, but cpus-per-proc is used. Thanks to Tetsuya Mishima for pointing it out
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30389.
2014-01-23 05:01:07 +00:00
Ralph Castain
a01470190d Allow a little more flexibility - if getpwuid fails, just use the return from getuid to define the session directory
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30388.
2014-01-23 05:00:05 +00:00
Ralph Castain
de07a64599 Cleanup the sensor code:
* use the global flags for linux and apple being found instead of re-doing the case statements

* update select procedure to ignore components that measure the same thing (e.g., resusage and sigar), taking the higher priority module

cmr=v1.7.5:reviewer=jsquyres:subject=Cleanup the sensor code

This commit was SVN r30368.
2014-01-22 21:01:09 +00:00
Jeff Squyres
7768828d2d Addendum to r30298: tweak the wording of the help messages a bit.
Refs trac:4117.  Please use this commit rather than the patch attached to
the ticket; the patch had a few mistakes in the tweaked wording.

This commit was SVN r30362.

The following SVN revision numbers were found above:
  r30298 --> open-mpi/ompi@58479399c3

The following Trac tickets were found above:
  Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-01-22 12:17:14 +00:00
Ralph Castain
e0edc29029 Add comment on future work
This commit was SVN r30336.
2014-01-20 19:54:31 +00:00
Ralph Castain
9b2066cfba Add two new sensor modules - one to monitor core temperatures, and the other to monitor resource usage using the sigar library
This commit was SVN r30335.
2014-01-20 19:35:48 +00:00
Ralph Castain
3e9c8497e0 Shift the verbose output a bit
Refs trac:4136

This commit was SVN r30332.

The following Trac tickets were found above:
  Ticket 4136 --> https://svn.open-mpi.org/trac/ompi/ticket/4136
2014-01-20 14:41:37 +00:00
Ralph Castain
5ad9795bd8 Cleanup some potential memory overruns
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30331.
2014-01-19 16:31:26 +00:00
Ralph Castain
9f6fd7b98d A few corrections to hostfile parsing - thanks to Tetsuya Mishima for the review
Refs trac:4136

This commit was SVN r30330.

The following Trac tickets were found above:
  Ticket 4136 --> https://svn.open-mpi.org/trac/ompi/ticket/4136
2014-01-19 16:26:12 +00:00
Ralph Castain
657796f9e0 Revert r30327 - turns out it isn't quite right just yet. :-(
Closes trac:4138

This commit was SVN r30328.

The following SVN revision numbers were found above:
  r30327 --> open-mpi/ompi@87d5f86025

The following Trac tickets were found above:
  Ticket 4138 --> https://svn.open-mpi.org/trac/ompi/ticket/4138
2014-01-18 23:38:39 +00:00
Ralph Castain
87d5f86025 Enable use of unix domain sockets for local OOB communications, thereby removing the requirement for an active network interface when running strictly on a single node. Update the overall OOB system to support cross-transport movement of messages so that the OOB can move a received message to another transport for transmission.
cmr=v1.7.5:reviewer=jsquyres:subject=Enable use of unix domain sockets for local OOB communications

This commit was SVN r30327.
2014-01-18 21:36:49 +00:00
Ralph Castain
fcdd904af4 Simplify and update hostfile handling to correctly support hostfiles that list nodes multiple times, once for each slot, and those that list a host once and include an explicit slot count. Eliminate support for mixing those two modes as this logic became just too complex when attempting to handle all the corner cases.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30325.
2014-01-18 16:08:40 +00:00
Ralph Castain
87f34860fe Protect array against crossing boundaries
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30316.
2014-01-17 21:36:20 +00:00
Mike Dubman
874c4e2558 PMI2: add missing file from prev commit
Refs trac:4119

This commit was SVN r30301.

The following Trac tickets were found above:
  Ticket 4119 --> https://svn.open-mpi.org/trac/ompi/ticket/4119
2014-01-16 13:17:08 +00:00
Mike Dubman
98234b5a69 SLURM/PMI2: Fix parsing of PMI2 process mapping
fixed by AlexM, reviewed by miked
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30300.
2014-01-16 12:05:29 +00:00
Ralph Castain
58479399c3 As per RFC and telecon, deprecate cmd line options and their corresponding MCA params for old-style mapping and binding directives
cmr=v1.7.5:reviewer=jsquyres:subject=deprecate old-style mapping and binding directives

This commit was SVN r30298.
2014-01-15 14:48:39 +00:00
Ralph Castain
590a87c730 You can't pass static buffer definitions to rml.send as it will attempt to release them upon completion - you need to send dynamically allocated buffers
This commit was SVN r30261.
2014-01-11 19:38:11 +00:00
Ralph Castain
286ff6d552 For large scale systems, we would like to avoid doing a full modex during MPI_Init so that launch will scale a little better. At the moment, our options are somewhat limited as only a few BTLs don't immediately call modex_recv on all procs during startup. However, for those situations where someone can take advantage of it, add the ability to do a "modex on demand" retrieval of data from remote procs when we launch via mpirun.
NOTE: launch performance will be absolutely awful if you do this with BTLs that aren't configured to modex_recv on first message!

Even with "modex on demand", we still have to do a barrier in place of the modex - we simply don't move any data around, which does reduce the time impact. The barrier is required to ensure that the other proc has in fact registered all its BTL info and therefore is prepared to hand over a complete data package. Otherwise, you may not get the info you need. In addition, the shared memory BTL can fail to properly rendezvous as it expects the barrier to be in place.

This behavior will *only* take effect under the following conditions:

1. launched via mpirun

2. #procs is greater than ompi_hostname_cutoff, which defaults to UINT32_MAX

3. mca param rte_orte_direct_modex is set to 1. At the moment, we are having problems getting this param to register properly, so only the first two conditions are in effect. Still, the bottom line is you have to *want* this behavior to get it.

The planned next evolution of this will be to make the direct modex be non-blocking - this will require two fixes:

1. if the remote proc doesn't have the required info, then let it delay its response until it does. This means we need a way for the MPI layer to tell the RTE "I am done entering modex data".

2. adjust the SM rendezvous logic to loop until the required file has been created

Creating a placeholder to bring this over to 1.7.5 when ready.

cmr=v1.7.5:reviewer=hjelmn:subject=Enable direct modex at scale

This commit was SVN r30259.
2014-01-11 17:36:06 +00:00
Ralph Castain
fb9e427320 One last corner case - when encountering an overload condition (e.g., by comm_spawning more procs than we have cores) and we are using the default binding policy, do *not* bind the new procs to anything as this can cause major problems. Instead, let the spawn succeed since the user didn't specifically ask to be bound, and leave the new procs as unbound.
Refs trac:4077

This commit was SVN r30200.

The following Trac tickets were found above:
  Ticket 4077 --> https://svn.open-mpi.org/trac/ompi/ticket/4077
2014-01-09 22:39:34 +00:00
Ralph Castain
24e990e747 Fix comm_spawn for oversubscribed systems by correctly computing the number of available slots
cmr=v1.7.4:reviewer=jsquyres:subject=Fix comm_spawn for oversubscribed systems

This commit was SVN r30197.
2014-01-09 20:33:48 +00:00
Ralph Castain
9fcb46d85a Correctly detect and handle oversubscription for comm_spawn
cmr=v1.7.4:reviewer=jsquyres:subject=Correctly detect and handle oversubscription for comm_spawn

This commit was SVN r30186.
2014-01-09 18:27:51 +00:00
Ralph Castain
6e5fedeb04 Oops - add verbose output to inform that cannot default bind due to no cores detected
Refs trac:4074

This commit was SVN r30185.

The following Trac tickets were found above:
  Ticket 4074 --> https://svn.open-mpi.org/trac/ompi/ticket/4074
2014-01-09 18:17:14 +00:00
Ralph Castain
4cdc291df1 Ensure slurm properly dies on abnormal termination
cmr=v1.7.4:reviewer=jsquyres:subject=Ensure slurm properly dies on abnormal termination

This commit was SVN r30182.
2014-01-09 16:52:02 +00:00
Jeff Squyres
87e476ebd8 Clean up many references to "rank": usually change to "process" and/or
specifically delineate that we're referring to the process' rank in
MPI_COMM_WORLD.

Refs trac:4068

This commit was SVN r30181.

The following Trac tickets were found above:
  Ticket 4068 --> https://svn.open-mpi.org/trac/ompi/ticket/4068
2014-01-09 16:37:49 +00:00
Ralph Castain
7e4748a0f1 Handle the case of nodes that do not report cores, and thus our default binding policy will fail even though binding is supported by defaulting to not binding on those nodes.
Thanks to Paul Hargrove for reporting the problem on NetBSD.

cmr=v1.7.4:reviewer=jsquyres:subject=Handle the case of nodes that do not report cores

This commit was SVN r30180.
2014-01-09 16:27:58 +00:00
Ralph Castain
f179f2086b Do a better job of reporting bindings - if someone gives a spec that binds us to all processors, then we are effectively unbound and should report it clearly instead of outputting a long line of B's.
cmr=v1.7.4:reviewer=jsquyres:subject=Do a better job of reporting bindings

This commit was SVN r30179.
2014-01-09 16:16:16 +00:00
Ralph Castain
2a0e4b5e62 Update the orterun help messages and man page to reflect new map/rank/bind options and defaults. Thanks to Paul Hargrove for reporting it.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30173.
2014-01-09 04:44:28 +00:00
Ralph Castain
bf453a2575 Reference the correct variable...sigh
Refs trac:4059

This commit was SVN r30163.

The following Trac tickets were found above:
  Ticket 4059 --> https://svn.open-mpi.org/trac/ompi/ticket/4059
2014-01-08 22:36:39 +00:00
Ralph Castain
80497d73cf Need to mark the daemon as alive so that exit commands are properly routed during abnormal terminations. Also, remove stale references to the "selected oob component" as we no longer require only one component be selected
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30162.
2014-01-08 22:35:48 +00:00
Ralph Castain
d5647394d8 Initialize variable so dash-host option gets correctly parsed
cmr=v1.7.4:reviewer=rolfv

This commit was SVN r30159.
2014-01-08 15:17:16 +00:00
Ralph Castain
e724d0d12d Ensure comm_spawn'd jobs get treated the same wrt setting default mapping directives
Refs trac:4059

This commit was SVN r30158.

The following Trac tickets were found above:
  Ticket 4059 --> https://svn.open-mpi.org/trac/ompi/ticket/4059
2014-01-08 15:16:22 +00:00
Ralph Castain
fb650aed0c Fix how we transfer mapping directives to the job, ensuring that directives that can be given outside of a mapping policy (e.g., oversubscribe and no-use-local) are retained.
cmr=v1.7.4:reviewer=jsquyres:subject=Fix how we transfer mapping directives to the job

This commit was SVN r30155.
2014-01-08 04:25:43 +00:00
Ralph Castain
bc75250951 Cleanup the sensor framework close - existing code was using incorrect object type. Don't start sensors if sample rate is zero. Don't add zero-byte data from resusage as it means nothing was measured.
cmr=v1.7.4:reviewer=hjelmn

This commit was SVN r30150.
2014-01-08 02:38:56 +00:00
Jeff Squyres
13b29cff2c This commit compliements/completes r30140. r30140 made all the
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:

 * pkgdatdir ->	ompidatadir
 * pkglibdir -> ompilibdir
 * pkgincludedir -> ompiincludedir

This commit was SVN r30145.

The following SVN revision numbers were found above:
  r30140 --> open-mpi/ompi@8b778903d8
2014-01-07 23:36:33 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Mike Dubman
40aadab85f re-enable map-by dist
after last refactoring in rmaps, map-by dist:hca  was disabled.
reverting it back

found/fixed by Elena, reviewed by miked

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30118.
2014-01-04 20:44:41 +00:00
Ralph Castain
9a855ff58e Update sensor component for new OOB calls
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30117.
2014-01-03 22:37:15 +00:00
Ralph Castain
3f2b3c53ea Ensure that rankfile-provided allocations are correctly handled
Fixes trac:4043

cmr=v1.7.4:reviewer=jsquyres:subject=Ensure that rankfile-provided allocations are correctly handled

This commit was SVN r30106.

The following Trac tickets were found above:
  Ticket 4043 --> https://svn.open-mpi.org/trac/ompi/ticket/4043
2014-01-02 16:07:16 +00:00
Ralph Castain
d5a5caa7e0 Restore the bycore mpirun option for backward compatibility
Refs trac:4044

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30103.

The following Trac tickets were found above:
  Ticket 4044 --> https://svn.open-mpi.org/trac/ompi/ticket/4044
2014-01-02 04:16:43 +00:00
Ralph Castain
a8a91b374e Update component-level selection comments to match latest revisions
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30087.
2013-12-25 19:12:43 +00:00
Ralph Castain
d049731911 Add pubsub pmi component to list of components to avoid when indirect launch used
Refs trac:4032

This commit was SVN r30083.

The following Trac tickets were found above:
  Ticket 4032 --> https://svn.open-mpi.org/trac/ompi/ticket/4032
2013-12-25 16:25:37 +00:00
Ralph Castain
85f2429819 Ensure the ipv6 lists get initialized and finalized
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30081.
2013-12-24 17:24:39 +00:00
Ralph Castain
2e08219cac Silence the valgrind report from the OOB
Refs trac:4033

This commit was SVN r30080.

The following Trac tickets were found above:
  Ticket 4033 --> https://svn.open-mpi.org/trac/ompi/ticket/4033
2013-12-24 17:06:45 +00:00
Ralph Castain
81df8d09ca Avoid use of PMI components when launched via mpirun as this is just unnecessary overhead that can cause confusion.
cmr=v1.7.4:reviewer=miked:subject=Avoid use of PMI components when launched via mpirun

This commit was SVN r30078.
2013-12-24 16:32:31 +00:00
Ralph Castain
01ee5f380b Remove debug - problem has been identified
Refs trac:4026

This commit was SVN r30075.

The following Trac tickets were found above:
  Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026
2013-12-24 15:22:18 +00:00
Jeff Squyres
ce02002a5e Free minor memory leak / squash valgrind still-reachable warning.
cmr=v1.7.5:reviewer=rhc

This commit was SVN r30071.
2013-12-24 11:04:38 +00:00
Ralph Castain
38f46641ce Ensure the recv handler has been initialized
Refs trac:4026

This commit was SVN r30068.

The following Trac tickets were found above:
  Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026
2013-12-24 06:09:45 +00:00
Ralph Castain
bb80625a8a Add missing var initialization
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30063.
2013-12-24 00:02:22 +00:00
Ralph Castain
65228d3571 Don't use "size_t" for the nbytes field in the header - use uint32_t to ensure that ntohl/htonl correctly match it
Refs trac:4026

This commit was SVN r30062.

The following Trac tickets were found above:
  Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026
2013-12-23 21:39:49 +00:00
Ralph Castain
7d8c0459a4 Attempt to debug hang that is hitting some environments. Posting to 1.7.4 as a placeholder for the eventual solution
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30060.
2013-12-23 19:57:05 +00:00
Nathan Hjelm
3be4536d9b Cleanup various leaks in ompi_info reported by valgrind.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30058.
2013-12-23 17:47:43 +00:00
George Bosilca
24879f9def Code cleanup while chasing valgrind complaints.
This commit was SVN r30048.
2013-12-21 23:28:14 +00:00
George Bosilca
38cbaeaa82 Try to impose a little bit of consistency on how we parse lists of
modules by enforcing the use of OPAL list accessors.

This commit was SVN r30045.
2013-12-21 23:23:33 +00:00
Ralph Castain
264150872b Add a bunch of debug output to the OOB connection completion code so we can track down a handshake problem. Available in optimized builds as well as debug ones by setting -mca oob_base_verbose 10
No review will be required as this is just debug code for those helping us debug the 1.7.4 release candidates

cmr-=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30043.
2013-12-21 16:09:26 +00:00
Ralph Castain
9c768df8b8 Resolve an unexpected behavior in hostfile allocations. Now that we filter allocations to determine what will be used for mapping, let the initial global pool be the union of nodes from all sources (default hostfile, hostfiles, and dash-hosts). Each app will filter down to only those specified for it using its own hostfile and dash-host options.
cmr=v1.7.4:reviewer=jsquyres:subject=Resolve an unexpected behavior in hostfile allocations

This commit was SVN r30040.
2013-12-21 01:38:27 +00:00
Adrian Reber
53a70fe87f Trying to get the C/R code to compile again. (send_*_nb)
This patch changes all send/send_buffer occurrences in the C/R code
to send_nb/send_buffer_nb.
The new code compiles but does not work.

Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED

Changes from V2:
* just replace the blocking calls with the non-blocking calls
* all #ifdef's introduced in V1 are gone
* send_* returns error code or ORTE_SUCCESS (not the number of bytes)

This commit was SVN r30036.
2013-12-20 21:58:28 +00:00
Adrian Reber
a3813d37c7 Trying to get the C/R code to compile again. (recv_*_nb)
This patch changes all recv/recv_buffer occurrences in the C/R code
to recv_nb/recv_buffer_nb.
The old code is still there but disabled using ifdefs (ENABLE_FT_FIXED).
The new code compiles but does not work.

Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED

Changes from V2:
* only #ifdef out the code where the behaviour is changed
  (used to be blocking; now non-blocking)

This commit was SVN r30035.
2013-12-20 21:05:40 +00:00
Ralph Castain
31248c0985 Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match.
Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node.

Refs trac:4003

This commit was SVN r30033.

The following Trac tickets were found above:
  Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003
2013-12-20 20:42:39 +00:00
Ralph Castain
71b52fe861 Ensure that comm_spawn'd procs get user-specified forwarded envars
Thanks to Tim Miller for reporting the regression from the 1.6 series

cmr=v1.7.4:reviewer=jsquyres:subject=Ensure that comm_spawn'd procs get user-specified forwarded envars

This commit was SVN r30012.
2013-12-20 14:47:35 +00:00
Ralph Castain
d47d2569f3 We stripped the process info packing routine to minimize message size when sending the launch message, but tools still require all the info. So modify the tool-hnp handshake to explicitly add the missing info
Refs trac:3992

This commit was SVN r29989.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 20:42:20 +00:00
Ralph Castain
55cd65b149 Don't warn about binding (process and/or memory) if the node cannot do it or if we would overload, but it wasn't specifically requested by the user (i.e., it is the result of the default policy). Instead, just don't bind and quietly move along.
Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff.

cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings

This commit was SVN r29978.
2013-12-19 16:31:45 +00:00
Ralph Castain
9b32dacb6c Ensure we don't abort if a tool cannot send a message - the orte/util/comm library used by tools to query mpirun knows how to handle this situation.
Refs trac:3992

This commit was SVN r29975.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 07:10:36 +00:00
Ralph Castain
6239e64f36 Further cleanup of orte-ps so it doesn't abort when hitting a stale HNP - only report that event once and just keep working.
Refs trac:3992

This commit was SVN r29974.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 03:28:05 +00:00
Ralph Castain
bf5e314f76 Tools require their own errmgr and state components so they can handle any errors that occur in, for example, communication .
Refs trac:3992

This commit was SVN r29972.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 01:49:33 +00:00
Ralph Castain
3aaca16faa Silence warnings that are no longer valid
Refs trac:3992

This commit was SVN r29970.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 00:40:36 +00:00
Ralph Castain
c5956e7b8c Convert debug output to opal_output_verbose
Thanks to Tetsuya Mishima for reporting it

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29969.
2013-12-19 00:36:15 +00:00
Ralph Castain
39957df08e Fixes trac:3963. Fix the tool ess procedure so it opens and selects the OOB framework, and have the OOB TCP module update the route to new connections (the routed modules know what to do).
Thanks to Dave Love and Ashley Pittman for pointing out the problem.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix tool communications with mpirun

This commit was SVN r29959.

The following Trac tickets were found above:
  Ticket 3963 --> https://svn.open-mpi.org/trac/ompi/ticket/3963
2013-12-18 23:13:46 +00:00
Ralph Castain
77553f72be Per this email thread:
http://www.open-mpi.org/community/lists/devel/2013/12/13412.php

fix the backtrace function to avoid async issues. Thanks to Takahiro Kawashima for the patch

This commit was SVN r29955.
2013-12-18 17:57:37 +00:00
Ralph Castain
ab4636c47b Per email on devel list, change the default rank-by to slot unless map-by <obj> is specified, in which case use rank-by <obj>
Refs trac:3977

This commit was SVN r29945.

The following Trac tickets were found above:
  Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977
2013-12-18 00:48:50 +00:00
Ralph Castain
53cd00fe16 By setting a default mapping/ranking/binding policy that wasn't "none", we introduced a problem for users of the Mac and any other machine where sockets aren't defined and/or binding is not supported. Fix that by checking to see if the user specified the failing policy - if not, then fall back to the old map/rank by slot and no binding.
Refs trac:3977

This commit was SVN r29933.

The following Trac tickets were found above:
  Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977
2013-12-17 14:50:10 +00:00
Adrian Reber
b42aad44a3 Trying to get the C/R code to compile again. This patch
includes various fixes all over the C/R code which are
hard to group like the other patches.

Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values

Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)

This commit was SVN r29922.
2013-12-16 15:35:28 +00:00
Ralph Castain
8b6d117541 Per the OMPI devel conference that changed our default behaviors:
* default to bind-to core 
* map-by slot if np=2
* map-by socket (balance across sockets on each node) if np > 2
* map-by <obj> will imply rank-by <obj> by default (leave default binding as above) 

Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs

cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values

This commit was SVN r29919.
2013-12-15 17:25:54 +00:00
Jeff Squyres
770bf77149 Fix some minor memory leaks in error code paths.
Many thanks to Tom Fogal for the patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix minor memory leaks in error code paths

This commit was SVN r29905.
2013-12-14 00:41:21 +00:00
Jeff Squyres
0ab48ad0d2 Fix some annoying flex warnings that have been there for years.
Many thanks to Tom Fogal for the initial patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings

This commit was SVN r29904.
2013-12-14 00:36:12 +00:00
Jeff Squyres
2e7653e4c2 Add missing argv.h includes.
Noticed these as part of #3694: external libevent's don't cause argv.h
to automatically get included.

Refs trac:3694

This commit was SVN r29897.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:17:36 +00:00
Brian Barrett
121ca26c59 Per discussion at Develoepr's Meeting, remove Solaris threads support. Solaris
will just fall back to pthreads, which should be no problem.

This commit was SVN r29893.
2013-12-13 20:07:11 +00:00
Ralph Castain
0e81959aae Cleanup mindist error messages - already patched in 1.7
This commit was SVN r29869.
2013-12-12 15:30:29 +00:00
Ralph Castain
1ff12362da Cleanup merge conflict that was incorrectly committed
This commit was SVN r29851.
2013-12-09 20:20:14 +00:00
Ralph Castain
83e59e6761 Once again, the Slurm folks have decided to redefine their envars, reversing what they had previously told us to do. So cleanup the Slurm allocation code, and also adjust to a change in srun behavior that now aborts a job if the ntasks-per-node doesn't get specified when ORTE calls it, but the user specified it when getting an allocation. Sigh.
cmr=v1.7.4:reviewer=miked:subject=Update Slurm allocation and launch

This commit was SVN r29849.
2013-12-09 17:58:46 +00:00
Mike Dubman
c208b858e7 improve error messages in mindist
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29846.
2013-12-09 06:34:38 +00:00
Ralph Castain
f2c49c6c19 Fix the map-by object mapper to handle cpus-per-proc by accounting for the request when computing the number of procs to put on each object. This ensures that the binding routine doesn't automatically overload the cores.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29843.
2013-12-08 16:59:25 +00:00
Ralph Castain
9604f36c3b Specify units for the job completion timeout
This commit was SVN r29839.
2013-12-08 04:51:58 +00:00
Ralph Castain
62c9e5c64c Really is better if we output a message indicating that the job was aborted due to hitting the execution time limit
Refs trac:3960

This commit was SVN r29833.

The following Trac tickets were found above:
  Ticket 3960 --> https://svn.open-mpi.org/trac/ompi/ticket/3960
2013-12-07 15:33:56 +00:00
Ralph Castain
d44e4a311f Per request from Dave Goodell, add support for MPIEXEC_TIMEOUT - if set in the environment, terminate the job after the specified number of seconds has passed. Equivalent to MPICH functionality.
cmr=v1.7.4:reviewer=dgoodell:subject=add support for MPIEXEC_TIMEOUT

This commit was SVN r29831.
2013-12-07 01:58:32 +00:00
Jeff Squyres
ed9aba3896 This patch fixes
error: void value not ignored as it ought to be 

in the C/R code by ignoring the return value of functions which 
no longer return a value (only void). 

Signed-off-by: Adrian Reber <adrian.reber@hs-esslingen.de>

This commit was SVN r29816.
2013-12-06 14:40:10 +00:00
Ralph Castain
fb59b6b875 Silence compiler warning when --disable-orte-static-ports
This commit was SVN r29783.
2013-12-03 01:53:31 +00:00
Ralph Castain
617a0edbb8 Fix hostfile parsing for the case where RMs count slots by listing the node multiple times. Thanks to Tetsuya Mishima for rep[orting the problem and providing a patch.
cmf=v1.7.4:reviewer=rhc

This commit was SVN r29748.
2013-11-24 16:17:52 +00:00
Ralph Castain
7c23a5ad65 Fix headers when building with ft enabled. Thanks to Adrian Reber for the patch!
This commit was SVN r29743.
2013-11-23 22:58:32 +00:00
Ralph Castain
7480beb7f0 Per request from Nathan, add an offset value to the job struct so we can construct a "global rank" that spans multiple jobs during dynamic launch operations. Store a new ORTE_DB_GLOBAL_RANK value for each process in the database, and ensure that we share our own value during connect_accept so both sides can see it.
This isn't being used yet - just enabling Nathan to do what he needs.

***** NOTE: any use of the OMPI_DB_GLOBAL_RANK database key must be protected by #ifdef OMPI_DB_GLOBAL_RANK as not all RTE's will define this key. *****

This commit was SVN r29708.
2013-11-14 17:01:43 +00:00
Ralph Castain
0f420f3676 Add a little debug
cmr:v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29705.
2013-11-14 04:22:59 +00:00
Ralph Castain
561c1830f7 Cleanup the radix MCA params - the max connections has no relationship to the max fd's on a node. It solely is used to improve the performance of small jobs by avoiding unnecessary inter-daemon routing.
Refs trac:3917

This commit was SVN r29704.

The following Trac tickets were found above:
  Ticket 3917 --> https://svn.open-mpi.org/trac/ompi/ticket/3917
2013-11-14 04:19:06 +00:00
Ralph Castain
540d38bc12 Per patch from Jeff, treat the case where someone transfers an archived file of an unknown type. This can't actually happen as this is on the receive end, and the error would have been treated and rejected on the send side. Still, practice defensive programming.
cmr:v1.7.4:reviewer=rhc:subject=Practice defensive programming

This commit was SVN r29699.
2013-11-13 23:49:26 +00:00
Jeff Squyres
f4e647538c mca_routed_radix_component.max_connections is unsigned; it can never be <0
cmr=v1.7.4:reviewer=hjelmn

This commit was SVN r29694.
2013-11-13 19:37:24 +00:00
Jeff Squyres
038116e4b8 orte_iof_base.output_limit is an unsigned type; just initialize it to INT_MAX
cmr=v1.7.4:reviewer=rhc

This commit was SVN r29693.
2013-11-13 19:36:43 +00:00
Mike Dubman
840e2cb4a2 mindist: cosmetic, use fallback to byslot if unable to read NUMA info, small fix.
fixed by Elena, reviewed by Ralph/Mike
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29679.
2013-11-13 09:26:40 +00:00
Ralph Castain
5b38259264 Ouch - remove an extraneous line.
Thanks to Tetsuya Mishima for reporting it

cmr=v1.7.4:reviewer=rhc:subject=Remove extraneous line from OOB

This commit was SVN r29677.
2013-11-13 04:02:05 +00:00
Ralph Castain
f1e510154c Revise the launch timeout detection so we don't mistakenly declare "failed to start". Recognize that timeout is at the per-job level, and define the timeout param as a total value instead of seconds/daemon as it otherwise can get to be an enormous (and useless) number.
Resolves problems in loop_spawn where the timer was incorrectly firing and killing the overall job.

cmr=v1.7.4:reviewer=hjelmn

This commit was SVN r29661.
2013-11-11 23:50:40 +00:00
Ralph Castain
46f633883b Correct the error check on rml.send
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29660.
2013-11-11 23:23:12 +00:00
Ralph Castain
e35ad23176 Correctly compute usage for dynamic spawns when binding is invoked. Ensure we correctly account for existing process usage on each node when computing bindings during dynamic spawns.
cmr=v1.7.4:reviewer=hjelmn:subject=Correctly compute usage for dynamic spawns when binding is invoked

This commit was SVN r29649.
2013-11-10 00:38:01 +00:00
Joshua Ladd
d594ffbfc7 Backing out Elena's patch - abstraction violation
This commit was SVN r29645.
2013-11-08 13:12:07 +00:00
Joshua Ladd
da3e272fdd Adds a check in the mindist mapper for whether or not the user asks for a specific device. This patch was submited by Elena Elkina and reviewed by Josh Ladd and should be added to
cmr=v1.7.4:reviewer=jladd

This commit was SVN r29644.
2013-11-08 04:28:53 +00:00
Brian Barrett
6d7a1fbb82 Move opal_portable_platform.h to opal/include/opal, which is where it really
should have been all along and fix one place that uses the file

Update opal_portable_platform.h with changes to mpi_portable_platform.h made 
in r29608.

Make mpi_portable_platform.h a symlink to opal_portable_platform.h, so that
they won't get out of sync.  I'd like to remove mpi_portable_platform.h, but
we don't automatically add -I${includedir}/openmpi/ to make that sane from
a header include point of view, so that's future work.

This commit was SVN r29618.

The following SVN revision numbers were found above:
  r29608 --> open-mpi/ompi@b71bd51cdd
2013-11-06 17:12:26 +00:00
Ralph Castain
fb0940a9d9 Add a couple of useful tests
This commit was SVN r29539.
2013-10-28 13:24:16 +00:00
Ralph Castain
8c5c7d0db4 Correct a bug in handling of oob_tcp_if_include/exclude addresses by using the kernel index instead of the raw index of the interface.
Refs trac:3696

This commit was SVN r29522.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-10-26 00:47:14 +00:00
Ralph Castain
604970a1a2 Initialize orte_coprocessors hash table to NULL. Delay coprocessor detection on HNP until after node topology final definition in case rmaps changes it. Minor spacing change.
Refs trac:3847

This commit was SVN r29504.

The following Trac tickets were found above:
  Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
2013-10-24 00:08:47 +00:00
Ralph Castain
f5920e9312 Revert r29489. This function only executes in the HNP. In orte/mca/ess/hnp/ess_hnp_module.c, we already check for local coprocessors and add them to the hash table if found. Thus, r29489 simply overwrote what was already present.
The data for each remote daemon is added later in the daemon callback function. Only the HNP retains info in the hash table.

If it is desirable to have each daemon retain its own coprocessor info, then this must be done in orte/mca/ess/base/ess_base_std_orted.c.

This commit was SVN r29497.

The following SVN revision numbers were found above:
  r29489 --> open-mpi/ompi@2e2794fa15
2013-10-23 22:35:24 +00:00
Nathan Hjelm
2e2794fa15 Fix coprocessor detection by always adding the local daemon's co-processors
to the hash table.

Tested and working on a system with 2 Xeon Phi co-processors.

cmr=v1.7.4:ticket=3847:reviewer=ompi-rm1.7

This commit was SVN r29489.

The following Trac tickets were found above:
  Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
2013-10-23 15:56:23 +00:00
Ralph Castain
7c86a843c8 Silence compiler warning
This commit was SVN r29477.
2013-10-23 04:13:36 +00:00
Ralph Castain
960a255e7f Do some cleanup of the --without-hwloc build - no need to work on coprocessors since we can't detect them anyway, cleanup some unused variables in the ppr mapper
This commit was SVN r29476.
2013-10-23 01:45:21 +00:00
Ralph Castain
25a84c7f0a Fix build --without-hwloc
This commit was SVN r29453.
2013-10-19 23:12:33 +00:00
Ralph Castain
b12167abef Per a good suggestion from Jeff, make the coprocessor mapping more scalable by using a hash table to cache the coprocessor list, and then do a single pass thru the nodes at the end to assign hostid's.
Refs trac:3847

This commit was SVN r29439.

The following Trac tickets were found above:
  Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
2013-10-14 22:01:48 +00:00
Ralph Castain
24c811805f ****************************************************************
This change contains a non-mandatory modification
       of the MPI-RTE interface. Anyone wishing to support
       coprocessors such as the Xeon Phi may wish to add
       the required definition and underlying support
****************************************************************

Add locality support for coprocessors such as the Intel Xeon Phi.

Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.

So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:

1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board

2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions

3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.

4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.

5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.

6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.

cmr:v1.7.4:reviewer=hjelmn

This commit was SVN r29435.
2013-10-14 16:52:58 +00:00
Ralph Castain
9902748108 ***** THIS INCLUDES A SMALL CHANGE IN THE MPI-RTE INTERFACE *****
Fix two problems that surfaced when using direct launch under SLURM:

1. locally store our own data because some BTLs want to retrieve 
   it during add_procs rather than use what they have internally

2. cleanup MPI_Abort so it correctly passes the error status all
   the way down to the actual exit. When someone implemented the
   "abort_peers" API, they left out the error status. So we lost
   it at that point and *always* exited with a status of 1. This 
   forces a change to the API to include the status.

cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch

This commit was SVN r29405.
2013-10-08 18:37:59 +00:00
Ralph Castain
9389592e05 Fix --without-hwloc build
This commit was SVN r29399.
2013-10-08 15:02:47 +00:00
Ralph Castain
2bd2284b93 Add a useful test and update another
This commit was SVN r29370.
2013-10-04 15:21:40 +00:00
Ralph Castain
697fb253fa Minor modification to test code
This commit was SVN r29357.
2013-10-04 03:11:31 +00:00
Ralph Castain
f4f2287958 Singletons currently start out by spawning an HNP - this is required solely in the cases where the singleton subsequently calls MPI_Comm_spawn or publishes port info without support from an external orte-server. In all other cases, the HNP is of no value and can actually be a detriment by creating additional overhead on the node. This is particularly concerning for async operations where processes may begin as singletons and then dynamically wireup to perform pt2pt communications.
So we now allow singletons to start on their own, only spawning an HNP when initiating an operation that actually requires it.

cmr:v1.7.4:reviewer=jsquyres

This commit was SVN r29354.
2013-10-04 02:58:26 +00:00
Nathan Hjelm
11722457ce Fix typo in grpcomm_pmi_module.c that was giving the wrong locality for direct launched jobs. Refs trac:3824
This commit was SVN r29348.

The following Trac tickets were found above:
  Ticket 3824 --> https://svn.open-mpi.org/trac/ompi/ticket/3824
2013-10-03 14:38:45 +00:00
Ralph Castain
2121e9c01b Fix an issue regarding use of PMI when running processes and tools that don't need or want to use it. We build PMI support based on configuration settings and library availability.
However, tools such as mpirun don't need it, and definitely shouldn't be using it. Ditto for procs launched by mpirun.

We used to have a way of dealing with this - we had the PMI component check to see if the process was the HNP or was launched by an HNP. Sadly, moving the OPAL db framework removed
 that ability as OPAL has no notion of HNPs or proc type.

So add a boolean flag to the db_base_select API that allows us to restrict selection to "local" components. This gives the PMI component the ability to reject itself as required. W
e then need to pass that param into the ess_base_std_app call so it can pass it all down.

This commit was SVN r29341.
2013-10-02 19:03:46 +00:00
Ralph Castain
5ec422dbc1 Correctly compute num local peers when launched via mpirun
This commit was SVN r29327.
2013-10-02 01:46:09 +00:00
Ralph Castain
71a24d6e74 Add some debug
This commit was SVN r29326.
2013-10-02 01:37:02 +00:00
Ralph Castain
c6b7d9d027 Fix variable declaration
This commit was SVN r29324.
2013-10-02 01:08:51 +00:00
Ralph Castain
fcb381c2e2 Minor cleanup - should behave the same, but just cleanup the variable names to avoid confusion
This commit was SVN r29323.
2013-10-02 00:10:36 +00:00
Ralph Castain
d565a76814 Do some cleanup of the way we handle modex data. Identify data that needs to be shared with peers in my job vs data that needs to be shared with non-peers - no point in sharing extra data. When we share data with some process(es) from another job, we cannot know in advance what info they have or lack, so we have to share everything just in case. This limits the optimization we can do for things like comm_spawn.
Create a new required key in the OMPI layer for retrieving a "node id" from the database. ALL RTE'S MUST DEFINE THIS KEY. This allows us to compute locality in the MPI layer, which is necessary when we do things like intercomm_create.

cmr:v1.7.4:reviewer=rhc:subject=Cleanup handling of modex data

This commit was SVN r29274.
2013-09-27 00:37:49 +00:00
Ralph Castain
6522963b9c Flag that a daemon has been launched when it reports back to the HNP so we avoid re-launching it on spawns against dynamic allocations
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29245.
2013-09-25 16:58:19 +00:00
Ralph Castain
23c8848157 Only connect the first time thru the Torque launch, remove stale code
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29227.
2013-09-22 23:53:57 +00:00
Ralph Castain
400c68ed0f Fix a segfault when a topology file is given to use in place of the one detected by mpirun itself. In that situation, the rmaps framework replaces the opal_hwloc_topology structure - but since that occurs *after* mpirun has set the node->topology field, we lose that definition. So don't set the node->topology field until after the rmaps framework has been opened.
Does not need to go to 1.7 branch as that ordering is different.

-This line, and those below, will be ignored--

M    orte/mca/ess/hnp/ess_hnp_module.c

This commit was SVN r29225.
2013-09-21 19:47:41 +00:00
Jeff Squyres
758cd25fff Move the MCA / MPI_T level of the LAMA component down to 5 (from 9).
This commit was SVN r29214.
2013-09-20 15:23:27 +00:00
George Bosilca
273d66d0f2 The MPI_Intercomm_create test was broken, as the remote peer was
always considered as being 1 (instead of count).

This commit was SVN r29207.
2013-09-18 16:47:54 +00:00
Ralph Castain
865a7028f8 Per patch from George, with a few minor cleanups. Correctly address the complete exchange of required wireup information in Intercomm_create so all procs in the resulting communicator know how to talk to each other.
Refs trac:29166

This commit was SVN r29200.

The following Trac tickets were found above:
  Ticket 29166 --> https://svn.open-mpi.org/trac/ompi/ticket/29166
2013-09-18 02:01:30 +00:00
Ralph Castain
99611ac1d2 Revert r29166 in favor of a better solution from George
This commit was SVN r29199.

The following SVN revision numbers were found above:
  r29166 --> open-mpi/ompi@497c7e6abb
2013-09-18 01:41:26 +00:00
George Bosilca
9e6c3c0646 Save the error code.
This commit was SVN r29196.
2013-09-17 23:50:11 +00:00
Ralph Castain
2680bff88e The function orte_iof_base_setup_prefork attempts to create a pty for
child stdout and falls back to plain pipe if openpty fails. Child uses
the 'usepty' flag to decide whether to treat this descriptor as a pty
or as a pipe.
Set 'usepty' flag to 0 upon openpty failure to inform the child that
it isn't dealing with a pty even though pty has been requested.


Thanks to Michal Peclo for reporting it and providing a patch.

cmr:v1.7.3:reviewer=jsquyres
cmr:v1.6.6:reviewer=jsquyres

This commit was SVN r29169.
2013-09-15 15:33:51 +00:00
Ralph Castain
b64c8dafd8 Cleanup some errors in pubsub - must set the active flag before posting the recv in case the message has already arrived
Refs trac:3696

This commit was SVN r29167.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-09-15 15:26:32 +00:00
Ralph Castain
497c7e6abb Fixes trac:2904
The intercomm "merge" function can create a linkage between procs that was not reflected anywhere in a modex, and so at least some of the procs in the resulting communicator don't know how to talk to some of the new communicator's peers.

For example, consider the case where:

1. parent job A comm_spawns a process (job B) - these processes exchange modex and can communicate

2. parent job A now comm_spawns another process (job C) - again, these can communicate, but the proc in C knows nothing of B

3. do an intercomm merge across the communicators created by the two comm_spawns. This puts B and C into the same communicator, but they know nothing about how to talk to each other as they were not involved in any exchange of contact info. Hence, collectives on that communicator now fail. 

This fix adds an API to the ompi/dpm framework that (a) exchanges the modex info across the procs in the merge to ensure all procs know how to communicate, and (b) calls add_procs to give the btl's a chance to select transports to any new procs.

cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29166.

The following Trac tickets were found above:
  Ticket 2904 --> https://svn.open-mpi.org/trac/ompi/ticket/2904
2013-09-15 15:00:40 +00:00
Ralph Castain
eb132f923b Check for bozo error of negative np for an app as this will cause ORTE to spin forever.
cmr:v1.7.3:reviewer=jsquyres:subject=Check for negative np
cmr:v1.6.6:reviewer=jsquyres:subject=Check for negative np

This commit was SVN r29157.
2013-09-11 19:21:22 +00:00
Ralph Castain
2a116ecdfc Fix a race condition created when two processes attempt to send to each other at the same time. This causes both processes to start connection procedures, resulting in a c
onflict that can cause messages to be lost. Add detection of this condition, and have both processes cancel their connect operations. The process with the higher rank will
 reconnect, while the lower rank process will simply wait for the connection to be created.

Refs trac:3696

This commit was SVN r29139.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-09-06 05:15:25 +00:00
Ralph Castain
e8697de521 Deal with PGI compilers on the Mac by initializing a global variable.
cmr:v1.6.6:reviewer=jsquyres
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29129.
2013-09-05 21:40:50 +00:00
Ralph Castain
13ae51a91b Protect against possible race conditions and threads by ensuring that rml send always occurs inside an event.
cmr:v1.7.4:reviewer=jsquyres:subject=Protect against race conditions in rml send

This commit was SVN r29128.
2013-09-05 01:16:32 +00:00
Ralph Castain
d32dfc96be Use the rankfile to obtain list of nodes for VM launch if/when rankfile is given.
cmr:v1.7.3:reviewer=jsquyres:subject=Obtain VM nodes from rankfile

This commit was SVN r29119.
2013-09-04 16:37:30 +00:00
Ralph Castain
d9f0505952 Fix the lama verbose outputs so they don't segfault if someone asks for verbose output, but isn't using lama
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29108.
2013-09-03 17:55:35 +00:00
Ralph Castain
2bfa99e945 If a rankfile is given and the number of procs not specified in the mpirun cmd line, then set the number of procs to the number of ranks in the rankfile
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29104.
2013-09-02 15:04:40 +00:00
Ralph Castain
43d1cd92ac Ensure we activate the "daemons launched" state when only the HNP is left or else we will hang.
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29094.
2013-08-29 22:50:51 +00:00
Dave Goodell
d17f104e7a oob: squash some valgrind warnings
These warnings were harmless, but they appeared even for simple programs
like single-process runs of `ring_c`.

This commit was SVN r29093.
2013-08-29 21:08:44 +00:00
Ralph Castain
12d4f45b5e Silence warning:
oob_tcp_connection.c: In function 'mca_oob_tcp_peer_accept':
oob_tcp_connection.c:725:9: warning: variable 'cmpval' set but not used [-Wunused-but-set-variable]

Refs trac:3696

This commit was SVN r29091.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-08-29 20:56:05 +00:00
Ralph Castain
7a7cfdd519 A little cleanup - the base function to sort numa lists must return something or you get a warning about non-void function returning without value, so cleanup the return values. Ensure the mindist module actually checks for a return of "error" so it won't segfault, and have it emit a polite message when that happens.
cmr:v1.7.3:reviewer=jladd

This commit was SVN r29089.
2013-08-29 20:01:06 +00:00
Ralph Castain
c71e760e6c The modex code was unfortunately written solely for PMI1 when updated to minimize calls to PMI_get - add the required PMI2 code
This commit was SVN r29084.
2013-08-28 23:52:32 +00:00
Joshua Ladd
1802aabf1a Add support for autodetecting a MLNX HCA in the rmaps min distance feature. In this way, .ini files distributed with software stacks need not specify a particular HCA but instead may select the key word auto which will automatically select the discovered device. To use this feature, simply pass the keyword auto instead of a specific device name, --mca rmaps_base_dist_hca auto. If more than one card is installed, the mapper will inform the user of this and, at this point, the user will then need to specify which card via the normal route, e.g. --mca rmaps_base_dist_hca <dev_name>. This should be added to \ncmr=v1.7.4:reviewer=rhc:subject=Autodetect logic for min dist mapping
This commit was SVN r29079.
2013-08-28 16:23:33 +00:00
Ralph Castain
7125143253 Replace missing opal_db open/select that was apparently lost on a prior merge. Thanks to Nathan for pointing it out
This commit was SVN r29072.
2013-08-27 19:42:31 +00:00
George Bosilca
65a362909d Can't see how it works ...
Thanks Thomas and Arm for the patch.

This commit was SVN r29066.
2013-08-27 16:52:24 +00:00
Ralph Castain
c9a25465da Don't need the number of nodes any more for PMI
Refs trac:3729

This commit was SVN r29064.

The following Trac tickets were found above:
  Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729
2013-08-23 18:36:51 +00:00
Ralph Castain
6d24b34940 Extend the dpm framework API to support persistent accept/connect operations:
* paccept - establish a persistent listening port for async connect requests

* pconnect - async connect to remote process that has posted a paccept port. Provides a timeout mechanism, and allows the underlying implementation to retry until timeout 

* pclose - shuts down a prior paccept posting

Includes example programs paccept.c and pconnect.c in orte/test/mpi. New MPI extension interfaces coming...

This commit was SVN r29063.
2013-08-23 18:02:50 +00:00
Ralph Castain
a200e4f865 As per the RFC, bring in the ORTE async progress code and the rewrite of OOB:
*** THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE ***

Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro.

***************************************************************************************

I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week.

The code is in  https://bitbucket.org/rhc/ompi-oob2


WHAT:    Rewrite of ORTE OOB

WHY:       Support asynchronous progress and a host of other features

WHEN:    Wed, August 21

SYNOPSIS:
The current OOB has served us well, but a number of limitations have been identified over the years. Specifically:

* it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code)

* we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface.

* the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients

* there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort

* only one transport (i.e., component) can be "active"


The revised OOB resolves these problems:

* async progress is used for all application processes, with the progress thread blocking in the event library

* each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on")

* multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC.

* a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions.

* opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object

* NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions

* obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel

* the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport

* routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active

* all blocking send/recv APIs have been removed. Everything operates asynchronously.


KNOWN LIMITATIONS:

* although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline

* the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker

* routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways

* obviously, not every error path has been tested nor necessarily covered

* determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when *all* transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost.

* reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways

* the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC

This commit was SVN r29058.
2013-08-22 16:37:40 +00:00
Ralph Castain
63d10d2d0d Fix typo
Refs trac:3729

This commit was SVN r29057.

The following Trac tickets were found above:
  Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729
2013-08-22 16:05:58 +00:00
Ralph Castain
16c5b30a1f Since the calls to "PMI get" scale by number of procs (not nodes), it makes more sense to have the MCA param be the cutoff based on number of procs. Also, it occurred to me that this shouldn't impact the nidmap process as that is built and circulated when we launch via mpirun, not during direct launch.
So shift the cutoff param to the MPI layer, and have it solely determine whether or not we call modex_recv on the hostname. If comm_world is of size greater than the cutoff, then we don't automatically retrieve the hostname when we build the ompi_proc_t for a process - instead, we fill the hostname entry on first call to modex_recv for that process.

The param is now "ompi_hostname_cutoff=N", where N=number of procs for cutoff.

Refs trac:3729

This commit was SVN r29056.

The following Trac tickets were found above:
  Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729
2013-08-22 03:40:26 +00:00
Ralph Castain
45e695928f As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time:
* add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit.

* remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL"

* modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded

* removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base

* added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames

This commit was SVN r29052.
2013-08-20 18:59:36 +00:00
Ralph Castain
9aebd7e281 Ensure we register the nidmap verbosity in mpirun, and add some debug
This commit was SVN r29042.
2013-08-18 23:40:32 +00:00
Ralph Castain
611d7f9f6b When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require.
This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times.

Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes:

* upon first request for data, have the OPAL db pmi component fetch and decode *all* the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally

* reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test

* reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued).

Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it

Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time.

This commit was SVN r29040.
2013-08-17 00:49:18 +00:00
Ralph Castain
b2d86e1857 Silence uninitialized var warning
This commit was SVN r29034.
2013-08-16 21:35:51 +00:00
Ralph Castain
b34bff8792 Cleanup warning
This commit was SVN r29032.
2013-08-16 21:14:35 +00:00
Ralph Castain
bebe852057 Add new info key for publish that allows user to designate that the port is to be unique - i.e., to return an error if that service has already been published. Default is to overwrite
This commit was SVN r29028.
2013-08-14 04:21:17 +00:00
Ralph Castain
72b5e867ab Correct shutdown ordering - rml must go last
This commit was SVN r29027.
2013-08-14 04:20:17 +00:00
Ralph Castain
8a4c5f4957 Attempt to plug a few memory leaks by ensuring we finalize all things opened during init. However, we are still leaking memory like a sieve in param registration and hwloc.
This commit was SVN r29026.
2013-08-14 02:03:00 +00:00
Nathan Hjelm
b2e773ece3 Fix debugger support for direct-launched jobs.
The orte rte component checks the orte_standalone_operation to decide
if it should wait for a message from the hnp or wait on the debugger.
This variable needed to be set to true in ess/pmi to enable the
correct path when direct launching.

cmr=v1.7.3:reviewer=rhc
cmr=v1.6.6:reviewer=rhc

This commit was SVN r29013.
2013-08-09 22:39:41 +00:00
Nathan Hjelm
841ed962f6 fix MCA variable and component system leaks
cmr=v1.7.3:reviewer=rhc

This commit was SVN r29011.
2013-08-09 19:50:28 +00:00
Nathan Hjelm
88cadc552d Make opal/db/pmi use as few PMI keys as possible.
This commit reintroduces key compression into the pmi db. This feature
compresses the keys stored into the component into a small number of
PMI keys by serializing the data and base64 encoding the result. This
will avoid issues with Cray PMI which restricts us to ~ 3 PMI keys per
rank.

This commit was SVN r28993.
2013-08-03 01:06:59 +00:00
Ralph Castain
285429a1c6 Remove release of buffer - non-blocking send callback will do it
This commit was SVN r28985.
2013-08-02 03:49:17 +00:00
Ralph Castain
37db1727a2 Refs trac:3710
Simplify the whole stripping of prefix method by consolidating it into a single MCA param. Allow for multiple prefixes to be stripped, each separated in the param by a comma. If no prefix is given, or the specified prefix isn't in the nodename, then just use the hostname itself.

This commit was SVN r28974.

The following Trac tickets were found above:
  Ticket 3710 --> https://svn.open-mpi.org/trac/ompi/ticket/3710
2013-08-01 00:32:10 +00:00
Nathan Hjelm
83a3fc2fd2 Add an option to control which hostnames orte_strip_prefix_from_node_names works
on.

This corrects a problem with Cray systems where the login node's hostname
was being stripped causing the login node to be used as a compute node by
mpirun.

cmr=v1.7.3:reviewer=rhc

This commit was SVN r28970.
2013-07-31 18:42:02 +00:00
Nathan Hjelm
ebbb32120a MCA/base: variable system updates
- Use an enumerator to handle bool values.

 - Fix a leak in the variable enumerator.

 - Fix a leak in an orte parameter.

This commit was SVN r28949.
2013-07-25 15:42:01 +00:00
Ralph Castain
6c1a140e99 Per request from Nathan, add a "commit" API to the opal db framework. This allows him to aggregate keys to work around the Cray's severe PMI limitations
This commit was SVN r28917.
2013-07-22 22:57:16 +00:00
Ralph Castain
5d12ab3873 Ensure we always set num_local_peers for both PMI2 and PMI1
This commit was SVN r28860.
2013-07-19 04:34:58 +00:00
Ralph Castain
b033a6b6d6 One last Cray-inspired fix...
Refs trac:3685

This commit was SVN r28857.

The following Trac tickets were found above:
  Ticket 3685 --> https://svn.open-mpi.org/trac/ompi/ticket/3685
2013-07-19 03:04:00 +00:00
Ralph Castain
92cb93b21e Remove set-but-unused variable
Refs trac:3685

This commit was SVN r28855.

The following Trac tickets were found above:
  Ticket 3685 --> https://svn.open-mpi.org/trac/ompi/ticket/3685
2013-07-19 01:42:35 +00:00
Ralph Castain
bc2586cf3c Refs trac:3685. Check error code returned by PMI2_Info_GetJobAttr.
This commit was SVN r28854.

The following Trac tickets were found above:
  Ticket 3685 --> https://svn.open-mpi.org/trac/ompi/ticket/3685
2013-07-19 01:24:51 +00:00
Ralph Castain
a10546d5c1 Cleanup and rename of platform files
This commit was SVN r28853.
2013-07-19 01:18:41 +00:00
Ralph Castain
e4e678e234 Per the RFC and discussion on the devel list, update the RTE-MPI error handling interface. There are a few differences in the code from the original RFC that came out of the discussion - I've captured those in the following writeup
George and I were talking about ORTE's error handling the other day in regards to the right way to deal with errors in the updated OOB. Specifically, it seemed a bad idea for a library such as ORTE to be aborting the job on its own prerogative. If we lose a connection or cannot send a message, then we really should just report it upwards and let the application and/or upper layers decide what to do about it.

The current code base only allows a single error callback to exist, which seemed unduly limiting. So, based on the conversation, I've modified the errmgr interface to provide a mechanism for registering any number of error handlers (this replaces the current "set_fault_callback" API). When an error occurs, these handlers will be called in order until one responds that the error has been "resolved" - i.e., no further action is required - by returning OMPI_SUCCESS. The default MPI layer error handler is specified to go "last" and calls mpi_abort, so the current "abort" behavior is preserved unless other error handlers are registered.

In the register_callback function, I provide an "order" param so you can specify "this callback must come first" or "this callback must come last". Seemed to me that we will probably have different code areas registering callbacks, and one might require it go first (the default "abort" will always require it go last). So you can append and prepend, or go first. Note that only one registration can declare itself "first" or "last", and since the default "abort" callback automatically takes "last", that one isn't available. :-)

The errhandler callback function passes an opal_pointer_array of structs, each of which contains the name of the proc involved (which can be yourself for internal errors) and the error code. This is a change from the current fault callback which returned an opal_pointer_array of just process names. Rationale is that you might need to see the cause of the error to decide what action to take. I realize that isn't a requirement for remote procs, but remember that we will use the SAME interface to report RTE errors internal to the proc itself. In those cases, you really do need to see the error code. It is legal to pass a NULL for the pointer array (e.g., when reporting an internal failure without error code), so handlers must be prepared for that possibility. If people find that too burdensome, we can remove it.

Should we ever decide to create a separate callback path for internal errors vs remote process failures, or if we decide to do something different based on experience, then we can adjust this API.

This commit was SVN r28852.
2013-07-19 01:08:53 +00:00
Ralph Castain
6c50c8167c Fix pmi-1 compile when no pmi2 is present
This commit was SVN r28849.
2013-07-18 22:45:08 +00:00
Ralph Castain
256034a3dc Sigh - fix a couple of spots I missed
Refs trac:3683

This commit was SVN r28843.

The following Trac tickets were found above:
  Ticket 3683 --> https://svn.open-mpi.org/trac/ompi/ticket/3683
2013-07-18 19:07:16 +00:00
Ralph Castain
fc3b777ef5 Cleanup a variable that isn't used if pmi2 support is available
Refs trac:3683

This commit was SVN r28841.

The following Trac tickets were found above:
  Ticket 3683 --> https://svn.open-mpi.org/trac/ompi/ticket/3683
2013-07-18 17:19:13 +00:00
Ralph Castain
92c6b806b9 Based on a patch submitted by Piotr Lesnicki of Bull, cleanup the PMI2 support. This has not been tested yet on multiple environments (e.g., Cray), so it needs more evaluation prior to moving to the 1.7 branch.
cmr:v1.7.3:reviewer=rhc

This commit was SVN r28837.
2013-07-18 14:46:07 +00:00
Ralph Castain
10ca1c1b04 Turns out that there was exactly ONE place in all of the OMPI code base that still referred to OPAL_TRACE, though a few places retained the include file for no reason. So no point in letting this sit as it is clearly an unused "feature".
This commit was SVN r28789.
2013-07-14 18:57:20 +00:00
Ralph Castain
563bf60fb8 Fix an ordering problem that crept in due to the change in MCA param system. One MCA param would set a value, and then we did a hard reset of that value before testing another MCA param, thus removing a critical value for proper operation of the first param.
cmr:v1.7.3:reviewer=brbarret

This commit was SVN r28788.
2013-07-14 16:22:13 +00:00
Ralph Castain
7a21661785 Silence a warning when --without-hwloc is used
This commit was SVN r28783.
2013-07-13 17:17:17 +00:00
Dave Goodell
3741d62308 fix --without-hwloc build failure
All builds since r28682 configured with '--without-hwloc' fail at "make"
time without this fix.

Reviewed by rhc@

This commit was SVN r28769.

The following SVN revision numbers were found above:
  r28682 --> open-mpi/ompi@446e33a5d8
2013-07-12 17:21:14 +00:00
Jeff Squyres
baa3182794 Per RFC
(http://www.open-mpi.org/community/lists/devel/2013/07/12534.php),
remove a bunch of dead code.

This commit was SVN r28756.
2013-07-11 17:34:28 +00:00
Ralph Castain
028f5ee7a6 Cleanup some bitrot from moving the db framework to opal and from the new mca param system
This commit was SVN r28741.
2013-07-09 14:37:08 +00:00
Ralph Castain
62378209f0 Even if we don't find the default hostfile, and nothing else was provided, then use all the known nodes.
cmr:v1.7.3:#3653:reviewer=jsquyres
cmr:v1.6.6:#3654:reviewer=jsquyres

This commit was SVN r28718.
2013-07-03 22:31:32 +00:00
Ralph Castain
443a6802b9 If the default hostfile is empty, we need to pickup all the known nodes, not just the head node.
cmr:v1.7.3:reviewer=jsquyres
cmr:v1.6.6:reviewer=jsquyres

This commit was SVN r28717.
2013-07-03 22:25:51 +00:00
Ralph Castain
446e33a5d8 There are cases where we want to use the novm state machine, but the backend node topology differs from that where mpirun is executing. In those cases, we can wind up thinking we are oversubscribed because the head node has fewer cores than the compute nodes.
To resolve this situation, add the ability to specify a backend topology file that mpirun shall use for its mapping operations. Create a new "set_topology" function in opal hwloc to support it.

This commit was SVN r28682.
2013-06-27 03:04:50 +00:00
Joshua Ladd
0b5c1f2ea8 Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28666.
2013-06-21 15:28:14 +00:00
Nathan Hjelm
299d5b3dd7 Fix two debugger attach bugs.
- orte_debugger_init_after_spawn was not being called for debuggers that
   use the MPIR_attach_fifo to co-locate debugger daemons.
 - MPIR_Breakpoint was not getting called if a debugger reattached. Add
   a job state (ORTE_JOB_STATE_DEBUGGER_DETACH) to reset mpir_breakpoint_fired
   to false when a debugger detaches to ensure MPIR_Breakpoint is called if
   another debugger attaches. Tested with STAT 2.0/launchmon 1.0.

cmr:v1.7

This commit was SVN r28665.
2013-06-20 16:18:05 +00:00
Jeff Squyres
b9ca8e3cd1 Tweaked the help message a bit (this is the end result of iterating on
the message in email between Mike, Ralph, Jeff).

Add this to CMR #3642 and #3643.

This commit was SVN r28662.
2013-06-20 13:19:23 +00:00
Ralph Castain
13665bffe8 Per an off-list discussion, it appears possible for a system to report failure when executing getpwuid. There are several reasons for this error to occur, most notably if the system uses a network-based authentication protocol (e.g., NIS) and that sytem gets overwhelmed when we launch on a lot of nodes.
There is no good way to recover from this scenario, and from past experience, using the user's name in the session directory (as opposed to the uid) is very helpful when things go wrong. So print a help message when this happens (it is extremely rare, but has happened at least once now) and return an error.

cmr:v1.7.3,reviewer=jsquyres
cmr:v1.6.5,reviewer=jsquyres

This commit was SVN r28658.
2013-06-20 04:30:42 +00:00
Ralph Castain
a51a0a8c48 Fix uninitialized var
This commit was SVN r28652.
2013-06-18 22:41:47 +00:00
George Bosilca
b4ebc417a1 Correctly register the component MCA parameters.
Few cleanups in the includes.

This commit was SVN r28645.
2013-06-15 16:05:09 +00:00
Nathan Hjelm
518d1fe200 Fix two typos that prevented alps direct launch from working
This commit was SVN r28628.
2013-06-13 17:04:08 +00:00
Joshua Ladd
61ffb47573 Minor fix for the min-dist mapping algorithm: we need to call 'get_nbobjs_by_type' first, before we get the sorted list of nodes - we need to add node objects and fill them in the summary object for the current topology. This patch was submitted by Elena Elkina and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28578.
2013-05-31 15:19:59 +00:00
Jeff Squyres
6d173af329 This commit introduces a new "mindist" ORTE RMAPS mapper, as well as
some relevant updates/new functionality in the opal/mca/hwloc and
orte/mca/rmaps bases.  This work was mainly developed by Mellanox,
with a bunch of advice from Ralph Castain, and some minor advice from
Brice Goglin and Jeff Squyres.

Even though this is mainly Mellanox's work, Jeff is committing only
for logistical reasons (he holds the hg+svn combo tree, and can
therefore commit it directly back to SVN).

-----

Implemented distance-based mapping algorithm as a new "mindist"
component in the rmaps framework.  It allows mapping processes by NUMA
due to PCI locality information as reported by the BIOS - from the
closest to device to furthest.

To use this algorithm, specify:

   {{{mpirun --map-by dist:<device_name>}}}

where <device_name> can be mlx5_0, ib0, etc.

There are two modes provided:

 1. bynode: load-balancing across nodes
 1. byslot: go through slots sequentially (i.e., the first nodes are
     more loaded)

These options are regulated by the optional ''span'' modifier; the
command line parameter looks like:

    {{{mpirun --map-by dist:<device_name>,span}}}

So, for example, if there are 2 nodes, each with 8 cores, and we'd
like to run 10 processes, the mindist algorithm will place 8 processes
to the first node and 2 to the second by default. But if you want to
place 5 processes to each node, you can add a span modifier in your
command line to do that.

If there are two NUMA nodes on the node, each with 4 cores, and we run
6 processes, the mindist algorithm will try to find the NUMA closest
to the specified device, and if successful, it will place 4 processes
on that NUMA but leaving the remaining two to the next NUMA node.

You can also specify the number of cpus per MPI process. This option
is handled so that we map as many processes to the closest NUMA as we
can (number of available processors at the NUMA divided by number of
cpus per rank) and then go on with the next closest NUMA.

The default binding option for this mapping is bind-to-numa. It works
if you don't specify any binding policy. But if you specified binding
level that was "lower" than NUMA (i.e hwthread, core, socket) it would
bind to whatever level you specify.

This commit was SVN r28552.
2013-05-22 13:04:40 +00:00
Jeff Squyres
089c632cce Remove a bunch of dead code: gcc 4.7 warns of set-but-unused
variables.  So get rid of them.

This commit was SVN r28538.
2013-05-17 21:45:49 +00:00
Ralph Castain
e100b8d165 don't need the return value, but should check for error
This commit was SVN r28534.
2013-05-16 15:15:02 +00:00
Jeff Squyres
128cc27417 Minor type fix (they're both enums/ints, so the compiler previously
silently cast them).

This commit was SVN r28532.
2013-05-16 00:47:37 +00:00
Ralph Castain
93ba4247f8 remove extra paren when --without-hwloc
This commit was SVN r28530.
2013-05-15 21:31:45 +00:00
Ralph Castain
3a372a65b8 Mapping policies must be tested as equalities as they are values, not bitmasks
This commit was SVN r28526.
2013-05-15 13:45:00 +00:00
Ralph Castain
29e4b0cc50 Cannot test equality on mapping directives as it is a bitmask
This commit was SVN r28525.
2013-05-15 13:41:49 +00:00
Ralph Castain
04b11accd3 Silience a few warnings
This commit was SVN r28515.
2013-05-14 21:58:40 +00:00
Ralph Castain
5296099ecb Fix the cpus-per-rank when binding to hwthreads. Add cpus-per-rank to diag printout
Thanks to Elena for reporting the problem

This commit was SVN r28508.
2013-05-14 20:17:50 +00:00
Ralph Castain
427b6b0b47 Fix the verbosity of yet another framework...sigh.
This commit was SVN r28481.
2013-05-13 14:36:32 +00:00
Jeff Squyres
456df1c9f7 Remove redundant opal_output() messages from the module; the called
functions will now show_help() their own error messages if something
goes wrong (per r28470).

This commit was SVN r28471.

The following SVN revision numbers were found above:
  r28470 --> open-mpi/ompi@2ff95a7739
2013-05-10 15:12:07 +00:00
Jeff Squyres
2ff95a7739 Proper show_help error messages for LAMA.
This commit was SVN r28470.
2013-05-10 15:06:25 +00:00
Ralph Castain
f15fe5045e Ensure that debugger connect can occur by getting the rml contact info updated before calling init_after_spawn
cmr:v1.7.3,reviewer=jsquyres

This commit was SVN r28455.
2013-05-06 22:00:45 +00:00
Ralph Castain
c52b94af8b Revert r28453 and r28452 - wrong fix
This commit was SVN r28454.

The following SVN revision numbers were found above:
  r28452 --> open-mpi/ompi@756ee4b5e0
  r28453 --> open-mpi/ompi@6da24143a2
2013-05-06 21:52:17 +00:00
Ralph Castain
6da24143a2 Minor performance improvement
This commit was SVN r28453.
2013-05-06 20:27:16 +00:00
Ralph Castain
756ee4b5e0 Update the rml_uri for each proc so debuggers can attach
This commit was SVN r28452.
2013-05-06 20:18:14 +00:00
Ralph Castain
707d0e653a Must use equal and not & comparison for mapping directives
This commit was SVN r28451.
2013-05-06 15:07:12 +00:00
Ralph Castain
a0a6412545 Do a little cleanup on abnormal termination procedure - don't keep submitting forced exit events (one will do), no need to reset the abnormal termination pipe event in orterun, etc.
This commit was SVN r28450.
2013-05-05 17:39:45 +00:00
Ralph Castain
fb2a694587 Fix print
This commit was SVN r28446.
2013-05-04 22:37:34 +00:00
Ralph Castain
27e3e382d5 No need for ORTE tools to use orte progress thread
This commit was SVN r28445.
2013-05-04 21:13:20 +00:00
Jeff Squyres
42a9a4c62c After examining a '''lot''' of MTT output with Ralph, fix the cause of
many, many MTT timeouts when running jobs under SLURM: send the right
command at the end to cause remote orteds to shut down.

This commit was SVN r28438.
2013-05-02 00:23:53 +00:00
Ralph Castain
5d7a93c032 Add the ability to use an external version of libevent. Clearly not recommended at this time. I've verified that it works in limited scenarios, but more thorough testing and performance impacts need to be assessed.
Interesting how many includes had to be fixed here and there to fill in missing dependencies :-)

This commit was SVN r28411.
2013-04-29 17:02:37 +00:00
Ralph Castain
3a354c4ea3 Cleanup the verbose output channel name
This commit was SVN r28391.
2013-04-24 23:44:02 +00:00
Ralph Castain
c5e1a7dc65 fix typo
This commit was SVN r28390.
2013-04-24 23:37:59 +00:00
Nathan Hjelm
55decca2b7 routed/debruijn: if the next hop for a message is unknown forward the message to the parent process
This commit was SVN r28384.
2013-04-24 19:05:25 +00:00
Nathan Hjelm
bccf8c657a Per RFC add initial support for the MPI 3.0 tools interface.
Current MPI_T support:
  - Full cvar interface.
  - Full categories interface.
  - No pvar support at this time.

This commit was SVN r28376.
2013-04-24 15:59:23 +00:00
Ralph Castain
2e8946db0a Add some debug output
This commit was SVN r28371.
2013-04-23 23:11:22 +00:00
Ralph Castain
b6377a2138 Properly remove the object from the list prior to releasing it when an error is encountered
This commit was SVN r28370.
2013-04-23 22:44:52 +00:00
Ralph Castain
1a54e52da4 Per conversation with Josh H., remove the stale rsh component from the filem framework. Let the "raw" component do the job.
This commit was SVN r28338.
2013-04-16 20:39:48 +00:00
Ralph Castain
d6ac721e22 Add client/server test
This commit was SVN r28332.
2013-04-15 13:10:42 +00:00
Jeff Squyres
349ee654c1 Fix some --without-hwloc compile errors. Also remove one
assigned-but-not-used variable assignment.

This commit was SVN r28321.
2013-04-10 15:08:31 +00:00
Ralph Castain
45af6cf59e The move of the orte_db framework to opal required that we create an opaque opal_identifier_t type as OPAL cannot know anything about the ORTE process name. However, passing a value down to opal and then having the db components reference it causes alignment issues on Solaris Sparc platforms. So pass the pointer instead and do the old "memcpy" trick to avoid the problem.
This commit was SVN r28308.
2013-04-08 23:34:16 +00:00
Ralph Castain
2040afdcae Daemonize prior to starting any progress threads
This commit was SVN r28303.
2013-04-08 00:42:57 +00:00
Ralph Castain
76426285f0 Cannot retain opal_buffer_t, so use a copy
This commit was SVN r28302.
2013-04-07 23:02:59 +00:00
Ralph Castain
698b4ad6e7 Fix the parameter handling so no-tree-spawn isn't getting reversed
This commit was SVN r28300.
2013-04-07 15:48:25 +00:00
Ralph Castain
1f011bef99 Cleanup the updated sys limits capability. Fix a few copy/paste bugs (my bad). Shift the limit set to the ODLS default module so that we sete the limits for all apps, even those that don't call opal_init. Leave it in opal_init as well to support direct-launch apps, but ensure we only set the limits once by removing the envar after launch by ODLS.
Provide some nice error messages if we fail to set the limits. Since the user had to specifically request we set the limit, treat failure as an error-out situation.

This commit was SVN r28288.
2013-04-04 16:00:17 +00:00
Ralph Castain
252147fba6 Cleanup error message if unknown host is given in -host and -hostfile options
This commit was SVN r28262.
2013-03-28 16:52:10 +00:00
Ralph Castain
e6ae088813 Cleanup error outputs when a daemon fails to start
This commit was SVN r28261.
2013-03-28 16:51:19 +00:00
Ralph Castain
21ee48de57 Add missing static declaration
This commit was SVN r28247.
2013-03-27 21:59:17 +00:00
Ralph Castain
1b5b9afa85 Silence warnings
This commit was SVN r28244.
2013-03-27 21:56:46 +00:00
Nathan Hjelm
17315bf360 Now that the entire codebase has been updated to use the MCA framework
system remove the last calls to the MCA parameter system.

This commit was SVN r28242.
2013-03-27 21:17:53 +00:00
Nathan Hjelm
9d4a26f47d Update OMPI frameworks to use the MCA framework system.
Notes:
  - This commit also eliminates the need for an available components list in use
    in several frameworks. None of the code in question was making use of the
    priority field of the priority component list item so these extra lists were
    removed.
  - Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
  - Cleans up the ompi/orte-info functions. Expose the functions that construct the
    list of params so they can be used elsewhere.

patches for mtl/portals4 from brian

missed a few output variables in openib

This commit was SVN r28241.
2013-03-27 21:17:31 +00:00
Nathan Hjelm
c041156f60 Update ORTE frameworks to use the MCA framework system.
This commit was SVN r28240.
2013-03-27 21:14:43 +00:00
Nathan Hjelm
365cf48db5 Update OPAL frameworks to use the MCA framework system.
This commit was SVN r28239.
2013-03-27 21:11:47 +00:00
Nathan Hjelm
c3b67d0187 Automatically generate a list of installed frameworks in project/include/project/frameworks.h
This commit was SVN r28238.
2013-03-27 21:10:32 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Ralph Castain
e7ac6c9bde Don't build rank_file if you can't use it anyway
This commit was SVN r28233.
2013-03-27 15:12:40 +00:00
Ralph Castain
256414121e Protect the cpus-per-rank MCA param registration so that --without-hwloc will build
This commit was SVN r28232.
2013-03-27 14:53:30 +00:00
Ralph Castain
317915225c Finish the binding cleanup by removing the no-longer-used binding level scheme. This proved to be fallible as there is no guarantee that the hierarchy it used matched physical reality of the machine (e.g., is L3 "above" the socket or not). Still have to complete the ppr update, but get the rest of it correct.
This commit was SVN r28223.
2013-03-26 20:09:49 +00:00
Ralph Castain
24b91839aa Ensure the process knows it local cpuset early enough to perform the locality computation
This commit was SVN r28221.
2013-03-26 19:14:23 +00:00
Ralph Castain
6ee32767d4 Restore the cpus-per-proc option for byslot and bynode mapping. Remove the bind_idx (which recorded the index of the hwloc object where the proc was bound) as this would no longer be unique, and just use the bitmap as the standard reference for location. Update the relative locality computation to take bitmaps as its argument.
This commit was SVN r28219.
2013-03-26 18:27:50 +00:00
Ralph Castain
9c68e60965 Add test for comm_spawn with info keys
This commit was SVN r28207.
2013-03-24 14:38:33 +00:00
Ralph Castain
2f43989d22 Add debug and handle the use-case where someone (a) uses a hostfile while in a managed allocation to sub-allocate runs, and (b) includes the HNP's node in one of those hostfiles.
cmr:v1.7

This commit was SVN r28203.
2013-03-22 00:53:33 +00:00
Jeff Squyres
63d17ce901 Fix CID 968581: ensure that the string read from the socket is always
\0-terminated so that strlen() and strstr() can be used without fear.
Also fix some insignificant mem leaks (which is somewhat moot, because
as soon as we leave those error conditions, the process will be
terminating, but what the heck, might as well fix these while I was in
the file for the \0-termination issue...).

This commit was SVN r28199.
2013-03-21 16:05:50 +00:00
Jeff Squyres
562db0dd11 Fix CID 741328: remove some dead code
This commit was SVN r28192.
2013-03-21 11:15:06 +00:00
Ralph Castain
fa13d27238 Avoid double-release in error path
This commit was SVN r28190.
2013-03-20 21:00:59 +00:00
Ralph Castain
147c6ff9e7 Clean out the cruft leftover from the use_common_ports experiment
cmr:v1.7

This commit was SVN r28184.
2013-03-20 15:07:43 +00:00
Ralph Castain
a4b6fb241f Remove all remaining vestiges of the Windows integration
This commit was SVN r28137.
2013-02-28 17:31:47 +00:00
Ralph Castain
63727aa714 Use a non-blocking send in show_help as it could be called from inside an event
This commit was SVN r28135.
2013-02-28 17:19:18 +00:00
Ralph Castain
cf9796accd Remove the old configure option for disabling full rte support - we now use the OMPI rte framework for such purposes
This commit was SVN r28134.
2013-02-28 01:35:55 +00:00
Ralph Castain
347df93cd4 Handle the case of someone specifying a directory for the application. Ensure we get a non-zero exit status and clarify the error message.
cmr:v1.7

This commit was SVN r28119.
2013-02-27 01:36:21 +00:00
Ralph Castain
f36312ee6f Continue cleanup - this time, start working on the "without full support" flags in ORTE. Remove no-longer-needed configure.m4 files from the ess and errmgr. In the former case, since all priorities are now the same (given the removal of the cnos component), configure priorities are no longer required.
This commit was SVN r28118.
2013-02-26 21:27:48 +00:00
Ralph Castain
74a3ece313 Remove unused component
This commit was SVN r28117.
2013-02-26 20:58:43 +00:00
Ralph Castain
8d2fa3693b First cut at removing the native Windows support. Remove all the Windows-specific components, and the .windows files sprinkled around. Remove the Windows platform files and MTT scripts. Update the NEWS to point Windows users to the cygwin package.
This commit was SVN r28116.
2013-02-26 20:44:56 +00:00
Ralph Castain
bd9265c560 Per the meeting on moving the BTLs to OPAL, move the ORTE database "db" framework to OPAL so the relocated BTLs can access it. Because the data is indexed by process, this requires that we define a new "opal_identifier_t" that corresponds to the orte_process_name_t struct. In order to support multiple run-times, this is defined in opal/mca/db/db_types.h as a uint64_t without identifying the meaning of any part of that data.
A few changes were required to support this move:

1. the PMI component used to identify rte-related data (e.g., host name, bind level) and package them as a unit to reduce the number of PMI keys. This code was moved up to the ORTE layer as the OPAL layer has no understanding of these concepts. In addition, the component locally stored data based on process jobid/vpid - this could no longer be supported (see below for the solution).

2. the hash component was updated to use the new opal_identifier_t instead of orte_process_name_t as its index for storing data in the hash tables. Previously, we did a hash on the vpid and stored the data in a 32-bit hash table. In the revised system, we don't see a separate "vpid" field - we only have a 64-bit opaque value. The orte_process_name_t hash turned out to do nothing useful, so we now store the data in a 64-bit hash table. Preliminary tests didn't show any identifiable change in behavior or performance, but we'll have to see if a move back to the 32-bit table is required at some later time.

3. the db framework was a "select one" system. However, since the PMI component could no longer use its internal storage system, the framework has now been changed to a "select many" mode of operation. This allows the hash component to handle all internal storage, while the PMI component only handles pushing/pulling things from the PMI system. This was something we had planned for some time - when fetching data, we first check internal storage to see if we already have it, and then automatically go to the global system to look for it if we don't. Accordingly, the framework was provided with a custom query function used during "select" that lets you seperately specify the "store" and "fetch" ordering.

4. the ORTE grpcomm and ess/pmi components, and the nidmap code,  were updated to work with the new db framework and to specify internal/global storage options.

No changes were made to the MPI layer, except for modifying the ORTE component of the OMPI/rte framework to support the new db framework.

This commit was SVN r28112.
2013-02-26 17:50:04 +00:00
Jeff Squyres
9bd4b814db Fix one more nroff macro issue
This commit was SVN r28090.
2013-02-21 17:38:06 +00:00
Jeff Squyres
76fcd42bc3 Fix minor nroff macro issues.
This commit was SVN r28088.
2013-02-21 17:35:36 +00:00
Jeff Squyres
12e047e594 Update documentation for rankfiles in orterun.1:
* Add a little more description of what rankfiles are
 * Update that we use logical numbering for socket:core notation
 * Mention +nX notation

This commit was SVN r28067.
2013-02-16 17:52:30 +00:00
Ralph Castain
c0b670bea8 I guess some profiling tools and debuggers require that the argv[0] of each rank be unique so they can create a filename based on that value. For those obscure cases, provide an mpirun cmd line option that indexes each argv[0] by rank
This commit was SVN r28064.
2013-02-15 20:20:49 +00:00
Ralph Castain
b9897267ef Cleanup report-bindings so it always reports the actual binding instead of what was requested. Ensure we don't report twice if it is an MPI process being launched.
This commit was SVN r28057.
2013-02-14 17:24:28 +00:00
Ralph Castain
744ed49b2d Begin cleanup of the thread_lock calls in ORTE. We'll ignore the ones in the rml/oob for now as that code block is being rewritten anyway.
This commit was SVN r28053.
2013-02-13 01:53:12 +00:00
Brian Barrett
504a6d036f * Rather than use the extra_includes directive, add the extra includes (which is really just -I${includedir}/openmpi/ for devel headers) to CPPFLAGS, since all the other necessary -Is for devel headers (like libevent and hwloc) are added to CPPFLAGS.
* Clean up ${includedir} and ${libdir} for script wrapper compilers
* Update script wrapper compilers to work like the C wrapper compilers w.r.t static and dynamic linking
* Remove the ORTE script wrapper compilers since they didn't support the ${includedir} stuff and Ralph said they weren't used anymore.

This commit was SVN r28052.
2013-02-13 00:33:05 +00:00
Joshua Ladd
70ad711337 Backing out the Open SHMEM project
This commit was SVN r28050.
2013-02-12 17:45:27 +00:00
Mike Dubman
ff384daab4 Added new project: oshmem.
This commit was SVN r28048.
2013-02-12 15:33:21 +00:00
Ralph Castain
b360156a37 Extend print coverage to all types
This commit was SVN r28012.
2013-02-01 14:21:06 +00:00
Ralph Castain
afb0db5b6f Okay, Jeff - just for you...flow the show help thru the orte functions so help messages will be aggregated
This commit was SVN r28007.
2013-02-01 00:35:48 +00:00
Ralph Castain
53e0ed71b0 Disqualify slurm module even if slurm support was configured into the build if we don't have an allocation and haven't enabled dynamic allocations
This commit was SVN r27995.
2013-01-31 18:15:47 +00:00
Ralph Castain
166f512924 Add some useful debug to the heartbeat sensor
This commit was SVN r27994.
2013-01-31 18:01:13 +00:00
Ralph Castain
ca9605773b If sensors are enabled, then the daemons need to have their proc->node field linked to their local node object
This commit was SVN r27991.
2013-01-31 16:38:57 +00:00
Ralph Castain
c87fa68f9b Cleanup the resource usage sensor, letting the db handle any printing requests.
This commit was SVN r27990.
2013-01-31 15:20:56 +00:00
Ralph Castain
9625757a71 Add new database component for printing "add_log" info
This commit was SVN r27989.
2013-01-31 15:19:39 +00:00
Ralph Castain
8e8e95ca6b Silence error report - just because someone only defines ipv4 static ports doesn't make a fatal error
This commit was SVN r27976.
2013-01-29 23:48:22 +00:00
Jeff Squyres
8e25b927ab Clean some minor warnings: remove variables that were set but never
used.

This commit was SVN r27974.
2013-01-29 23:35:42 +00:00
Ralph Castain
112f8eedb1 Handle the case where rankfile is providing the allocation
This commit was SVN r27971.
2013-01-29 20:37:58 +00:00
Nathan Hjelm
666bd826dc fix alps configury
This commit was SVN r27962.
2013-01-29 15:44:30 +00:00
Brian Barrett
b8442ba505 Revamp the handling of wrapper compiler flags. The user flags, main configure
flags, and mca flags are kept seperate until the very end.  The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro.  MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.

This commit was SVN r27950.
2013-01-29 00:00:43 +00:00
Ralph Castain
cfaefb3286 Remove the only place where PMI was used outside a component, and relocate that code to common/pmi.
This commit was SVN r27944.
2013-01-28 20:14:51 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00
Ralph Castain
6eaf601ae6 Good ol' Cray changed the way node/cpu allocation is handled in their latest release of ALPS, and so our allocator is broken. Adjust for the revised method, but preserve the older method for those Cray users who have not updated their system.
cmr:v1.7

This commit was SVN r27911.
2013-01-25 21:53:31 +00:00
Ralph Castain
f6b4db0b79 Fix rank_file operations. We changed the syntax to use semi-colons between multiple slot assignments so that we could use the comma to separate specific cores, but somehow the flex definitions didn't get updated to accept that character. We also incorrectly zero'd the bitmap between slot assignment sections, and so multiple slot assignments only wound up making the last one in the list.
This commit was SVN r27908.
2013-01-25 18:33:25 +00:00
Ralph Castain
2504da1ac9 Remove stale code - message arrival time doesn't really mean much anymore.
This commit was SVN r27905.
2013-01-24 23:02:02 +00:00
Brian Barrett
0e799a93c3 Automake will ship the .in file whether or not the conditional is taken,
so don't install orte_wrapper_script when it's not used

This commit was SVN r27902.
2013-01-24 21:36:25 +00:00
Ralph Castain
9bfb2b989b Silence warning
This commit was SVN r27901.
2013-01-24 19:38:51 +00:00
Ralph Castain
4b310473a1 Correct the computation of the daemon vpid
cmr:v1.7

This commit was SVN r27899.
2013-01-24 18:04:53 +00:00
Ralph Castain
b403ca5bd8 Silence warning
This commit was SVN r27897.
2013-01-23 22:17:08 +00:00
Ralph Castain
4d34d30a97 Silence warning
This commit was SVN r27896.
2013-01-23 22:16:48 +00:00
Ralph Castain
6e2cabb87f Remove duplicate code
This commit was SVN r27889.
2013-01-23 02:07:06 +00:00
Ralph Castain
a591fbf06f Add initial support for dynamic allocations. At this time, only Slurm supports the new capability, which will be included in an upcoming release.
Add hooks for supporting dynamic allocation and deallocation to support application-driven requests and fault recovery operations.

This commit was SVN r27879.
2013-01-20 00:33:42 +00:00
Ralph Castain
e4673f3283 Add new job state
This commit was SVN r27878.
2013-01-20 00:30:27 +00:00
Ralph Castain
7aa80b984d Add new test program
This commit was SVN r27877.
2013-01-20 00:29:45 +00:00
Ralph Castain
7102d7c5f7 ick - brain is fried. take that test out as it isnt needed on a regular basis
This commit was SVN r27875.
2013-01-19 14:48:31 +00:00
Ralph Castain
38786457cb Add new test
This commit was SVN r27874.
2013-01-19 14:46:23 +00:00
Ralph Castain
73387e50e2 Add missing variable def - thanks to Paul Hargrove for spotting.
This commit was SVN r27865.
2013-01-18 14:32:53 +00:00
George Bosilca
e69dc00460 Dont duplicate headers nor global variables.
This commit was SVN r27864.
2013-01-18 11:51:56 +00:00
Ralph Castain
c96cc2d5a0 In order to properly connect to debuggers like STAT, we need to get the hostname in its unstripped version for the MPIR_proctab. Unfortunately, we need a stripped version for Cray's alps launcher. So when we are stripping the hostname prefix, retain alias hostnames and add the ability to specify an alias to use in the proctab.
This commit was SVN r27863.
2013-01-18 05:00:05 +00:00
Ralph Castain
54266837e9 Remove use of param_find function as that function will be disappearing
This commit was SVN r27831.
2013-01-15 19:50:38 +00:00
Ralph Castain
5b8de0b9f4 Ouch - opal_progress calls event_loop with a NO_BLOCK flag. So when run without progress threads, the ORTE tools were not blocking in the event lib as they should be. Avoid calling opal_progress inside ORTE by directly using the event_loop call instead of ORTE_WAIT_FOR_COMPLETION as parts of the OMPI layer are using that macro.
Thanks to George for spotting the problem.

This commit was SVN r27815.
2013-01-14 23:06:42 +00:00
Ralph Castain
97ee683275 Track parentage of procs
This commit was SVN r27758.
2013-01-08 04:41:12 +00:00
Ralph Castain
aea6787918 Add new routed component with self-healing connections - based on radix component - for use in monitoring system
This commit was SVN r27757.
2013-01-08 04:40:35 +00:00
Ralph Castain
c9a596b487 Remove unused var
This commit was SVN r27756.
2013-01-08 04:39:30 +00:00
Ralph Castain
beddf3b379 Add required rml tag
This commit was SVN r27751.
2013-01-05 06:32:20 +00:00
Ralph Castain
bee8bf5d8f Update the sensor framework to report stats back to the HNP if requested by including the data in heartbeats.
This commit was SVN r27748.
2013-01-05 06:30:20 +00:00
Ralph Castain
c71e119bbb Extend the db framework to add support for logging data to databases without duplicating all the modex-related storage.
This commit was SVN r27746.
2013-01-05 06:28:09 +00:00
George Bosilca
34eecb8956 Be more explicit about the operation (store or update). complain loudly
if something goes wrong.

This commit was SVN r27743.
2013-01-04 20:47:25 +00:00
Ralph Castain
cc29f8ff95 Attempt to fix the stupid Cray PMI problem
This commit was SVN r27742.
2013-01-04 02:53:42 +00:00
Nathan Hjelm
6a9ab9b221 Change orte_startup_timeout to be in seconds and remove the 10 second maximum
This commit was SVN r27741.
2013-01-03 23:56:34 +00:00
Ralph Castain
81a8e21939 Need to have the event thread running during init/finalize, but we still have a problem with cleanup - so comment out the event_base_free for now.
This commit was SVN r27738.
2013-01-03 02:16:57 +00:00
Ralph Castain
c65de32218 Cleanup the PMI subsystems to support Sam's "rml-less" shared memory wireup. Only retrieve keys that are specifically requested, and only when they are requested. Let string values be segmented across multiple keys, but don't do it for anything else.
This commit was SVN r27737.
2013-01-03 02:16:10 +00:00
Ralph Castain
c1690f403e Remove non-existent file
This commit was SVN r27730.
2012-12-29 02:21:50 +00:00
Ralph Castain
68329b516c Cleanup stale test codes
This commit was SVN r27729.
2012-12-28 16:52:51 +00:00
Ralph Castain
d1163ebbf2 Ensure we cleanup DFS worker threads during finalize to avoid segfaulting in MCA param cleanup
This commit was SVN r27723.
2012-12-25 21:17:35 +00:00
Ralph Castain
64da742d5f Remove the orte_finalize_event variable - no longer needed
This commit was SVN r27722.
2012-12-25 19:33:20 +00:00
Ralph Castain
cada035f38 Fix the segfault problem in the orteds - turns out it only occurred with progress threads enabled. Ensure the thread gets started at the right time (at the end of init), although the event base gets created earlier. Remove the finalize event as we can instead use the loopbreak call to exit the event loop.
This commit was SVN r27721.
2012-12-25 19:30:18 +00:00
Ralph Castain
c8e34813b6 THIS IS A TEMPORARY FIX - do not finalize opal as the parameter system has been broken and will segfault when finalized.
THIS PATCH MUST BE REMOVED WHEN THE PARAMETER SYSTEM HAS BEEN FIXED.

This commit was SVN r27720.
2012-12-24 18:42:19 +00:00
Ralph Castain
72bea688f1 Fix typo
This commit was SVN r27717.
2012-12-23 18:13:39 +00:00
Ralph Castain
852a709c0e Add libopen-pal to the libraries as all these tools directly reference OPAL functions, and the list of OS's that don't support indirect linking grows (Mac and Ubuntu, for now).
This commit was SVN r27716.
2012-12-23 15:54:05 +00:00
Jeff Squyres
b29b852281 Consolidate all the opal/orte/ompi .m4 files back to the top-level
config/ directory.  We split them apart a while ago in the hopes that
it would simplify things, but it didn't really (e.g., because there
were still some ompi/opal .m4 files in the top-level config/
directory, resulting in developer confusion where any given m4 macro
was defined).

So this commit consolidates them back into the top-level directory for
simplicity.  

There's still (at least) two changes that would be nice to make:

 1. Split any generated .m4 file (e.g., autogen-generated .m4 files)
    into a separate directory somewhere so that a top-level -Iconfig/
    will only get our explicitly defined macros, not the autogen stuff
    (e.g., with libevent2019 needing to get the visibility macro, but
    NOT all the autogen-generated inclusion of component configure.m4
    files).
 1. Change configure to be of the form:
{{{
# ...a small amount of preamble/setup...
OPAL_SETUP
m4_ifdef([project_orte], [ORTE_SETUP])
m4_ifdef([project_ompi], [OMPI_SETUP])
# ...a small amount of finishing stuff...
}}}

I doubt we'll ever get anything as clean as that, but that would be
the goal to shoot for.

This commit was SVN r27704.
2012-12-19 00:00:36 +00:00
Ralph Castain
ab73d11368 Oops - push missing definitions
This commit was SVN r27688.
2012-12-18 16:43:03 +00:00
Ralph Castain
c5ba59ba67 Remove stale component
This commit was SVN r27684.
2012-12-18 04:01:16 +00:00
Ralph Castain
0427a478b2 Remove stale component
This commit was SVN r27683.
2012-12-18 04:00:51 +00:00
Ralph Castain
82f1ba0ea8 Fix static port usage, ensure that both ipv4 and ipv6 are given if ipv6 was enabled
This commit was SVN r27682.
2012-12-18 03:59:49 +00:00
Ralph Castain
2fdd367aa9 Refs trac:3429
Fix bug reported by FreyGuy19713: in cases where HNP node has multiple entries in a hostfile or other allocation, we need to track the total slots allocated to that node.

This commit was SVN r27673.

The following Trac tickets were found above:
  Ticket 3429 --> https://svn.open-mpi.org/trac/ompi/ticket/3429
2012-12-14 17:00:44 +00:00
Jeff Squyres
c5b0bcd9f7 Refs trac:3422
* Add some comments in the *-wrapper-data-txt.in files just so that
   someone doesn't forget in the future why we link in what we do in
   the MPI and ORTE wrapper compilers.
 * Update ompi_wrapper_script.in to match the new behavior.
 * Update orte_wrapper_script.in to support --openmpi:linkall (which
   is a no-op in this case)

This commit was SVN r27672.

The following Trac tickets were found above:
  Ticket 3422 --> https://svn.open-mpi.org/trac/ompi/ticket/3422
2012-12-14 16:34:20 +00:00
Jeff Squyres
f779b1ded9 Put back the static-library-detection stuff from r27668, with some
additional functionality.  Rationale (refs trac:3422):

 * Normal MPI applications only ever use the MPI API. Hence, -lmpi is
   sufficient (they'll never directly call ORTE or OPAL
   functions). This is arguably the most common case.
 * That being said, we do have some test programs (e.g., those in
   orte/test/mpi) that call MPI functions but also call ORTE/OPAL
   functions. I've also written the occasional MPI test program that
   calls opal_output, for example (there even might be a few tests in
   the IBM test suite that directly call ORTE/OPAL functions).
   * Even though this is not a common case, these applications should
     also compile/link with mpicc.
   * So we should add a --openmpi:linkall option that will also link
     in whatever is necessary to call ORTE/OPAL functions
   * Yes, we could hard-code "-lopen-rte -lopen-pal" in Makefiles, but
     we do reserve the right to change those library names and/or add
     others someday, so it's better to abstract out the names and let
     the wrapper supply whatever is necessary.
 * ORTE programs, however, are different. They almost always call OPAL
   functions (e.g., if they want to send a message, they must use the
   OPAL DSS). As such, it seems like the ORTE programs should always
   link in OPAL.

Therefore:

 * Add undocumented --openmpi:linkall flag to the wrapper compilers.
   See the comment in opal_wrapper.c for an explanation of what it
   does.  This flag is only intended for Open MPI developers -- not
   end users.  That's why it's undocumented.
 * Update orte/test/mpi/Makefile.am to add --openmpi:linkall
 * Make ortecc/ortec++'s wrapper data text files always explicitly
   link in libopen-pal

This commit was SVN r27670.

The following SVN revision numbers were found above:
  r27668 --> open-mpi/ompi@cf845897aa

The following Trac tickets were found above:
  Ticket 3422 --> https://svn.open-mpi.org/trac/ompi/ticket/3422
2012-12-13 22:31:37 +00:00
Jeff Squyres
cf845897aa Temporarily revert r27662 and r27667 because something wonky is
happening on OS X.  Grumble...

This commit was SVN r27668.

The following SVN revision numbers were found above:
  r27662 --> open-mpi/ompi@97cc916007
  r27667 --> open-mpi/ompi@529f6244ca
2012-12-11 23:08:14 +00:00
Jeff Squyres
97cc916007 Per discussion at the Open MPI developer meeting last week:
1. Restore libopen-pal.la, libopen-rte.la, and libmpi.la to be
    separate entities (i.e., don't have libopen-rte.la include
    libopen-pal.la, and don't have libmpi.la include libopen-pal.la).
    Yay!
 1. Consequently, make the wrapper compilers look for flags indicating
    that the user wants to compile statically (currently: -static,
    !--static, -Bstatic, and "-Wl," in front of all of those).  If it
    is, follow a 6-way matrix for determinining which libraries to
    list on the underlying command line.
 1. To support that, add the name of a token static and dynamic
    library to look for in each of the wrapper compiler data files.
 1. Fix a long-standing typo in the opalcc wrapper data file.

This commit was SVN r27662.
2012-12-11 01:46:59 +00:00
Ralph Castain
1e92aa2b66 Enable multiple worker threads for processing DFS requests
This commit was SVN r27659.
2012-12-09 02:54:19 +00:00
Ralph Castain
c26ed7dcdd Fix comm_spawn when ORTE progress thread is enabled by ensuring that all operations on the global list of active collectives are done in events to avoid conflicts.
This commit was SVN r27658.
2012-12-09 02:53:20 +00:00
Nathan Hjelm
3e1b13b13a Re-add support for old flex (2.5.4a and earlier) while still cleaning up properly in new flex.
This commit was SVN r27657.
2012-12-07 00:12:43 +00:00
Ralph Castain
1237f8db57 Extend the ras module interface to include the orte_job_t being allocated so that dynamic allocations can be supported
This commit was SVN r27627.
2012-11-23 13:50:10 +00:00
George Bosilca
994d1aba50 Nothing.
This commit was SVN r27626.
2012-11-21 20:07:20 +00:00
Nathan Hjelm
a427a7e727 do not include c99 flag in compiler wrappers
This commit was SVN r27625.
2012-11-20 19:33:14 +00:00
Ralph Castain
7a5f6b584c Have orte-info show thread support as well
This commit was SVN r27624.
2012-11-18 18:15:22 +00:00
Ralph Castain
43f883cb42 Add some more detailed error output to the db_hash component and nidmap code. Ensure the local nodename is included in the HNP's aliases
This commit was SVN r27622.
2012-11-18 17:57:19 +00:00
Ralph Castain
f2ec35536e Fix a bug that prevented MCA params from being forwarded to daemons upon launch
cmr:v1.7

This commit was SVN r27621.
2012-11-18 17:55:26 +00:00
Ralph Castain
e11f32038a Add an MCA param to retain all aliases based on IP addrs for node names so that procs can look them up by interface, if desired. If the param is set, pass aliases around to all daemons and procs for local use
This commit was SVN r27619.
2012-11-16 04:04:29 +00:00
Ralph Castain
da6428a822 Add an MCA param to help debug the ORTE progress thread
This commit was SVN r27614.
2012-11-15 15:54:38 +00:00
Ralph Castain
5241925b62 Add the dfs to ompi_info
This commit was SVN r27613.
2012-11-15 15:54:07 +00:00
Ralph Castain
3cecc1569b Fix segfault if no file_maps were pushed
This commit was SVN r27612.
2012-11-15 15:39:17 +00:00
Ralph Castain
fe6dfad625 Update DFS to support multi-node operations
This commit was SVN r27594.
2012-11-12 02:54:53 +00:00
Ralph Castain
fefec03e78 Enable all ORTE tools to use progress threads if they are enabled
This commit was SVN r27593.
2012-11-12 02:54:09 +00:00
Ralph Castain
a6325e4546 Silence compiler warning
This commit was SVN r27590.
2012-11-12 02:51:29 +00:00
Ralph Castain
26f1cd0909 Fix compiler warnings
This commit was SVN r27588.
2012-11-12 02:50:45 +00:00
Ralph Castain
bd887f7f56 Add a new "test" component to the DFS that treats all files as remote in order to test the app-to-daemon interactions on a single machine. Set a global param to indicate we are using staged execution. Add a param to indicate it is okay for non-MPI processes to execute without finalizing. Cleanup file map load and fetch operations.
This commit was SVN r27587.
2012-11-10 14:09:12 +00:00
Ralph Castain
81d0b06842 Strip the domain info from the hostname if that option is specified, protecting IP address-based names
This commit was SVN r27586.
2012-11-10 14:05:27 +00:00
Ralph Castain
615cc66b44 Protect the HNP cleanup in cases where no session dirs are created
This commit was SVN r27585.
2012-11-10 14:03:07 +00:00
Ralph Castain
fd632147df Per patch from Nathan, with a few fixes, cleanup the orte-info tool
This commit was SVN r27581.
2012-11-10 04:11:40 +00:00
Nathan Hjelm
e0f5137e46 add prototypes for lex destroy functions
This commit was SVN r27580.
2012-11-09 22:00:27 +00:00
Nathan Hjelm
a754674fd7 Per the specification for putenv (http://pubs.opengroup.org/onlinepubs/009604599/functions/putenv.html) the string given to putenv becomes part of the environment. The string must not be changed or freed.
cmr:v1.7

This commit was SVN r27578.
2012-11-09 16:33:14 +00:00
Nathan Hjelm
8658bbc902 instead of relying on yyterminate to clean up the lex context call the destroy functions directly (after closing the file)
This commit was SVN r27577.
2012-11-09 16:10:55 +00:00
Nathan Hjelm
842caae4c7 Fix a small leak in orte/util/name_fns.c
cmr:v1.7

This commit was SVN r27576.
2012-11-07 23:59:49 +00:00
Ralph Castain
9b729794f2 A prior commit apparently broke the trunk when something was inadvertently left behind - so remove a reference to a no-longer-existing function
This commit was SVN r27574.
2012-11-07 11:11:05 +00:00
Nathan Hjelm
7fb5caea92 Remove the finish_parsing function from various .l files. The function is incomplete (doesn't clean up the lex state) and should be replaced by *_yylex_destroy which correctly cleans up the state.
Checked with the flex 2.5.35. Verified with valgrind that this fixes several "still reachable" leaks.

cmr:v1.7

This commit was SVN r27571.
2012-11-06 19:26:14 +00:00
Nathan Hjelm
bdedd8b0d3 Per RFC modify the behavior of mca_base_components_close to NOT close the output. Modify frameworks to always close their output and set to -1.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.

This commit was SVN r27570.
2012-11-06 19:09:26 +00:00
Ralph Castain
27b41a7db4 If the nodename is an IP address, we need to retain the full name (even if keep_fqdn is false) so that the ssh tree spawn can proceed.
cmr:v1.7

This commit was SVN r27561.
2012-11-05 16:59:53 +00:00
Ralph Castain
3812979315 Remove stale file - we removed the size functions awhile back
This commit was SVN r27551.
2012-11-01 03:36:51 +00:00
Brian Barrett
e61c00212d Add files found in svn but not tarball
This commit was SVN r27549.
2012-11-01 02:27:03 +00:00
Brian Barrett
dd907e8d4c Remove unused macro
This commit was SVN r27547.
2012-11-01 02:04:57 +00:00
Nathan Hjelm
2acd0f83de Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter".
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.

This commit was SVN r27527.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-30 19:45:18 +00:00
Nathan Hjelm
df9bd0ed59 fix bug in plm/rsh that could add extraneous mca options to the orted argv
cmr:v1.7

This commit was SVN r27526.
2012-10-30 19:40:04 +00:00
Ralph Castain
7c3a2c3c92 Allow PMI support to find subdirs named lib64 instead of lib - thanks to Guillaume.Papaure of Bull for the patch
This commit was SVN r27519.
2012-10-30 17:38:57 +00:00
Ralph Castain
a080de188f Enable orterun to directly support staged execution, treating each app as a separate job. Support transfer of file maps when support exists.
This commit was SVN r27516.
2012-10-29 23:11:30 +00:00
Ralph Castain
e5e72c3137 Expand the dfs API to support retrieval, loading and purging of file maps.
This commit was SVN r27515.
2012-10-29 23:05:45 +00:00
Ralph Castain
4e52a15e70 Provide for sync on seek and close DFS operations. Eliminate an unnecessary wake-up timer when using ORTE progress thread
This commit was SVN r27500.
2012-10-26 15:49:04 +00:00
Ralph Castain
4ef30c016b Remove stale windows references
This commit was SVN r27491.
2012-10-26 01:19:14 +00:00
Ralph Castain
35e5e5b512 Set the orte_event_base to the opal_event_base in ompi_info - we aren't doing anything with progress threads anyway
This commit was SVN r27488.
2012-10-25 22:36:08 +00:00
Ralph Castain
df642f1508 Add an API to get a remote file's size. Separate dfs cmds from returned data messages so daemons don't get confused.
This commit was SVN r27487.
2012-10-25 22:23:08 +00:00
Ralph Castain
79e36413c2 There was some discussion of this at an earlier time, but we never got around to doing it - so make orte behave more like a regular library, counting the number of times init is called, and executing finalize when all those are exhausted.
This commit was SVN r27484.
2012-10-25 18:39:37 +00:00
Ralph Castain
094d6f3143 Add a new "distributed file system" capability to support file access operations across nodes that do not have a network file system attached to them.
Add a set of URI create/parse utilities

This commit was SVN r27483.
2012-10-25 17:15:17 +00:00
Ralph Castain
32c185f730 Set a priority for output of forwarded IO so it can effectively compete against inbound messages
This commit was SVN r27480.
2012-10-24 23:34:50 +00:00
Ralph Castain
e06c330635 Add the ability to set a backlog limit on forwarded output waiting at mpirun - helps to avoid crashing systems during debug. Note that we default to "unlimited" to maintain current behavior.
This commit was SVN r27479.
2012-10-24 23:21:40 +00:00
Ralph Castain
e6014bf2e1 Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
This commit was SVN r27477.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Ralph Castain
7574d6673b If someone provides the launch_agent cmd, then don't prefix it
cmr:v1.7

This commit was SVN r27473.
2012-10-24 16:14:04 +00:00
Ralph Castain
5c0534a7ad Ensure that comm_spawn launches procs on the nodes specified by add-host and add-hostfile
This commit was SVN r27452.
2012-10-18 00:40:44 +00:00
Nathan Hjelm
d59034e6ef MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
cmr:v1.7

This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Ralph Castain
4028ce7a5d Silence warnings by making types match
This commit was SVN r27446.
2012-10-14 03:45:28 +00:00
Ralph Castain
285a3b168d Add an ability to specify the max number of simultaneous procs/node for an application when operating in staged mode. Change some debug statements from OPAL_OUTPUT_VERBOSE to opal_output_verbose so they are available in optimized builds.
This commit was SVN r27445.
2012-10-14 03:31:32 +00:00
Ralph Castain
04304c186f Remove the setup_hadoop configure script as it is no longer required - the hadoop support components can build without accessing hadoop itself.
This commit was SVN r27385.
2012-09-29 18:30:35 +00:00
Ralph Castain
9daaa001d9 Remove tools that are no longer required
This commit was SVN r27383.
2012-09-29 17:33:16 +00:00
Ralph Castain
54db4c35eb Get the trunk to build again when --without-hwloc is specified. Move a couple of key type definitions and utilities out from under the HAVE_HWLOC test so they are always available as they don't really depend on hwloc's presence. Tell two compnents not to build if hwloc is disabled:
ompi/mca/sbgp/basesmsocket
orte/mca/rmaps/lama

Remove stale configure.params files from the sbgp framework as the OMPI build system no longer looks at those files.

This commit was SVN r27377.
2012-09-26 23:24:27 +00:00
Samuel Gutierrez
42280e2af5 Temporarily make routed binomial the default. We are experiencing issues with
debruijn when launching fewer processes than are actually available within an
allocation. When this is fixed, please revert this change.

This commit was SVN r27376.
2012-09-26 16:08:12 +00:00
Jeff Squyres
cb65a44c6c Fix the component priority assignment. Thanks to Alex Margolin for
the patch.

This commit was SVN r27363.
2012-09-25 07:13:23 +00:00
George Bosilca
6ec41400b3 Fix the error message in case a daemon does not succeed at killing the
local offspring.

This commit was SVN r27362.
2012-09-24 15:25:21 +00:00
Ralph Castain
d5279b0dc8 Make an attempt to protect hwloc cset2str from segfaulting in weird scenario
This commit was SVN r27361.
2012-09-23 16:51:51 +00:00
Ralph Castain
1ddb334a52 Put the specified JDK path first so we find that one
This commit was SVN r27360.
2012-09-22 19:08:52 +00:00
Ralph Castain
d95025f53a Ensure we clear the usage numbers when binding on multiple nodes so we don't "carry over" info from one node to the next. Use the same tracking mechanism for binding upwards and in-place to avoid doing a bunch of mallocs.
Refs trac:3322

This commit was SVN r27356.

The following Trac tickets were found above:
  Ticket 3322 --> https://svn.open-mpi.org/trac/ompi/ticket/3322
2012-09-20 15:16:06 +00:00
Ralph Castain
90d7b5fdca Update test
This commit was SVN r27354.
2012-09-20 02:51:27 +00:00
Ralph Castain
445161cd2e Correctly count the total number of allocated slots
This commit was SVN r27353.
2012-09-20 02:50:14 +00:00
Ralph Castain
f592967685 Add missing retain to maintain correct accounting on nodes
This commit was SVN r27352.
2012-09-20 02:30:53 +00:00
Ralph Castain
e309db0be9 Ensure file descriptors are closed upon completion of transfer
This commit was SVN r27349.
2012-09-18 18:39:29 +00:00
Ralph Castain
11305109e1 Track positioned files so we avoid re-positioning them across jobs
This commit was SVN r27347.
2012-09-18 15:56:21 +00:00
Ralph Castain
a3060cdd15 Fix the bind_downward code - it was incorrectly looking across the entire node instead of only looking below the locale to which the proc had been assigned. In other words, if the proc was mapped to a core, then the only hwthreads that should be considered for binding are those directly below that core. The binding algo was incorrectly looking at ALL hwthreads in that scenario, causing the proc to be bound to an HT outside of the mapped location.
This now results in the procs being bound within their assigned location. It also causes us to use only the 0th HT on a core unless --use-hwthread-cpus has been specified (in which case, we use all the HTs in a core). Bind to core binds you to all HTs regardless - the --use-hwthread-cpus only impacts the oversubscribed determination and when binding to HT.

cmr:v1.7

This commit was SVN r27342.
2012-09-14 22:01:19 +00:00
Ralph Castain
9057e84ec1 Correct test statement
This commit was SVN r27321.
2012-09-12 14:30:03 +00:00
Ralph Castain
c4fd3df2df Remove unused variables
This commit was SVN r27319.
2012-09-12 12:03:24 +00:00
Ralph Castain
c82cfecc1c Cleanup comm_spawn for the multi-node case where at least one new process isn't spawned on every node. Avoid the complexities of trying to execute a daemon collective across the dynamic spawn as it becomes too hard to ensure that all daemons participate or are accounted for - instead, use a less scalable but workable solution of sending the data directly between the participating procs. Ensure that singletons get their collectives properly defined at startup so the spawned "HNP" is ready for them.
As a secondary cleanup, the HNP doesn't need to update its nidmap during an xcast as it already has an up-to-date picture of the situation. So just dump that data and move along.

This commit was SVN r27318.
2012-09-12 11:31:36 +00:00
Ralph Castain
6b5f9d7767 Some cleanups for staged execution
This commit was SVN r27317.
2012-09-12 09:15:33 +00:00
Ralph Castain
5f7a5c4793 Update test to include all keys
This commit was SVN r27311.
2012-09-12 05:02:51 +00:00
Jeff Squyres
fb2e543a57 Refs trac:3275.
We ran into a case where the OMPI SVN trunk grew a new acceptable MCA
parameter value, but this new value was not accepted on the v1.6
branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts
the value "silent", but on the older v1.6 branch, it doesn't).  If you
set "hwloc_base_mem_bind_failure_action=silent" in the default MCA
params file and then accidentally ran with the v1.6 branch, every OMPI
executable (including ompi_info) just failed because hwloc_base_open()
would say "hey, 'silent' is not a valid value for
hwloc_base_mem_bind_failure_action!".  Kaboom.

The only problem is that it didn't give you any indication of where
this value was being set.  Quite maddening, from a user perspective.

So we changed the ompi_info handles this case.  If any framework open
function return OMPI_ERR_BAD_PARAM (either because its base MCA params
got a bad value or because one of its component register/open
functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out
a warning that it received and error, and then dump out the parameters
that it has received so far in the framework that had a problem.

At a minimum, this will show the user the MCA param that had an error
(it's usually the last one), and ''where it was set from'' (so that
they can go fix it).  

We updated ompi_info to check for O???_ERR_BAD_PARAM from each from
the framework opens.  Also updated the doxygen docs in mca.h for this
O???_BAD_PARAM behavior.  And we noticed that mca.h had MCA_SUCCESS
and MCA_ERR_??? codes.  Why?  I think we used them in exactly one
place in the code base (mca_base_components_open.c).  So we deleted
those and just used the normal OPAL_* codes instead.

While we were doing this, we also cleaned up a little memory
management during ompi_info/orte-info/opal-info finalization.
Valgrind still reports a truckload of memory still in use at ompi_info
termination, but they mostly look to be components not freeing
memory/resources properly (and outside the scope of this fix).

This commit was SVN r27306.

The following Trac tickets were found above:
  Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275
2012-09-11 20:47:24 +00:00
Ralph Castain
a0ffeb205a Add an orted component for staged operations and rename the staged component to "staged_hnp".
This commit was SVN r27305.
2012-09-11 20:35:46 +00:00
Ralph Castain
387f657fc2 Nuts - forgot to include this with the MPI Ticket 313 stuff. Set some of the envars needed for MPI_INFO_ENV
This commit was SVN r27304.
2012-09-11 20:35:09 +00:00
Ralph Castain
cd8aff675b Update test
This commit was SVN r27303.
2012-09-11 20:32:43 +00:00
Ralph Castain
e8ecd67d53 Once again, bloody SLURM changes the envars and breaks things. Try and track their changes so we get a correct allocation.
This commit was SVN r27302.
2012-09-11 20:31:33 +00:00
Jeff Squyres
a8f8064d8b Add a missing free(). Refs trac:3292.
This commit was SVN r27298.

The following Trac tickets were found above:
  Ticket 3292 --> https://svn.open-mpi.org/trac/ompi/ticket/3292
2012-09-11 17:59:40 +00:00
Josh Hursey
40132f1874 Protect a potentially uninitialized variable (orte_sstore_base_global_snapshot_ref).
If orte_sstore_base_global_snapshot_ref is null, then it will default appropriately when it is used. When prelaunching we always specify this parameter, but if we are not prelaunching it is possible to allow this to be null and it will initialize when used. However we setup the prelaunching variable in both situtations and in the latter that would result in a NULL reference. This patch protects that code segment.

This commit was SVN r27289.
2012-09-11 15:14:28 +00:00
Ralph Castain
ca40cb5f1c Fix comm_spawn by mpirun
This commit was SVN r27285.
2012-09-10 17:09:25 +00:00
Jeff Squyres
8585920e49 Add a note in the default hostfile that it is not used in managed
environments. 

This commit was SVN r27264.
2012-09-07 14:41:19 +00:00
Ralph Castain
4ca495c7e3 In managed allocations, we need to ensure that all nodes are flagged as having their slots "given" in case the user incorrectly asks us to change them. Also need to update the HNP node's "slots_given" flag as we don't replace the orte_node_t object when inserting the node info for that node.
This commit was SVN r27258.
2012-09-07 04:08:17 +00:00
Ralph Castain
2110fb7f95 Add some debug
This commit was SVN r27257.
2012-09-07 04:06:37 +00:00
Ralph Castain
78ccb097f0 Fix vm setup in unmanaged environments - needs to construct a node list in the same way we now do for mapping
This commit was SVN r27256.
2012-09-07 01:53:19 +00:00
Ralph Castain
876d78f36a If JAVA_HOME is present on a Linux system, use it to find Java support
This commit was SVN r27255.
2012-09-07 00:41:14 +00:00
Ralph Castain
36acbe4ca6 Multiple apps might want the same files, so instead of using the app_idx to determine who gets what, use the actual file names as they are sent anyway
This commit was SVN r27254.
2012-09-06 22:02:05 +00:00
Ralph Castain
e9e52fc78f Gain some efficiency in the staged mapper - if soft locations are in use and get_nodes returns busy, then no need to continue cycling thru the remaining apps as all nodes are occupied
This commit was SVN r27253.
2012-09-06 22:01:18 +00:00
Ralph Castain
67f34c3be6 Record the bind_level recvd by the daemon for each job so it can be correctly sent to the procs. Add test in get_relative_locality to avoid descending into an infinite loop if the level is NODE (==0).
This commit was SVN r27252.
2012-09-06 20:50:07 +00:00
Ralph Castain
efa50346c8 Error out if we are filtering a hostfile and encounter a node that is not in the resource-managed allocation, giving an error message identifying the file and the node. Don't filter managed allocations thru a default hostfile as this can lead to "hidden" errors.
Don't use dash-host info on managed allocations if we using soft locations

This commit was SVN r27245.
2012-09-05 19:42:00 +00:00
Ralph Castain
d772e0fc3d Add an option to treat dash-host specifications as "requested, but not required". So-called "soft" location requests can allow an application to execute even if the ideal allocation isn't available.
This commit was SVN r27242.
2012-09-05 18:42:09 +00:00
Ralph Castain
6d29cecce1 Fix the help message warning of multiple prefixes so it correctly prints out the info, and fix a typo.
cmr:v1.7

This commit was SVN r27241.
2012-09-05 16:28:36 +00:00
Ralph Castain
64ccf789f2 Ensure the final output is printed
cmr:v1.7

This commit was SVN r27240.
2012-09-05 16:25:15 +00:00
Ralph Castain
fde83a44ab This confusion has been around for awhile, caused by a long-ago decision to track slots allocated to a specific job as opposed to allocated to the overall mpirun instance. We eliminated that quite a while ago, but never consolidated the "slots_alloc" and "slots" fields in orte_node_t. As a result, confusion has grown in the code base as to which field to look at and/or update.
So (finally) consolidate these two fields into one "slots" field. Add a field in orte_job_t to indicate when all the procs for a job will be launched together, so that staged operations can know when MPI operations are allowed.

This commit was SVN r27239.
2012-09-05 01:30:39 +00:00
Ralph Castain
bae5dab916 If (and only if) a user requests, set the default number of slots on any node to the number of objects of the specified type. This *only* takes effect in an unmanaged environment - i.e., if an external resource manager assigns us a number of slots, then that is what we use. However, if we are using a hostfile, then the user may or may not have given us a value for the number of slots on each node.
For those nodes (and *only* those nodes) where the user does *not* specify a slot count, we will set the number of slots according to their direction: either to the number of cores, numas, sockets, or hwthreads. Otherwise, the slot count is set to 1.

Note that the default behavior remains unchanged: in the absence of any value for #slots, and in the absence of any directive to set #slots, we will set #slots=1.

This commit was SVN r27236.
2012-09-04 20:58:26 +00:00
Ralph Castain
18d2f75b56 Ensure we don't re-link files when staging execution as we may be executing more members of the same app. Allow the user to ask that directory trees be "flattened" so that all files appear in the proc's session directory itself.
This commit was SVN r27232.
2012-09-04 17:52:12 +00:00
Ralph Castain
86a16edd5a Move the debug output to the right place
This commit was SVN r27231.
2012-09-04 17:22:17 +00:00
Ralph Castain
11de735e8a Complete the revamp of hostfile support in non-managed environments. Working at the app level, ensure that we utilize only those nodes specified for that app, but fall back to the default hostfile (if available) for those with no specification, further falling back to the local host if the default hostfile is not present or is empty.
This commit was SVN r27230.
2012-09-04 16:34:05 +00:00
Jeff Squyres
b23a6b8eda Shiqing removed this file in r27217 (but neglected to remove it from
the Makefile.am).

This commit was SVN r27226.

The following SVN revision numbers were found above:
  r27217 --> open-mpi/ompi@ddbd542732
2012-09-04 13:06:39 +00:00
Ralph Castain
42d58f17bf Provide a more user-expected way of handling hostfile and dash-host allocations in unmanaged environments. First look for an RM-managed allocation. If nothing is found, or no active module is alive, proceed to look for hostfile and dash-host assignments. These are provided on a per-app basis, so we have to cycle across the apps using the following algorithm:
1. if a hostfile is given, then add the nodes found in that hostfile to our list - i.e., the resulting allocation contains the UNION of all nodes specified in hostfiles from across all apps.

2. any app that has no hostfile but has a dash-host, will have those nodes added to the list

3. any app that fails to have a hostfile or a dash-host will be given the default hostfile, if we have it

Each app will subsequently be filtered using their hostfile and/or dash-host data to ensure that the app only has access to the hosts it specified

Note that any relative node syntax found in the hostfiles or dash-host data will generate an error in this scenario, so only non-relative syntax can be present

This commit was SVN r27223.
2012-09-04 11:50:52 +00:00
Ralph Castain
3894179e2f Add missing file
This commit was SVN r27222.
2012-09-04 01:16:58 +00:00
Ralph Castain
fa6a18f05a If the orted HNP is spawned by a singleton, then it needs to harvest all the MCA params from its environment to ensure they are passed on to any subsequently spawned daemons. Otherwise, the singleton could be directed to select options that other apps miss.
This commit was SVN r27221.
2012-09-04 01:10:26 +00:00
Ralph Castain
b5e26a90ea Singletons should insist that the spawned orted HNP use the novm state machine so that any subsequent comm_spawn occurs only on required nodes
This commit was SVN r27220.
2012-09-04 01:09:01 +00:00
Shiqing Fan
ddbd542732 Remove one .windows file.
Add a macro definition for isblank function.

This commit was SVN r27217.
2012-09-03 09:51:44 +00:00
Ralph Castain
66c3f5d18d When getting target nodes for mapping, there is a difference between not finding any nodes that match the required constraints (either in hostfile or dash-host filtering) and finding at least one such node, but all its slots are busy. Make the return code reflect this difference so the caller can take appropriate action.
This commit was SVN r27213.
2012-09-01 10:30:40 +00:00
Ralph Castain
95019cc310 Fix a few places where we weren't completely identifying hostfile-based operations against "localhost" entries. Tell the mapper base to be silent when we don't want errors announced because nodes aren't available for mapping (something it is okay if they are fully used). Fix an infinite loop in the file prepositioning code.
This commit was SVN r27210.
2012-08-31 21:28:49 +00:00
Jeff Squyres
da00d281e6 Oops -- we want the priority to be low, not high (for now). :-)
This commit was SVN r27208.
2012-08-31 21:08:35 +00:00
Jeff Squyres
fcc1c7e33c = Overview =
First revision of the Locatation Aware Mapping Algorithm (LAMA) RMAPS
component.  This component is used to effect many different types of
regular of process/processor affinity patterns.  Although quite
flexible in the patterns that it provides, it is ''not'' a
fully-arbitrary, rankfile-like solution for process/processor
affinity.  

Inspiried by !BlueGene-like network specifications, LAMA has a core
algorithm that is quite good at specifying regular patterns in
multiple "dimensions" (where "dimensions" are expressed in terms of
different hardware elements: processor hardware threads, cores,
sockets, ...etc.).  The LAMA core algorithm is described here:

  http://www.open-mpi.org/papers/cluster-2011-lama/

= LAMA Usage Levels =

LAMA allows specifying affinity multiple different ways:

 1. None: Speciying no affinity options to mpirun results in exactly
    the same behavior as today: no affinity is used.
 1. Simple: Using the mpirun options "--bind-to <WIDTH>" and "--map-to
    <LEVEL>" to indicate how "wide" each process should be bound
    (i.e., bind to a processor core, or to a processor socket, etc.)
    and how to lay out the processes (i.e., round robin by cores,
    sockets, etc.).  
 1. Expert: Using four new MCA parameters to effect process mapping
    and binding to processors.  These options are a bit complex, and
    are not for the faint at heart, but offer a high degree of
    (regular pattern) flexibility (each of these are described more
    fully below): 
    * rmaps_lama_map: a sequence of characters describing how to lay
      out processes
    * rmaps_lama_bind: a sequence of characters describing the
      resources to bind to each process
    * rmaps_lama_mppr: a sequence of characters describing the maximum
      number of processes to allow per resource (i.e., a specific
      definition of "oversubscription")
    * rmaps_lama_ordering: once all processes are in place, how to
      order the ranks in MPI_COMM_WORLD

We anticipate that most users will utilize the "None" and "Simple"
levels of affinity, and they continue to work just as they do with the
v1.6 series and SVN trunk.  

The Expert level was designed for two purposes:

 1. To provide a precise definition for the "Simple" level (i.e.,
 every
    --bind-to/--map-by option in the "Simple" level has a
 corresponding
    precise specification in the "Expert" level)
 1. As modern computing platforms become more complex, we simply
    cannot predict what application developers will need in terms of
    processor affinity.  LAMA is an attempt to provide a highly
    flexible mechanism that allows applications to utilize a variety
    of complex, unique affinity patterns beyond the common "bind to
    core" and "bind to socket" patterns.

= LAMA Simple Level =

The "Simple" level is pretty much the same as what Open MPI has
offered for years.  It supports the same --bind-to and --map-by
options that Open MPI has supported for a while, but expands their
scope a bit.

Specifically, the following options are available for both --bind-to
and --map-by:

 * slot
 * hwthread
 * core
 * l1cache
 * l2cache
 * l3cache
 * socket
 * numa
 * board
 * node

= LAMA Expert Level =

The "Expert" level requires some explanation.  I'll repeat my
disclaimer here: the LAMA Expert level is not for the meek.  It is
flexible, but complex.  '''Most users won't need the Expert level.'''

LAMA works in three phases: mapping, binding, and ordering.  Each is
described below.

== Expert: Mapping ==

Processes are paired with sets of resources.  For example, each
process may be paired with a single processor core.  Or each process
may be paired with an entire processor socket.  LAMA performs this
mapping, obeying the Max Processes Per Resource ("MPPR", pronounced
"mipper") limits.  More on MPPR, below.

Mapping can be performed across multiple hardware levels:

 * h: Hardware thread
 * c: Processor core
 * s: Processor socket
 * L1: L1 cache
 * L2: L2 cache
 * L3: L3 cache
 * N: NUMA node
 * b: Processor board
 * n: Server node

If the act of mapping is that of pairing MPI processes to the
resources that have been allocated to a job, one can easily imagine
looping through all the resources and assigning processes to them.

But to effect different process process layout patterns across those
resources, one may want to loop over those resources ''in a different
order.''  That is, if the above-mentioned nine hardware resources
(hardware thread, processor core, etc.) can be thought of as an
nine-dimensional space, you can imagine nine nested loops to traverse
all of them.  And you can imagine that changing the order of nesting
would change the traversal pattern.

LAMA accepts a sequence of tokens representing the above-mentioned
nine hardware resources to specify the order of looping when mapping
resources to processes.

For example, consider a "simple" traversal: csL1L2L3Nbnh.  Reading
that sequence of letters from left-to-right, it specifies mapping by
processor core, processor socket, L1 cache, L2 cache, L3 cache, NUMA
node, processor board, server node, and finally hardware thread.

Wait... what?  That string specifies resources from "smallest" to
"largest" -- with the exception of hardware threads.  Why are they
tacked on to the end?

In short, this string of letters means "map by round robin by core" --
(indeed, it exactly corresponds to the Simple level "--map-by core").
Specifically, LAMA traverses the string from left-to-right and maps
processes to all the resources indicated by that token (e.g., "c" for
processor core).  When there are no more resources indicated by that
token, it goes on to the next token.

Hence, in this case, LAMA will map the first process to the first
core, then it will map the second process to the second core, and so
on.

Once all the cores are exhausted, LAMA effectively ignores all the
other letters until "h" (because all the other resources are made up
of cores; when cores are exhausted, those resources are exhausted,
too).  

If there are still more processes to be mapped, LAMA will then
traverse all the hyperthreads -- meaning that the next process will be
mapped to the second hyperthread on the first core.  And the next
process will be mapped to the second hyperthread on the second core.
And so on.

Keep in mind that the cores involved may span many server nodes; we're
not just talking about the cores (etc.) in a single machine.

As another example, the sequence "sL1L2L3Nbnch" is exactly equivalent
to "--map-by socket" (i.e., LAMA maps the first process to the first
socket, the second process to the second socket, and so on).

The sequence of letter can be combined in many, many different ways to
produce many different regular mapping patterns.

=== Max Processes Per Resource (MPPR) ===

The MPPR is an expression that precisely defines the maximum number of
processes that can be mapped to any single resource.  In effect, it
defines the concept of "oversubscription."  Specifically, traditional
HPC wisdom is that "oversubscription" is when there is more than one
MPI process per processor core.

This conventional defintion is expressed in a MPPR string of "1:c"
(one process per core).  

But what if your MPI processes are multi-threaded, and they need
multiple processes per core?  You'd need a different description of
"oversubscription" in this case.  Perhaps you want to have one MPI
process per socket.  This would be expressed in a MPPR string of
"1:s".

The general form of an individual MPPR specification is an integer
follow by a colon, followed by any of the tokens from mapping can be
used in the MPPR specification.  For example "1:c" is pronounced "one
process per core."

Multiple MPPR specifications can be strung together into a
comma-delimited list, too.  All of these MPPR values and then taken
into account when mapping.  Here's some examples:

 * 1:c -- allow, at most, one process per processor core (i.e., don't
   schedule by hyperthread)
 * 1:s -- allow, at most, one process per processor socket (e.g., 
   that process may be multithreaded, or wants exclusive use of the
   socket's caches)
 * 1:s,2:n -- only allow one process per processor socket, but, at
   most, two processes per server node (e.g., if the two MPI processes
   will consume all the RAM on the server node, even if there are more
   processor cores available)

If mapping all processes to resources would exceed a MPPR limit, this
job is ruled to be oversubscribed.  If --oversubscribe was specified
on the mpirun command line, the job continues.  Otherwise, LAMA will
abort the job.

Additionally, if --oversubscribe is specified, LAMA will endlessly
cycle through the mapping token string untill all processes have been
mapped.

== Expert: Binding == 

Once processes have been paired with resources during the Mapping
stage, they are optionally bound to a (potentially different) set of
resources.  For example, processes may be mapped round robin by
processor socket, but bound to an individual processor core.

To be clear: if binding is not used, then mapping is effectively
reduced to "counting how many processes end up on each server node."
Without binding, there's no enforcement that a process will stay where
LAMA thinks it was placed.  

With binding, however, processes are bound to a set of hardware
threads.  The number of threads to which the process is bound is
sometimes referred to as the "binding width".  For example, if a
process is bound to all the hardware threads in a processor socket,
its "width" is the processor socket.

(note that we specifically do not say that the hardware threads are
sequential, even if they are all within a single resource such as a
processor core or socket.  BIOS ordering of hardware threads can be
wonky; so we only refer to "sets of hardware threads")

Bindings are expressed as an integer and a token from the mapping
string.  For example "1s" means "bind each process to one processor
socket" (there is no ":" in the binding string because the ":" is
pronounced as "per" when reading the MPPR string).

Note that it only makes sense to bind processes to a single resource
specification (unlike the MPPR specification, where multiple limits
can be specified).

== Expert: Ordering ==

Finally, processes are assigned a rank in MPI_COMM_WORLD.  LAMA
currently offers two ordering modes: sequential or natural:

 * Sequential: if you laid out all the hardware resources in a single
   line, and then overlaid all the MPI processes on top of them, they
   are ordered from 0 to (N-1) from left-to-right.
 * Natural: the ordering of ranks follows the mapping ordering.  For
   example, consider a server node with two processor sockets, each
   containing four cores.  The command line "mpirun -np 8 --bind-to
   core --map-by socket --order n a.out" would result in MCW ranks
   that look like this: [0 2 4 6] [1 3 5 7].

= Execution =

At this point, the job is fully mapped, optionally bound, and its
ranks in MPI_COMM_WORLD are ordered.  It now starts its execution.

= Final Notes =

Note that at this point, lama is not the default mapper.  It must be
activiated with "--mca rmaps lama".  We'll continue to do further
testing and comparitive analysis with the current set of ORTE mappers.

Also, note that the LAMA algorithm can handle heterogeneity between
hardware resources (e.g., an MPI job spanning server nodes with
differing numbers of processor sockets).  For lack of a longer
explanation (this commit message already long enough!), LAMA considers
each server node individually during mapping and binding.

See the LAMA paper for more details:
http://www.open-mpi.org/papers/cluster-2011-lama/

This commit was SVN r27206.
2012-08-31 19:57:53 +00:00
Jeff Squyres
ca5bd85364 Add some help messages to the RAS simulator for PEBKAC issues.
This commit was SVN r27203.
2012-08-31 16:33:29 +00:00
Ralph Castain
38ce23db43 Add some protection to allow NULL bytes in byte objects and NULL strings to be handled cleanly in nidmaps and modex entries. Ensure there is a valid nidmap available for the HNP to pass down to any local procs when it is operating alone.
This commit was SVN r27188.
2012-08-31 01:07:36 +00:00
Ralph Castain
efc4a40c8a It is okay for a key not to be found
This commit was SVN r27187.
2012-08-30 15:12:23 +00:00
Ralph Castain
6dbb7a8493 Just because a peer didn't post a particular pmi key, that doesn't mean it is an immediate irrecoverable error - could be they just don't have a matching interface. Let the upper layer decide what to do about it.
This commit was SVN r27186.
2012-08-30 14:12:09 +00:00
Ralph Castain
05c0464dcb Add missing protections
This commit was SVN r27183.
2012-08-30 12:17:29 +00:00
Ralph Castain
1b659de132 Get staged execution working on multi-node setups. Improve efficiency by only remapping if all procs not yet mapped in the job.
This commit was SVN r27181.
2012-08-29 20:35:52 +00:00