1
1
Граф коммитов

2977 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
446e33a5d8 There are cases where we want to use the novm state machine, but the backend node topology differs from that where mpirun is executing. In those cases, we can wind up thinking we are oversubscribed because the head node has fewer cores than the compute nodes.
To resolve this situation, add the ability to specify a backend topology file that mpirun shall use for its mapping operations. Create a new "set_topology" function in opal hwloc to support it.

This commit was SVN r28682.
2013-06-27 03:04:50 +00:00
Joshua Ladd
0b5c1f2ea8 Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28666.
2013-06-21 15:28:14 +00:00
Nathan Hjelm
299d5b3dd7 Fix two debugger attach bugs.
- orte_debugger_init_after_spawn was not being called for debuggers that
   use the MPIR_attach_fifo to co-locate debugger daemons.
 - MPIR_Breakpoint was not getting called if a debugger reattached. Add
   a job state (ORTE_JOB_STATE_DEBUGGER_DETACH) to reset mpir_breakpoint_fired
   to false when a debugger detaches to ensure MPIR_Breakpoint is called if
   another debugger attaches. Tested with STAT 2.0/launchmon 1.0.

cmr:v1.7

This commit was SVN r28665.
2013-06-20 16:18:05 +00:00
Ralph Castain
a51a0a8c48 Fix uninitialized var
This commit was SVN r28652.
2013-06-18 22:41:47 +00:00
George Bosilca
b4ebc417a1 Correctly register the component MCA parameters.
Few cleanups in the includes.

This commit was SVN r28645.
2013-06-15 16:05:09 +00:00
Nathan Hjelm
518d1fe200 Fix two typos that prevented alps direct launch from working
This commit was SVN r28628.
2013-06-13 17:04:08 +00:00
Joshua Ladd
61ffb47573 Minor fix for the min-dist mapping algorithm: we need to call 'get_nbobjs_by_type' first, before we get the sorted list of nodes - we need to add node objects and fill them in the summary object for the current topology. This patch was submitted by Elena Elkina and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28578.
2013-05-31 15:19:59 +00:00
Jeff Squyres
6d173af329 This commit introduces a new "mindist" ORTE RMAPS mapper, as well as
some relevant updates/new functionality in the opal/mca/hwloc and
orte/mca/rmaps bases.  This work was mainly developed by Mellanox,
with a bunch of advice from Ralph Castain, and some minor advice from
Brice Goglin and Jeff Squyres.

Even though this is mainly Mellanox's work, Jeff is committing only
for logistical reasons (he holds the hg+svn combo tree, and can
therefore commit it directly back to SVN).

-----

Implemented distance-based mapping algorithm as a new "mindist"
component in the rmaps framework.  It allows mapping processes by NUMA
due to PCI locality information as reported by the BIOS - from the
closest to device to furthest.

To use this algorithm, specify:

   {{{mpirun --map-by dist:<device_name>}}}

where <device_name> can be mlx5_0, ib0, etc.

There are two modes provided:

 1. bynode: load-balancing across nodes
 1. byslot: go through slots sequentially (i.e., the first nodes are
     more loaded)

These options are regulated by the optional ''span'' modifier; the
command line parameter looks like:

    {{{mpirun --map-by dist:<device_name>,span}}}

So, for example, if there are 2 nodes, each with 8 cores, and we'd
like to run 10 processes, the mindist algorithm will place 8 processes
to the first node and 2 to the second by default. But if you want to
place 5 processes to each node, you can add a span modifier in your
command line to do that.

If there are two NUMA nodes on the node, each with 4 cores, and we run
6 processes, the mindist algorithm will try to find the NUMA closest
to the specified device, and if successful, it will place 4 processes
on that NUMA but leaving the remaining two to the next NUMA node.

You can also specify the number of cpus per MPI process. This option
is handled so that we map as many processes to the closest NUMA as we
can (number of available processors at the NUMA divided by number of
cpus per rank) and then go on with the next closest NUMA.

The default binding option for this mapping is bind-to-numa. It works
if you don't specify any binding policy. But if you specified binding
level that was "lower" than NUMA (i.e hwthread, core, socket) it would
bind to whatever level you specify.

This commit was SVN r28552.
2013-05-22 13:04:40 +00:00
Jeff Squyres
089c632cce Remove a bunch of dead code: gcc 4.7 warns of set-but-unused
variables.  So get rid of them.

This commit was SVN r28538.
2013-05-17 21:45:49 +00:00
Ralph Castain
e100b8d165 don't need the return value, but should check for error
This commit was SVN r28534.
2013-05-16 15:15:02 +00:00
Jeff Squyres
128cc27417 Minor type fix (they're both enums/ints, so the compiler previously
silently cast them).

This commit was SVN r28532.
2013-05-16 00:47:37 +00:00
Ralph Castain
3a372a65b8 Mapping policies must be tested as equalities as they are values, not bitmasks
This commit was SVN r28526.
2013-05-15 13:45:00 +00:00
Ralph Castain
29e4b0cc50 Cannot test equality on mapping directives as it is a bitmask
This commit was SVN r28525.
2013-05-15 13:41:49 +00:00
Ralph Castain
04b11accd3 Silience a few warnings
This commit was SVN r28515.
2013-05-14 21:58:40 +00:00
Ralph Castain
5296099ecb Fix the cpus-per-rank when binding to hwthreads. Add cpus-per-rank to diag printout
Thanks to Elena for reporting the problem

This commit was SVN r28508.
2013-05-14 20:17:50 +00:00
Ralph Castain
427b6b0b47 Fix the verbosity of yet another framework...sigh.
This commit was SVN r28481.
2013-05-13 14:36:32 +00:00
Jeff Squyres
456df1c9f7 Remove redundant opal_output() messages from the module; the called
functions will now show_help() their own error messages if something
goes wrong (per r28470).

This commit was SVN r28471.

The following SVN revision numbers were found above:
  r28470 --> open-mpi/ompi@2ff95a7739
2013-05-10 15:12:07 +00:00
Jeff Squyres
2ff95a7739 Proper show_help error messages for LAMA.
This commit was SVN r28470.
2013-05-10 15:06:25 +00:00
Ralph Castain
f15fe5045e Ensure that debugger connect can occur by getting the rml contact info updated before calling init_after_spawn
cmr:v1.7.3,reviewer=jsquyres

This commit was SVN r28455.
2013-05-06 22:00:45 +00:00
Ralph Castain
c52b94af8b Revert r28453 and r28452 - wrong fix
This commit was SVN r28454.

The following SVN revision numbers were found above:
  r28452 --> open-mpi/ompi@756ee4b5e0
  r28453 --> open-mpi/ompi@6da24143a2
2013-05-06 21:52:17 +00:00
Ralph Castain
6da24143a2 Minor performance improvement
This commit was SVN r28453.
2013-05-06 20:27:16 +00:00
Ralph Castain
756ee4b5e0 Update the rml_uri for each proc so debuggers can attach
This commit was SVN r28452.
2013-05-06 20:18:14 +00:00
Ralph Castain
707d0e653a Must use equal and not & comparison for mapping directives
This commit was SVN r28451.
2013-05-06 15:07:12 +00:00
Ralph Castain
a0a6412545 Do a little cleanup on abnormal termination procedure - don't keep submitting forced exit events (one will do), no need to reset the abnormal termination pipe event in orterun, etc.
This commit was SVN r28450.
2013-05-05 17:39:45 +00:00
Jeff Squyres
42a9a4c62c After examining a '''lot''' of MTT output with Ralph, fix the cause of
many, many MTT timeouts when running jobs under SLURM: send the right
command at the end to cause remote orteds to shut down.

This commit was SVN r28438.
2013-05-02 00:23:53 +00:00
Ralph Castain
5d7a93c032 Add the ability to use an external version of libevent. Clearly not recommended at this time. I've verified that it works in limited scenarios, but more thorough testing and performance impacts need to be assessed.
Interesting how many includes had to be fixed here and there to fill in missing dependencies :-)

This commit was SVN r28411.
2013-04-29 17:02:37 +00:00
Ralph Castain
3a354c4ea3 Cleanup the verbose output channel name
This commit was SVN r28391.
2013-04-24 23:44:02 +00:00
Ralph Castain
c5e1a7dc65 fix typo
This commit was SVN r28390.
2013-04-24 23:37:59 +00:00
Nathan Hjelm
55decca2b7 routed/debruijn: if the next hop for a message is unknown forward the message to the parent process
This commit was SVN r28384.
2013-04-24 19:05:25 +00:00
Ralph Castain
2e8946db0a Add some debug output
This commit was SVN r28371.
2013-04-23 23:11:22 +00:00
Ralph Castain
b6377a2138 Properly remove the object from the list prior to releasing it when an error is encountered
This commit was SVN r28370.
2013-04-23 22:44:52 +00:00
Ralph Castain
1a54e52da4 Per conversation with Josh H., remove the stale rsh component from the filem framework. Let the "raw" component do the job.
This commit was SVN r28338.
2013-04-16 20:39:48 +00:00
Jeff Squyres
349ee654c1 Fix some --without-hwloc compile errors. Also remove one
assigned-but-not-used variable assignment.

This commit was SVN r28321.
2013-04-10 15:08:31 +00:00
Ralph Castain
45af6cf59e The move of the orte_db framework to opal required that we create an opaque opal_identifier_t type as OPAL cannot know anything about the ORTE process name. However, passing a value down to opal and then having the db components reference it causes alignment issues on Solaris Sparc platforms. So pass the pointer instead and do the old "memcpy" trick to avoid the problem.
This commit was SVN r28308.
2013-04-08 23:34:16 +00:00
Ralph Castain
76426285f0 Cannot retain opal_buffer_t, so use a copy
This commit was SVN r28302.
2013-04-07 23:02:59 +00:00
Ralph Castain
698b4ad6e7 Fix the parameter handling so no-tree-spawn isn't getting reversed
This commit was SVN r28300.
2013-04-07 15:48:25 +00:00
Ralph Castain
1f011bef99 Cleanup the updated sys limits capability. Fix a few copy/paste bugs (my bad). Shift the limit set to the ODLS default module so that we sete the limits for all apps, even those that don't call opal_init. Leave it in opal_init as well to support direct-launch apps, but ensure we only set the limits once by removing the envar after launch by ODLS.
Provide some nice error messages if we fail to set the limits. Since the user had to specifically request we set the limit, treat failure as an error-out situation.

This commit was SVN r28288.
2013-04-04 16:00:17 +00:00
Ralph Castain
252147fba6 Cleanup error message if unknown host is given in -host and -hostfile options
This commit was SVN r28262.
2013-03-28 16:52:10 +00:00
Ralph Castain
e6ae088813 Cleanup error outputs when a daemon fails to start
This commit was SVN r28261.
2013-03-28 16:51:19 +00:00
Ralph Castain
21ee48de57 Add missing static declaration
This commit was SVN r28247.
2013-03-27 21:59:17 +00:00
Ralph Castain
1b5b9afa85 Silence warnings
This commit was SVN r28244.
2013-03-27 21:56:46 +00:00
Nathan Hjelm
17315bf360 Now that the entire codebase has been updated to use the MCA framework
system remove the last calls to the MCA parameter system.

This commit was SVN r28242.
2013-03-27 21:17:53 +00:00
Nathan Hjelm
c041156f60 Update ORTE frameworks to use the MCA framework system.
This commit was SVN r28240.
2013-03-27 21:14:43 +00:00
Nathan Hjelm
365cf48db5 Update OPAL frameworks to use the MCA framework system.
This commit was SVN r28239.
2013-03-27 21:11:47 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Ralph Castain
e7ac6c9bde Don't build rank_file if you can't use it anyway
This commit was SVN r28233.
2013-03-27 15:12:40 +00:00
Ralph Castain
256414121e Protect the cpus-per-rank MCA param registration so that --without-hwloc will build
This commit was SVN r28232.
2013-03-27 14:53:30 +00:00
Ralph Castain
317915225c Finish the binding cleanup by removing the no-longer-used binding level scheme. This proved to be fallible as there is no guarantee that the hierarchy it used matched physical reality of the machine (e.g., is L3 "above" the socket or not). Still have to complete the ppr update, but get the rest of it correct.
This commit was SVN r28223.
2013-03-26 20:09:49 +00:00
Ralph Castain
24b91839aa Ensure the process knows it local cpuset early enough to perform the locality computation
This commit was SVN r28221.
2013-03-26 19:14:23 +00:00
Ralph Castain
6ee32767d4 Restore the cpus-per-proc option for byslot and bynode mapping. Remove the bind_idx (which recorded the index of the hwloc object where the proc was bound) as this would no longer be unique, and just use the bitmap as the standard reference for location. Update the relative locality computation to take bitmaps as its argument.
This commit was SVN r28219.
2013-03-26 18:27:50 +00:00