1
1
Граф коммитов

3752 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
4ef30c016b Remove stale windows references
This commit was SVN r27491.
2012-10-26 01:19:14 +00:00
Ralph Castain
35e5e5b512 Set the orte_event_base to the opal_event_base in ompi_info - we aren't doing anything with progress threads anyway
This commit was SVN r27488.
2012-10-25 22:36:08 +00:00
Ralph Castain
df642f1508 Add an API to get a remote file's size. Separate dfs cmds from returned data messages so daemons don't get confused.
This commit was SVN r27487.
2012-10-25 22:23:08 +00:00
Ralph Castain
79e36413c2 There was some discussion of this at an earlier time, but we never got around to doing it - so make orte behave more like a regular library, counting the number of times init is called, and executing finalize when all those are exhausted.
This commit was SVN r27484.
2012-10-25 18:39:37 +00:00
Ralph Castain
094d6f3143 Add a new "distributed file system" capability to support file access operations across nodes that do not have a network file system attached to them.
Add a set of URI create/parse utilities

This commit was SVN r27483.
2012-10-25 17:15:17 +00:00
Ralph Castain
32c185f730 Set a priority for output of forwarded IO so it can effectively compete against inbound messages
This commit was SVN r27480.
2012-10-24 23:34:50 +00:00
Ralph Castain
e06c330635 Add the ability to set a backlog limit on forwarded output waiting at mpirun - helps to avoid crashing systems during debug. Note that we default to "unlimited" to maintain current behavior.
This commit was SVN r27479.
2012-10-24 23:21:40 +00:00
Ralph Castain
e6014bf2e1 Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
This commit was SVN r27477.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Ralph Castain
7574d6673b If someone provides the launch_agent cmd, then don't prefix it
cmr:v1.7

This commit was SVN r27473.
2012-10-24 16:14:04 +00:00
Ralph Castain
5c0534a7ad Ensure that comm_spawn launches procs on the nodes specified by add-host and add-hostfile
This commit was SVN r27452.
2012-10-18 00:40:44 +00:00
Nathan Hjelm
d59034e6ef MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
cmr:v1.7

This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Ralph Castain
4028ce7a5d Silence warnings by making types match
This commit was SVN r27446.
2012-10-14 03:45:28 +00:00
Ralph Castain
285a3b168d Add an ability to specify the max number of simultaneous procs/node for an application when operating in staged mode. Change some debug statements from OPAL_OUTPUT_VERBOSE to opal_output_verbose so they are available in optimized builds.
This commit was SVN r27445.
2012-10-14 03:31:32 +00:00
Ralph Castain
04304c186f Remove the setup_hadoop configure script as it is no longer required - the hadoop support components can build without accessing hadoop itself.
This commit was SVN r27385.
2012-09-29 18:30:35 +00:00
Ralph Castain
9daaa001d9 Remove tools that are no longer required
This commit was SVN r27383.
2012-09-29 17:33:16 +00:00
Ralph Castain
54db4c35eb Get the trunk to build again when --without-hwloc is specified. Move a couple of key type definitions and utilities out from under the HAVE_HWLOC test so they are always available as they don't really depend on hwloc's presence. Tell two compnents not to build if hwloc is disabled:
ompi/mca/sbgp/basesmsocket
orte/mca/rmaps/lama

Remove stale configure.params files from the sbgp framework as the OMPI build system no longer looks at those files.

This commit was SVN r27377.
2012-09-26 23:24:27 +00:00
Samuel Gutierrez
42280e2af5 Temporarily make routed binomial the default. We are experiencing issues with
debruijn when launching fewer processes than are actually available within an
allocation. When this is fixed, please revert this change.

This commit was SVN r27376.
2012-09-26 16:08:12 +00:00
Jeff Squyres
cb65a44c6c Fix the component priority assignment. Thanks to Alex Margolin for
the patch.

This commit was SVN r27363.
2012-09-25 07:13:23 +00:00
George Bosilca
6ec41400b3 Fix the error message in case a daemon does not succeed at killing the
local offspring.

This commit was SVN r27362.
2012-09-24 15:25:21 +00:00
Ralph Castain
d5279b0dc8 Make an attempt to protect hwloc cset2str from segfaulting in weird scenario
This commit was SVN r27361.
2012-09-23 16:51:51 +00:00
Ralph Castain
1ddb334a52 Put the specified JDK path first so we find that one
This commit was SVN r27360.
2012-09-22 19:08:52 +00:00
Ralph Castain
d95025f53a Ensure we clear the usage numbers when binding on multiple nodes so we don't "carry over" info from one node to the next. Use the same tracking mechanism for binding upwards and in-place to avoid doing a bunch of mallocs.
Refs trac:3322

This commit was SVN r27356.

The following Trac tickets were found above:
  Ticket 3322 --> https://svn.open-mpi.org/trac/ompi/ticket/3322
2012-09-20 15:16:06 +00:00
Ralph Castain
90d7b5fdca Update test
This commit was SVN r27354.
2012-09-20 02:51:27 +00:00
Ralph Castain
445161cd2e Correctly count the total number of allocated slots
This commit was SVN r27353.
2012-09-20 02:50:14 +00:00
Ralph Castain
f592967685 Add missing retain to maintain correct accounting on nodes
This commit was SVN r27352.
2012-09-20 02:30:53 +00:00
Ralph Castain
e309db0be9 Ensure file descriptors are closed upon completion of transfer
This commit was SVN r27349.
2012-09-18 18:39:29 +00:00
Ralph Castain
11305109e1 Track positioned files so we avoid re-positioning them across jobs
This commit was SVN r27347.
2012-09-18 15:56:21 +00:00
Ralph Castain
a3060cdd15 Fix the bind_downward code - it was incorrectly looking across the entire node instead of only looking below the locale to which the proc had been assigned. In other words, if the proc was mapped to a core, then the only hwthreads that should be considered for binding are those directly below that core. The binding algo was incorrectly looking at ALL hwthreads in that scenario, causing the proc to be bound to an HT outside of the mapped location.
This now results in the procs being bound within their assigned location. It also causes us to use only the 0th HT on a core unless --use-hwthread-cpus has been specified (in which case, we use all the HTs in a core). Bind to core binds you to all HTs regardless - the --use-hwthread-cpus only impacts the oversubscribed determination and when binding to HT.

cmr:v1.7

This commit was SVN r27342.
2012-09-14 22:01:19 +00:00
Ralph Castain
9057e84ec1 Correct test statement
This commit was SVN r27321.
2012-09-12 14:30:03 +00:00
Ralph Castain
c4fd3df2df Remove unused variables
This commit was SVN r27319.
2012-09-12 12:03:24 +00:00
Ralph Castain
c82cfecc1c Cleanup comm_spawn for the multi-node case where at least one new process isn't spawned on every node. Avoid the complexities of trying to execute a daemon collective across the dynamic spawn as it becomes too hard to ensure that all daemons participate or are accounted for - instead, use a less scalable but workable solution of sending the data directly between the participating procs. Ensure that singletons get their collectives properly defined at startup so the spawned "HNP" is ready for them.
As a secondary cleanup, the HNP doesn't need to update its nidmap during an xcast as it already has an up-to-date picture of the situation. So just dump that data and move along.

This commit was SVN r27318.
2012-09-12 11:31:36 +00:00
Ralph Castain
6b5f9d7767 Some cleanups for staged execution
This commit was SVN r27317.
2012-09-12 09:15:33 +00:00
Ralph Castain
5f7a5c4793 Update test to include all keys
This commit was SVN r27311.
2012-09-12 05:02:51 +00:00
Jeff Squyres
fb2e543a57 Refs trac:3275.
We ran into a case where the OMPI SVN trunk grew a new acceptable MCA
parameter value, but this new value was not accepted on the v1.6
branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts
the value "silent", but on the older v1.6 branch, it doesn't).  If you
set "hwloc_base_mem_bind_failure_action=silent" in the default MCA
params file and then accidentally ran with the v1.6 branch, every OMPI
executable (including ompi_info) just failed because hwloc_base_open()
would say "hey, 'silent' is not a valid value for
hwloc_base_mem_bind_failure_action!".  Kaboom.

The only problem is that it didn't give you any indication of where
this value was being set.  Quite maddening, from a user perspective.

So we changed the ompi_info handles this case.  If any framework open
function return OMPI_ERR_BAD_PARAM (either because its base MCA params
got a bad value or because one of its component register/open
functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out
a warning that it received and error, and then dump out the parameters
that it has received so far in the framework that had a problem.

At a minimum, this will show the user the MCA param that had an error
(it's usually the last one), and ''where it was set from'' (so that
they can go fix it).  

We updated ompi_info to check for O???_ERR_BAD_PARAM from each from
the framework opens.  Also updated the doxygen docs in mca.h for this
O???_BAD_PARAM behavior.  And we noticed that mca.h had MCA_SUCCESS
and MCA_ERR_??? codes.  Why?  I think we used them in exactly one
place in the code base (mca_base_components_open.c).  So we deleted
those and just used the normal OPAL_* codes instead.

While we were doing this, we also cleaned up a little memory
management during ompi_info/orte-info/opal-info finalization.
Valgrind still reports a truckload of memory still in use at ompi_info
termination, but they mostly look to be components not freeing
memory/resources properly (and outside the scope of this fix).

This commit was SVN r27306.

The following Trac tickets were found above:
  Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275
2012-09-11 20:47:24 +00:00
Ralph Castain
a0ffeb205a Add an orted component for staged operations and rename the staged component to "staged_hnp".
This commit was SVN r27305.
2012-09-11 20:35:46 +00:00
Ralph Castain
387f657fc2 Nuts - forgot to include this with the MPI Ticket 313 stuff. Set some of the envars needed for MPI_INFO_ENV
This commit was SVN r27304.
2012-09-11 20:35:09 +00:00
Ralph Castain
cd8aff675b Update test
This commit was SVN r27303.
2012-09-11 20:32:43 +00:00
Ralph Castain
e8ecd67d53 Once again, bloody SLURM changes the envars and breaks things. Try and track their changes so we get a correct allocation.
This commit was SVN r27302.
2012-09-11 20:31:33 +00:00
Jeff Squyres
a8f8064d8b Add a missing free(). Refs trac:3292.
This commit was SVN r27298.

The following Trac tickets were found above:
  Ticket 3292 --> https://svn.open-mpi.org/trac/ompi/ticket/3292
2012-09-11 17:59:40 +00:00
Josh Hursey
40132f1874 Protect a potentially uninitialized variable (orte_sstore_base_global_snapshot_ref).
If orte_sstore_base_global_snapshot_ref is null, then it will default appropriately when it is used. When prelaunching we always specify this parameter, but if we are not prelaunching it is possible to allow this to be null and it will initialize when used. However we setup the prelaunching variable in both situtations and in the latter that would result in a NULL reference. This patch protects that code segment.

This commit was SVN r27289.
2012-09-11 15:14:28 +00:00
Ralph Castain
ca40cb5f1c Fix comm_spawn by mpirun
This commit was SVN r27285.
2012-09-10 17:09:25 +00:00
Jeff Squyres
8585920e49 Add a note in the default hostfile that it is not used in managed
environments. 

This commit was SVN r27264.
2012-09-07 14:41:19 +00:00
Ralph Castain
4ca495c7e3 In managed allocations, we need to ensure that all nodes are flagged as having their slots "given" in case the user incorrectly asks us to change them. Also need to update the HNP node's "slots_given" flag as we don't replace the orte_node_t object when inserting the node info for that node.
This commit was SVN r27258.
2012-09-07 04:08:17 +00:00
Ralph Castain
2110fb7f95 Add some debug
This commit was SVN r27257.
2012-09-07 04:06:37 +00:00
Ralph Castain
78ccb097f0 Fix vm setup in unmanaged environments - needs to construct a node list in the same way we now do for mapping
This commit was SVN r27256.
2012-09-07 01:53:19 +00:00
Ralph Castain
876d78f36a If JAVA_HOME is present on a Linux system, use it to find Java support
This commit was SVN r27255.
2012-09-07 00:41:14 +00:00
Ralph Castain
36acbe4ca6 Multiple apps might want the same files, so instead of using the app_idx to determine who gets what, use the actual file names as they are sent anyway
This commit was SVN r27254.
2012-09-06 22:02:05 +00:00
Ralph Castain
e9e52fc78f Gain some efficiency in the staged mapper - if soft locations are in use and get_nodes returns busy, then no need to continue cycling thru the remaining apps as all nodes are occupied
This commit was SVN r27253.
2012-09-06 22:01:18 +00:00
Ralph Castain
67f34c3be6 Record the bind_level recvd by the daemon for each job so it can be correctly sent to the procs. Add test in get_relative_locality to avoid descending into an infinite loop if the level is NODE (==0).
This commit was SVN r27252.
2012-09-06 20:50:07 +00:00
Ralph Castain
efa50346c8 Error out if we are filtering a hostfile and encounter a node that is not in the resource-managed allocation, giving an error message identifying the file and the node. Don't filter managed allocations thru a default hostfile as this can lead to "hidden" errors.
Don't use dash-host info on managed allocations if we using soft locations

This commit was SVN r27245.
2012-09-05 19:42:00 +00:00