Ralph Castain
35e5e5b512
Set the orte_event_base to the opal_event_base in ompi_info - we aren't doing anything with progress threads anyway
...
This commit was SVN r27488.
2012-10-25 22:36:08 +00:00
Ralph Castain
df642f1508
Add an API to get a remote file's size. Separate dfs cmds from returned data messages so daemons don't get confused.
...
This commit was SVN r27487.
2012-10-25 22:23:08 +00:00
Ralph Castain
79e36413c2
There was some discussion of this at an earlier time, but we never got around to doing it - so make orte behave more like a regular library, counting the number of times init is called, and executing finalize when all those are exhausted.
...
This commit was SVN r27484.
2012-10-25 18:39:37 +00:00
Ralph Castain
094d6f3143
Add a new "distributed file system" capability to support file access operations across nodes that do not have a network file system attached to them.
...
Add a set of URI create/parse utilities
This commit was SVN r27483.
2012-10-25 17:15:17 +00:00
Ralph Castain
32c185f730
Set a priority for output of forwarded IO so it can effectively compete against inbound messages
...
This commit was SVN r27480.
2012-10-24 23:34:50 +00:00
Ralph Castain
e06c330635
Add the ability to set a backlog limit on forwarded output waiting at mpirun - helps to avoid crashing systems during debug. Note that we default to "unlimited" to maintain current behavior.
...
This commit was SVN r27479.
2012-10-24 23:21:40 +00:00
Ralph Castain
e6014bf2e1
Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
...
This commit was SVN r27477.
The following SVN revision numbers were found above:
r27451 --> open-mpi/ompi@d59034e6ef
r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Ralph Castain
7574d6673b
If someone provides the launch_agent cmd, then don't prefix it
...
cmr:v1.7
This commit was SVN r27473.
2012-10-24 16:14:04 +00:00
Ralph Castain
5c0534a7ad
Ensure that comm_spawn launches procs on the nodes specified by add-host and add-hostfile
...
This commit was SVN r27452.
2012-10-18 00:40:44 +00:00
Nathan Hjelm
d59034e6ef
MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
...
cmr:v1.7
This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Ralph Castain
4028ce7a5d
Silence warnings by making types match
...
This commit was SVN r27446.
2012-10-14 03:45:28 +00:00
Ralph Castain
285a3b168d
Add an ability to specify the max number of simultaneous procs/node for an application when operating in staged mode. Change some debug statements from OPAL_OUTPUT_VERBOSE to opal_output_verbose so they are available in optimized builds.
...
This commit was SVN r27445.
2012-10-14 03:31:32 +00:00
Ralph Castain
04304c186f
Remove the setup_hadoop configure script as it is no longer required - the hadoop support components can build without accessing hadoop itself.
...
This commit was SVN r27385.
2012-09-29 18:30:35 +00:00
Ralph Castain
9daaa001d9
Remove tools that are no longer required
...
This commit was SVN r27383.
2012-09-29 17:33:16 +00:00
Ralph Castain
54db4c35eb
Get the trunk to build again when --without-hwloc is specified. Move a couple of key type definitions and utilities out from under the HAVE_HWLOC test so they are always available as they don't really depend on hwloc's presence. Tell two compnents not to build if hwloc is disabled:
...
ompi/mca/sbgp/basesmsocket
orte/mca/rmaps/lama
Remove stale configure.params files from the sbgp framework as the OMPI build system no longer looks at those files.
This commit was SVN r27377.
2012-09-26 23:24:27 +00:00
Samuel Gutierrez
42280e2af5
Temporarily make routed binomial the default. We are experiencing issues with
...
debruijn when launching fewer processes than are actually available within an
allocation. When this is fixed, please revert this change.
This commit was SVN r27376.
2012-09-26 16:08:12 +00:00
Jeff Squyres
cb65a44c6c
Fix the component priority assignment. Thanks to Alex Margolin for
...
the patch.
This commit was SVN r27363.
2012-09-25 07:13:23 +00:00
George Bosilca
6ec41400b3
Fix the error message in case a daemon does not succeed at killing the
...
local offspring.
This commit was SVN r27362.
2012-09-24 15:25:21 +00:00
Ralph Castain
d5279b0dc8
Make an attempt to protect hwloc cset2str from segfaulting in weird scenario
...
This commit was SVN r27361.
2012-09-23 16:51:51 +00:00
Ralph Castain
1ddb334a52
Put the specified JDK path first so we find that one
...
This commit was SVN r27360.
2012-09-22 19:08:52 +00:00
Ralph Castain
d95025f53a
Ensure we clear the usage numbers when binding on multiple nodes so we don't "carry over" info from one node to the next. Use the same tracking mechanism for binding upwards and in-place to avoid doing a bunch of mallocs.
...
Refs trac:3322
This commit was SVN r27356.
The following Trac tickets were found above:
Ticket 3322 --> https://svn.open-mpi.org/trac/ompi/ticket/3322
2012-09-20 15:16:06 +00:00
Ralph Castain
90d7b5fdca
Update test
...
This commit was SVN r27354.
2012-09-20 02:51:27 +00:00
Ralph Castain
445161cd2e
Correctly count the total number of allocated slots
...
This commit was SVN r27353.
2012-09-20 02:50:14 +00:00
Ralph Castain
f592967685
Add missing retain to maintain correct accounting on nodes
...
This commit was SVN r27352.
2012-09-20 02:30:53 +00:00
Ralph Castain
e309db0be9
Ensure file descriptors are closed upon completion of transfer
...
This commit was SVN r27349.
2012-09-18 18:39:29 +00:00
Ralph Castain
11305109e1
Track positioned files so we avoid re-positioning them across jobs
...
This commit was SVN r27347.
2012-09-18 15:56:21 +00:00
Ralph Castain
a3060cdd15
Fix the bind_downward code - it was incorrectly looking across the entire node instead of only looking below the locale to which the proc had been assigned. In other words, if the proc was mapped to a core, then the only hwthreads that should be considered for binding are those directly below that core. The binding algo was incorrectly looking at ALL hwthreads in that scenario, causing the proc to be bound to an HT outside of the mapped location.
...
This now results in the procs being bound within their assigned location. It also causes us to use only the 0th HT on a core unless --use-hwthread-cpus has been specified (in which case, we use all the HTs in a core). Bind to core binds you to all HTs regardless - the --use-hwthread-cpus only impacts the oversubscribed determination and when binding to HT.
cmr:v1.7
This commit was SVN r27342.
2012-09-14 22:01:19 +00:00
Ralph Castain
9057e84ec1
Correct test statement
...
This commit was SVN r27321.
2012-09-12 14:30:03 +00:00
Ralph Castain
c4fd3df2df
Remove unused variables
...
This commit was SVN r27319.
2012-09-12 12:03:24 +00:00
Ralph Castain
c82cfecc1c
Cleanup comm_spawn for the multi-node case where at least one new process isn't spawned on every node. Avoid the complexities of trying to execute a daemon collective across the dynamic spawn as it becomes too hard to ensure that all daemons participate or are accounted for - instead, use a less scalable but workable solution of sending the data directly between the participating procs. Ensure that singletons get their collectives properly defined at startup so the spawned "HNP" is ready for them.
...
As a secondary cleanup, the HNP doesn't need to update its nidmap during an xcast as it already has an up-to-date picture of the situation. So just dump that data and move along.
This commit was SVN r27318.
2012-09-12 11:31:36 +00:00
Ralph Castain
6b5f9d7767
Some cleanups for staged execution
...
This commit was SVN r27317.
2012-09-12 09:15:33 +00:00
Ralph Castain
5f7a5c4793
Update test to include all keys
...
This commit was SVN r27311.
2012-09-12 05:02:51 +00:00
Jeff Squyres
fb2e543a57
Refs trac:3275.
...
We ran into a case where the OMPI SVN trunk grew a new acceptable MCA
parameter value, but this new value was not accepted on the v1.6
branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts
the value "silent", but on the older v1.6 branch, it doesn't). If you
set "hwloc_base_mem_bind_failure_action=silent" in the default MCA
params file and then accidentally ran with the v1.6 branch, every OMPI
executable (including ompi_info) just failed because hwloc_base_open()
would say "hey, 'silent' is not a valid value for
hwloc_base_mem_bind_failure_action!". Kaboom.
The only problem is that it didn't give you any indication of where
this value was being set. Quite maddening, from a user perspective.
So we changed the ompi_info handles this case. If any framework open
function return OMPI_ERR_BAD_PARAM (either because its base MCA params
got a bad value or because one of its component register/open
functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out
a warning that it received and error, and then dump out the parameters
that it has received so far in the framework that had a problem.
At a minimum, this will show the user the MCA param that had an error
(it's usually the last one), and ''where it was set from'' (so that
they can go fix it).
We updated ompi_info to check for O???_ERR_BAD_PARAM from each from
the framework opens. Also updated the doxygen docs in mca.h for this
O???_BAD_PARAM behavior. And we noticed that mca.h had MCA_SUCCESS
and MCA_ERR_??? codes. Why? I think we used them in exactly one
place in the code base (mca_base_components_open.c). So we deleted
those and just used the normal OPAL_* codes instead.
While we were doing this, we also cleaned up a little memory
management during ompi_info/orte-info/opal-info finalization.
Valgrind still reports a truckload of memory still in use at ompi_info
termination, but they mostly look to be components not freeing
memory/resources properly (and outside the scope of this fix).
This commit was SVN r27306.
The following Trac tickets were found above:
Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275
2012-09-11 20:47:24 +00:00
Ralph Castain
a0ffeb205a
Add an orted component for staged operations and rename the staged component to "staged_hnp".
...
This commit was SVN r27305.
2012-09-11 20:35:46 +00:00
Ralph Castain
387f657fc2
Nuts - forgot to include this with the MPI Ticket 313 stuff. Set some of the envars needed for MPI_INFO_ENV
...
This commit was SVN r27304.
2012-09-11 20:35:09 +00:00
Ralph Castain
cd8aff675b
Update test
...
This commit was SVN r27303.
2012-09-11 20:32:43 +00:00
Ralph Castain
e8ecd67d53
Once again, bloody SLURM changes the envars and breaks things. Try and track their changes so we get a correct allocation.
...
This commit was SVN r27302.
2012-09-11 20:31:33 +00:00
Jeff Squyres
a8f8064d8b
Add a missing free(). Refs trac:3292.
...
This commit was SVN r27298.
The following Trac tickets were found above:
Ticket 3292 --> https://svn.open-mpi.org/trac/ompi/ticket/3292
2012-09-11 17:59:40 +00:00
Josh Hursey
40132f1874
Protect a potentially uninitialized variable (orte_sstore_base_global_snapshot_ref).
...
If orte_sstore_base_global_snapshot_ref is null, then it will default appropriately when it is used. When prelaunching we always specify this parameter, but if we are not prelaunching it is possible to allow this to be null and it will initialize when used. However we setup the prelaunching variable in both situtations and in the latter that would result in a NULL reference. This patch protects that code segment.
This commit was SVN r27289.
2012-09-11 15:14:28 +00:00
Ralph Castain
ca40cb5f1c
Fix comm_spawn by mpirun
...
This commit was SVN r27285.
2012-09-10 17:09:25 +00:00
Jeff Squyres
8585920e49
Add a note in the default hostfile that it is not used in managed
...
environments.
This commit was SVN r27264.
2012-09-07 14:41:19 +00:00
Ralph Castain
4ca495c7e3
In managed allocations, we need to ensure that all nodes are flagged as having their slots "given" in case the user incorrectly asks us to change them. Also need to update the HNP node's "slots_given" flag as we don't replace the orte_node_t object when inserting the node info for that node.
...
This commit was SVN r27258.
2012-09-07 04:08:17 +00:00
Ralph Castain
2110fb7f95
Add some debug
...
This commit was SVN r27257.
2012-09-07 04:06:37 +00:00
Ralph Castain
78ccb097f0
Fix vm setup in unmanaged environments - needs to construct a node list in the same way we now do for mapping
...
This commit was SVN r27256.
2012-09-07 01:53:19 +00:00
Ralph Castain
876d78f36a
If JAVA_HOME is present on a Linux system, use it to find Java support
...
This commit was SVN r27255.
2012-09-07 00:41:14 +00:00
Ralph Castain
36acbe4ca6
Multiple apps might want the same files, so instead of using the app_idx to determine who gets what, use the actual file names as they are sent anyway
...
This commit was SVN r27254.
2012-09-06 22:02:05 +00:00
Ralph Castain
e9e52fc78f
Gain some efficiency in the staged mapper - if soft locations are in use and get_nodes returns busy, then no need to continue cycling thru the remaining apps as all nodes are occupied
...
This commit was SVN r27253.
2012-09-06 22:01:18 +00:00
Ralph Castain
67f34c3be6
Record the bind_level recvd by the daemon for each job so it can be correctly sent to the procs. Add test in get_relative_locality to avoid descending into an infinite loop if the level is NODE (==0).
...
This commit was SVN r27252.
2012-09-06 20:50:07 +00:00
Ralph Castain
efa50346c8
Error out if we are filtering a hostfile and encounter a node that is not in the resource-managed allocation, giving an error message identifying the file and the node. Don't filter managed allocations thru a default hostfile as this can lead to "hidden" errors.
...
Don't use dash-host info on managed allocations if we using soft locations
This commit was SVN r27245.
2012-09-05 19:42:00 +00:00
Ralph Castain
d772e0fc3d
Add an option to treat dash-host specifications as "requested, but not required". So-called "soft" location requests can allow an application to execute even if the ideal allocation isn't available.
...
This commit was SVN r27242.
2012-09-05 18:42:09 +00:00