Ralph Castain
1e92aa2b66
Enable multiple worker threads for processing DFS requests
...
This commit was SVN r27659.
2012-12-09 02:54:19 +00:00
Ralph Castain
c26ed7dcdd
Fix comm_spawn when ORTE progress thread is enabled by ensuring that all operations on the global list of active collectives are done in events to avoid conflicts.
...
This commit was SVN r27658.
2012-12-09 02:53:20 +00:00
Nathan Hjelm
3e1b13b13a
Re-add support for old flex (2.5.4a and earlier) while still cleaning up properly in new flex.
...
This commit was SVN r27657.
2012-12-07 00:12:43 +00:00
Ralph Castain
1237f8db57
Extend the ras module interface to include the orte_job_t being allocated so that dynamic allocations can be supported
...
This commit was SVN r27627.
2012-11-23 13:50:10 +00:00
George Bosilca
994d1aba50
Nothing.
...
This commit was SVN r27626.
2012-11-21 20:07:20 +00:00
Ralph Castain
43f883cb42
Add some more detailed error output to the db_hash component and nidmap code. Ensure the local nodename is included in the HNP's aliases
...
This commit was SVN r27622.
2012-11-18 17:57:19 +00:00
Ralph Castain
f2ec35536e
Fix a bug that prevented MCA params from being forwarded to daemons upon launch
...
cmr:v1.7
This commit was SVN r27621.
2012-11-18 17:55:26 +00:00
Ralph Castain
e11f32038a
Add an MCA param to retain all aliases based on IP addrs for node names so that procs can look them up by interface, if desired. If the param is set, pass aliases around to all daemons and procs for local use
...
This commit was SVN r27619.
2012-11-16 04:04:29 +00:00
Ralph Castain
3cecc1569b
Fix segfault if no file_maps were pushed
...
This commit was SVN r27612.
2012-11-15 15:39:17 +00:00
Ralph Castain
fe6dfad625
Update DFS to support multi-node operations
...
This commit was SVN r27594.
2012-11-12 02:54:53 +00:00
Ralph Castain
a6325e4546
Silence compiler warning
...
This commit was SVN r27590.
2012-11-12 02:51:29 +00:00
Ralph Castain
26f1cd0909
Fix compiler warnings
...
This commit was SVN r27588.
2012-11-12 02:50:45 +00:00
Ralph Castain
bd887f7f56
Add a new "test" component to the DFS that treats all files as remote in order to test the app-to-daemon interactions on a single machine. Set a global param to indicate we are using staged execution. Add a param to indicate it is okay for non-MPI processes to execute without finalizing. Cleanup file map load and fetch operations.
...
This commit was SVN r27587.
2012-11-10 14:09:12 +00:00
Ralph Castain
615cc66b44
Protect the HNP cleanup in cases where no session dirs are created
...
This commit was SVN r27585.
2012-11-10 14:03:07 +00:00
Nathan Hjelm
e0f5137e46
add prototypes for lex destroy functions
...
This commit was SVN r27580.
2012-11-09 22:00:27 +00:00
Nathan Hjelm
8658bbc902
instead of relying on yyterminate to clean up the lex context call the destroy functions directly (after closing the file)
...
This commit was SVN r27577.
2012-11-09 16:10:55 +00:00
Ralph Castain
9b729794f2
A prior commit apparently broke the trunk when something was inadvertently left behind - so remove a reference to a no-longer-existing function
...
This commit was SVN r27574.
2012-11-07 11:11:05 +00:00
Nathan Hjelm
7fb5caea92
Remove the finish_parsing function from various .l files. The function is incomplete (doesn't clean up the lex state) and should be replaced by *_yylex_destroy which correctly cleans up the state.
...
Checked with the flex 2.5.35. Verified with valgrind that this fixes several "still reachable" leaks.
cmr:v1.7
This commit was SVN r27571.
2012-11-06 19:26:14 +00:00
Nathan Hjelm
bdedd8b0d3
Per RFC modify the behavior of mca_base_components_close to NOT close the output. Modify frameworks to always close their output and set to -1.
...
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.
This commit was SVN r27570.
2012-11-06 19:09:26 +00:00
Brian Barrett
e61c00212d
Add files found in svn but not tarball
...
This commit was SVN r27549.
2012-11-01 02:27:03 +00:00
Nathan Hjelm
2acd0f83de
Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter".
...
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.
This commit was SVN r27527.
The following SVN revision numbers were found above:
r27451 --> open-mpi/ompi@d59034e6ef
r27456 --> open-mpi/ompi@ecdbf34937
2012-10-30 19:45:18 +00:00
Nathan Hjelm
df9bd0ed59
fix bug in plm/rsh that could add extraneous mca options to the orted argv
...
cmr:v1.7
This commit was SVN r27526.
2012-10-30 19:40:04 +00:00
Ralph Castain
a080de188f
Enable orterun to directly support staged execution, treating each app as a separate job. Support transfer of file maps when support exists.
...
This commit was SVN r27516.
2012-10-29 23:11:30 +00:00
Ralph Castain
e5e72c3137
Expand the dfs API to support retrieval, loading and purging of file maps.
...
This commit was SVN r27515.
2012-10-29 23:05:45 +00:00
Ralph Castain
4e52a15e70
Provide for sync on seek and close DFS operations. Eliminate an unnecessary wake-up timer when using ORTE progress thread
...
This commit was SVN r27500.
2012-10-26 15:49:04 +00:00
Ralph Castain
4ef30c016b
Remove stale windows references
...
This commit was SVN r27491.
2012-10-26 01:19:14 +00:00
Ralph Castain
df642f1508
Add an API to get a remote file's size. Separate dfs cmds from returned data messages so daemons don't get confused.
...
This commit was SVN r27487.
2012-10-25 22:23:08 +00:00
Ralph Castain
094d6f3143
Add a new "distributed file system" capability to support file access operations across nodes that do not have a network file system attached to them.
...
Add a set of URI create/parse utilities
This commit was SVN r27483.
2012-10-25 17:15:17 +00:00
Ralph Castain
32c185f730
Set a priority for output of forwarded IO so it can effectively compete against inbound messages
...
This commit was SVN r27480.
2012-10-24 23:34:50 +00:00
Ralph Castain
e06c330635
Add the ability to set a backlog limit on forwarded output waiting at mpirun - helps to avoid crashing systems during debug. Note that we default to "unlimited" to maintain current behavior.
...
This commit was SVN r27479.
2012-10-24 23:21:40 +00:00
Ralph Castain
e6014bf2e1
Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
...
This commit was SVN r27477.
The following SVN revision numbers were found above:
r27451 --> open-mpi/ompi@d59034e6ef
r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Ralph Castain
7574d6673b
If someone provides the launch_agent cmd, then don't prefix it
...
cmr:v1.7
This commit was SVN r27473.
2012-10-24 16:14:04 +00:00
Ralph Castain
5c0534a7ad
Ensure that comm_spawn launches procs on the nodes specified by add-host and add-hostfile
...
This commit was SVN r27452.
2012-10-18 00:40:44 +00:00
Nathan Hjelm
d59034e6ef
MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
...
cmr:v1.7
This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Ralph Castain
4028ce7a5d
Silence warnings by making types match
...
This commit was SVN r27446.
2012-10-14 03:45:28 +00:00
Ralph Castain
285a3b168d
Add an ability to specify the max number of simultaneous procs/node for an application when operating in staged mode. Change some debug statements from OPAL_OUTPUT_VERBOSE to opal_output_verbose so they are available in optimized builds.
...
This commit was SVN r27445.
2012-10-14 03:31:32 +00:00
Ralph Castain
04304c186f
Remove the setup_hadoop configure script as it is no longer required - the hadoop support components can build without accessing hadoop itself.
...
This commit was SVN r27385.
2012-09-29 18:30:35 +00:00
Ralph Castain
54db4c35eb
Get the trunk to build again when --without-hwloc is specified. Move a couple of key type definitions and utilities out from under the HAVE_HWLOC test so they are always available as they don't really depend on hwloc's presence. Tell two compnents not to build if hwloc is disabled:
...
ompi/mca/sbgp/basesmsocket
orte/mca/rmaps/lama
Remove stale configure.params files from the sbgp framework as the OMPI build system no longer looks at those files.
This commit was SVN r27377.
2012-09-26 23:24:27 +00:00
Samuel Gutierrez
42280e2af5
Temporarily make routed binomial the default. We are experiencing issues with
...
debruijn when launching fewer processes than are actually available within an
allocation. When this is fixed, please revert this change.
This commit was SVN r27376.
2012-09-26 16:08:12 +00:00
Jeff Squyres
cb65a44c6c
Fix the component priority assignment. Thanks to Alex Margolin for
...
the patch.
This commit was SVN r27363.
2012-09-25 07:13:23 +00:00
George Bosilca
6ec41400b3
Fix the error message in case a daemon does not succeed at killing the
...
local offspring.
This commit was SVN r27362.
2012-09-24 15:25:21 +00:00
Ralph Castain
d5279b0dc8
Make an attempt to protect hwloc cset2str from segfaulting in weird scenario
...
This commit was SVN r27361.
2012-09-23 16:51:51 +00:00
Ralph Castain
d95025f53a
Ensure we clear the usage numbers when binding on multiple nodes so we don't "carry over" info from one node to the next. Use the same tracking mechanism for binding upwards and in-place to avoid doing a bunch of mallocs.
...
Refs trac:3322
This commit was SVN r27356.
The following Trac tickets were found above:
Ticket 3322 --> https://svn.open-mpi.org/trac/ompi/ticket/3322
2012-09-20 15:16:06 +00:00
Ralph Castain
445161cd2e
Correctly count the total number of allocated slots
...
This commit was SVN r27353.
2012-09-20 02:50:14 +00:00
Ralph Castain
f592967685
Add missing retain to maintain correct accounting on nodes
...
This commit was SVN r27352.
2012-09-20 02:30:53 +00:00
Ralph Castain
e309db0be9
Ensure file descriptors are closed upon completion of transfer
...
This commit was SVN r27349.
2012-09-18 18:39:29 +00:00
Ralph Castain
11305109e1
Track positioned files so we avoid re-positioning them across jobs
...
This commit was SVN r27347.
2012-09-18 15:56:21 +00:00
Ralph Castain
a3060cdd15
Fix the bind_downward code - it was incorrectly looking across the entire node instead of only looking below the locale to which the proc had been assigned. In other words, if the proc was mapped to a core, then the only hwthreads that should be considered for binding are those directly below that core. The binding algo was incorrectly looking at ALL hwthreads in that scenario, causing the proc to be bound to an HT outside of the mapped location.
...
This now results in the procs being bound within their assigned location. It also causes us to use only the 0th HT on a core unless --use-hwthread-cpus has been specified (in which case, we use all the HTs in a core). Bind to core binds you to all HTs regardless - the --use-hwthread-cpus only impacts the oversubscribed determination and when binding to HT.
cmr:v1.7
This commit was SVN r27342.
2012-09-14 22:01:19 +00:00
Ralph Castain
c4fd3df2df
Remove unused variables
...
This commit was SVN r27319.
2012-09-12 12:03:24 +00:00
Ralph Castain
c82cfecc1c
Cleanup comm_spawn for the multi-node case where at least one new process isn't spawned on every node. Avoid the complexities of trying to execute a daemon collective across the dynamic spawn as it becomes too hard to ensure that all daemons participate or are accounted for - instead, use a less scalable but workable solution of sending the data directly between the participating procs. Ensure that singletons get their collectives properly defined at startup so the spawned "HNP" is ready for them.
...
As a secondary cleanup, the HNP doesn't need to update its nidmap during an xcast as it already has an up-to-date picture of the situation. So just dump that data and move along.
This commit was SVN r27318.
2012-09-12 11:31:36 +00:00