Jeff Squyres
87e17a41da
Ensure that the com_rules[] array entries are initialized to NULL in
...
case individual entries aren't used, but dynamic rules are enabled
(i.e., at least one or more of them are not NULL, meaning that they'll
all be assumed to be either NULL or a valid value).
This commit was SVN r23361.
2010-07-07 14:04:18 +00:00
Ralph Castain
31295e8dc2
As discussed on today's telecon, reorganize the debugger attachment code in orte to better support efforts within the tool community aimed at exploring alternative methods. Move the debugger attachment code from the orterun directory to a new debugger framework. Organize the existing standard support code into an "mpir" component. Organize the current extensions for co-spawning debugger daemons into a separate "mpirx" component.
...
Since the MPIR symbols are now included in the ORTE library, remove duplicate declarations in OMPI and replace them with extern references to their ORTE instantiations.
This commit was SVN r23360.
2010-07-06 23:35:42 +00:00
Jeff Squyres
a3aba8f2b7
Missed two places to rename libtrace -> libompitrace.
...
This commit was SVN r23359.
2010-07-06 22:26:08 +00:00
Jeff Squyres
e7c71582fe
Started to update the NEWS file for v1.5. Might need a few more
...
tweaks.
This commit was SVN r23358.
2010-07-06 21:58:57 +00:00
Jeff Squyres
1802325a39
Rename "libtrace" to be "libompitrace" so as not to conflict with an
...
already-existing "libtrace" on some BSD distros.
This commit was SVN r23357.
2010-07-06 21:48:15 +00:00
Jeff Squyres
3a87183b57
Fix the help message to show which values are legal for
...
--enable-contrib-no-build. Thanks to Kevin Buckley for pointing out
the issue.
This commit was SVN r23356.
2010-07-06 14:57:05 +00:00
Jeff Squyres
c8bb7537e7
Remove include/opal/sys/cache.h -- its only purpose in life was to
...
#define CACHE_LINE_SIZE to 128. This name has a conflict on NetBSD,
and it seems kinda odd to have a header file that ''only'' defines a
single value. Also, we'll soon be raising hwloc to be a first-class
item, so having this file around seemed kinda weird.
Therefore, I replaced CACHE_LINE_SIZE with opal_cache_line_size, an
int (in opal/runtime/opal_init.c and opal/runtime/opal.h) on the
rationale that we can fill this in at runtime with hwloc info (trunk
and v1.5/beyond, only). The only place we ''needed'' a compile-time
CACHE_LINE_SIZE was in the BTL SM (for struct padding), so I made a
new BTL_SM_ preprocessor macro with the old CACHE_LINE_SIZE value
(128). That use isn't suitable for run-time hwloc information,
anyway.
This commit was SVN r23349.
2010-07-06 14:33:36 +00:00
Jeff Squyres
6d77118254
Fixes for FT code that came from recent shared memory updates.
...
This commit was SVN r23348.
2010-07-06 12:58:48 +00:00
Ralph Castain
98e7bc94f5
This belongs more properly in the ORCM repo now that we have MCA capability over there
...
This commit was SVN r23346.
2010-07-05 18:25:03 +00:00
Ralph Castain
dac1b9976e
Make the default group channels bidirectional
...
This commit was SVN r23345.
2010-07-05 14:45:57 +00:00
Shiqing Fan
f4a5f7c7d6
Update ORTE source files.
...
This commit was SVN r23344.
2010-07-05 08:45:41 +00:00
Jeff Squyres
5bebdb97fa
Need these header files for NetBSD. Thanks for the heads-up from
...
Aleksej Saushev.
This commit was SVN r23343.
2010-07-02 17:38:57 +00:00
Jeff Squyres
8fef296b8a
Updates about thread support levels.
...
This commit was SVN r23341.
2010-07-02 13:14:09 +00:00
Jeff Squyres
af33e5cd1a
Fix typo.
...
This commit was SVN r23340.
2010-07-02 12:37:48 +00:00
Ralph Castain
ee4564c13b
Add some useful debug
...
This commit was SVN r23339.
2010-07-02 03:35:47 +00:00
Ralph Castain
b4422e012c
Fix a typo that breaks ompi_info if --enable-sensors
...
This commit was SVN r23338.
2010-07-02 02:38:55 +00:00
Jeff Squyres
10185343a7
Ensure that we're actually checking for *linux*. Thanks to Aleksej
...
Saushev for the patch.
This commit was SVN r23336.
2010-07-01 23:26:49 +00:00
Ralph Castain
f3d90dfb8d
Fully restore fault recovery, both at the individual process and daemon level.
...
NOTE: MPI fault recovery remains unavailable pending merge from Josh. This only covers ORTE-level processes.
This commit was SVN r23335.
2010-07-01 19:45:43 +00:00
Ralph Castain
7190415977
Fix JEFF's mistake - we cannot use orte_show_help if execv fails because we already closed all the file descriptors!
...
This commit was SVN r23334.
2010-07-01 19:41:26 +00:00
Ralph Castain
510ade9503
Do not use nodes that are flagged as down or do-not-use for this map. Modify error output to reflect possible reasons no nodes would be available
...
This commit was SVN r23333.
2010-07-01 19:39:31 +00:00
Ralph Castain
81a65f2c67
Define a new node state
...
This commit was SVN r23332.
2010-07-01 19:38:23 +00:00
Ralph Castain
f5548b8e0f
remove a potential locking conflict, and let emacs go ahead and reformat the function (sigh)
...
This commit was SVN r23331.
2010-07-01 19:37:53 +00:00
Ralph Castain
d463aec2f6
Don't try to send to dead daemons, keep accounting straight so we don't hang
...
This commit was SVN r23330.
2010-07-01 19:37:02 +00:00
Ralph Castain
dd85689560
Cleanup pointer array addressing
...
This commit was SVN r23329.
2010-07-01 19:33:10 +00:00
Ralph Castain
26fbae447e
Don't try to forward input when we already ordered shutdown. Check return codes on sends
...
This commit was SVN r23328.
2010-07-01 19:32:08 +00:00
Ralph Castain
628936a99f
Provide a convenience option to disable fault recovery (as opposed to setting three separate, long-named mca params)
...
This commit was SVN r23327.
2010-07-01 19:31:11 +00:00
Ralph Castain
09acea1ccc
Update platform file
...
This commit was SVN r23326.
2010-07-01 19:30:15 +00:00
Jeff Squyres
222c4c8dd8
Reformat the verbatim sections of these man pages for narrower (80
...
char) displays.
This commit was SVN r23325.
2010-07-01 18:52:45 +00:00
Ralph Castain
1102f0c171
Replace old platform file with newer ones
...
This commit was SVN r23322.
2010-06-29 15:00:10 +00:00
Ralph Castain
73eabc83d6
Add new platform files
...
This commit was SVN r23321.
2010-06-29 14:58:40 +00:00
Jeff Squyres
ad95e00b42
Remove an extraneous/misleading comment.
...
This commit was SVN r23320.
2010-06-29 14:42:03 +00:00
Jeff Squyres
9ac56c8674
Add "-j4" into the flags passed when we "make distcheck" (these flags
...
don't help when just running "make dist"). On my (somewhat older)
machines, it cut the wall clock time of make_dist_tarball down from
~55 minutes to ~40 minutes.
This commit was SVN r23318.
2010-06-29 14:32:20 +00:00
Jeff Squyres
e82e7f896e
These compile warnings have been forever; I finally got inspired to
...
fix them.
This commit was SVN r23316.
2010-06-28 17:26:38 +00:00
Ralph Castain
3237b9ec87
Print a nice error message when a daemon fails, and exit with a non-zero status
...
This commit was SVN r23314.
2010-06-28 16:38:54 +00:00
Jeff Squyres
1fad51776d
Also add <stdlib.h> for exit().
...
This commit was SVN r23308.
2010-06-28 15:17:42 +00:00
Jeff Squyres
f9d4426c19
OS X / Absoft needs <string.h>
...
This commit was SVN r23307.
2010-06-28 15:15:06 +00:00
Ralph Castain
a1ea6bc130
Ignore debugger daemon termination status - we don't care how they died.
...
This commit was SVN r23306.
2010-06-26 03:08:50 +00:00
Jeff Squyres
6d07a1cc0b
Per comments in this commit, hwloc isn't able to find cores on all
...
platforms (e.g., PPC64 running RHEL 5.4) -- sometimes it only finds
PUs. So in that case, just run the same calculation, but with PUs
instead of cores.
This commit was SVN r23305.
2010-06-25 21:36:53 +00:00
Ralph Castain
f325ac030a
Add a function to prepend a string to the beginning of an argv array - useful when building app_contexts from user input
...
This commit was SVN r23303.
2010-06-24 15:52:36 +00:00
Nadia Derbey
c22e6b3613
openib btl unsafe in case of extremely low srq settings
...
This commit was SVN r23301.
2010-06-24 09:59:45 +00:00
Ralph Castain
099c3aad97
Fix a major foopah that broke debugger attach. With the revisions in updating proc state, we dropped the recording of each proc's pid. Thus, attaching debuggers would find a proctable whose pids all equal 0.
...
This required modification of the errmgr.update_state API so the pid could be passed in to the function that could update the proper data record(s). All calls to that API have been updated as well, but I obviously couldn't test them all.
Thanks to Dong Ahn (LLNL) for catching this problem!
Also fixed debugger daemon cospawn, both for initial launch and attach-while-running modes. Tested and verified on rsh and slurm.
This commit was SVN r23300.
2010-06-24 05:13:53 +00:00
Ralph Castain
e9f4c84d7e
Add another name field to the job object
...
This commit was SVN r23299.
2010-06-24 01:57:27 +00:00
Jeff Squyres
5cdd79ef13
Oops -- set the bits one at a time via _set. Using _cpu effectively
...
zeroed out the cpuset before setting the bit (i.e., we always had a
cpuset of 1).
This commit was SVN r23298.
2010-06-23 20:56:59 +00:00
Shiqing Fan
681df0089b
Add a few new files into the tarball.
...
This commit was SVN r23297.
2010-06-22 16:45:56 +00:00
Ralph Castain
8b2a682fba
Return a silent error when -do-not-launch is given
...
This commit was SVN r23291.
2010-06-22 01:06:10 +00:00
Shiqing Fan
2e5e9f0a03
Fix a wrong windows path in hpn_contack, which causes problems when looking up in the session directories. Add two more ess module for Windows.
...
This commit was SVN r23286.
2010-06-21 09:47:33 +00:00
Ralph Castain
ae746a390f
Debugger daemons spawned upon attachment to a running job need to be treated just like a regular job - they are not "piggybacking" onto an existing launch, and so the orte daemons need to report them just like a regular job launch in order to release from spawn.
...
Modify the debugger control flag to include "do not monitor" so orterun will not take debugger daemon termination into account when deciding that all jobs are done.
This commit was SVN r23282.
2010-06-19 15:22:36 +00:00
Jeff Squyres
ea05c73cfc
Use the right number of characters for the strncmp. Thanks to Brad
...
for catching that!
This commit was SVN r23281.
2010-06-18 15:45:38 +00:00
Jeff Squyres
cdc5541cb0
Search for "dlname", not "dlopen". This value will be filled in if
...
there is a DSO to open.
This commit was SVN r23280.
2010-06-18 15:13:34 +00:00
Shiqing Fan
e32159d118
Updates and fixes for Fortran bindings on Windows, including two missing feature tests and CMake scripts improvements.
...
This commit was SVN r23279.
2010-06-18 13:03:16 +00:00