Jeff Squyres
c98729694a
Fix a compile error in the xgrid plm. This does nothing to make it
...
work; it just now compiles on os x.
This commit was SVN r23387.
2010-07-13 11:56:20 +00:00
Rolf vandeVaart
fb19872806
Two new flag definitions needed by the new PML.
...
This commit was SVN r23386.
2010-07-13 11:30:43 +00:00
Rolf vandeVaart
19d007a6fc
New PML to support failover between openib BTLs.
...
openib BTL changes coming soon.
This commit was SVN r23385.
2010-07-13 10:46:20 +00:00
Ralph Castain
570d19106b
Allow singletons to use ompi-server for rendezvous via pubsub as well as comm_spawn without starting their own local daemons
...
This commit was SVN r23384.
2010-07-13 06:33:07 +00:00
Ralph Castain
eee7541ae7
Don't overwrite a prior setting for create_session_dirs
...
This commit was SVN r23383.
2010-07-13 06:30:09 +00:00
Ralph Castain
8bb0c16c2f
Add new tag
...
This commit was SVN r23382.
2010-07-13 06:29:13 +00:00
Rolf vandeVaart
b4af9c0efc
Fix casts so trunk compiles
...
This commit was SVN r23381.
2010-07-13 01:52:22 +00:00
Ralph Castain
0b4081b162
The --output-filename option currently mixes the output from all processes with the same rank, but different jobids. Thus, the output from comm_spawn'd processes gets intermingled with their parents and each other.
...
Adding the jobid to the output filename solves the problem.
Thanks to Jody for pointing this out!
This commit was SVN r23380.
2010-07-12 21:43:49 +00:00
Ralph Castain
4a94ea53d3
Minor cleanup - if any jobid in the remote group is different from the local group, then flag disconnect
...
This commit was SVN r23379.
2010-07-12 21:39:56 +00:00
Ralph Castain
84d63a46cd
Remove a hard-coded limit of 64 independent jobs that could connect/accept together
...
This commit was SVN r23378.
2010-07-12 18:34:33 +00:00
Shiqing Fan
8de5654bf9
Add new files into the tarball.
...
This commit was SVN r23377.
2010-07-12 16:21:46 +00:00
Shiqing Fan
cdc7e0bec9
Mainly type casts.
...
Get rid of pthread and other unnecessary stuffs for Windows.
This commit was SVN r23376.
2010-07-12 16:17:56 +00:00
Shiqing Fan
74120b46c1
Need to check another ofed library.
...
This commit was SVN r23375.
2010-07-12 16:15:22 +00:00
Ralph Castain
4c5ea3d1ef
Allow the ALPS allocator to run if the BASIL_RESERVATION_ID has been set. Update show_help messages and comments to reflect this new scenario.
...
Thanks to Jerome Soumagne for the patch!
This commit was SVN r23374.
2010-07-12 15:42:25 +00:00
Shiqing Fan
e3be90ff22
Update CMake modules, adding initial support for openib.
...
This commit was SVN r23373.
2010-07-12 15:28:37 +00:00
Ralph Castain
2babebf9c3
Revert some of a prior commit - there is no need for another flag to indicate one-sided termination of the orteds.
...
This commit was SVN r23372.
2010-07-10 03:33:01 +00:00
Ralph Castain
da61b69b15
Ensure we don't incorrectly return a non-zero exit code when normally terminating a slurm job.
...
Slurm, of course, must always be different...
This commit was SVN r23371.
2010-07-09 19:14:10 +00:00
Ralph Castain
5d2233c950
Add some new tags
...
This commit was SVN r23369.
2010-07-09 17:51:28 +00:00
Ralph Castain
2193ace464
Ensure the debugger symbols are loaded into orterun prior to orte_init so they are found by attaching debuggers.
...
This commit was SVN r23368.
2010-07-09 17:51:00 +00:00
Rolf vandeVaart
cc9b768fdb
Fix typo and missing header - needed by Solaris
...
This commit was SVN r23367.
2010-07-09 14:33:13 +00:00
Jeff Squyres
8670551175
Fix some typos.
...
This commit was SVN r23366.
2010-07-09 00:11:25 +00:00
Jeff Squyres
1e702565b8
Move knem to the top of the list.
...
This commit was SVN r23364.
2010-07-08 18:29:03 +00:00
Shiqing Fan
c51c262e67
Relevant Windows fixes for r23360.
...
This commit was SVN r23363.
The following SVN revision numbers were found above:
r23360 --> open-mpi/ompi@31295e8dc2
2010-07-07 16:58:16 +00:00
Ralph Castain
62a8b73f1a
Correctly handle output sent to the group input channel
...
This commit was SVN r23362.
2010-07-07 14:17:48 +00:00
Jeff Squyres
87e17a41da
Ensure that the com_rules[] array entries are initialized to NULL in
...
case individual entries aren't used, but dynamic rules are enabled
(i.e., at least one or more of them are not NULL, meaning that they'll
all be assumed to be either NULL or a valid value).
This commit was SVN r23361.
2010-07-07 14:04:18 +00:00
Ralph Castain
31295e8dc2
As discussed on today's telecon, reorganize the debugger attachment code in orte to better support efforts within the tool community aimed at exploring alternative methods. Move the debugger attachment code from the orterun directory to a new debugger framework. Organize the existing standard support code into an "mpir" component. Organize the current extensions for co-spawning debugger daemons into a separate "mpirx" component.
...
Since the MPIR symbols are now included in the ORTE library, remove duplicate declarations in OMPI and replace them with extern references to their ORTE instantiations.
This commit was SVN r23360.
2010-07-06 23:35:42 +00:00
Jeff Squyres
a3aba8f2b7
Missed two places to rename libtrace -> libompitrace.
...
This commit was SVN r23359.
2010-07-06 22:26:08 +00:00
Jeff Squyres
e7c71582fe
Started to update the NEWS file for v1.5. Might need a few more
...
tweaks.
This commit was SVN r23358.
2010-07-06 21:58:57 +00:00
Jeff Squyres
1802325a39
Rename "libtrace" to be "libompitrace" so as not to conflict with an
...
already-existing "libtrace" on some BSD distros.
This commit was SVN r23357.
2010-07-06 21:48:15 +00:00
Jeff Squyres
3a87183b57
Fix the help message to show which values are legal for
...
--enable-contrib-no-build. Thanks to Kevin Buckley for pointing out
the issue.
This commit was SVN r23356.
2010-07-06 14:57:05 +00:00
Jeff Squyres
c8bb7537e7
Remove include/opal/sys/cache.h -- its only purpose in life was to
...
#define CACHE_LINE_SIZE to 128. This name has a conflict on NetBSD,
and it seems kinda odd to have a header file that ''only'' defines a
single value. Also, we'll soon be raising hwloc to be a first-class
item, so having this file around seemed kinda weird.
Therefore, I replaced CACHE_LINE_SIZE with opal_cache_line_size, an
int (in opal/runtime/opal_init.c and opal/runtime/opal.h) on the
rationale that we can fill this in at runtime with hwloc info (trunk
and v1.5/beyond, only). The only place we ''needed'' a compile-time
CACHE_LINE_SIZE was in the BTL SM (for struct padding), so I made a
new BTL_SM_ preprocessor macro with the old CACHE_LINE_SIZE value
(128). That use isn't suitable for run-time hwloc information,
anyway.
This commit was SVN r23349.
2010-07-06 14:33:36 +00:00
Jeff Squyres
6d77118254
Fixes for FT code that came from recent shared memory updates.
...
This commit was SVN r23348.
2010-07-06 12:58:48 +00:00
Ralph Castain
98e7bc94f5
This belongs more properly in the ORCM repo now that we have MCA capability over there
...
This commit was SVN r23346.
2010-07-05 18:25:03 +00:00
Ralph Castain
dac1b9976e
Make the default group channels bidirectional
...
This commit was SVN r23345.
2010-07-05 14:45:57 +00:00
Shiqing Fan
f4a5f7c7d6
Update ORTE source files.
...
This commit was SVN r23344.
2010-07-05 08:45:41 +00:00
Jeff Squyres
5bebdb97fa
Need these header files for NetBSD. Thanks for the heads-up from
...
Aleksej Saushev.
This commit was SVN r23343.
2010-07-02 17:38:57 +00:00
Jeff Squyres
8fef296b8a
Updates about thread support levels.
...
This commit was SVN r23341.
2010-07-02 13:14:09 +00:00
Jeff Squyres
af33e5cd1a
Fix typo.
...
This commit was SVN r23340.
2010-07-02 12:37:48 +00:00
Ralph Castain
ee4564c13b
Add some useful debug
...
This commit was SVN r23339.
2010-07-02 03:35:47 +00:00
Ralph Castain
b4422e012c
Fix a typo that breaks ompi_info if --enable-sensors
...
This commit was SVN r23338.
2010-07-02 02:38:55 +00:00
Jeff Squyres
10185343a7
Ensure that we're actually checking for *linux*. Thanks to Aleksej
...
Saushev for the patch.
This commit was SVN r23336.
2010-07-01 23:26:49 +00:00
Ralph Castain
f3d90dfb8d
Fully restore fault recovery, both at the individual process and daemon level.
...
NOTE: MPI fault recovery remains unavailable pending merge from Josh. This only covers ORTE-level processes.
This commit was SVN r23335.
2010-07-01 19:45:43 +00:00
Ralph Castain
7190415977
Fix JEFF's mistake - we cannot use orte_show_help if execv fails because we already closed all the file descriptors!
...
This commit was SVN r23334.
2010-07-01 19:41:26 +00:00
Ralph Castain
510ade9503
Do not use nodes that are flagged as down or do-not-use for this map. Modify error output to reflect possible reasons no nodes would be available
...
This commit was SVN r23333.
2010-07-01 19:39:31 +00:00
Ralph Castain
81a65f2c67
Define a new node state
...
This commit was SVN r23332.
2010-07-01 19:38:23 +00:00
Ralph Castain
f5548b8e0f
remove a potential locking conflict, and let emacs go ahead and reformat the function (sigh)
...
This commit was SVN r23331.
2010-07-01 19:37:53 +00:00
Ralph Castain
d463aec2f6
Don't try to send to dead daemons, keep accounting straight so we don't hang
...
This commit was SVN r23330.
2010-07-01 19:37:02 +00:00
Ralph Castain
dd85689560
Cleanup pointer array addressing
...
This commit was SVN r23329.
2010-07-01 19:33:10 +00:00
Ralph Castain
26fbae447e
Don't try to forward input when we already ordered shutdown. Check return codes on sends
...
This commit was SVN r23328.
2010-07-01 19:32:08 +00:00
Ralph Castain
628936a99f
Provide a convenience option to disable fault recovery (as opposed to setting three separate, long-named mca params)
...
This commit was SVN r23327.
2010-07-01 19:31:11 +00:00