Jeff Squyres
ecd603256a
* Rename opal_hwloc_components to opal_hwloc_base_components
...
* Fix some comments
This commit was SVN r25150.
2011-09-17 11:54:36 +00:00
Ralph Castain
1cd7b02df3
Add a set of default errmgr components that support solely the default "everything dies on error" behavior. Set their priority to be selected by default, but provide params to adjust those priorities to allow other component selection.
...
This commit was SVN r25139.
2011-09-13 22:03:45 +00:00
Ralph Castain
3c4f04f4d9
Ensure opal_hwloc_topology is NULL after being destroyed
...
This commit was SVN r25138.
2011-09-13 19:21:10 +00:00
Nathan Hjelm
079ccdf8b1
fix debugger co-location launching
...
This commit was SVN r25136.
2011-09-13 15:08:03 +00:00
Ralph Castain
ca7638553f
Remove stale code
...
This commit was SVN r25133.
2011-09-12 23:00:41 +00:00
Ralph Castain
556a05566e
Silence warning
...
This commit was SVN r25130.
2011-09-12 16:21:51 +00:00
Shiqing Fan
0aea775837
Set the compiler flags in a better way.
...
This commit was SVN r25125.
2011-09-12 08:24:27 +00:00
Ralph Castain
92c7372e20
Per the RFC from Jeff, move hwloc from opal/mca/common to its own static framework ala libevent. Have ORTE daemons collect the topology info at startup and, if --enable-hwloc-xml is set, send that info back to the HNP for later use. The HNP only retains unique topology "templates" to reduce memory footprint. Have the daemon include the local topology info in the nidmap buffer sent to each app so the apps don't all hammer the local system to discover it for themselves.
...
Remove the sysinfo framework as hwloc replaces that functionality.
This commit was SVN r25124.
2011-09-11 19:02:24 +00:00
Ralph Castain
2091e39bee
Record the file descriptor on the read event when building optimized
...
This commit was SVN r25123.
2011-09-11 18:57:14 +00:00
Rainer Keller
9d5afc58c6
- Fix breakage of the epoch changes with PGI:
...
Don't juse include pre-processor macros between two strins ("s1" #if 0 ... "s2")...
Rather print out the epoch as 0 always...
This commit was SVN r25110.
2011-08-31 08:40:31 +00:00
Wesley Bland
f8740e5478
Correct a typo reported by Pasha.
...
This commit was SVN r25109.
2011-08-30 18:44:52 +00:00
Ralph Castain
03ddf8520b
Resolve not-used warnings
...
This commit was SVN r25101.
2011-08-27 14:27:15 +00:00
Ralph Castain
56ebfa23cc
ORTE configure options belong in orte/config, not opal.
...
This commit was SVN r25100.
2011-08-27 14:23:49 +00:00
Wesley Bland
f542ecd578
Fix a couple of problems with the resil code not compiling.
...
This commit was SVN r25099.
2011-08-27 03:21:00 +00:00
George Bosilca
a4245b8d63
Remove some warnings related to the resilience patch.
...
This commit was SVN r25097.
2011-08-27 00:15:34 +00:00
George Bosilca
67ec5a0556
While doing cleanup remove pending warnings.
...
This commit was SVN r25095.
2011-08-26 23:38:06 +00:00
George Bosilca
fc1184c41f
Remove a warnings introduced by the epoch commit.
...
This commit was SVN r25094.
2011-08-26 23:36:52 +00:00
Wesley Bland
4e7ff0bd5e
By popular demand the epoch code is now disabled by default.
...
To enable the epochs and the resilient orte code, use the configure flag:
--enable-resilient-orte
This will define both:
ORTE_ENABLE_EPOCH
ORTE_RESIL_ORTE
This commit was SVN r25093.
2011-08-26 22:16:14 +00:00
Ralph Castain
1c08a4006c
Refactor some code to remove a few API handles from errmgr. Reviewed/tested by Wes.
...
This commit was SVN r25064.
2011-08-18 16:24:45 +00:00
Ralph Castain
e58623cd5b
Bring alps back to full operations by correctly computing daemon names. Unfortunately, alps doesn't assign cnos rank in node-based order - i.e., cnos rank=0 isn't necessarily on the first node of the execution. So adjust when using static ports.
...
Add some debug to nidmap
Ensure that the HNP's node name is not included in the regex when launching via rshbase as that node is automatically included in the daemon map.
This commit was SVN r25063.
2011-08-18 14:59:18 +00:00
Wesley Bland
a2a20c3766
I believe this should fix the race condition that Terry is seeing in the MTT
...
tests. It appears that nothing in the errmgr was using the mutexes to protect
the odls child list.
This commit was SVN r25062.
2011-08-18 14:52:30 +00:00
Shiqing Fan
6d0ab9bd6c
One library was missing for linking orterun on Windows.
...
This commit was SVN r25057.
2011-08-18 09:33:41 +00:00
Ralph Castain
23f47295a8
Add even more debug
...
This commit was SVN r25053.
2011-08-16 16:41:33 +00:00
Ralph Castain
d624d43f69
Add more debug
...
This commit was SVN r25052.
2011-08-16 15:47:37 +00:00
Ralph Castain
3d96497581
Add debug
...
This commit was SVN r25050.
2011-08-16 12:22:05 +00:00
Shiqing Fan
7292ee2387
One .windows file is missing in the tarball.
...
This commit was SVN r25049.
2011-08-15 10:21:25 +00:00
Shiqing Fan
3af7c9f7bb
Complete the MinGW build support on Windows.
...
This commit was SVN r25048.
2011-08-15 09:47:23 +00:00
Shiqing Fan
627f1dd351
Correct several export declarations.
...
This commit was SVN r25047.
2011-08-15 09:45:51 +00:00
Ralph Castain
ca3d29a1e6
Extend regex support to a bigger audience
...
This commit was SVN r25046.
2011-08-12 21:02:48 +00:00
Ralph Castain
ea4e2c2db4
Unused variables
...
This commit was SVN r25045.
2011-08-12 21:02:09 +00:00
Jeff Squyres
1cbfb53801
r24976 wasn't quite right -- you now actually get a warning if you
...
specify btl_tcp_if_include because btl_tcp_if_exclude is defaulted to
the loopback devices.
This commit does a few things:
* Introduce a new OPAL MCA base function:
mca_base_param_check_exclusive_string(). It checks to see that the
''user'' does not set two MCA parameters that are mutually
exclusive by checking the source of those MCS param values.
* Use the above function in many BTLs (and the OOB TCP) to ensure
that <foo>_if_include and <foo>_if_exclude are not both specified
''by the user''.
* Re-arrange many of these BTLs to move their MCA registration code
into a separate component_register() function (vs. the
component_open() function).
This code has been nominally reviewed and checked by Ralph, George,
Terry, and Shiqing.
This commit was SVN r25043.
The following SVN revision numbers were found above:
r24976 --> open-mpi/ompi@8f4ac54336
2011-08-10 17:24:36 +00:00
Ralph Castain
b360c98afd
Per request from Pasha, revert r25004 - but modified a touch to reflect fact that opal_argv_append copies the provided string, so we don't need to print it and then free it.
...
This commit was SVN r25037.
The following SVN revision numbers were found above:
r25004 --> open-mpi/ompi@2418831bea
2011-08-09 22:42:27 +00:00
Nathan Hjelm
aa3d302a05
use persistent rml_recv in iof
...
This commit was SVN r25035.
2011-08-09 21:30:12 +00:00
Ralph Castain
f1951e7ccd
If we are abnormally terminating, then don't wait for orteds to report back. Send them a "halt_vm" command, which instructs them to kill their local procs and immediately terminate, doing their best to cleanup on the way out.
...
Also do a little cleanup on debug output in rshbase.
This commit was SVN r25033.
2011-08-09 17:42:19 +00:00
Wesley Bland
67feeb6aca
Move the errmgr code back. This shouldn't cause the svn problems that I
...
apparently caused last time. Sorry about that. This one will just be a big
changelog.
This commit was SVN r25016.
2011-08-08 16:01:08 +00:00
Wesley Bland
09274cd047
Make sure that the epoch is initialized everywhere so we don't get weird output
...
during valgrind. This shouldn't have caused any problems with any actual
execution. Just extra warnings in valgrind.
This commit was SVN r25015.
2011-08-08 15:11:55 +00:00
Ralph Castain
8014e3429e
Don't double-count procs as they are launched
...
This commit was SVN r25011.
2011-08-08 06:05:23 +00:00
Ralph Castain
7b9f958dcf
Add some missing error strings. Update test to show silent errors
...
This commit was SVN r25010.
2011-08-08 04:21:02 +00:00
Ralph Castain
4083dc617f
Fix computation of number of required files and file descriptors - it only depends on the total number of local procs, not on the number of procs in the entire job!
...
This commit was SVN r25008.
2011-08-08 04:09:40 +00:00
Ralph Castain
590ac70e88
Add a simple test program for error string output
...
This commit was SVN r25007.
2011-08-07 21:32:25 +00:00
Ralph Castain
8b3c562b84
Adjust verbosity levels to make it easier to debug at scale
...
This commit was SVN r25006.
2011-08-07 21:14:21 +00:00
Ralph Castain
2418831bea
Pass the nodelist to the aprun command even when using all nodes
...
This commit was SVN r25004.
2011-08-06 04:19:41 +00:00
Ralph Castain
bd8e43a2de
Correct debug output so it doesn't falsely report the module
...
This commit was SVN r25003.
2011-08-05 20:30:34 +00:00
Ralph Castain
d603c79ab4
Fix the FAILED_TO_START scenario so orted doesn't segfault
...
This commit was SVN r25002.
2011-08-05 20:29:50 +00:00
Ralph Castain
c86bfb4e90
Need to copy the string
...
This commit was SVN r25001.
2011-08-05 19:03:28 +00:00
Ralph Castain
7b307d5bf0
Cleanup handling of all-numerical node names
...
This commit was SVN r25000.
2011-08-05 14:59:14 +00:00
Ralph Castain
157bad5435
If we can't compress the name, that's fine - but still have to move to next posn
...
This commit was SVN r24999.
2011-08-05 14:43:36 +00:00
Ralph Castain
3199663613
Correctly handle the case of mixes of character-based names and all-number names
...
This commit was SVN r24998.
2011-08-05 14:37:36 +00:00
Ralph Castain
066022126e
Sort the nodes to be in numerically increasing order so the regex has a chance of working right.
...
This commit was SVN r24993.
2011-08-05 03:37:13 +00:00
Ralph Castain
5a634caad9
Cleanly handle the case where the node "name" is just a number, and avoid the N-N output when the number is not part of a sequence.
...
This commit was SVN r24992.
2011-08-05 03:36:30 +00:00