Jeff Squyres
73bcc4a36b
Fix mistake that came in via the ompi-agen tree in r23764. The mistake wasn't part of the core autogen upgrade; it was an additional 'bonus' cleanup. Oops. The mistake will always create a set of directories under installdir, even if you do not --with-devel-headers. The set of directories will be empty, but still -- they should not be there at all. This commit fixes that -- the directories are not created at all if you do not --with-devel-headers
...
This commit was SVN r23801.
The following SVN revision numbers were found above:
r23764 --> open-mpi/ompi@40a2bfa238
2010-09-24 22:53:28 +00:00
Ralph Castain
40a2bfa238
WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues.
...
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.
Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.
This commit was SVN r23764.
2010-09-17 23:04:06 +00:00
Josh Hursey
ba7e94dd89
Some relatively minor C/R related cleanup
...
* Fix a configure warning for checking --enable-ft-thread
* In hnp and orted ErrMgr components check to see if other components have already recovered this process before trying to recover it again.
* Fix 'npernode' for restarting using the resilient rmaps component
* export ompi_info_set, so that internal functionality can use it.
This commit was SVN r23535.
2010-07-30 18:59:34 +00:00
Ralph Castain
ad5eaee4c6
Protect against NULL and provide additional resource check/error report
...
This commit was SVN r23432.
2010-07-19 18:33:32 +00:00
Ralph Castain
510ade9503
Do not use nodes that are flagged as down or do-not-use for this map. Modify error output to reflect possible reasons no nodes would be available
...
This commit was SVN r23333.
2010-07-01 19:39:31 +00:00
Ethan Mallove
57eee4d75c
* Can't put var declarations in the middle of code
...
* Use OBJ_RELEASE on data that was OBJ_NEW'd
* Limit single-line char width
* Use ORTE_ERR_BAD_PARAM on a rankfile typo, not ORTE_ERR_SILENT
* Add copyright
This commit was SVN r23196.
2010-05-21 15:30:38 +00:00
Ralph Castain
aaaeea6f17
Once again, fix the blasted rank_file mapper. I can't guarantee that I fixed it correctly, but at least now it compiles!
...
This commit was SVN r23190.
2010-05-21 09:46:42 +00:00
Ethan Mallove
e751f3c21c
Add a check for a duplicate rank assignment in the rankfile parser (Fixes trac:2414)
...
This commit was SVN r23186.
The following Trac tickets were found above:
Ticket 2414 --> https://svn.open-mpi.org/trac/ompi/ticket/2414
2010-05-20 18:38:03 +00:00
Abhishek Kulkarni
afbe3e99c6
* Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
...
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
back the native error code.
* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
(OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
decode 'ret' to get the native error code.
This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
Ralph Castain
871f445848
Ignore nodes that are "down" when generating maps
...
This commit was SVN r23119.
2010-05-12 18:08:40 +00:00
Ralph Castain
8da781af84
Continue developing support for distributed virtual machines - minor changes to ensure correct jobid gets used and that dvm's can communicate with tools
...
This commit was SVN r22958.
2010-04-12 22:33:09 +00:00
Ralph Castain
d3ed4e68b7
Utilize a non-used mapping policy bit to define a policy that uses only existing alive daemons to support virtual machines and restarting processes on already-active nodes
...
This commit was SVN r22951.
2010-04-10 05:02:47 +00:00
Ralph Castain
a1e82e9d05
Per discussion with Josh, cleanup the errmgr API by creating separate modules for the public vs internal APIs. This mirrors the architecture used in other frameworks that had similar requirements.
...
Remove the orcm errmgr module - moving to the orcm code base so it can utilize orcm communications and not interfere with ompi-related operations.
This commit was SVN r22931.
2010-04-05 22:59:21 +00:00
Ralph Castain
1caba7af2f
Fix a bunch of compiler warnings reported by Jeff
...
This commit was SVN r22930.
2010-04-03 00:20:19 +00:00
Ralph Castain
84c7973df8
Update the #procs in the job prior to assigning vpids for each app_context.
...
This commit was SVN r22929.
2010-04-03 00:03:35 +00:00
Ralph Castain
6b43b76f9d
Some updates required for generating a LAM-style virtual machine. Retain the local node if requested. Properly setup the daemon job map for a VM launch.
...
This commit was SVN r22928.
2010-04-03 00:03:01 +00:00
Josh Hursey
e4f2d03d28
ErrMgr Framework redesign to better support fault tolerance development activities.
...
Explained in more detail in the following RFC:
http://www.open-mpi.org/community/lists/devel/2010/03/7589.php
This commit was SVN r22872.
2010-03-23 21:28:02 +00:00
Ralph Castain
7ebf72b4aa
Trivial cleanup
...
This commit was SVN r22813.
2010-03-10 18:24:38 +00:00
Ralph Castain
7fd7b7a8cc
Fix the load_balance mapper so that it sets the #procs in the job before attempting to compute vpids
...
This commit was SVN r22812.
2010-03-10 17:52:19 +00:00
Ralph Castain
4355134991
Let the vm launcher specify the mapping policy
...
This commit was SVN r22797.
2010-03-08 19:13:21 +00:00
Ralph Castain
bfa39d7f7e
Update the seq mapper to support lists from -host. Reorg the dash_host code to provide an ordered list as required by the seq mapper
...
This commit was SVN r22795.
2010-03-08 09:54:49 +00:00
Ralph Castain
69fe5ca69b
Correctly compute bynode mapping, even in the presence of a $#$%#@^$ rankfile
...
This commit was SVN r22748.
2010-03-02 05:21:42 +00:00
Ralph Castain
5514d9c673
Fix the stupid rankfile mapper again, hopefully not breaking everything else to accommodate it. Looks like the round-robin mappers still work, at least...
...
This commit was SVN r22746.
2010-03-01 20:40:47 +00:00
Ralph Castain
359dc5cad3
Complete the app_idx change by cleaning up warnings in mappers
...
This commit was SVN r22728.
2010-02-27 18:14:27 +00:00
Josh Hursey
a3583b8f57
Fix --bynode option to remember for subsequent jobs where it left off last time.
...
Add a ''map_bynode'' info key to determine if the job to be started by comm_spawn* should be mapped by node or by slot. Default is to map according to the default policy set when the parent job was started.
cmr:v1.5.1
This commit was SVN r22564.
2010-02-05 15:37:49 +00:00
Shiqing Fan
bdc13dacb1
A type cast.
...
This commit was SVN r22520.
2010-01-31 20:22:22 +00:00
Ralph Castain
7badff9d2d
Okay to return no available nodes for mapping when launching daemons - just means there is nothing to do
...
This commit was SVN r22509.
2010-01-28 22:58:28 +00:00
Ralph Castain
f66b6cae23
Enable the boot of an orted "virtual machine". Modify the mapper framework to allow mapping of only daemons. Remove the cm ras module as no longer required. Modify the orted code to always send back node arch info. Remove the "--enable-bootstrap" configure option as this feature will now always be available.
...
This commit was SVN r22480.
2010-01-25 22:25:13 +00:00
Shiqing Fan
872a4047ba
Fix the bug that caused by ADD_DEPENDENCIES() from different version of CMake.
...
In CMake 2.6 and earlier, this function add dependencies for targets and also link the target libraries automatically, but in CMake 2.8,this behavior has been changed, i.e. it will only add the dependencies but no link, which will cause linking errors at compilation time.
This commit was SVN r22405.
2010-01-14 18:10:20 +00:00
Ralph Castain
cec840f6b9
The ability to add procs to a running job was unfortunately borked when we added the detection of a proc exiting before calling init. Re-enable it here, ensuring that procs that are being restarted and/or added to a job do -not- call barrier during orte_init.
...
This commit was SVN r22404.
2010-01-14 17:59:42 +00:00
Ralph Castain
5e031d9ded
Let a restarted process have access to all known nodes instead of only those already in its prior job map
...
This commit was SVN r22225.
2009-11-19 19:45:11 +00:00
Ralph Castain
f1f156d57b
Make rmaps base open function play nicely with ompi_info
...
This commit was SVN r22111.
2009-10-20 07:28:23 +00:00
Ralph Castain
d8d80d6f1a
Closes trac:2054. Check if a user specifies more cpus-per-rank than there are cpus in a socket - if so, politely tell them "you are stupid" and abort.
...
This commit was SVN r22091.
The following Trac tickets were found above:
Ticket 2054 --> https://svn.open-mpi.org/trac/ompi/ticket/2054
2009-10-13 04:19:07 +00:00
Ralph Castain
1475d34c13
Ensure we default to byslot mapping
...
This commit was SVN r22090.
2009-10-11 23:50:42 +00:00
Ralph Castain
40e2299fa7
Test to ensure that num_procs was provided for the resilient mapper - it cannot be used with options like npernode.
...
Cleanup the show_help text file
This commit was SVN r22082.
2009-10-09 15:26:23 +00:00
Ralph Castain
dcab61ad83
Restore the prior default rank assignment scheme for round-robin mappers. Ensure that each app_context has sequential vpids.
...
This commit was SVN r22048.
2009-10-02 03:16:18 +00:00
Ralph Castain
a15c58c583
Fix the proc assignment into the job data object during assignment of vpids as comm_spawned procs were being overwritten by their parents with the same vpid.
...
Add a little debug output when updating proc state
This commit was SVN r22042.
2009-10-01 13:44:34 +00:00
Ralph Castain
51f64aaf96
Add a new ras module to support bootstrap operations. Additional functionality may eventually be required in the component, but for now all it does is provide a mechanism for ensuring that other allocations don't confuse the system.
...
Only active if specifically directed to use it
This commit was SVN r22040.
2009-09-30 23:30:24 +00:00
Ralph Castain
dff0d01673
Yet another paffinity cleanup...sigh.
...
1. ensure that orte_rmaps_base_schedule_policy does not override cmd line settings
2. when you try to bind to more cores than we have, generate a not-enough-processors error message
3. allow npersocket -bind-to-core combination - because, yes, somebody actually wants to do it.
This commit was SVN r21996.
2009-09-22 18:44:53 +00:00
Ralph Castain
8da3aa8d5c
Some (hopefully final!) adjustments and corrections to the paffinity support:
...
1. default -npersocket to force -bind-to-socket
2. if we cannot get a value for cores/socket, try using #logical cpus. otherwise, default to 1 core
3. add missing error message for not-enough-processors
4. since we no longer loop through orte_register_params twice, put the auto-detect of
topology info in the rte_init for hnp and std_orted
5. fix bind-to-core, bysocket combination
This commit was SVN r21992.
2009-09-22 15:41:03 +00:00
Ralph Castain
98a4450df6
Fix the seq mapper by initializing the proc object to NULL before claiming a slot for it
...
This commit was SVN r21969.
2009-09-17 05:18:37 +00:00
Ralph Castain
142036f2c0
Issue an error message and abort if the user requests a number of processes that conflicts with nperxxx directives when evaluated against available resources
...
This commit was SVN r21949.
2009-09-07 03:36:10 +00:00
Jeff Squyres
e1fe03ad44
Minor grammar fixes, and use "#" for separating lines, not blank lines.
...
This commit was SVN r21931.
2009-09-03 07:02:21 +00:00
Ralph Castain
0421a49844
Update the xml support to allow -xml-file foo whereby we redirect all xml formatted output (and ONLY xml formatted output) to a specified file
...
This commit was SVN r21930.
2009-09-02 18:03:10 +00:00
Lenny Verkhovsky
2a594fec6c
added help message to rankfile mapper when failed if using alias instead of full hostname
...
This commit was SVN r21919.
2009-09-01 11:17:32 +00:00
Ralph Castain
0394a4884d
Setup cpus-per-proc and cpus-per-rank as synonyms, both in mca params and on mpirun cmd line
...
This commit was SVN r21914.
2009-08-30 14:30:36 +00:00
Ralph Castain
2d27bc9824
Default npersocket to bind-to-socket unless otherwise directed
...
This commit was SVN r21904.
2009-08-27 13:21:14 +00:00
Ralph Castain
5e710928a5
Revise the new binding system slightly:
...
1. finalize the logic for properly respecting externally assigned bindings. Thanks to Chris Samuel for his help with this. Still needs some acid testing, but appears to now work.
2. remove the double-logic of requiring opal_paffinity_alone AND bind-to-foo. If the user specifies bind-to-foo, trust her and just do it.
This commit was SVN r21885.
2009-08-26 02:01:49 +00:00
Ralph Castain
2016a3180b
Silence compiler warnings about uninitialized variables
...
This commit was SVN r21883.
2009-08-26 01:56:39 +00:00
Ralph Castain
9ad33a4688
Silence compiler warning about uninitialized variable
...
This commit was SVN r21882.
2009-08-26 01:56:11 +00:00