Ralph Castain
bd045468e5
Let apps use the ess cm module too...
...
This commit was SVN r23246.
2010-06-07 14:16:34 +00:00
Abhishek Kulkarni
afbe3e99c6
* Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
...
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
back the native error code.
* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
(OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
decode 'ret' to get the native error code.
This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
Ralph Castain
4308922f59
Ensure that any application-specific selection of ess module doesn't get overridden by what is given to the orted or orterun
...
Cleanup tool name determination for CM
This commit was SVN r22980.
2010-04-15 18:10:50 +00:00
Ralph Castain
8da781af84
Continue developing support for distributed virtual machines - minor changes to ensure correct jobid gets used and that dvm's can communicate with tools
...
This commit was SVN r22958.
2010-04-12 22:33:09 +00:00
Ralph Castain
75e99e6118
Do a better job of selecting cm ess component, handle tool and daemon issues
...
This commit was SVN r22942.
2010-04-07 18:59:21 +00:00
Ralph Castain
8e29a6858a
Properly handle the case when a daemon is given both parts of its name
...
This commit was SVN r22935.
2010-04-06 22:41:18 +00:00
Josh Hursey
e4f2d03d28
ErrMgr Framework redesign to better support fault tolerance development activities.
...
Explained in more detail in the following RFC:
http://www.open-mpi.org/community/lists/devel/2010/03/7589.php
This commit was SVN r22872.
2010-03-23 21:28:02 +00:00
Ralph Castain
0b9552cd4e
Expand the ESS framework's API to include a new function "query_sys_info" that allows the caller to retrieve key-value pairs of info on the local system capabilities (e.g., cpu type/model). Have each daemon and the HNP "sense" that information and provide it to their local procs to avoid having every proc querying the system directly.
...
This commit was SVN r22870.
2010-03-23 20:47:41 +00:00
Ralph Castain
d49f93b743
Cleanup the initialization handshake for multicast apps
...
This commit was SVN r22855.
2010-03-19 20:15:01 +00:00
Ralph Castain
abbdc2b527
Pass the job family to tools that need to connect to specific HNPs
...
This commit was SVN r22853.
2010-03-19 04:01:33 +00:00
Ralph Castain
9a5fdbb622
Continue development of reliable multicast
...
This commit was SVN r22616.
2010-02-14 19:20:56 +00:00
Ralph Castain
16b7bc7a82
Sigh...get the order right to match unpack
...
This commit was SVN r22539.
2010-02-03 15:50:43 +00:00
Ralph Castain
e88627a7ca
Ensure we don't go through rml open/select more than once.
...
Open the rml to get the uri when bootstrapping daemons
This commit was SVN r22538.
2010-02-03 15:38:32 +00:00
Ralph Castain
cb1007b5a9
Pass back the number of daemons in the system
...
This commit was SVN r22537.
2010-02-03 14:31:16 +00:00
Ralph Castain
f66b6cae23
Enable the boot of an orted "virtual machine". Modify the mapper framework to allow mapping of only daemons. Remove the cm ras module as no longer required. Modify the orted code to always send back node arch info. Remove the "--enable-bootstrap" configure option as this feature will now always be available.
...
This commit was SVN r22480.
2010-01-25 22:25:13 +00:00
Ralph Castain
b35486d945
The CM ess module needs to open the sysinfo framework and select modules prior to when others need it. Thus, setup a flag to avoid multiple open/select within that framework.
...
This commit was SVN r22393.
2010-01-12 22:03:49 +00:00
Ralph Castain
9acec283af
Add a new TCP module to the reliable multicast framework. This module uses ORTE's grpcomm.xcast functionality to "fake" multicasts for environments where regular multicast isn't reliable.
...
Modify the startup logic to allow for this use-case.
This commit was SVN r22310.
2009-12-15 01:18:27 +00:00
George Bosilca
501d1cc4ad
Set default values to avoid using these variables uninitialized.
...
This commit was SVN r22279.
2009-12-08 18:42:22 +00:00
Ralph Castain
4a82dd9a45
Add message sequence numbers to multicast messages, tracked by channel
...
This commit was SVN r22262.
2009-12-04 04:17:44 +00:00
Ralph Castain
a0d5c80ce0
Add a new framework for discovering local resource information such as cpu type/model, #cpus, available physical memory, etc. Two initial components (darwin and linux) are provided. This is needed to support bootstrap operations where daemons are started at node boot, and applications where initial knowledge of cpu identification is needed to guide framework component selection.
...
Add orte configuration option to control the use of the framework in the system. Although the code will build, it will not be active unless configured with --enable-bootstrap.
If bootstrap is enabled and the new opal_sysinfo framework can successfully determine the cpu model, pass that info to the application as an MCA param to support some work at Sun.
Also, have daemons report back the resources they find to guide process mapping in bootstrap operations (i.e., where the daemon starts at node boot as opposed to being launched at application start).
Adjust some platform files to enable these capabilities.
This commit was SVN r22244.
2009-11-30 23:11:25 +00:00
Ralph Castain
8dc08e304f
No longer require name passed separately
...
This commit was SVN r22221.
2009-11-19 19:41:41 +00:00
Ralph Castain
840766a894
Update the rmcast APIs to include tag params and reorder them to look like their rml cousins
...
This commit was SVN r22218.
2009-11-17 15:58:59 +00:00
Ralph Castain
6496ce7212
Expand the reliable multicast APIs to support sending/recving of iovecs
...
This commit was SVN r22213.
2009-11-11 22:10:35 +00:00
Ralph Castain
e501589b3b
Cleanup the bootstrap procedure for multiple daemons starting up
...
This commit was SVN r22094.
2009-10-13 15:14:54 +00:00
Ralph Castain
84cc847be8
Next phase of auto-wireup using multicast. Enable use of multicast groups to separate comm from different application groups. Have the orted bootstrap message go to a different rml tag so the node can be added to the pool.
...
This commit was SVN r22083.
2009-10-10 01:19:56 +00:00
Ralph Castain
1d7ab97c84
Update the multicast framework to allow specification of different message scopes per various RFCs. Redefine the API a little to utilize channel numbers without worrying about the specifics of their addressing
...
This commit was SVN r22037.
2009-09-30 14:40:43 +00:00
Ralph Castain
709b36efb4
Cleanup auto-wireup and enable tools to "discover" the HNP via multicast
...
This commit was SVN r22012.
2009-09-25 01:00:09 +00:00
Ralph Castain
3167f0a0a0
Complete the next round of the multicast framework development. Needs further polish, upgrade to handle message fragmentation - but good enough for auto-bootstrap of orteds.
...
Teach the ess cm module to bootstrap orted launch
This commit was SVN r22006.
2009-09-23 20:57:49 +00:00
Ralph Castain
12613352eb
Add missing header file
...
This commit was SVN r21990.
2009-09-22 13:07:57 +00:00
Ralph Castain
2210989e2d
Update the cm ess module to support orted bootstrap. Continue work towards bootstrap capability.
...
This commit was SVN r21989.
2009-09-22 02:16:40 +00:00
Ralph Castain
60edbc7220
Fix hetero operations and comm_spawn (to a point).
...
Remove all architecture references from ORTE and put them back in the modex using modex_send/recv calls.
Hetero operations are now fully supported again. Comm_spawn now works up to the point where it segfaults due to an error in the CID code - which now allows Edgar to dig further! :-)
This commit was SVN r21655.
2009-07-13 20:03:41 +00:00
George Bosilca
6a00481285
We know what a daemon is there is no need to dig into the nidmap to find it out.
...
This commit was SVN r21503.
2009-06-23 20:43:45 +00:00
Ralph Castain
0a67bcb653
Minor cleanups
...
This commit was SVN r21387.
2009-06-06 15:44:00 +00:00
Ralph Castain
137104b786
Initial support for CMs - needs to be pruned as CM support develops
...
This commit was SVN r21335.
2009-05-30 20:57:23 +00:00