1
1
Граф коммитов

6 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
7c7304466c Add a binomial tree-based launch to ssh, turned "on" only when the plm_rsh_tree_spawned mca param is set to a non-zero value. This probably isn't a very optimized capability, but it does execute a tree-based launch that may scale better than linear at high node counts.
Add the daemon map capability to the ODLS to create and save a map of daemon vpid vs nodename from the launch message.

Cleanup a few places in the base plm launch support where we didn't adequately protect rml recv's from potentially executing sends.

This commit was SVN r18143.
2008-04-14 18:26:08 +00:00
Ralph Castain
57e3e86cda Use the proper exit code for mpirun to indicate an error when something goes wrong during launch (in scenarios where the procs don't report the problem directly themselves)
This commit was SVN r18121.
2008-04-10 09:15:08 +00:00
George Bosilca
2ed6ed37bd Don't forget to cleanup once we're done.
This commit was SVN r17965.
2008-03-25 22:42:24 +00:00
Ralph Castain
edb8e32a7a Add default hostfile parameter plus --default-hostfile command line option.
Fix error message when job setup failed

This commit was SVN r17724.
2008-03-05 04:54:57 +00:00
Ralph Castain
5e6928d710 Cleanup recursions in ORTE caused by processing recv'd messages that can cause the system to take action resulting in receipt of another message.
Basically, the method employed here is to have a recv create a zero-time timer event that causes the event library to execute a function that processes the message once the recv returns. Thus, any action taken as a result of processing the message occur outside of a recv.

Created two new macros to assist:

ORTE_MESSAGE_EVENT: creates the zero-time event, passing info in a new orte_message_event_t object

ORTE_PROGRESSED_WAIT: while waiting for specified conditions, just calls progress so messages can be recv'd.

Also fixed the failed_launch function as we no longer block in the orted callback function. Updated the error messages to reflect revision. No change in API to this function, but PLM "owners" may want to check their internal error messages to avoid duplication and excessive output.

This has been tested on Mac, TM, and SLURM.

This commit was SVN r17647.
2008-02-28 19:58:32 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00