1
1

10 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
bbb60e37c0 Remove stale define
This commit was SVN r14913.
2007-06-06 17:19:43 +00:00
Ralph Castain
d9acc93efa Compute and pass the local_rank and local number of procs (in that proc's job) on the node.
To be precise, given this hypothetical launching pattern:

host1: vpids 0, 2, 4, 6
host2: vpids 1, 3, 5, 7

The local_rank for these procs would be:

host1: vpids 0->local_rank 0, v2->lr1, v4->lr2, v6->lr3
host2: vpids 1->local_rank 0, v3->lr1, v5->lr2, v7->lr3

and the number of local procs on each node would be four. If vpid=0 then does a comm_spawn of one process on host1, the values of the parent job would remain unchanged. The local_rank of the child process would be 0 and its num_local_procs would be 1 since it is in a separate jobid.

I have verified this functionality for the rsh case - need to verify that slurm and other cases also get the right values. Some consolidation of common code is probably going to occur in the SDS components to make this simpler and more maintainable in the future.

This commit was SVN r14706.
2007-05-21 14:30:10 +00:00
Ralph Castain
ef71055cf8 Teach the odls to properly test for and report failed-to-start for application processes.
Test for system limits (where known) prior to doing things like fork and pipe since some systems aren't very nice about it when we try to exceed such limits.

This commit was SVN r14494.
2007-04-24 18:54:45 +00:00
Sven Stork
a86deb460e - export required symbols
This commit was SVN r13810.
2007-02-27 09:43:32 +00:00
Ralph Castain
4e50cdae52 This commit accomplishes two things:
1. Fix the "hang" condition when an application isn't found. It turned out that the ODLS had some difficulty with the process actually not having been started - hence, it never called the waitpid callback. As a result, the "terminated" trigger didn't fire, and so mpirun didn't wake up. With this change, the HNP's errmgr forces the issue by causing the trigger to fire itself when an abort condition occurs.

2. Shift the recording of the pid and the nodename from mpi_init to the orted launcher. This allows programs such as Eclipse PTP to get the pids even for non-MPI applications. In the case of bproc, the pls handles this chore since we don't use orteds in that system.

This commit was SVN r12558.
2006-11-11 04:03:45 +00:00
George Bosilca
7982a23bde ORTE_DECLSPEC should be ...
This commit was SVN r12206.
2006-10-20 02:23:54 +00:00
Ralph Castain
ab196c3121 Okay, this fixes the problem of MCA params spreading too far. Sorry for the multiple corrections.
This commit was SVN r12201.
2006-10-19 22:51:02 +00:00
George Bosilca
b7579b09c7 Correctly handle the ORTE_DECLSPEC attribute.
This commit was SVN r12041.
2006-10-06 07:08:17 +00:00
George Bosilca
fd76e56279 One protection against C++ compilers is more than enough.
This commit was SVN r11998.
2006-10-05 05:24:43 +00:00
Ralph Castain
37dfdb76eb Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done.
This commit was SVN r11661.
2006-09-14 21:29:51 +00:00