aa70a35fea
This commit was SVN r4978. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r4977
68 строки
2.1 KiB
Plaintext
68 строки
2.1 KiB
Plaintext
Undecided timing:
|
|
-----------------
|
|
|
|
- if an MPI process fails (e.g., it seg faults), it causes orterun to
|
|
hang. This is with the rsh pls.
|
|
--> Looks like the problem is with what happens when you set the
|
|
state of the process in the soh to ORTE_PROC_STATE_ABORTED.
|
|
--> Ralph is looking at this
|
|
|
|
- if the daemon is not found or fails to start, orterun will hang. No
|
|
indication is given to the users that something went wrong.
|
|
--> Brian thinks he fixed this, but since he sets the state to
|
|
ORTE_PROC_STATE_ABORTED, it won't be really clear until the
|
|
above issue is fixed. But it at least tells you what went
|
|
wrong.
|
|
|
|
- $prefix/etc/hosts vs. $prefix/etc/openmpi-default-hostfile
|
|
--> Brian temporarily added symlink in $prefix/etc/ for
|
|
openmpi-default-hostfile -> hosts if there isn't already
|
|
a hosts file so that he doesn't have to create one every
|
|
time he does "rm -rf $prefix && make install". Will file
|
|
bug so that this can be fixed (and will fix in the trunk)
|
|
|
|
|
|
Pre-milestone:
|
|
--------------
|
|
|
|
- singleton mpi doesn't work
|
|
|
|
- Ralph: Populate orte_finalize()
|
|
|
|
|
|
Post-milestone:
|
|
---------------
|
|
|
|
- ras_base_alloc: doesn't allow for oversubscribing like this:
|
|
|
|
eddie: cpu=2
|
|
vogon: cpu=2 max-slots=4
|
|
mpirun -np 6 uptime
|
|
|
|
It barfs because it tries to evenly divide the remaining unallocated
|
|
procs across all nodes (i.e., 1 each on eddie/vogon) rather than
|
|
seeing that vogon can take the remaining 2.
|
|
|
|
- Jeff: TM needs to be re-written to use daemons (can't hog TM
|
|
connection forever)
|
|
|
|
- Jeff: make the mapper be able to handle app->map_data
|
|
|
|
- Jeff: add function callback in cmd_line_t stuff
|
|
|
|
- Jeff: does cmd_line_t need to *get* MCA params if a command line
|
|
param is not taken but an MCA param is available?
|
|
- consider empty string problem...
|
|
|
|
- ?: Friendlier error messages (e.g., if no nodes -- need something
|
|
meaningful to tell the user)
|
|
|
|
- Ralph: compare and set function in GPR
|
|
|
|
- Jeff: collapse MCA params from 3 names to 1 name
|
|
|
|
- ?: Apply LANL copyright to trunk (post all merging activity)
|
|
|
|
- Probably during/after OMPI/ORTE split:
|
|
- re-merge [orte|ompi]_pointer_array and [orte|ompi]_value_array
|