1
1
openmpi/ISSUES
Brian Barrett 30af9a7b90 * More changes from the tim branch. Still has problems with ABORTed procs,
but now tells you when it can't find orted.  Also includes memory leak
  plugs, bproc fixes, and gm repairs.

This commit was SVN r4937.
2005-03-18 23:58:36 +00:00

65 строки
2.0 KiB
Plaintext

Undecided timing:
-----------------
- if an MPI process fails (e.g., it seg faults), it causes orterun to
hang. This is with the rsh pls.
--> Looks like the problem is with what happens when you set the
state of the process in the soh to ORTE_PROC_STATE_ABORTED.
--> Ralph is looking at this
- if the daemon is not found or fails to start, orterun will hang. No
indication is given to the users that something went wrong.
--> Brian thinks he fixed this, but since he sets the state to
ORTE_PROC_STATE_ABORTED, it won't be really clear until the
above issue is fixed. But it at least tells you what went
wrong.
- $prefix/etc/hosts vs. $prefix/etc/openmpi-default-hostfile
--> Brian temporarily added symlink in $prefix/etc/ for
openmpi-default-hostfile -> hosts if there isn't already
a hosts file so that he doesn't have to create one every
time he does "rm -rf $prefix && make install". Will file
bug so that this can be fixed (and will fix in the trunk)
Pre-milestone:
--------------
- singleton mpi doesn't work
- Ralph: Populate orte_finalize()
Post-milestone:
---------------
- ras_base_alloc: doesn't allow for oversubscribing like this:
eddie: cpu=2
vogon: cpu=2 max-slots=4
mpirun -np 6 uptime
It barfs because it tries to evenly divide the remaining unallocated
procs across all nodes (i.e., 1 each on eddie/vogon) rather than
seeing that vogon can take the remaining 2.
- Jeff: TM needs to be re-written to use daemons (can't hog TM
connection forever)
- Jeff: make the mapper be able to handle app->map_data
- Jeff: add function callback in cmd_line_t stuff
- Jeff: does cmd_line_t need to *get* MCA params if a command line
param is not taken but an MCA param is available?
- consider empty string problem...
- ?: Friendlier error messages (e.g., if no nodes -- need something
meaningful to tell the user)
- Ralph: compare and set function in GPR
- Jeff: collapse MCA params from 3 names to 1 name
- ?: Apply LANL copyright to trunk (post all merging activity)