![Brian Barrett](/assets/img/avatar_default.png)
but now tells you when it can't find orted. Also includes memory leak plugs, bproc fixes, and gm repairs. This commit was SVN r4937.
65 строки
2.0 KiB
Plaintext
65 строки
2.0 KiB
Plaintext
Undecided timing:
|
|
-----------------
|
|
|
|
- if an MPI process fails (e.g., it seg faults), it causes orterun to
|
|
hang. This is with the rsh pls.
|
|
--> Looks like the problem is with what happens when you set the
|
|
state of the process in the soh to ORTE_PROC_STATE_ABORTED.
|
|
--> Ralph is looking at this
|
|
|
|
- if the daemon is not found or fails to start, orterun will hang. No
|
|
indication is given to the users that something went wrong.
|
|
--> Brian thinks he fixed this, but since he sets the state to
|
|
ORTE_PROC_STATE_ABORTED, it won't be really clear until the
|
|
above issue is fixed. But it at least tells you what went
|
|
wrong.
|
|
|
|
- $prefix/etc/hosts vs. $prefix/etc/openmpi-default-hostfile
|
|
--> Brian temporarily added symlink in $prefix/etc/ for
|
|
openmpi-default-hostfile -> hosts if there isn't already
|
|
a hosts file so that he doesn't have to create one every
|
|
time he does "rm -rf $prefix && make install". Will file
|
|
bug so that this can be fixed (and will fix in the trunk)
|
|
|
|
|
|
Pre-milestone:
|
|
--------------
|
|
|
|
- singleton mpi doesn't work
|
|
|
|
- Ralph: Populate orte_finalize()
|
|
|
|
|
|
Post-milestone:
|
|
---------------
|
|
|
|
- ras_base_alloc: doesn't allow for oversubscribing like this:
|
|
|
|
eddie: cpu=2
|
|
vogon: cpu=2 max-slots=4
|
|
mpirun -np 6 uptime
|
|
|
|
It barfs because it tries to evenly divide the remaining unallocated
|
|
procs across all nodes (i.e., 1 each on eddie/vogon) rather than
|
|
seeing that vogon can take the remaining 2.
|
|
|
|
- Jeff: TM needs to be re-written to use daemons (can't hog TM
|
|
connection forever)
|
|
|
|
- Jeff: make the mapper be able to handle app->map_data
|
|
|
|
- Jeff: add function callback in cmd_line_t stuff
|
|
|
|
- Jeff: does cmd_line_t need to *get* MCA params if a command line
|
|
param is not taken but an MCA param is available?
|
|
- consider empty string problem...
|
|
|
|
- ?: Friendlier error messages (e.g., if no nodes -- need something
|
|
meaningful to tell the user)
|
|
|
|
- Ralph: compare and set function in GPR
|
|
|
|
- Jeff: collapse MCA params from 3 names to 1 name
|
|
|
|
- ?: Apply LANL copyright to trunk (post all merging activity)
|