Undecided timing: ----------------- - if an MPI process fails (e.g., it seg faults), it causes orterun to hang. This is with the rsh pls. --> Looks like the problem is with what happens when you set the state of the process in the soh to ORTE_PROC_STATE_ABORTED. --> Ralph is looking at this - if the daemon is not found or fails to start, orterun will hang. No indication is given to the users that something went wrong. --> Brian thinks he fixed this, but since he sets the state to ORTE_PROC_STATE_ABORTED, it won't be really clear until the above issue is fixed. But it at least tells you what went wrong. - $prefix/etc/hosts vs. $prefix/etc/openmpi-default-hostfile --> Brian temporarily added symlink in $prefix/etc/ for openmpi-default-hostfile -> hosts if there isn't already a hosts file so that he doesn't have to create one every time he does "rm -rf $prefix && make install". Will file bug so that this can be fixed (and will fix in the trunk) Pre-milestone: -------------- - singleton mpi doesn't work - Ralph: Populate orte_finalize() Post-milestone: --------------- - ras_base_alloc: doesn't allow for oversubscribing like this: eddie: cpu=2 vogon: cpu=2 max-slots=4 mpirun -np 6 uptime It barfs because it tries to evenly divide the remaining unallocated procs across all nodes (i.e., 1 each on eddie/vogon) rather than seeing that vogon can take the remaining 2. - Jeff: TM needs to be re-written to use daemons (can't hog TM connection forever) - Jeff: make the mapper be able to handle app->map_data - Jeff: add function callback in cmd_line_t stuff - Jeff: does cmd_line_t need to *get* MCA params if a command line param is not taken but an MCA param is available? - consider empty string problem... - ?: Friendlier error messages (e.g., if no nodes -- need something meaningful to tell the user) - Ralph: compare and set function in GPR - Jeff: collapse MCA params from 3 names to 1 name - ?: Apply LANL copyright to trunk (post all merging activity) - Probably during/after OMPI/ORTE split: - re-merge [orte|ompi]_pointer_array and [orte|ompi]_value_array