orte_session_dir_finalize doesn't clean the right directories.
orte_session_dir_cleanup neither.
This patch fixes several issues:
1. orte_session_dir_cleanup():
1. when jobid is not a wildcard, jobid is used to build the job
session dir (instead of ORTE_LOCAL_JOBID).
1. ORTE_SUCCESS is unconditionally returned (instead of rc that
might have been previously set to another value).
1. orte_session_dir_finalize():
1. convert_jobid_to_string is not the right call to get the job
session dir.
1. in some places orte_process_info.top_session_dir is directly
used, without being prefixed with the base directory.
Factorized the code sections that build the job_session_dir into a
single orte_build_job_session_dir() function that is now called by
both orte_session_dir_finalize() and orte_session_dir_cleanup().
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
This commit was SVN r21498.
The open/select of the PLM is done in orte/mca/ess/base/ess_base_std_orted.c. It only is done when the PLM MCA param is set directing a specific PLM be selected. The function
orte_plm_base_orted_append_basic_args
clears the params passed to the daemon of any PLM selection passed to the HNP. Each PLM then adds a PLM directive if-and-only-if backend PLM support is desired. At present, Torque, SLURM, and rsh all specify this support and direct that the backend orted open the "rsh" PLM.
This commit was SVN r21488.
The following SVN revision numbers were found above:
r21480 --> open-mpi/ompi@ed585bce8a
If we do not initialize the PML, non-HNP daemons will not be able to use its functions. For example, RSH needs it when the tree_spawn mode is
enabled: daemons call orte_pml.remote_spawn() function to spawn their children in the deployment tree.
This commit was SVN r21480.
Update the loop_spawn test to remove a sleep so that it runs at max speed, letting the new code catch when we overrun ourselves and wait for room to be cleared for the next comm_spawn.
This commit was SVN r21390.
in this directory and it's causing "make dist" to break.
Shiqing -- is there a missing file in this directory? If so, please
add it and restore the EXTRA_DIST line I just removed. Thanks!
This commit was SVN r21340.
way to have no abort message is to pass NULL (the errmanager is smart
enough to handle this case and not emit any extra message).
This commit was SVN r21311.
Emit a more informative error message when the file descriptor limit is
reached during an accept() call. Also, abort when the accept fails to
avoid an infinite loop.
Emit a more informative error message when the help file can't be opened.
This commit was SVN r21271.
The following Trac tickets were found above:
Ticket 1930 --> https://svn.open-mpi.org/trac/ompi/ticket/1930
messages when delivering a signal (like STOP or CONT)
to a non-existant process. This fixes trac:1929.
Also, only print one error message in the other cases.
This commit was SVN r21263.
The following Trac tickets were found above:
Ticket 1929 --> https://svn.open-mpi.org/trac/ompi/ticket/1929
Add a new tm ess module that exploits this capability.
Update the various plm modules to enable it - just a minor change reflecting an added param to a plm base function.
Additional fixes included:
1. remove an erroneous cleanup of session directories in the tool finalize procedure - tools don't create session directories to begin with!
2. fix a duplicate free when attempting to execute a non-existent app
3. cleanup an typo in the comm utilities
4. fix comm_spawn - was perturbed by the changes in pack/unpack of orte_job_t to properly support orte-ps
Been tested on slurm and tm machines, using all tests in orte/test/mpi. May run into issue with command line length on large jobs due to inclusion of node info to support static ports - will fix this next with addition of regexp generator to compress that info.
This commit was SVN r21248.