PMIx_server_register_nspace() is an asynchronous operation, so
the pmix glue wait for it completes before returning.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
the argc field from the opal_pmix_app_t struct was removed,
so adjust the pmix/ext11 glue accordingly.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message.
Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Repair rsh/ssh tree spawn by unpacking and updating the nidmap in remote_spawn.
Add more specific error messages so the cause of a messaging problem is a little clearer. Remove some stale code. Ensure we stop trying to send a message after a few times.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
* MCA option to control how long we wait for stack traces:
- orte_timeout_for_stack_trace INTEGER
Default: 30
Setting to <= 0 will cause it to wait forever
* Useful when gathering stack traces from large jobs which might take
a long time.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
- New MCA option: opal_stacktrace_output
- Specifies where the stack trace output stream goes.
- Accepts: none, stdout, stderr, file[:filename]
- Default filename 'stacktrace'
- Filename will be `stacktrace.PID`, or if VPID is available,
then the filename will be `stacktrace.VPID.PID`
- Update util/stacktrace to allow for different output avenues
including files. Previously this was hardcoded to 'stderr'.
- Since opal_backtrace_print needs to be signal safe, passing it a
FILE object that actually represents a file stream is difficult. This
is because we cannot open the file in the signal handler using
`fopen` (not safe), but have to use `open` (safe). Additionally, we
cannot use `fdopen` to convert the `int fd` to a `FILE *fh` since it
is also not signal safe.
- I did not want to break the backtrace.h API so I introduced a new
rule (documented in `backtrace.c`) that if the `FILE *file`
argument is `NULL` then look for the `opal_stacktrace_output_fileno`
variable to tell you which file descriptor to use for output.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
The hook is called from memheap when memory range
is going to be allocated by smalloc(), realloc() and others.
ucx spml uses this hook to call ucp_mem_advise in order to speedup
non blocking memory mapping.
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
Given that we only set OOB contact info from inside of events, or before we begin threaded operations (e.g., in the ess), allow set_contact_info to directly update the oob/base framework globals.
Correct the nidmap regex decompression routine.
Ensure that rank=1 daemon always sends back its topology as this is the most common use-case.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>