This is so when a debugger attaches using MPIR, it can step out of this stack back into main.
This cannot be done with certain aggressive optimisations and missing debug information.
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Co-authored-by: Jeff Squyres <jsquyres@cisco.com>
* When we moved to allowing dual rml/oob transports, we added a bunch of
stuff that is no longer needed. Remove it so as to simplify the
messaging system.
* Fix the routed/radix component so it correctly returns the parent's
vpid
Signed-off-by: Ralph Castain <rhc@pmix.org>
We've been fighting the battle of trying to create a regex generator and
parser that can handle arbitrary hostname schemes - without long-term
success. The worst of it is that there is no way of checking to see if
the computed regex is correct short of parsing it and doing a
character-by-character comparison with the original string. Ugh...there
has to be a better solution.
One option is to investigate using 3rd-party regex libraries as
those are coming from communities whose sole focus is resolving that
problem. However, someone would need to spend the time to investigate
it, and we'd have to find a license-friendly implementation.
Another option is to quit beating our heads against the wall and just
compress the information. It won't be as much of a reduction, but we
also won't keep hitting scenarios where things break. In this case, it
seems that "perfection" is definitely the enemy of "good enough".
This PR implements the compression option while retaining the
possibility of people adding regex-generating components. The
compression code used in ORTE is consolidated into the opal/compress
framework. That framework currently held bzip and gzip components for
use in compressing checkpoint files - since we no longer support C/R, I
have .opal_ignore'd those components.
However, I have left the original framework APIs alone in case someone
ever decides to redo C/R. The APIs of interest here are added to the
framework - specifically, the "compress_block" and "decompress_block"
functions. I then moved the ORTE zlib compression code into a new
component in this framework.
Unfortunately, the framework currently is a single-select one - i.e.,
only one active component at a time. Since I .opal_ignore'd the other
two and made the priority of zlib high, this isn't a problem. However,
if someone wants to re-enable bzip/gzip or add another component, they
might need to transition opal/compress to a multi-select framework.
Included changes:
* Consolidate the compression code into the opal/compress framework
* Move the ORTE zlib compression code into a new opal/compress/zlib
component
* Ignore the bzip and gzip components in opal/compress framework
* Add a "compress_base_limit" MCA param to set the threshold above which
we compress data - defaults to 4096 bytes
* Delete stale brucks and rcd components from orte/grpcomm framework
* Delete the orte/regx framework
* Update the launch system to use opal/compress instead of string regex
* Provide a default module if no zlib is available
* Fix some misc multi-node issues
* Properly generate the nidmap in response to a "connection warmup"
message so the remote daemon knows the children it needs to launch.
* Remove stale references to orte_node_regex
* opal_byte_object_t's are not OPAL objects - properly release allocated
memory.
* Set the topology
* Currently only handling homogeneous case
* Update the compress framework files to conform
* Consolidate open/close into one "frame" file. Ensure we open/close the
framework
Signed-off-by: Ralph Castain <rhc@pmix.org>
be more careful about closing framewworks as part of
orte_finalize. Owing to recent restructuring in opal to handle
finalize in a more general fashion, the missing framework
closes were causing meltdowns as the mca vars subsystem
was cleaning itself up.
This problem was recently reported by Siegmar:
https://www.mail-archive.com/users@lists.open-mpi.org//msg32946.html
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
This commit fixes an compilation error when configured
with `--enable-timing`.
Procedures in the function `orte_ess_base_app_setup`
in `orte/mca/ess/base/ess_base_std_app.c` are moved
to `orte/mca/ess/pmi/ess_pmi_module.c`
and `orte/mca/ess/singleton/ess_singleton_module.c`
in the recent commit 57f6b94fa5.
In `ess_pmi_module.c`, the first argument of the
`OPAL_TIMING_ENV_NEXT` macro should have been adapted
to the destination function but was not.
In `ess_singleton_module.c`, `OPAL_TIMING_ENV_INIT`
was not used in the destination function originally.
So `OPAL_TIMING_ENV_NEXT` cannot be used in the function.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
When there is no alias for a given node, do not set the
ORTE_NODE_ALIAS attribute to an empty string any more.
Thanks Erico for reporting this issue.
Thanks Ralph for the guidance.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Resolve a race condition between registering for a file to be removed upon termination and actual creation of that file by providing attributes that identify whether the path is a file or directory. This removes the need for PMIx to detect the difference.
Refs #4686
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Handle the need for different regex generator/parsers by moving the
orte/util/nidmap and orte/util/regex code into a new "regx" framework.
Use the original code to complete a "fwd" component, and create a
scaffold for IBM's "reverse" component.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
If available, have apps use registration capability to cleanup their session directories. Setup capability for vader to register its shared memory file location - let someone familiar with that code do so.
Final cleanup to track uid/gid, update the opal/pmix API to pass flags for ignore and leave top directory alone
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Still in the "needs to be done" category:
* mapping/ranking/binding options aren't correctly supported
* if the DVM encounters some errors (e.g., not enough resources for the job), the resulting error is globally set and impacts any subsequent job submission
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
* Resolves#3705
* Components should link against the project level library to better
support `dlopen` with `RTLD_LOCAL`.
* Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
with the appropriate project level library:
```
MCA components in ompi/
$(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
$(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
$(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
$(top_builddir)/oshmem/liboshmem.la"
```
Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
passed to make it all flow thru the opal/pmix "put/get" operations. Update the PMIx code to latest master to pickup some required behaviors.
Remove the no-longer-required get_contact_info and set_contact_info from the RML layer.
Add an MCA param to allow the ofi/rml component to route messages if desired. This is mainly for experimentation at this point as we aren't sure if routing wi
ll be beneficial at large scales. Leave it "off" by default.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Remove the opal_ignore from the RML/OFI component, but disable that component unless the user specifically requests it via the "rml_ofi_desired=1" MCA param. This will let us test compile in various environments without interfering with operations while we continue to debug
Fix an error when computing the number of infos during server init
Signed-off-by: Ralph Castain <rhc@open-mpi.org>