Consolidate the ompi_process_info and opal_process_info structs to
remove duplicate storage and conversion issues. Unwind some interweaving
of include files using opal.h. Silence a couple of warnings.
For now, set the arch to local if PMIX_ARCH is not found.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Do some code cleanup in the connect/accept code. Ensure that the OMPI
layer has access to the PMIx identifier for the process. Add macros for
converting PMIx names to/from strings. Cleanup a few of the simple test
programs. Add a little more info to a btl/tcp error message.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Add a framework to support different types of threading models including
user space thread packages such as Qthreads and argobot:
https://github.com/pmodels/argobotshttps://github.com/Qthreads/qthreads
The default threading model is pthreads. Alternate thread models are
specificed at configure time using the --with-threads=X option.
The framework is static. The theading model to use is selected at
Open MPI configure/build time.
mca/threads: implement Argobots threading layer
config: fix thread configury
- Add double quotations
- Change Argobot to Argobots
config: implement Argobots check
If the poll time is too long, MPI hangs.
This quick fix just sets it to 0, but it is not good for the
Pthreads version. Need to find a good way to abstract it.
Note that even 1 (= 1 millisecond) causes disastrous performance
degradation.
rework threads MCA framework configury
It now works more like the ompi/mca/rte configury,
modulo some edge items that are special for threading package
linking, etc.
qthreads module
some argobots cleanup
Signed-off-by: Noah Evans <noah.evans@gmail.com>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Deprecate --am and --amca options
Avoid default param files on backend nodes
Any parameters in the PRRTE default or user param files will have been
picked up by prte and included in the environment sent to the prted, so
don't open those files on the backend.
Avoid picking up MCA param file info on backend
Avoid the scaling problem at PRRTE startup by only reading the system
and user param files on the frontend.
Complete revisions to cmd line parser for OMPI
Per specification, enforce following precedence order:
1. system-level default parameter file
1. user-level default parameter file
1. Anything found in the environment
1. "--tune" files. Note that "--amca" goes away and becomes equivalent to "--tune". Okay if it is provided more than once on a cmd line (we will aggregate the list of files, retaining order), but an error if a parameter is referenced in more than one file with a different value
1. "--mca" options. Again, error if the same option appears more than once with a different value. Allowed to override a parameter referenced in a "tune" file
1. "-x" options. Allowed to overwrite options given in a "tune" file, but cannot conflict with an explicit "--mca" option
1. all other options
Fix special handling of "-np"
Get agreement on jobid across the layers
Need all three pieces (PRRTE, PMIx, and OPAL) to agree on the nspace
conversion to jobid method
Ensure prte show_help messages get output
Print abnormal termination messages
Cleanup error reporting in persistent operations
Signed-off-by: Ralph Castain <rhc@pmix.org>
dd
Signed-off-by: Ralph Castain <rhc@pmix.org>
Extend the PMIx modex recv macros to cover the full set of
immediate/optional combinations. If PMIx_Init cannot reach a server,
then declare the MPI proc to be a singleton.
Provide full support for info values via PMIx
Catch all the values used in the "info" area of OMPI using data
available from PMIx instead of via envars. Update PMIx and PRRTE to sync
with their capabilities.
PMIx
- ensure cleanup of fork/exec children
- fix bug in gds/hash that left app info off of list
PRRTE
- fix multi-app bugs
- port setup_child logic from orte
- OMPI env changes
- set app->first_rank
- ensure common hostname across prun, prte, and pmix
- Fix "nolocal" support
Silence a warning from btl/vader
Signed-off-by: Ralph Castain <rhc@pmix.org>
Temporary solution for the PML inconsistency issue discussed in #7475.
This patch address 2 things: first it make the PMIx key optional so that
if we are not in a full modex mode we don't do a direct modex, and
second it get the PML info from the vpid 0 instead of from the local
rank.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Restrict the search to the "immediate" range so at worst we check with
our local server and don't go up to the host daemon.
Signed-off-by: Ralph Castain <rhc@pmix.org>
If someone gives us a namespace that doesn't easily translate to an
integer, we have to create a mechanism for working around the
disconnect. PRRTE has been updated to give us a flag so we know we were
"natively" launched. If we don't see it, then fall back to generating a
hash of the nspace as our jobid. We then have to translate back/forth
between nspace and jobid using a lookup table.
Probably not the right long-term solution, but hopefully helps get us
thru for a bit.
Includes update of PRRTE pointer
Signed-off-by: Ralph Castain <rhc@pmix.org>
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.
remove orte: misc fixes
- UCX fixes
- VPATH issue
- oshmem fixes
- remove useless definition
- Add PRRTE submodule
- Get autogen.pl to traverse PRRTE submodule
- Remove stale orcm reference
- Configure embedded PRRTE
- Correctly pass the prefix to PRRTE
- Correctly set the OMPI_WANT_PRRTE am_conditional
- Move prrte configuration to the end of OMPI's configure.ac
- Make mpirun a symlink to prun, when available
- Fix makedist with --no-orte/--no-prrte option
- Add a `--no-prrte` option which is the same as the legacy
`--no-orte` option.
- Remove embedded PMIx tarball. Replace it with new submodule
pointing to OpenPMIx master repo's master branch
- Some cleanup in PRRTE integration and add config summary entry
- Correctly set the hostname
- Fix locality
- Fix singleton operations
- Fix support for "tune" and "am" options
Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>