Deprecate --am and --amca options
Avoid default param files on backend nodes
Any parameters in the PRRTE default or user param files will have been
picked up by prte and included in the environment sent to the prted, so
don't open those files on the backend.
Avoid picking up MCA param file info on backend
Avoid the scaling problem at PRRTE startup by only reading the system
and user param files on the frontend.
Complete revisions to cmd line parser for OMPI
Per specification, enforce following precedence order:
1. system-level default parameter file
1. user-level default parameter file
1. Anything found in the environment
1. "--tune" files. Note that "--amca" goes away and becomes equivalent to "--tune". Okay if it is provided more than once on a cmd line (we will aggregate the list of files, retaining order), but an error if a parameter is referenced in more than one file with a different value
1. "--mca" options. Again, error if the same option appears more than once with a different value. Allowed to override a parameter referenced in a "tune" file
1. "-x" options. Allowed to overwrite options given in a "tune" file, but cannot conflict with an explicit "--mca" option
1. all other options
Fix special handling of "-np"
Get agreement on jobid across the layers
Need all three pieces (PRRTE, PMIx, and OPAL) to agree on the nspace
conversion to jobid method
Ensure prte show_help messages get output
Print abnormal termination messages
Cleanup error reporting in persistent operations
Signed-off-by: Ralph Castain <rhc@pmix.org>
dd
Signed-off-by: Ralph Castain <rhc@pmix.org>
Clarify exactly what compiler atomics support is needed to build Open
MPI.
For example, the following configuration now fails to configure on
Solaris:
- Studio 12.5 Sun C 5.14 SunOS_sparc 2016/05/31
- Also tried with GCC 4.0.2
with the following error:
configure: error: No atomic primitives available for sparc-sun-solaris2.11
Thanks to @jdelsign for the report.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
OMPI can only support PMIx v3 and above. PRRTE requires at least PMIx
v4, so protect against the case where OMPI is built against an external
PMIx v3.
Fix check of PMIx_Init return code for singleton operations.
Ensure that the PMIx framework gets properly opened.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Per the devel mailing list, we discussed the need/desirability of this change. Here is the logic behind not including it:
If you call "hwloc_topology_load", then hwloc merrily does its discovery and slams many-core systems. If you call "opal_hwloc_get_topology", then that is fine - it checks if we already have it, tries to get it from PMIx (using shared mem for hwloc 2.x), and only does the discovery if no other method is available.
We previously decided to let those who need the topology call "opal_hwloc_get_topology" to ensure the topo was available so that we don't load it unless someone actually needs it - in the case where it isn't available via PMIx, this avoids paying the startup time and memory footprint penalties for no reason. The function is protected so it will simply return SUCCESS if the topology is already defined.
After discussion, it was decided to stick with that "only setup the topology if someone actually needs it" approach. Hence, we will not blanket init the topology, and the mtl/ofi component will call opal_hwloc_get_topology to ensure the topo has been defined prior to using it.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
Use "prte" instead of "prun" for proxy execution of cmds like mpirun.
This avoids the fork/exec-rendezvous complexities and should result in
more reliable operation.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Restore missing call to get_topology - others were doing it in their
components as repeated calls just return success, but let's ensure it is
always present.
Signed-off-by: Ralph Castain <rhc@pmix.org>
CPPFLAGS, LDFLAGS, and LIBS were only being saved conditionally, but
restored unconditionally. This could result in wiping out
CPPFLAGS/LDFLAGS/LIB.
Make sure to *always* save these flags so that when they are restored,
they are restored to their proper value.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Extend the PMIx modex recv macros to cover the full set of
immediate/optional combinations. If PMIx_Init cannot reach a server,
then declare the MPI proc to be a singleton.
Provide full support for info values via PMIx
Catch all the values used in the "info" area of OMPI using data
available from PMIx instead of via envars. Update PMIx and PRRTE to sync
with their capabilities.
PMIx
- ensure cleanup of fork/exec children
- fix bug in gds/hash that left app info off of list
PRRTE
- fix multi-app bugs
- port setup_child logic from orte
- OMPI env changes
- set app->first_rank
- ensure common hostname across prun, prte, and pmix
- Fix "nolocal" support
Silence a warning from btl/vader
Signed-off-by: Ralph Castain <rhc@pmix.org>
The fence logic in MPI_Init got messed up somehow such that we were
always executing a fence, which is not desirable. The logic is supposed
to be:
* if async fence is requested and we are not collecting data, then do
not fence at all
* if async fence is requested and we are collecting data, then execute
the fence in the background - wait for completion at the end of MPI_Init.
* if async fence is not requested, then execute a blocking fence at that
point, collecting data as directed. Note that we cannot actually do a
blocking fence as we need to cycle the event library via opal_progress
as the PMIx progress thread is tied to the OMPI event base.
Signed-off-by: Ralph Castain <rhc@pmix.org>
orte is gone, and we don't want to require other
RM's to either use a prrte specific env, or to
set their own.
OMPI_MCA_orte_ess_num_procs -> OMPI_MCA_num_procs
OMPI_MCA_orte_cpu_type -> OMPI_MCA_cpu_type
See PRRTE PR's:
https://github.com/openpmix/prrte/pull/443https://github.com/openpmix/prrte/pull/440
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
- Avoid modifying single-dash options of applications
- Fix fetch of node/app-level info
- Return correct status code
Signed-off-by: Ralph Castain <rhc@pmix.org>
Per suggestion of @awlauria, added some comments about
the need to free ep before resources it points to.
Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
PMIx:
- Ensure that launchers open all required frameworks
- Pass back the tool's ID
- Fix race condition in IOF
PRRTE:
- Begin conversion to use of nspace in place of numeric jobid
- Restore support:
--report-bindings
--display-map
--display-devel-map
--display-topo
--do-not-launch
--xml-output
--display-allocation
Signed-off-by: Ralph Castain <rhc@pmix.org>
- Port memchecker call from a1d502c.
- Remove unused memcheck macro variables.
- Some code readability improvements.
- Remove some stray +1's in dynamic comm cleanup.
- Re-add OPAL_ENABLE_DEBUG macro to osc header.
- Cleanup some printf's, and includes.
- Refactor cleanup of dpm_disconnect_objs.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>