PR open-mpi/ompi#2432 introduced a regression where configure
and build with --disable-dlopn caused build failure owing
to unresolved alps lli symbols in the libopal-pal shared library.
This commit fixes this problem.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
there is no need for a configure option as well - so remove the
--enable-orte-static-ports configure option. When decoding the daemon
nidmap, mark new daemons as ALIVE by default - we will discover dead
ones as we go.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
display the hop node used to send a message
(if the message is sent directly, then the hop is the destination)
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
* By default, make sure that we are using the short hostnames and not
the fully qualified hostnames when running under LSF.
* Related to commit open-mpi/ompi@d26dd2c20e
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
This commit updates the alps ras component to allow the use of
hyperthreads on compute nodes. In this case we need to use the cpuCnt
value from the node structure instead of numPEs.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/base/rml_base_stubs.c
deleted: ../orte/mca/rml/ofi/.opal_ignore
modified: ../orte/mca/rml/ofi/Makefile.am
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
modified: ../orte/test/system/ofi_conduit_stress.c
Removed stale include directive
modified: ../orte/mca/rml/ofi/Makefile.am
The ofi plugin supports multiple providers, and identifies them
by ofi_prov_id, changed the previous name conduit_id to ofi_prov_id
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_request.h
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/base/rml_base_stubs.c
deleted: ../orte/mca/rml/ofi/.opal_ignore
modified: ../orte/mca/rml/ofi/Makefile.am
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
modified: ../orte/test/system/ofi_conduit_stress.c
Removed stale include directive
modified: ../orte/mca/rml/ofi/Makefile.am
The ofi plugin supports multiple providers, and identifies them
by ofi_prov_id, changed the previous name conduit_id to ofi_prov_id
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_request.h
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Fixed merge issues, and minor pull-request comments
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/base/rml_base_stubs.c
deleted: ../orte/mca/rml/ofi/.opal_ignore
modified: ../orte/mca/rml/ofi/Makefile.am
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
modified: ../orte/test/system/ofi_conduit_stress.c
Removed stale include directive
modified: ../orte/mca/rml/ofi/Makefile.am
The ofi plugin supports multiple providers, and identifies them
by ofi_prov_id, changed the previous name conduit_id to ofi_prov_id
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_request.h
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/base/rml_base_stubs.c
deleted: ../orte/mca/rml/ofi/.opal_ignore
modified: ../orte/mca/rml/ofi/Makefile.am
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
modified: ../orte/test/system/ofi_conduit_stress.c
Removed stale include directive
modified: ../orte/mca/rml/ofi/Makefile.am
Fixed merge issues, and minor pull-request comments
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Removed trailing space
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Cleaned up test- ofi_conduit_stress.c
modified: ../orte/test/system/ofi_conduit_stress.c
cleaned up printing the provider info during initialisation
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Fixing warnings
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
minor cleanup
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
more cleanup
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Sending the ethernet address only in the get_contact_info, rest will be sent through modex
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Adding error logging on failures
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Handling the OPAL_MODEX_SEND/RECV generically for all ofi providers.
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Adding to build ofi for limited people
new file: ../orte/mca/rml/ofi/.opal_ignore
new file: ../orte/mca/rml/ofi/.opal_unignore
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Removign the error logging for now
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
add the option to pass an alternate port to plm
for example
node0 port=2222
directs the plm (via the ORTE_NODE_PORT) attribute to use
the non default port 2222 (e.g. ssh -p 2222 node0 ...)
Multiple conduits can exist at the same time, and can even point to the same base transport. Each conduit can have its own characteristics (e.g., flow control) based on the info keys provided to the "open_conduit" call. For ease during the transition period, the "legacy" RML interfaces remain as wrappers over the new conduit-based APIs using a default conduit opened during orte_init - this default conduit is tied to the OOB framework so that current behaviors are preserved. Once the transition has been completed, a one-time cleanup will be done to update all RML calls to the new APIs and the "legacy" interfaces will be deleted.
While we are at it: Remove oob/usock component to eliminate the TMPDIR length problem - get all working, including oob_stress
It is possible that one or more procs could get thru PMIx_Init, and thus be marked as in state "registered", before all local procs have been started. If that happens, then we would report some of the procs in state "running", and the others in state "registered" - which means that the HNP would miss the "running" stage of the state machine.
Thanks to Jingchao Zhang for his patience in tracking this down on the 2.0 branch
* Expand the use of the `orte_keep_fqdn_hostnames` MCA parameter when
it is set to false.
* If that parameter is set to false (default) then short hostnames
(e.g., `node01`) will match with the long hostnames (e.g.,
`node01.mycluster.org`). This allows a user (or resource manager)
to mix the use of short and long hostnames.
- Note that this mechanism does _not_ perform a DNS lookup, but
instead strips off the FQDN by truncating the hostname string at
the first `.` character (when not an IP address).
- By default (`false`) the following is true:
`node01 == node01.mycluster.org == node01.bogus.com`
since we use `node01` as the hostname.
Oversubscribe stipulates that we allow more procs on the node than assigned slots - it has nothing to do with the number of available pe's. Let overload directives handle the pe situation.
Each process occupies a "slot". We automatically set #slots = #cpus if nothing else is told to us. If you want to run more procs and slots, you must tell us to allow oversubscription.
A process can utilize multiple pe's if that option is given. If you try to bind more than one proc to a given pe, then we will error out unless you tell us to allow overloading.
We need to list all major project libraries in the private libraries
line to enable static linking to work properly.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Don't strcmp against the default value -- the default value may change
over time. Instead, check to see if the MCA var source is not
DEFAULT.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Split process name variable "name" to
- "wildcard_rank" for the cases where wildcard is used.
- "pname" for the case where reference to particular process is needed.
Clang 5.1 on my mac was a sad panda compiling a couple
of files, complaining about uninitialized stack variables.
This commit makes clang a happier panda (or at least not so sad).
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
configury command line is quoted and made available via the OPAL_CONFIGURE_CLI macro.
it can be retrieved via {orte-info,ompi_info,oshmem_info} -c, or
{orte-info,ompi_info,oshmem_info} --all --parseable | grep ^config:cli:
This seems like an obvious typo: insert a missing "break" statement so
that we don't fall through to the next case.
Fixes CIDs 1362756 and 1362764.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Add PMIx 2.0
Remove PMIx 1.1.4
Cleanup copying of component
Add missing file
Touchup a typo in the Makefile.am
Update the pmix ext114 component
Minor cleanups and resync to master
Update to latest PMIx 2.x
Update to the PMIx event notification branch latest changes
Add descriptions for the new --report-state-on-timeout and
--get-stack-traces options.
Also add --timeout, and cross-reference MPIEXEC_TIMEOUT with it.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Note that this cannot be used for MPI performance testing. It is really only useful for ORTE scaling tests. It also only works with the rsh/ssh launcher.