to choose the first available non-socket provider.
modified: orte/mca/rml/ofi/rml_ofi_component.c
modified: orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi Jayakumar <anandhi.s.jayakumar@intel.com>
* Resolves#3705
* Components should link against the project level library to better
support `dlopen` with `RTLD_LOCAL`.
* Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
with the appropriate project level library:
```
MCA components in ompi/
$(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
$(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
$(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
$(top_builddir)/oshmem/liboshmem.la"
```
Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
passed to make it all flow thru the opal/pmix "put/get" operations. Update the PMIx code to latest master to pickup some required behaviors.
Remove the no-longer-required get_contact_info and set_contact_info from the RML layer.
Add an MCA param to allow the ofi/rml component to route messages if desired. This is mainly for experimentation at this point as we aren't sure if routing wi
ll be beneficial at large scales. Leave it "off" by default.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(1) rml_ofi_transports mca parameter. This parameter should have the list of transports (currently ethernet,fabric are valid)
fabric is higher priority if provided.
(2) ORTE_RML_TRANSPORT_TYPE key with values "ethernet" or "fabric". "fabric" is higher priority.
If specific provider is required use ORTE_RML_OFI_PROV_NAME key with values "socket" or "OPA" or any other supported in system.
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
On send_msg choose the provider on local and peer to follow below rules -
1. if the user specified the transport for this conduit (even giving us a prioritized list of candidates), then the one we selected is the _only_ one we will use. If the remote peer has a matching endpoint, then we use it - otherwise, we error out
2. if the user didn't specify a transport, then we look for matches against _all_ of our available transports, starting with fabric and then going to Ethernet, taking the first one that matches.
3. if we can't find any match, then we error out
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
send_msg() -> Fixed case when the local provider chosen at time of opening conduit
is not present in peer (destination) node
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
When opening conduit, checking for the transport preference in below order -
(1) rml_ofi_transports mca parameter. This parameter should have the list of transports (currently ethernet,fabric are valid)
fabric is higher priority if provided.
(2) ORTE_RML_TRANSPORT_TYPE key with values "ethernet" or "fabric". "fabric" is higher priority.
If specific provider is required use ORTE_RML_OFI_PROV_NAME key with values "socket" or "OPA" or any other supported in system.
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
On send_msg choose the provider on local and peer to follow below rules -
1. if the user specified the transport for this conduit (even giving us a prioritized list of candidates), then the one we selected is the _only_ one we will use. If the remote peer has a matching endpoint, then we use it - otherwise, we error out
2. if the user didn't specify a transport, then we look for matches against _all_ of our available transports, starting with fabric and then going to Ethernet, taking the first one that matches.
3. if we can't find any match, then we error out
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
send_msg() -> Fixed case when the local provider chosen at time of opening conduit
is not present in peer (destination) node
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi Jayakumar <anandhi.s.jayakumar@intel.com>
Remove the opal_ignore from the RML/OFI component, but disable that component unless the user specifically requests it via the "rml_ofi_desired=1" MCA param. This will let us test compile in various environments without interfering with operations while we continue to debug
Fix an error when computing the number of infos during server init
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Cleanup the way we look for matching OFI addresses by using the opal_net_samenetwork helper function. This now works for multi-network environments, but only using the socket provider
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
This PR renames the common library for OFI libfabric from
libfabric to ofi. There are a number of reasons this
is good to do:
1) its shorter and replaces 9 characters with three for
function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
the term "ofi" internally and also in their configure options
relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)
There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.
This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Repair rsh/ssh tree spawn by unpacking and updating the nidmap in remote_spawn.
Add more specific error messages so the cause of a messaging problem is a little clearer. Remove some stale code. Ensure we stop trying to send a message after a few times.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/base/rml_base_stubs.c
deleted: ../orte/mca/rml/ofi/.opal_ignore
modified: ../orte/mca/rml/ofi/Makefile.am
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
modified: ../orte/test/system/ofi_conduit_stress.c
Removed stale include directive
modified: ../orte/mca/rml/ofi/Makefile.am
The ofi plugin supports multiple providers, and identifies them
by ofi_prov_id, changed the previous name conduit_id to ofi_prov_id
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_request.h
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/base/rml_base_stubs.c
deleted: ../orte/mca/rml/ofi/.opal_ignore
modified: ../orte/mca/rml/ofi/Makefile.am
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
modified: ../orte/test/system/ofi_conduit_stress.c
Removed stale include directive
modified: ../orte/mca/rml/ofi/Makefile.am
The ofi plugin supports multiple providers, and identifies them
by ofi_prov_id, changed the previous name conduit_id to ofi_prov_id
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_request.h
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Fixed merge issues, and minor pull-request comments
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/base/rml_base_stubs.c
deleted: ../orte/mca/rml/ofi/.opal_ignore
modified: ../orte/mca/rml/ofi/Makefile.am
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
modified: ../orte/test/system/ofi_conduit_stress.c
Removed stale include directive
modified: ../orte/mca/rml/ofi/Makefile.am
The ofi plugin supports multiple providers, and identifies them
by ofi_prov_id, changed the previous name conduit_id to ofi_prov_id
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_request.h
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/base/rml_base_stubs.c
deleted: ../orte/mca/rml/ofi/.opal_ignore
modified: ../orte/mca/rml/ofi/Makefile.am
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
modified: ../orte/test/system/ofi_conduit_stress.c
Removed stale include directive
modified: ../orte/mca/rml/ofi/Makefile.am
Fixed merge issues, and minor pull-request comments
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Removed trailing space
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Cleaned up test- ofi_conduit_stress.c
modified: ../orte/test/system/ofi_conduit_stress.c
cleaned up printing the provider info during initialisation
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Fixing warnings
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
minor cleanup
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
more cleanup
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Sending the ethernet address only in the get_contact_info, rest will be sent through modex
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Adding error logging on failures
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Handling the OPAL_MODEX_SEND/RECV generically for all ofi providers.
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Adding to build ofi for limited people
new file: ../orte/mca/rml/ofi/.opal_ignore
new file: ../orte/mca/rml/ofi/.opal_unignore
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Removign the error logging for now
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Multiple conduits can exist at the same time, and can even point to the same base transport. Each conduit can have its own characteristics (e.g., flow control) based on the info keys provided to the "open_conduit" call. For ease during the transition period, the "legacy" RML interfaces remain as wrappers over the new conduit-based APIs using a default conduit opened during orte_init - this default conduit is tied to the OOB framework so that current behaviors are preserved. Once the transition has been completed, a one-time cleanup will be done to update all RML calls to the new APIs and the "legacy" interfaces will be deleted.
While we are at it: Remove oob/usock component to eliminate the TMPDIR length problem - get all working, including oob_stress