This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.
All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
If we really get a catastrophic error from a libfabric call, don't
bother trying to continue (because data has been corrupted and there's
nothing sane left to do). Just call opal_btl_usnic_exit() (which
tries to call the PML error callback, but we're so early in the
module_init process that this likely hasn't been setup yet, so the job
will likely abort).
Nothing too substantial here, but two of the messages moved from
"libfabric API failed" to "internal error during init", just to be a
bit more descriptive.
When we get errors, the entry.data field tells us how many errors are
being reported. So decrement the loop count variable by that much.
This fixes CSCut30441.
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.
This first patch moves orte_cr_continue_like_restart from ORTE
to opal_cr_continue_like_restart in OPAL. This only leaves three
calls from OPAL to ORTE in the FT code. As it is not yet 100%
clear how to handle these calls the code orte_sstore.set_attr()
has been #ifdef'd out for now.
The derived segment type (btl_openib_segment_t) was intended to store
the registration info needed for put and get. In BTL 3.0 this is no
longer required. I intended to remove this type as part of
open-mpi/ompi@74f1af4548 .
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit should resolve an issue seen with CUDA-aware support. The
problem came in with BTL 3.0. Before 3.0 the size of the copy was
stored in the incoming segment's des_remote_count field. This field
does not exist in BTL 3.0 so I stored the value in the
des_segment_count field. This caused problems with the cuda support
code. To fix the issue the endpoint pointer is now stored in the in
fragment's endpoint pointer which free's up the segment's des_cbdata
pointer for storing the transfer size.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
If we ended up with no modules (e.g., all usnic devices were
excluded), there was a race condition in that the connectivity agent
could tear down its local socket before one or more of the local
clients saw it. Therefore, the local clients would timeout waiting
for the socket to appear.
So move the connectivity checker init later in the bootstrapping
process (it *must* be setup before module_init()), and have it only
invoked if we actually ended up with one or more modules.
Fix previously-unfinished error paths during startup/bootstrapping.
Instead of just blindly continuing on when an fi_* function call
fails, opal_show_help and skip that device.
Also, only check the usnic config minimums once. They're VIC-wide and
won't change on a per-device basis -- we only need to check them once.
Fixes CSCut19179.
call the opal_common_verbs_mca_register function to make sure that
opal_common_verbs_want_fork_support mca parameter is created and therefore
can be used to control the fork support.
Also include two other minor changes:
1. More C99-style member initialization in the component struct
1. Fix the BTL module member initialization to not be redundant
In order to have an effect, ibv_fork_init should be called in the
beginning of the verbs initialization flow - before the calls to the
ibv_create_qp and ibv_create_cq verbs.
These functions are called from the oob/ud code and by the time the
other verbs components (btl openib, pml yalla, ...) call ibv_fork_init,
it's too late. This commit forces the call to ibv_fork_init (if it's
requested) right at the beginning of all the components that are using
verbs.
(ibv_fork_init() can be safely called multiple times)
This commit also removes the btl_openib_want_fork_support mca parameter
and adds a new mca parameter instead - opal_verbs_want_fork_support.
Through this new parameter, fork support may be requested for ALL
components.
The default value for this parameter is set to 1.
Before this commit the btl_openib_want_fork_support parameter didn't
provide fork support for the openib btl if its value was set to 1.
(because when openib called ibv_fork_init, it was already after the
calls to ibv_create_* in oob/ud and thereofre it failed).
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.
This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.
Notes:
OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Please verify your components have been updated correctly. Keep in
mind that in terms of threading:
OPAL_FREE_LIST_GET -> opal_free_list_get_st
OPAL_FREE_LIST_RETURN -> opal_free_list_return_st
I used the opal_using_threads() variant anytime it appeared multiple
threads could be operating on the free list. If this is not the case
update to _st. If multiple threads are always in use change to _mt.
This commit adds an owner file in each of the component directories
for each framework. This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page. Currently there are two
"fields" in the file, an owner and a status. A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
Some BTLs do not require local registration for some rdma
transactions. For example: inline put on openib, fma put on ugni. This
commit adds code to expose the local registration thresholds to BTL
users. Optimized code can take advantage of this information to
improve rdma performance.
Add the functions that changed between BTL 2.0 and 3.0 into compat.h
and compat.c:
* module.btl_prepare_src: the signature and body of this method
changed between 2.0 and 3.0. However, the functions that this
method calls did *not* need to change, so they are copied over
wholesale (with the exception that they no longer accept the unused
`registration` parameter).
* module.btl_prepare_dst: this method does not exist in BTL 3.0.
* module.btl_put: the signature and body of this method changed
between 2.0 and 3.0.
The send inline optimization uses the btl_sendi function to achieve lower
latency and higher message rates. Before this commit BTLs were allowed to
assume the descriptor was non-NULL and were expected to return a valid
descriptor if the send could not be completed using btl_sendi. This
behavior was fine until the usage of btl_sendi was changed in ob1. This
commit allows the caller to specify NULL for the descriptor. The affected
btls have been updated to handle this case.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds an interface for btl's to export support for 64-bit atomic
operations on integers. BTL's that can support atomic operations should
implement these functions and set the appropriate btl_flags and btl_atomic_flags.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The old BTL interface provided support for RDMA through the use of
the btl_prepare_src and btl_prepare_dst functions. These functions were
expected to prepare as much of the user buffer as possible for the RDMA
operation and return a descriptor. The descriptor contained segment
information on the prepared region. The btl user could then pass the
RDMA segment information to a remote peer. Once the peer received that
information it then packed it into a similar descriptor on the other
side that could then be passed into a single btl_put or btl_get
operation.
Changes:
- Added functions to register and deregister memory regions with the
btl. If no registration is needed a btl should set these function
pointers to NULL. These function take over for btl_prepare_src/dst
and btl_free for RDMA operations. The caller should specify the
maximum permissions needed on the memory.
- Changed the function signatures for both btl_put and btl_get. In
place of a prepared descriptor the caller should provide the source
and destination addresses and registration handles as well as a
new callback function. The callback will be provided with the local
address and registration handle, callback context, callback data, and
status. See mca_btl_base_rdma_completion_fn_t in btl.h.
- Added a new btl constraint: MCA_BTL_REG_HANDLE_MAX_SIZE. This
value specifies the maximum size of any btl's registration handle.
- Removed the btl_prepare_dst function. This reflects the fact that
RDMA operations no longer depend on "prepared" descriptors.
- Removed the btl_seg_size member. There is no need to btl's to
subclass the mca_btl_base_segment_t class anymore.
- Expose the btl's put/get limitations with new struct members:
btl_put_limit, btl_put_alignment, btl_get_limit, btl_get_alignment.
- Remove the mca_mpool_base_registration_t argument from the btl_prepare_src
function. The argument was intended to support RDMA operations and is no
longer necessary.
- Remove des_remote/des_remote_count from the mca_btl_base_descriptor_t
structure. This structure member was originally used to specify the remote
segment for RDMA operations. Since the new btl interface no longer uses
desriptors for RDMA this member no longer has a purpose. In addition
to removing these members the local segment structure fields have been
renamed to from des_local/des_local_count to des_segments/des_segment_count.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
usnic_fls() can actually return 0, leading us to incorrectly free() a
buffer instead of OMPI_FREE_LIST_RETURN_MT'ing it.
So add an explicit bool in the struct that tracks whether the buffer
came from malloc or a freelist.
This was CID 1269660.
in this code and it ended up changing the logic that is used to set up eager RDMA.
Rather than setting up eager RDMA with a high priority message, it did it the other
way around. For some reason, CUDA-aware support did not like this. So, basically,
restore the logic to the way it was prior to the refactoring. The refactoring did not
intend to change this. Lightly reviewed by hjelmn.