CID 1269821 Dereference null return value (NULL_RETURNS)
This is another false positive that can be silenced by looping on
opal_list_remove_first instead of using both opal_list_is_empty and
opal_list_remove_first.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1269674 Ignoring number of bytes read (CHECKED_RETURN)
Check that we read enough bytes to get a complete async command.
CID 1269793 Missing break in switch (MISSING_BREAK)
Added comment to indicate fall through was intentional.
CID 1269702: Constant variable guards dead code (DEADCODE)
Remove an unused argument to opal_show_help. This will quiet the
coverity issue.
CID 1269675 Ignoring number of bytes read (CHECKED_RETURN)
Check that at least sizeof(int) bytes are read. If this is not the
case then it is an error.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The derived segment type (btl_openib_segment_t) was intended to store
the registration info needed for put and get. In BTL 3.0 this is no
longer required. I intended to remove this type as part of
open-mpi/ompi@74f1af4548 .
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.
This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.
Notes:
OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
in this code and it ended up changing the logic that is used to set up eager RDMA.
Rather than setting up eager RDMA with a high priority message, it did it the other
way around. For some reason, CUDA-aware support did not like this. So, basically,
restore the logic to the way it was prior to the refactoring. The refactoring did not
intend to change this. Lightly reviewed by hjelmn.
structure
This structure member was originally used to specify the remote segment
for an RDMA operation. Since the new btl interface no longer uses
desriptors for RDMA this member no longer has a purpose. In addition
to removing these members the local segment information has been
renamed to des_segments/des_segment_count.
Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.