Embedding libfabric is a temporary measure; I'm removing some warning
notifications so that the output isn't so cluttered (we're getting
the real warnings fixed upstream, but the OMPI community doesn't
really care/need to see the warnings in the meantime).
This commit adds an MCA variable to select Portals4 logical
addressing, populates the logical-to-physical mapping table and
initializes the NI in this mode.
Switch to using the query/priority method for selecting
MTLs. This switch was motivated by the fact that now
on some platforms, its possible for multiple MTLs to
be initializable, but only one MTL should be selected.
In addition, there is a complication with the PSM and
IFO (with PSM provider) MTLs owing to the fact that
they cannot both intialize the underlying PSM context,
i.e. only one call to psm_init is allowed per process.
The mxm component has not been compiled as the author
doesn't currently have access to a system with a recent
enough mxm installed to allow for a compile.
The portals4, ofi, and psm components have been checked
for compilation. The ofi and psm components have been
checked for runtime correctness on a intel/qlogic system
with up to date PSM installed.
This document is the result of a George Bosilca/Jeff Squyres
discussion at the developer meeting in Dallas in January of 2015. It
attempts to provide some rules of thumb / guidance for naming
conventions of symbols in MPI extensions.
http://www.open-mpi.org/community/lists/devel/2015/01/16820.php,
coll ML has two pending issues: a deadlock and a performance critical
on every communicator creation. After confirmation over IM from
Pasha, the ML collective module will be disabled until it is
fixed. Token to Pasha.
The rml/oob was not doing sanity checks on the input peer
parameter for the orte_rml_oob_send_nb and orte_rml_oob_send_buffer_nd.
Owing to the fact that there are places in the ompi/orte stack
where things like orte_show_help_norender are called way before
ORTE_PROC_MY_HNP, are setup properly, all kinds of weird
startup failures can occur as the rml/oob tries to process send
requests where the peer is junk.
Rather than try to expand this kind of thing:
/* if we are the HNP, or the RML has not yet been setup,
* or ROUTED has not been setup,
* or we weren't given an HNP, or we are running in standalone
* mode, then all we can do is process this locally
*/
if (ORTE_PROC_IS_HNP || orte_standalone_operation ||
NULL == orte_rml.send_buffer_nb ||
NULL == orte_routed.get_route ||
NULL == orte_process_info.my_hnp_uri) {
rc = show_help(filename, topic, output, ORTE_PROC_MY_NAME);
}
do the right thing in the rml level and return an error rather than
eventually failing in the send owing to peer not being valid.