As noted by Alexander Pozdneev, non-blocking sends are now able to
*access* buffers in pending non-blocking send operations; the buffers
just can't be *modified*.
Using the underlying hardware identification to split
communicators based on locality has been enabled using
the MPI_Comm_Split_Type function.
Currently implemented split's are:
HWTHREAD
CORE
L1CACHE
L2CACHE
L3CACHE
SOCKET
NUMA
NODE
BOARD
HOST
CU
CLUSTER
However only NODE is defined in the standard which is why the
remaning splits are referred to using the OMPI_ prefix instead
of the standard MPI_ prefix.
I have tested this using --without-hwloc and --with-hwloc=<path>
which both give the same output.
NOTE: I think something fishy is going on in the locality operators.
In my test-program I couldn't get the correct split on these requests:
NUMA, SOCKET, L3CACHE
where I suspected a full communicator but only got one.
Per discussion starting
http://www.open-mpi.org/community/lists/users/2014/12/26018.php, at
least note that OMPI does not allow adding or deleting attributes in
an attribute copy or delete callback (or any of its children) on the
same object on which the callback was invoked.
inserted in the ompi_proc_list as soon as it is created and it
is removed only upon the call to the destructor. In ompi_proc_finalize
we loop over all procs in ompi_proc_finalize and release them once.
However, as a proc is not removed from this list right away, we
decrease the ref count for each proc until it reach zero and the
proc is finally removed. Thus, we cannot clean the BML/BTL after
the call the ompi_proc_finalize.
A quick fix is to delay the call to ompi_proc_finalize until all
other frameworks have been finalized, and then the behavior
depicted above will give the expected outcome.
Background: In order to support atomics each btl needs to provide support
for communicating with self unless the btl module can guarantee global
atomicity. Before this commit bml/r2 discarded any BTL with lower
exclusivity than an existing send btl. This would cause the BML to
discard any btl other than self.
The new behavior is as follows:
- If an exisiting send btl has higher exclusivity then the btl will not be
added to the send btl list for the endpoint.
- If a btl provides RDMA support then it is always added to the rdma btl
list.
- bml_btl weight for send btls is now calculated across all send btls.
- bml_btl weight for rdma btls is now calculated across all rdma btls.
With this change self should still win as the only send btl for loopback
without disqualifying other btls (ugni, openib) for atomic operations.