In the OPAL_ENABLE_FT_CR code path there used to be a variable
'mca_base_component_distill_checkpoint_ready' which got removed.
The FT code was not compiling and while trying to get it to compile
again the old variable was #ifdef'd out. This re-introduces the
variable with a new name 'opal_base_distill_checkpoint_ready'
and enables the code previously #ifdef'd out.
This removes the last hack introduced to get the FT code to compile
again.
This commit was SVN r30928.
When compiling --with-ft there are a few compiler warnings about
unused variables. This patch fixes those compiler warnings.
This commit was SVN r30927.
Realistically, the usnic BTL doesn't need to know anything about the
underlying transport except for its header length (so that it knows
where the payload begins in a received buffer). So remove the use of
the specific transport prefix union and just rely on the usnic verbs
extension to tell us what the header length is if we're using the
usNIC/UDP transport, or sizeof(struct ibv_grh) if we're using usNIC/L2
transport.
This commit was SVN r30914.
Check the IBV_TRANSPORT_* values. In the case of IBV_TRANSPORT_IWARP,
there's an ambiguity and we need to also check to see whether the
usnic verbs externsion probe exists.
This commit was SVN r30913.
If they exist, call the usnic verbs extensions to both enable UDP
support and get the UD receiver header length that should be used
(rather than assume 40/struct GRH).
This commit was SVN r30912.
The problem arises when a hostfile is used, and the user provides host names without specifying the slots= paramater. In these cases, we assign slots=1, but automatically allow oversubscription since that number isn't confirmed. We then provide a separate parameter by which the user can direct that we assign the number of slots based on the sensed hardware - e.g., by telling us to set the #slots equal to the #cores on each node. However, this has been set to "off" by default.
In order to make this a little less complex for the user, set the default such that we automatically set #slots equal to #cores (or #hwt's if use_hwthreads_as_cpus has been set) only for those cases where the user provides names in a hostfile but does not provide slot information.
Also cleanup some a couple of issues in the mapping/binding system:
* ensure we only override the binding directive if we are oversubscribed *and* overload is not allowed
* ensure that the MPI procs don't attempt to bind themselves if they are launched by an orted as any binding directive (no matter what it was) would have been serviced by the orted on launch
* minor cleanup to the warning message when oversubscribed and binding was requested
cmr=v1.7.5:reviewer=rhc:subject=update mapping/binding system
This commit was SVN r30909.
- now add_procs can be called more than once (during MPI_INIT and Inter_Comm_Create)
- adjust MXM to this reality
fixed by Alina, reviewed by Yossi/Mike
cmr=v1.7.5:reviewer=ompi-rm1.7
This commit was SVN r30907.
This allows compilers to know that the code path(s) where
ompi_rte_abort() is invoked won't return (and therefore won't warn in
certain cases).
cmr=v1.8:reviewer=rhc
This commit was SVN r30891.
* Include opal_stdint.h so that we have uin32_t
cmr=v1.7.5:ticket=trac:4298
This commit was SVN r30890.
The following Trac tickets were found above:
Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
We don't use this functionality any more; we use the transport_type
and device name to identify usnic devices. It's slightly easier
because we can transport_type+name from ibv_device_open() and don't
have to do an additional ibv_query_device() to get its attributes.
Reviewed by Dave Goodell.
cmr=v1.7.5:reviewer=ompi-rm1.7
This commit was SVN r30882.
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style. That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint. If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later. We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.
cmr=v1.7.5:ticket=trac:4253
This commit was SVN r30879.
The following SVN revision numbers were found above:
r30850 --> open-mpi/ompi@3641500442
r30852 --> open-mpi/ompi@4e282a3295
The following Trac tickets were found above:
Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
fix situations where cluster nodes can have different btls
Fixed by Roman, reviewed by Igor, Mike
cmr=v1.7.5:reviewer=ompi-rm1.7
This commit was SVN r30877.
Basically: since usnic is a connectionless transport, we do not get
OS-provided services "for free" that connection-oriented transports
get, namely: "hey, I wasn't able to make a connection to peer X", and
"hey, your connection to peer X has died."
This connectivity-checker runs in a separate progress thread in the
usnic BTL in local rank 0 on each server. Upon first send in any
process, the connectivty-checker agent will send some UDP pings to the
peer to ensure that we can reach it. If we can't, we'll abort the job
with a nice show_help message.
There's a lengthy comment in btl_usnic_connectivity.h explains the
scheme and how it works.
Reviewed by Dave Goodell.
cmr=v1.7.5:ticket=trac:4253
This commit was SVN r30860.
The following Trac tickets were found above:
Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
dtypes support to oshmem scoll mpi. added comment to
oshmem scoll mpi component regarding casting size_t to int.
Fixed by Elena, Reviewed by Igor/Mike
cmr=v1.7.5:reviewer=ompi-rm1.7
This commit was SVN r30856.
- similar to opal/shmem
- next step is some refactoring and merge into opal/shmem
Developed by Igor, reviewed by AlexM, MikeD
This commit fixes trac:4261.
This commit was SVN r30855.
The following Trac tickets were found above:
Ticket 4261 --> https://svn.open-mpi.org/trac/ompi/ticket/4261
finalize.
Closes trac:4290
cmr=v1.7.5:reviewer=miked
This commit was SVN r30854.
The following Trac tickets were found above:
Ticket 4290 --> https://svn.open-mpi.org/trac/ompi/ticket/4290
- Fix several typos is osc/rdma.
- Fix a locking issue in osc/sm that was caused by an incorrect
assumption about the semantics of opal_atomic_add_32.
- Always unlock the accumulation lock in osc/sm.
- The base of a processes shared memory window should be NULL if
the size is zero. Fixed.
cmr=v1.7.5:ticket=trac:4304
This commit was SVN r30853.
The following Trac tickets were found above:
Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style. That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint. If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later. We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.
cmr=v1.7.5:ticket=trac:4253
This commit was SVN r30852.
The following SVN revision numbers were found above:
r30850 --> open-mpi/ompi@3641500442
The following Trac tickets were found above:
Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253