A bus error occurs in sm OSC under the following conditions.
- sparc64 or any other architectures which need strict alignment.
- `MPI_WIN_POST` or `MPI_WIN_START` is called for a window created
by sm OSC.
- The communicator size is odd and greater than 3.
The lines 283-285 in current `ompi/mca/osc/sm/osc_sm_component.c` has
the following code.
```c
module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base);
module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1);
module->posts[0] = (uint64_t *) (module->node_states + comm_size);
```
The size of `ompi_osc_sm_node_state_t` is multiples of 4 but not
multiples of 8. So if `comm_size` is odd, `module->posts[0]` does
not aligned to 8. This causes a bus error when accessing
`module->posts[i][j]`.
This patch fixes the alignment of `module->posts[0]` by setting
`module->posts[0]` first.
Add primitive magic number and version checking in the connectivity
checker protocol. These checks doesn't *guarantee* to we won't get
false PINGs and ACKs, but they do significantly reduce the possibility
of interpretating random incoming fragments as PINGs or ACKs.
The btl_recv.h:lookup_sender() function uses the hashed ORTE proc name
to determine the sender of the packet. With add_procs_cutoff>0, the
usnic BTL may not have knowledge of all the senders.
Until the usNIC BTL can be adjusted to do something like the
openib/ugni BTLs (i.e., use opal_proc_for_name() to lookup unknown
sender proc names), set MCA_BTL_FLAGS_SINGLE_ADD_PROCS, which means
that ob1 will only all add_procs() once -- with all the procs in it.
Also in this commit, adapt the connectivity checker to not rely on
knowing all the senders (which is a bit easier than adapting the main
BTL send path): the receiving connectivity agent will simply echo back
the same PING message (which contains the sender's IP address+UDP
port) back to the sender without checking that it knows who the sender
is. If the sender receives the echoed PING back on the expexted
interface, it will find a match in the pending pings list. If the
sender receives the echoed PING back an unexpected interface, a match
will not be found, and the incoming PING message will be dropped.
Fixesopen-mpi/ompi#1440
This commit fixes an inconsistency between btl_openib_receive_queues,
btl_openib_max_send_size and btl_openib_eager_limit. Before this
commit if the ini file specified a set of default receive queues that
happen to not contain one large enough for the default max_send_size
of eager_limit users would see an error like:
WARNING: The largest queue pair buffer size specified in the
btl_openib_receive_queues MCA parameter is smaller than the maximum
send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
that no queue is large enough to receive the largest possible incoming
message fragment. The OpenFabrics (openib) BTL will therefore be
deactivated for this run.
Local host: somehost
Largest buffer size: 65536
Maximum send fragment size: 131072
This commit adds code that detects the source of the max_send_size and
eager_limit values and sets either or both of them to the size
supported by the largest queue pair if both 1) the value is larger
than the largest queue pair size, and 2) the value was not set by the
user or a MCA configuration file.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds two m4 macros: OPAL_SUMMARY_ADD, OPAL_SUMMARY_PRINT.
OPAL_SUMMARY_ADD adds an item to a section in the summary. For example
OPAL_SUMMARY_ADD([[Transports]],[[Foo]],...,[yes]) will add the
following to the summary:
Transports
-----------------------
Foo: yes
With this commit two sections are added: Transports, Resource Managers.
The OPAL_SUMMARY_PRINT macro is called after AC_OUTPUT and prints out
some information about the build (version, projects, etc) and then
the summarys sections. It will additionally print a warning if
internal debugging is enabled.
Example output:
Open MPI configuration:
-----------------------
Version: 3.0.0 a1
Build Open Platform Abstration project: yes
Build Open Runtime project: yes
Build Open MPI project: yes
Build Open SHMEM project: no
MPI C++ bindings (deprecated): no
MPI Fortran bindings: mpif.h, use mpi, use mpi_f08
Debug build: yes
Transports
-----------------------
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no
KNEM Shared Memory: no
Linux CMA IPC: no
Mellanox MXM: no
Open UCX: no
OpenFabrics libfabric: no
OpenFabrics Verbs: no
portals4: no
QLogic Infinipath (PSM): no
tcp: yes
XPMEM Shared Memory: no
Resource Managers
-----------------------
Cray Alps: no
Grid Engine: no
LSF: no
Slurm: yes
Torque: yes
INTERNAL DEBUGGING IS ENABLED. DO NOT USE THIS BUILD FOR PERFORMANCE MEASUREMENTS!
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit fixes a bug in the opal key value parser that might cause
the filename parser to go past the beginning of the string.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>