At least with some providers (sockets and GNI), the mprobe/mrecv
ofi mtl methods were incorrect. For these two providers at least
one must supply the original tag and mask bits used with the
prior FI_PEEK | FI_CLAIM request that had been used to probe for
the message.
These providers take a strict interpretation of the following sentence
from the libfabric fi_tagged man page:
```
Claimed messages can only be retrieved using a subsequent, paired receive operation with the FI_CLAIM flag set.
```
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
since some tasks migth end up having /dev/null as their stdin,
simply avoid pipe creation and destruction for these tasks.
From a pragmatic and MPI point of view, and unless explicitly required
otherwise, all MPI tasks but (the first) one end up with /dev/null
as their stdin.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
* Fix typo in the `opal_atomic_wmb` declaration.
* Fix lingering `eieio` reference in the XL assembly to be `lwsync`
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
ompio has historically changed the WRONLY flag provided by the applicaiton
to RDWR to allow for the data sieving optimization within the two-phase I/O
fcoll component. This change did not have a performance impact
on regular UNIX file systems, but seems to hurt performance on NFS (and maybe Lustre?)
So provide an option that allows to keep the WRONLY option, and raise an error
if tha fcoll/two-phase would actually like to use the data sieving.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
of "unset".
mtl/psm2: Update some shadow mca parameters to use the default "unset".
mtl/psm2: Add new shadow parameter to allow specifying the service level.
Signed-off-by: Matias A Cabral <matias.a.cabral@intel.com>
Sometimes, the ethernet interfaces can get quite high kernel indices. struct
ifreq (see netdevice(7)) defines ifr_ifindex to be int's. The OOB component
used int16_t internally for matching (in case of -mca oob_tcp_if_[in|ex]clude)
which meant that any interface index > 32767 would never be matched because the
integer would be truncated to int16_t upon return from the function. OOB would
then refuse to work because it didn't find any usable interfaces and MPI job
would abort.
Signed-off-by: Wojtek Wasko <wwasko@nvidia.com>
gcc 5.2 complains:
```
mtl_ofi_component.c: In function ‘ompi_mtl_ofi_finalize’:
mtl_ofi_component.c:613:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
if (ret = fi_close((fid_t)ompi_mtl_ofi.fabric)) {
^
```
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Before this commit, the presence of usNIC devices -- which will
(currently) return no data when fi_getinfo() is queried for tagged
matching providers -- would cause an error message to be displayed.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The value of ret is negative (e.g., -61), but it is displayed in the
help message as `%zd`, which renders as unsigned (i.e., a giant
positive value). So make sure to negate the negative value before
rendering it (e.g., so we display "61", not "4294967235").
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
so the bool type is defined when using old compilers that do not support gcc builtin atomics (such as gcc 4.1.x from CentOS 5)
Fixesopen-mpi/ompi#4478
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
higher priority than rdma and default to psm2.
Context: the Intel Omni-path driver (hfi1) has verbs support, so the openib
btl is available to use. However, at a bad performance. Without this
change osc rdma using btl openib is the default choice when running on Intel
Omni-path, with a lower performance than osc pt2pt over mtl psm2.
Signed-off-by: Matias A Cabral <matias.a.cabral@intel.com>
The following issues have been fixed for `mindist`:
- computing the job map on the backend nodes
- using slots count (`-host node1:<s1>,nodeN:<sN>`)
- fixed `dist:span` job mapping method
- fixed `oversubcribe` option with `-host`
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
Rework the logic to handle the out-of-sequence fragments on the receiver
side. A large number of OOS messages are still arriving even in single
threaded scenarios.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
set proper error codes in mca_fs_ufs_file_open by mapping the errno value to
the MPI error code.
Refs. open-mpi/ompi#4443
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
This is a bug fix based on a problem reported on the mailing list.
For very large read/write operations, ompio breaks the operation
down into multiple cycles. The problem was that
one of the variables required to maintain its values
across the different cycles did not do that, and because
of that the calculations of the memory offsets was wrong.
Fixes issue #4453
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
the fs/lustre component has missed out on a number of updates to the fs/ufs component.
This commit tries to import all the changes performed on the fs/ufs component
w.r.t to the file_open operation, including updates on how the amode is set,
error is propegated and setting the fs_block_size value (which is required for
locking purposes).
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
set proper error codes in mca_fs_ufs_file_open by mapping the errno value to
the MPI error code. Fixes an issue reported on the mailing by Wei-keng Liao
Fixes Issue #4443
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>