set grp_local_rank as MPI_UNDEFINED before invoking
ompi_comm_nexcid() in order to benefit from the optimizations
introduced in open-mpi/ompi@68167ec879
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
This fixes a regression in sockets provider which could return -EINTR value
from fi_cq_read() due to a syscall being interrupted. The error value is
currently interpreted as fatal condition. Relax the rule so that we can retry
fi_cq_read() operation.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
OMPI_FORTRAN_USEMPIF08_MOD macro was removed in open-mpi/ompi@791bcee6c0
so this macro is now manually expanded to mpi/fortran/use-mpi-f08/mod
Thanks to Nathan T. Weeks for reporting
Refs open-mpi/ompi#3605
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
ompio has the unique problem, that mca parameters set in the io/ompio component
have to be accessible from other frameworks as well. This is mostly done to avoid
a replication in the parameter names and to reduce the number of mca parameters that
and end-user has to worry about.
This commit introduces a generic function to retrieve ompio mca parameters, the function pointer
is stored on the file handle. It replaces two functions that used the same concept already for
one parameter each.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
This commit adds support for fetch-and-op atomics. This is needed
because and and or are irreversible operations so there needs to be a
way to get the old value atomically. These are also the only semantics
supported by C11 (there is not atomic_op_fetch, just
atomic_fetch_op). The old op-and-fetch atomics have been defined in
terms of fetch-and-op.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit renames the arithmetic atomic operations in opal to
indicate that they return the new value not the old value. This naming
differentiates these routines from new functions that return the old
value.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit eliminates the old opal_atomic_bool_cmpset functions. They
have been replaced by the opal_atomic_compare_exchange_strong
functions.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
It should have always been #define'd in order to correctly handle the
multi-threaded case.
Also fix indentation in ompi/mpi/c/comm_get_errhandler.c
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
This commit fixes the following bugs:
- Allow a btl to be used for communication if it can communicate with
all non-self peers and it supports global atomic visibility. In
this case CPU atomics can be used for self and the btl for any
other peer.
- It was possible to get into a state where different threads of an
MPI process could issue conflicting accumulate operations to a
remote peer. To eliminate this race we now update the peer flags
atomically.
- Queue up and re-issue put operations that failed during a BTL
callback. This can occur during an accumulate operation. This was
an unhandled error case.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Cleanup several places where abstraction violations crept into OMPI layer (direct reference of ORTE). Add some missing includes that were exposed by this change.
Note that this compiles, but I haven't tested it for execution yet. Handing it over to Noah Evans for completion
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
The current versions of these functions have a fatal flaw. If a
errhandler set and free call is made by another thread while the
thread calling get is between the cmpset and retain then we will
retain an invalid object. Fixing this by just using locking. This is
not a critical path so this should be ok.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Turns out there are edge cases where an MTL's isend
method may end up marking a send request complete prior
to returning to the CM code. The would end up causing
problems in the bsend path since the ompi_request_complete
would end up getting invoked a second time on this request.
This ended up causing segfaults, etc. in ompi_request_complete .
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
At least with some providers (sockets and GNI), the mprobe/mrecv
ofi mtl methods were incorrect. For these two providers at least
one must supply the original tag and mask bits used with the
prior FI_PEEK | FI_CLAIM request that had been used to probe for
the message.
These providers take a strict interpretation of the following sentence
from the libfabric fi_tagged man page:
```
Claimed messages can only be retrieved using a subsequent, paired receive operation with the FI_CLAIM flag set.
```
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
ompio has historically changed the WRONLY flag provided by the applicaiton
to RDWR to allow for the data sieving optimization within the two-phase I/O
fcoll component. This change did not have a performance impact
on regular UNIX file systems, but seems to hurt performance on NFS (and maybe Lustre?)
So provide an option that allows to keep the WRONLY option, and raise an error
if tha fcoll/two-phase would actually like to use the data sieving.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
of "unset".
mtl/psm2: Update some shadow mca parameters to use the default "unset".
mtl/psm2: Add new shadow parameter to allow specifying the service level.
Signed-off-by: Matias A Cabral <matias.a.cabral@intel.com>
gcc 5.2 complains:
```
mtl_ofi_component.c: In function ‘ompi_mtl_ofi_finalize’:
mtl_ofi_component.c:613:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
if (ret = fi_close((fid_t)ompi_mtl_ofi.fabric)) {
^
```
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Before this commit, the presence of usNIC devices -- which will
(currently) return no data when fi_getinfo() is queried for tagged
matching providers -- would cause an error message to be displayed.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The value of ret is negative (e.g., -61), but it is displayed in the
help message as `%zd`, which renders as unsigned (i.e., a giant
positive value). So make sure to negate the negative value before
rendering it (e.g., so we display "61", not "4294967235").
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
higher priority than rdma and default to psm2.
Context: the Intel Omni-path driver (hfi1) has verbs support, so the openib
btl is available to use. However, at a bad performance. Without this
change osc rdma using btl openib is the default choice when running on Intel
Omni-path, with a lower performance than osc pt2pt over mtl psm2.
Signed-off-by: Matias A Cabral <matias.a.cabral@intel.com>
Rework the logic to handle the out-of-sequence fragments on the receiver
side. A large number of OOS messages are still arriving even in single
threaded scenarios.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
set proper error codes in mca_fs_ufs_file_open by mapping the errno value to
the MPI error code.
Refs. open-mpi/ompi#4443
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
This is a bug fix based on a problem reported on the mailing list.
For very large read/write operations, ompio breaks the operation
down into multiple cycles. The problem was that
one of the variables required to maintain its values
across the different cycles did not do that, and because
of that the calculations of the memory offsets was wrong.
Fixes issue #4453
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>