mtl_ofi_provider_include (resp. mtl_ofi_provider_exclude) can be used
to specify which provider(s) the OFI MTL can select (resp. ignore).
e.g. --mca mtl_ofi_provider_include "psm,sockets"
By default, mtl_ofi_provider_exclude is set to "sockets,mxm".
This deprecates the old MCA var named "mtl_ofi_provider".
This commit does the following:
* s/ompi_check_treematch/ompi_topo_treematch/ (i.e., abide by the
prefix rule)
* change the value of ompi_topo_treematch_happy from yes/no to 0/1, so
that we can use -eq for numerical comparisons (vs. string
comparisons). It's the little things in life, no?
* Check the valueo f $OPAL_HAVE_HWLOC to ensure that hwloc support is
enabled. If not, disqualify treematch from building.
* Fixes a few places that were underquoted
* Convert from "test ... -a ..." to "test ... && test ..."
Fixesopen-mpi/ompi#797
The prior code was checking string constants (which are #defines from
configure) against NULL. They can never be NULL, so the checks were
overly-defensive. If the preprocessor macros do not exist, we'll get
a different compiler error. So remove the dead code.
This fixes CID 72349.
A helper method in Request.java could cause a crash
if the request array that was passed contained nulls.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
This commit rewrites parts of libnbc to fix issues identified by
coverity and myself. The changes are as follows:
- libnbc function would return invalid error codes (internal to
libnbc) to the mpi layer. These codes names are of the form
NBC_. They do not match up with the error codes expected by the mpi
layer. I purged the use of all these error codes with the exception
of NBC_OK and NBC_CONTINUE in progress. These codes are used to
identify when a request handle is complete.
- Handles and schedules were leaked by all collective routines on
error. A new routine was added to return a collective handle
(NBC_Return_handle).
- Temporary buffers containting in/out neighbors for neighborhood
collectives were always leaked.
- Neigborhood collectives contained code to handle MPI_IN_PLACE which
is never a valid input for the send or receive buffer. Stipped this
code out.
- Files were inconsistently named. Most are nbc_isomething.c but one
was named coll_libnbc_ireduce_scatter_block.c.
- Made the NBC_Schedule "structure" and object so it can be
retained/released. This may enable the use of schedule caching at a
later time. More testing will be needed to ensure the caching code
works. If it doesn't the code should be stripped out completely.
- Added code to simply common case of scheduling send/recv +
barrier.
- Code cleanup for readability.
The code now passes the clang static analyzer.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Some OFI providers such as "sockets" are used for debugging
purposes mostly. For these providers, other components usually
offer better performance -- e.g. for sockets, the BTL/TCP would
be a better choice.
Thus, we chose to ignore some providers unless explicitly asked
by the user on the command line:
e.g. --mca mtl_ofi_provider sockets
Added Cloneable to the implemented interface list as per
Coverity suggestion. The required methods were already
implemented, but it was not explicitly stated. This is
an intent revealing change.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
When configured with --enable-picky
topo_base_lazy_init.c compiles with a warning:
CC base/topo_base_lazy_init.lo
base/topo_base_lazy_init.c:46:67: warning: implicit conversion from enumeration type 'enum mca_base_register_flag_t' to different enumeration type 'mca_base_open_flag_t' (aka 'enum mca_base_open_flag_t') [-Wenum-conversion]
err = mca_base_framework_open (&ompi_topo_base_framework, MCA_BASE_REGISTER_DEFAULT);
This commit fixes this implicit conversion problem.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
our optimized default file view. Otherwise, performance will suffer. file_get_view should still return the correct filetype, not our optimized default file view
Includes java bindings for MPI_GET_ELEMENTS_X and
MPI_STATUS_SET_ELEMENTS_X. This PR also adds the
Count object which represents MPI_Count.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
- some application use MPI_File_delete as a collective function (e.g. IOR), which I think is not really covered by the standard. Right now, one process succeeds and theother ones return an error code. Fix that by not returning no error if the file that we try to delete does not exist anymore, to make these applications work.
Retain inline progress function for ofi
mtl, but have a non-inlined progress function
which is registered with the opal progress
mechanism.
@jithinjosepkl
I've bad news about the psm provider. I still notice
segfaults - not always - but frequently at finalize
when using the psm provider. I don't notice this
when using the sockets provider.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
only define the unique fortran symbol depending on
- CAPS
- PLAIN
- SINGLE_UNDERSCORE
- DOUBLE_UNDERSCORE
and bind the f08 symbol to the uniquely defined C symbol.
Use real data structures to make the code simpler.
(perl script written by Jeff)
at Inria Bordeaux. This allows us to take advantage of the remap
capability of MPI to rearrange the ranks beased on the weights
povided by the application.
Fix the indentation and protect with __DEBUG__ one fprintf.
Add the Cecill-B license to the imported library.
Fix a compiler warning.
Restrict the TreeMatch dependencies.
The TreeMatch software is released under BSD3 (as indicated by their
copyright information @
https://gforge.inria.fr/scm/viewvc.php/COPYING?view=markup&root=treematch).
Update the README.
Even if the mutex is actually located in
sm_data->sm_offset_ptr->mutex, have sm_data->mutex point to it. This
avoids a few #if blocks that are otherwise identical.
There are a few places where adding the @param for the variable
javadoc wants does not make sense, so I added suppression statements
in those areas.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
Bindings for the MPI_WIN_FLUSH_LOCAL, MPI_WIN_FLUSH_LOCAL_ALL, MPI_WIN_ALLOCATE, MPI_WIN_ALLOCATE_SHARED, and MPI_COMM_SPLIT_TYPE. Also added several necessary constants.
Signed-off-by: Nathaniel Graham ngraham@lanl.gov
Java bindings for the following functions: MPI_RACCUMULATE, MPI_GET_ACCUMULATE, MPI_RGET_ACCUMULATE, MPI_WIN_LOCK_ALL, MPI_WIN_UNLOCK_ALL, MPI_WIN_SYNC, MPI_WIN_FLUSH, MPI_WIN_FLUSH_ALL, MPI_COMPARE_AND_SWAP, and MPI_FETCH_AND_OP. Also includes Java bindings for the Operations MPI_REPLACE and MPI_NO_OP.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
@ggouaillardet was right -- we should have put the
ompi_buffer_detach_f08() function in the use-mpi-f08 directory to
begin with. Putting it in the mpif-h directory made it complicated as
to whether the function would be built or not (e.g., whether weak
symbols were supported or not, whether the profiling layer was
disabled or not, ...etc.).
Just put it in the use-mpi-f08 directory and always build it (when the
mpi_f08 module is built, of course), and keep it simple.
Since there is no profiling version of the f08 buffer_detach function
(or, more specifically, the Fortran compile does the name mangling of
MPI and PMPI to the back-end C function for us), ensure that it is
only compiled once.
Also, per Gilles' observation, the f08-related #pragmas are no longer
relevant.
Add an mpi_f08-specific implementation for MPI_BUFFER_DETACH.
Per MPI-3.1:3.6, p45, the buffer argument is ignored in
MPI_BUFFER_DETACH for mpif.h and the mpi module. But in the mpi_f08
module, the buffer argument is treated like it is in the C binding.
No real functional changes:
* Reduce #if's a little -- have a single "no hwloc" and "hwloc"
section.
* Make a common subroutine (no_hwloc_support()) for when we don't have
any hwloc support
* affinity: will build unless disabled.
* cr: will build if FT is enabled, unless it is disabled. It will
also complain/abort if you --with-mpi-ext=cr, but FT is disabled.
* example: will only build if --with-mpi-ext=example (and .ompi_ignore
is removed)
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
The definition of MPI_T_pvar_get_index was incorrect. This commit
fixes the definition and adds a missing return code.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit fixes several bugs in the static request objects used by
ob1 for blocking send/receive operations.
- Fix memory leak when using MPI_THREAD_MULTIPLE. Requests were
allocated off the free list but were destructed and NOT returned.
- Fix double-destruct of static objects. There is no reason to
CONSTRUCT/DESTUCT the static object for each send/receive
operation. This adds overhead and no benefit. To keep the code
clean helper functions have been added to finalize ob1 send/receive
requests.
- Remove now unnecessary include of alloca.h.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This new MTL runs over PSM2 for Omni Path. PSM2 is a descendant of PSM
with changes to support more ranks and some MPI-3 features like mprobe.
PSM2 will only support Omni Path networks; PSM only supports True Scale.
Likewise, the existing PSM MTL will continue to be maintained for True
Scale, while the PSM2 MTL is developed and maintained for Omni Path.
from the message queues (a debugging feature). With this approach
all blocking (single threaded) requests are allocated from the main
freelist, so they will be accounted for during the message queues
investigation).
We've seen this a few times (e.g.,
http://www.open-mpi.org/community/lists/users/2015/06/27057.php
reported via @siegmargross). I'm not entirely sure why it happens --
the best I can come up with is a poorly-synchronized network
filesystem and/or a bug in "make". For example: this code hasn't
changed in forever, and it only happens to users *sometimes*.
Regardless, avoid the error altogether by removing the file before
making the sym link (it should be a sym link anyway -- if there's
something there, it should be safe to remove it before we re-create
the sym link that should be there in the first place).
(cherry picked from commit 0edd265ea045e649c9489e3cb8fdb657800d95c3)
The Portals4 MTL allocates two Portals IDs requesting specific
well-known IDs and assumes that those IDs are allocated. If those IDs
are in use, PtlPTAlloc() will allocate a different ID. This commit
verifies that the requested IDs were allocated.
CID 71734 Self assignment (NO_EFFECT)
This code has no effect. The original author of the offending code
does not remember why the self-assignment is there. Fortran
MPI_Win_get_attr tests are working with or without it so remove the
code.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>