keep track of the sizeof the blocklen_per_process and displs_per_process
on the aggregator datastructure to minimze the number of realloc function
calls required in the shuffle_init operation.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
This commit attempts to update the romio io component to not use
functions removed in MPI-3.0 (2012). This is a first cut and will
probably need to be reviewed for correctness.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
romio assumes that all predefined datatypes are contiguous. Because of
the (terribly named) composed datatypes MPI_SHORT_INT, MPI_DOUBLE_INT,
MPI_LONG_INT, etc this is an incorrect assumption. The simplest way to
fix this is to override the MPI_Type_get_envelope and
MPI_Type_get_contents calls with calls that will work on these
datatypes. Note that not all calls to these MPI functions are
replaced, only the ones used when flattening a non-contiguous
datatype.
References #5009
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Fixes issue #5069, which relates a BigMPI bug with the use of
MPI_Type_vectpor to construct very large datatypes (>2GB).
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
This commit fixes a segfault in mtl-portals4 finalize(). The segfault
occurs if finalize() is called without any calls to add_procs(). This
commit resolves the segfault by skipping the progress() loop in
finalize() if the Portals was not initialized.
Signed-off-by: Todd Kordenbrock (thkgcode@gmail.com)
Per 0ab6b201fed, note in the MPI_Comm_spawn_multiple.3in man page that
the array_of_commands does not need to be terminated -- it just need
to have exactly "count" entries. In the Fortran binding, at least,
this is different than in prior released versions of Open MPI (it's
not a backwards incompatibility, since prior versions of Open MPI
required array_of_commands to be blank-string-terminated in Fortran --
this change makes Open MPI be *less* restrictive, and therefore still
backwards compatible).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
MPI defines the "argv" param to Fortran MPI_COMM_SPAWN as being
terminated by a blank string. While not precisely defined (except
through a non-binding example, Example 10.2, MPI-3.1 p382:6-29), one
can infer that the "array_of_argv" param to Fortran
MPI_COMM_SPAWN_MULTIPLE is also a set of argv, each of which are
terminated by a blank line.
The "array_of_commands" argument to Fortran MPI_COMM_SPAWN_MULTIPLE is
a little less well-defined. It is *assumed* to be of length "count"
(another parameter to MPI_COMM_SPAWN_MULTIPLE) -- and *not* be
terminated by a blank string. This is also given credence by the same
example 10.2 in MPI-3.1.
The previous code assumed that "array_of_commands" should also be
terminated by a blank line -- but per the above, this is incorrect.
Instead, we should just parse our "count" number of strings from
"array_of_commands" and *not* look for a blank line termination.
This commit separates these two cases:
* ompi_fortran_argv_blank_f2c(): parse a Fortran array of strings out
and stop when reaching a blank string.
* ompi_fortran_argv_count_f2c(): parse a Fortran array of strings out
and stop when "count" number of strings have been parsed.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
javah is no more available from Java 10, so try
javac -h first (available since Java 8) and fallback on javah
Refs. open-mpi/ompi#5000
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
the ompio module resets the amode from WRONLY to RDWR in order
to accoomodate data sieving in the two-phase fcoll componet. This
leads however to an error if MPI_MODE_SEQUENTIAL has been requested
by the user, since MODE_SEQUENTIAL is incompatible with MODE_RDWR.
SInce the change to the amode was done after opening the file for
individual file pointers but before opening the file for shared filepointers,
this lead to an error message in the sharedfp component.
Note, that data sieving is never necessary if MODE_SEQUENTIAL is set,
so this should not be a problem for any scenario.
Fixes#4991
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
instead of using a temporary buffer and copy data into the temp buffer before sending, use a derived datatype to describe the data that needs to be sent during a cycle in the collective I/O operation.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
Implements recursive doubling algorithm for MPI_Scan and MPI_Exscan.
The algorithm preserves order of operations so it can be used both
by commutative and non-commutative operations.
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
This commit fixes a bug is osc/rdma that can occur if the total size
of the shared memory segment gets larger than 4 GiB. The bug was
caused by a typo. The type of my_base_offset should have been size_t
not int.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This will almost certainly never happen, but be defensive and
guarantee that we never return an uninitialized variable.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Ensure to initialized ret_code. This problem will likely never occur
in practice, but we might as well be defensive about it.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The various RMA functions need to have the asynchronous property on
all buffers. This property was missing and some buffers were
incorrectly marked as intent(in). This commit fixes the function
signatures.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
somehow the flag indicating to gather performance data
on collective io operations has changed to 1 accidentally.
Should be 0 ( false) by default.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
never got to move this sharedfp component into anything
usable. Can easily be restored if necessary.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
plfs components are at this point not utilized by anybody as far as I know.
Easy to bring back if we want to.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
This commit is a large update to the osc/rdma component. Included in
this commit:
- Add support for using hardware atomics for fetch-and-op and single
count accumulate when using the accumulate lock. This will improve
the performance of these operations even when not setting the
single intrinsic info key.
- Rework how large accumulates are done. They now block on the get
operation to fix some bugs discovered by an IBM one-sided test. I
may roll back some of the changes if the underlying bug in the
original design is discovered. There appear to be no real
difference (on the hardware this was tested with) in performance so
its probably a non-issue. References #2530.
- Add support for an additional lock-all algorithm: on-demand. The
on-demand algorithm will attempt to acquire the peer lock when
starting an RMA operation. The lock algorithm default has not
changed. The algorithm can be selected by setting the
osc_rdma_locking_mode MCA variable. The valid values are two_level
and on_demand.
- Make use of the btl_flush function if available. This can improve
performance with some btls.
- When using btl_flush do not keep track of the number of put
operations. This reduces the number of atomic operations in the
critical path.
- Make the window buffers more friendly to multi-threaded
applications. This was done by dropping support for multiple
buffers per MPI window. I intend to re-add that support once the
underlying performance bug under the old buffering scheme is
fixed.
- Fix a bug in request completion in the accumulate, get, and put
paths. This also helps with #2530.
- General code cleanup and fixes.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>