update the configure logic of the gpfs component
based on what we learned from user feedback over the last
two years for the other components
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
Delete check for amode which should go to a higler layer, e.g. ompi_file_open.
Only perform Info value check if key is found.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
Until now sqrt(n) was missed as a factor for odd square numbers n. This
lead to suboptimal results of MPI_Dims_create for input numbers like 9,
25, 49, ... Fix the results by including sqrt(n) in the search for
factors.
Refs: #7186
Signed-off-by: Michael Lass <bevan@bi-co.net>
In fcoll_two_phase_supprot_fns.c: calculation of the aggregator index
failed for large offsets on 32bit machine, due to improper handling of
64bit offsets.
Fixes Issue #7110
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
Compiling OMPI on cray systems using latest Cray compilers (clang based)
yielded some compiler warnings from ompio/lustre. Squash these warnings.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
This is closely related to Platform-MPI's old -prot feature.
The long-format of the tables it prints could look like this:
> Host 0 [myhost001] ranks 0 - 1
> Host 1 [myhost002] ranks 2 - 3
> Host 2 [myhost003] ranks 4
> Host 3 [myhost004] ranks 5
> Host 4 [myhost005] ranks 6
> Host 5 [myhost006] ranks 7
> Host 6 [myhost007] ranks 8
> Host 7 [myhost008] ranks 9
> Host 8 [myhost009] ranks 10
>
> host | 0 1 2 3 4 5 6 7 8
> ======|==============================================
> 0 : sm tcp tcp tcp tcp tcp tcp tcp tcp
> 1 : tcp sm tcp tcp tcp tcp tcp tcp tcp
> 2 : tcp tcp self tcp tcp tcp tcp tcp tcp
> 3 : tcp tcp tcp self tcp tcp tcp tcp tcp
> 4 : tcp tcp tcp tcp self tcp tcp tcp tcp
> 5 : tcp tcp tcp tcp tcp self tcp tcp tcp
> 6 : tcp tcp tcp tcp tcp tcp self tcp tcp
> 7 : tcp tcp tcp tcp tcp tcp tcp self tcp
> 8 : tcp tcp tcp tcp tcp tcp tcp tcp self
>
> Connection summary:
> on-host: all connections are sm or self
> off-host: all connections are tcp
In this example hosts 0 and 1 had multiple ranks so "sm" was more
meaningful than "self" to identify how the ranks on the host are
talking to each other. While host 2..8 were one rank per host so
"self" was more meaningful as their btl.
Above a certain number of hosts (12 by default) the above table gets too big
so we shrink to a more abbreviated looking table that has the same data:
> host | 0 1 2 3 4 8
> ======|====================
> 0 : A C C C C C C C C
> 1 : C A C C C C C C C
> 2 : C C B C C C C C C
> 3 : C C C B C C C C C
> 4 : C C C C B C C C C
> 5 : C C C C C B C C C
> 6 : C C C C C C B C C
> 7 : C C C C C C C B C
> 8 : C C C C C C C C B
> key: A == sm
> key: B == self
> key: C == tcp
Then above 36 hosts we stop printing the 2d table entirely and just print the
summary:
> Connection summary:
> on-host: all connections are sm or self
> off-host: all connections are tcp
The options to control it are
-mca comm_method 1 : print the above table at the end of MPI_Init
-mca comm_method 2 : print the above table at the beginning of MPI_Finalize
-mca comm_method_max <n> : number of hosts <n> for which to print a full size 2d
-mca comm_method_brief 1 : only print summary output, no 2d table
-mca comm_method_fakefile <filename> : for debugging only
* printing at init vs finalize:
The most important difference between these two is that when printing the table
during MPI_Init(), we send extra messages to make sure all hosts are connected to
each other. So the table ends up working against the idea of on-demand connections
(although it's only forcing the n^2 connections in the number of hosts, not the
total ranks). If printing at MPI_Finalize() we don't create any connections that
aren't already connected, so the table is more likely to have "n/a" entries if
some hosts never connected to each other.
* how many hosts <n> for which to print a full size 2d table
The option -mca comm_method_max <n> can be used to specify a number of hosts <n>
(default 12) that controls at what host-count the unabbreviated / abbreviated
2d tables get printed:
1 - n : full size 2d table
n+1 - 3n : shortened 2d table
3n+1 - inf : summary only, no 2d table
* brief
The option -mca comm_method_brief 1 can be used to skip the printing of the 2d
table and only show the short summary
* fakefile
This is a debugging option that allows easeir testing of all the printout
routines by letting all the detected communication methods between the hosts
be overridden by fake data from a file.
The source of the information used in the table is the .mca_component_name
In the case of BTLs, the module always had a .btl_component linking back to the
component. The vars mca_pml_base_selected_component and ompi_mtl_base_selected_component
offer similar functionality for pml/mtl.
So with the ability to identify the component, we can then access
the component name with code like this
mca_pml_base_selected_component.pmlm_version.mca_component_name
See the three lookup_{pml,mtl,btl}_name() functions in hook_comm_method_fns.c,
and their use in comm_method() to parse the strings and produce an integer
to represent the connection type being used.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
This is based on a bug reported on the mailing list using a netcdf testcase.
The problem occurs if processes are using a custom file view, but on some
of them it appears as if the default file view is being used. Because of that,
the simple-grouping option lead to different number of aggregators used on different
processes, and ultimately to a deadlock. This patch fixes the problem by not using
the file_view size anymore for the calculation in the simple-grouping option,
but the contiguous chunk size (which is identical on all processes).
Fixes issue #7109
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
In order to work around an issue with flang based compilers,
avoid declaring bind(C) constants and use plain Fortran parameter
instead.
For example,
type(MPI_Comm), bind(C, name="ompi_f08_mpi_comm_world") OMPI_PROTECTED :: MPI_COMM_WORLD
is changed to
type(MPI_Comm), parameter :: MPI_COMM_WORLD = MPI_Comm(OMPI_MPI_COMM_WORLD)
Note that in order to preserve ABI compatibility, ompi/mpi/fortran/use-mpi-f08/constants.{c,h}
have been kept even if its symbols are no more referenced by Open MPI.
Refs. open-mpi/ompi#7091
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
This commit fixes a segfault in mtl-portals4 finalize(). The segfault
occurs if finalize() is called without any calls to add_procs(). This
commit resolves the segfault by skipping the flow control fini() call if
Portals4 was not initialized.
Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
zero-size derived datatypes are now flagged as OPAL_DATATYPE_FLAG_CONTIGUOUS
so update mca_pml_ucx_init_datatype() to correctly handle them.
Since 'size' is a 'size_t', the assertion can simply be removed.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
This commit updates the coll/basic component to correctly order sends
and receives for cartesian communicators with cyclic boundaries. This
addresses an issue identified by mpi-forum/mpi-issues#153. This issue
occurs when the size in any dimension is 1. This gives the same
neighbor in the positive and negative directions. The old code was
sending and receiving in the same order so the -1 buffer contained
the +1 result and vise-versa. The problem is addressed by using
unique tags for each send. This should cover both the case where
overtaking is allowed and is not allowed. The former case will be
possible is a MPI_Cart_create_with_info() call is added to the
standard.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Change the ncounts argument to MPI_Count and use
MPI_Status_set_elements_x for enabling read/write operations beyond
the 2GB limit.
Thanks to Richard Warren from the HDF5 group for reporting the issue
and providing the suggested fix for romio.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
individual read/write operations exceeding 2GB fail in ompio
due to improper conversions from size_t to int in two different
locations. This commit fixes an issue reported by Richard Warren
from the HDF5 group.
Fixes Issue #397
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
one off patch for v4.0.x. for some reason commit on master
didn't have this problem.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 5f3dbdb5c8a94a4f426ecca1a3a91c83035f956c)
Note that this commit is actually a cherry-pick from the v4.0.x
branch. This is the opposite direction than what we nornmally do: we
usually commit to master first and then cherry-pick to the release
branches (vs. the other way around).
As is probably evident from the original commit message above, through
a comedy of errors, this commit was actually applied to the v4.0.x
branch first and then cherry-picked back to master (i.e., the problem
*did* exist in the original master commit
3aca4af548a3d781b6b52f89f4d6c7e66d379609, but it was not recongized at
the time).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
INTERNAL: STL-59403
The OFI (libfabric) MTL does not respect the maximum message size
parameter that OFI provides in the fi_info data.
This patch adds this missing max_msg_size field to the mca_ofi_module_t
structure and adds a length check to the low-level send routines.
Change-Id: I05aa71d332f2df897133b30c28bf37d98f061996
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
Reviewed-by: Adam Goldman <adam.goldman@intel.com>
Reviewed-by: Brendan Cunningham <brendan.cunningham@intel.com>
I'm restoring the info function pointers to the IO module
but allowing the function pointers to be NULL (eg in ompio).
And letting romio321 set its function pointers for those
routines.
This means the info system uses the new OMPI-level info
system for most things, but skips it and uses the pre-existing
romio info system just for the romio module.
It's possible to convert romio, but I went a ways down that
path and found it kind of convoluted. Having pointers from
the lower level ADIO_File back to the higher level ompi_file_t
wasn't too bad, but I got stuck trying to figure out where/how
to register the infosubscribe_subscribe callbacks vs the way
initial k/v values are scattered around the romio code currently.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
open-mpi/ompi@0fe756d416 Introduced
a bug in coll/hcoll component. The ompi_requests allocated by
libhcoll would be treated as coll_base_nbc_request during
ompi_coll_base_retain_<> call. Afterwards this would lead to a
segv in the request cleanup.
Fix: since libhcoll interface does not distinguish between the
blocling/non-blocking requests use coll_base_nbc_request all the
time and initialize it properly in
coll/hcoll/get_coll_handle(). It is still within 2 cache lines.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
Within the shuffle iteration, the aggregators have to set a displacement array needed to receive data from other processes. The array had 1 extra element. We adjust the displacement index to match the number of elements.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
Within the shuffle iteration, the aggregators have to set a displacement array needed to receive data from other processes. The array had 1 extra element. We adjust the displacement index to match the number of elements.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>