Amazing how a bad instruction scheduling can have such a drastic impact
on the code performance. With this change, the get a boost of at least
50% on the performance of data with a small blocklen and/or count.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Optimize contiguous loops by collapsing them into a single element.
During datatype optimization collapse similar elements into larger
blocks.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Upon detecting a datatype loop representation skip the entire loop
according the the remaining space.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
- optimize handling of contiguous with gaps datatypes.
- fixes a performance issue for all datatypes with a count of 1.
- optimize the pack/unpack of contiguous with gaps datatype.
- optimize the case of blocklen == 1
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Move toward a base type of vector (count, type, blocklen, extent, disp)
with disp and extent applying toward the count repertition and blocklen
being a contiguous memory of type type.
Implement 2 optimizations on this description used during type_commit:
- collapse: successive similar datatype descriptions are collapsed
together with an increased count.
- fusion: fuse successive datatype descriptions in order to minimize the
number of resulting memcpy during pack/unpack.
Fixes at the OMPI datatype level including:
- Fix the create_hindexed and vector creation.
- Fix the handling of [get|set]_elements and _count.
- Correctly compute the dispacement for block indexed types.
- Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Unfortunately, https://github.com/open-mpi/ompi/pull/6797 was merged
before all feedback was received (39b799d936). This PR is a minor
addendum to that commit.
This PR simply removes a meaningless `= {0}` operation.
The use of gethostname() here -- and many other places in the code
base -- is technically unsafe. See
https://github.com/open-mpi/ompi/issues/6801 for a further description
of the issue and a suggested fix. But the risk is quite low;
real-world hostnames are usually much shorter than
OPAL_MAXHOSTNAMELEN. Hence, this PR just removes the meaningless
operation and leaves a real fix for gethostname() usage to a potential
future PR.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Previously the verbose output of if_linux_ipv6_open looked like this:
found interface ab c: 0ab: a b: abc: 0 0: a 0🔡 0 0 scope 0
This changes the output to:
found interface eth0 inet6 ab0c🆎a0b🔤0:a00:abcd:0 scope 0
Signed-off-by: Orivej Desh <orivej@gmx.fr>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
A typical parameter of opal_output_verbose() is ORTE_NAME_PRINT(...),
which is an expensive macro.
Most of the time, this is unnecessary since the verbosity level is too high.
Make opal_output_verbose() a macro so such arguments are only evaluated if the
verbosity is low enough.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Since array_of_errcodes is only allocated when MPI_ERRCODES_IGNORE is not used,
it should not be cleaned when MPI_ERRCODES_IGNORE is used.
Correctly allocate array_of_errcodes with the right size (e.g. maxprocs).
Thanks Gyevi-Nagy Laszlo for reporting this issue.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
do not check some input parameters when an {in,out}degree is zero
Thanks Junchao Zhang for analyzing and reporting this issue.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
To avoid fully initializing the osc/ucx component for MPI application
that are not using One-Sided functionality, the initialization happens
at the first MPI window creation.
This commit ensures atomicity of global state modifications.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
In common_ompi_aggregators calc_cost routine:
do not cast the real division to an int intermediately.
This patch removes the obsolete int variable c and assigns
the result of the P_a/P_x division directly to n_as.
With the intermediate int c variable, n_as gets 0 if P_a < P_x,
resulting in a division by 0 when computing n_s.
Signed-off-by: Harald Klimach <harald.klimach@uni-siegen.de>