Update PMIx to latest master to get supporting updates. For
connect/accept (part of comm_spawn as well), lookup locality for all
participating procs on the node and compute the relative locality so it
can be used for MPI operations.
Signed-off-by: Ralph Castain <rhc@pmix.org>
After the OPAL_MODEX_RECV call, remote_addrs was not freed in the error
path. Moved the free call into cleanup to ensure we always free this
memory before leaving the function.
Signed-off-by: William Zhang <wilzhang@amazon.com>
Added information on the type of objects provided in the list as well as
the required fields for them.
Signed-off-by: William Zhang <wilzhang@amazon.com>
The parameter names were misleading due to implying a single interface
instead of a list. This will provide more clarity in distinguishing the
list of interfaces from each individual interface.
Signed-off-by: William Zhang <wilzhang@amazon.com>
Start optimizing the code.
This commit divides the operations in 2 parts, the first, outside the
critical part, deals with partial blocks of predefined elements, and the
second, inside the critical path, only deals with full blocks of
elements. This reduces the number of expensive operations in the
critical path and results in a decent performance increase.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Amazing how a bad instruction scheduling can have such a drastic impact
on the code performance. With this change, the get a boost of at least
50% on the performance of data with a small blocklen and/or count.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Optimize contiguous loops by collapsing them into a single element.
During datatype optimization collapse similar elements into larger
blocks.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Upon detecting a datatype loop representation skip the entire loop
according the the remaining space.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
- optimize handling of contiguous with gaps datatypes.
- fixes a performance issue for all datatypes with a count of 1.
- optimize the pack/unpack of contiguous with gaps datatype.
- optimize the case of blocklen == 1
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Move toward a base type of vector (count, type, blocklen, extent, disp)
with disp and extent applying toward the count repertition and blocklen
being a contiguous memory of type type.
Implement 2 optimizations on this description used during type_commit:
- collapse: successive similar datatype descriptions are collapsed
together with an increased count.
- fusion: fuse successive datatype descriptions in order to minimize the
number of resulting memcpy during pack/unpack.
Fixes at the OMPI datatype level including:
- Fix the create_hindexed and vector creation.
- Fix the handling of [get|set]_elements and _count.
- Correctly compute the dispacement for block indexed types.
- Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Unfortunately, https://github.com/open-mpi/ompi/pull/6797 was merged
before all feedback was received (39b799d936). This PR is a minor
addendum to that commit.
This PR simply removes a meaningless `= {0}` operation.
The use of gethostname() here -- and many other places in the code
base -- is technically unsafe. See
https://github.com/open-mpi/ompi/issues/6801 for a further description
of the issue and a suggested fix. But the risk is quite low;
real-world hostnames are usually much shorter than
OPAL_MAXHOSTNAMELEN. Hence, this PR just removes the meaningless
operation and leaves a real fix for gethostname() usage to a potential
future PR.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Previously the verbose output of if_linux_ipv6_open looked like this:
found interface ab c: 0ab: a b: abc: 0 0: a 0🔡 0 0 scope 0
This changes the output to:
found interface eth0 inet6 ab0c🆎a0b🔤0:a00:abcd:0 scope 0
Signed-off-by: Orivej Desh <orivej@gmx.fr>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
A typical parameter of opal_output_verbose() is ORTE_NAME_PRINT(...),
which is an expensive macro.
Most of the time, this is unnecessary since the verbosity level is too high.
Make opal_output_verbose() a macro so such arguments are only evaluated if the
verbosity is low enough.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
This commit fixes an issue seem with some older versions of gcc
(verified to occur in gcc 6.x) where on x86_64 systems the
acquire memory barrier in C11 atomics acts as a no-op. On these
systems the three memory barriers should all be equivalent.
This is related to the error fixed in open-mpi/ompi@30119ee.
References #6655.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Remove compatibility code for multiple versions of BTL_IN_OPAL,
BTL_VERSION, and RCACHE_VERSION. This stuff was really only necessary
when we were actively swapping code between multiple release branches
that had large variations in core OMPI infrastructure. These large
variations have now been around for quite a while, so the need for
this "compat" layer is significantly reduced. It hasn't been removed
simply because a few of the "compat" names a slightly more friendly
than the real names (e.g., the SEND/RECV/PUT names).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This commit fixes an error in the 32-bit compare-and-swap atomic support
for Aries networks. The code was incorrectly using the non-fetching
version of cswap which was causing the routing to return
OPAL_ERR_BAD_ARG.
Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
This link-back seems to be breaking OMPI for some reason. I'm not sure we need it in PMIx anyway, but we'll investigate over there.
Signed-off-by: Ralph Castain <rhc@pmix.org>
The first category of issue I'm addressing is that recent code changes
seem to only consider -cpu-set as a binding option. Eg a command like
this
% mpirun -np 2 --report-bindings --use-hwthread-cpus \
--bind-to cpulist:ordered --map-by hwthread --cpu-set 6,7 hostname
which just round robins over the --cpu-set list.
Example output which seems fine to me:
> MCW rank 0: [..../..B./..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [..../...B/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
It should also be possible though to pass a --cpu-set to most other
map/bind options and have it be a constraint on that binding. Eg
% mpirun -np 2 --report-bindings \
--bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname
% mpirun -np 2 --report-bindings \
--bind-to hwthread --map-by ppr:2:node,pe=2 --cpu-set 6,7,12,13 hostname
The first command above errors that
> Conflicting directives for mapping policy are causing the policy
> to be redefined:
> New policy: RANK_FILE
> Prior policy: BYHWTHREAD
The error check in orte_rmaps_rank_file_open() is likely too aggressive.
The intent seems to be that any option like "--map-by whatever" will
check to see if a rankfile is in use, and report that mapping via rmaps
and using an explicit rankfile is a conflict.
But the check has been expanded to not just check
NULL != orte_rankfile
but also errors out if
(NULL != opal_hwloc_base_cpu_list &&
!OPAL_BIND_ORDERED_REQUESTED(opal_hwloc_binding_policy))
which seems to be only recognizing -cpu-set as a binding option and
ignoring -cpu-set as a constraint on other binding policies.
For now I've changed the
NULL != opal_hwloc_base_cpu_list
to
OPAL_BIND_TO_CPUSET == OPAL_GET_BINDING_POLICY(opal_hwloc_binding_policy)
so it hopefully only errors out if -cpu-set is being used as a binding
policy. Whether I did that right or not it's enough to get to the next
stage of testing the example commands I have above.
Another place similar logic is used is hwloc_base_frame.c where it has
/* did the user provide a slot list? */
if (NULL != opal_hwloc_base_cpu_list) {
OPAL_SET_BINDING_POLICY(opal_hwloc_binding_policy, OPAL_BIND_TO_CPUSET);
}
where it used to (long ago) only do that if
!OPAL_BINDING_POLICY_IS_SET(opal_hwloc_binding_policy)
I think the new code is making it impossible to use --cpu-set as anything
other than a binding policy.
That brings us past the error detection and into the real functionality, some of
which has been stripped out, probably in moving to hwloc-2:
% mpirun -np 2 --report-bindings \
--bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname
> MCW rank 0: [B.../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [.B../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
The rank_by() function in rmaps_base_ranking.c makes an array out of objects
returned from
opal_hwloc_base_get_obj_by_type(,,,i,)
which uses df_search(). That function changed quite a bit from hwloc-1 to 2
but it used to include a check for
available = opal_hwloc_base_get_available_cpus(topo, start)
which is where the bitmask from --cpu-set goes. And it used to skip objs that
had hwloc_bitmap_iszero(available).
So I restored that behavior in ds_search() by adding a "constrained_cpuset" to
replace start->cpuset that it was otherwise processing. With that change in
place the first command works:
% mpirun -np 2 --report-bindings \
--bind-to hwthread --map-by hwthread --cpu-set 6,7 hostname
> MCW rank 0: [..../..B./..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [..../...B/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
The other command uses a different path though that still ignored the
available mask:
% mpirun -np 2 --report-bindings \
--bind-to hwthread --map-by ppr:2:node:pe=2 --cpu-set 6,7,12,13 hostname
> MCW rank 0: [BB../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [..BB/..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
In bind_generic() the code used to call
opal_hwloc_base_find_min_bound_target_under_obj() which used
opal_hwloc_base_get_ncpus(), and that's where it would
intersect objects with the available cpuset and skip over ones
that were't available. To match the old behavior I added a few
lines in bind_generic() to skip over objects that don't intersect
the available mask. After that we get
% mpirun -np 2 --report-bindings \
--bind-to hwthread --map-by ppr:2:node:pe=2 --cpu-set 6,7,12,13 hostname
> MCW rank 0: [..../..BB/..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
> MCW rank 1: [..../..../..../BB../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../....]
I think the above changes are improvements, but I don't feel like they're
comprehensive. I only traced through enough code to fix the two specific
bugs I was dealing with.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
... to avoid using an architecture name macro in
`opal/mca/timer/linux/timer_linux_component.c`.
The function name `opal_sys_timer_freq` is also changed for
consistency with `opal_sys_timer_get_cycles`.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
... in the case of `OPAL_GCC_INLINE_ASSEMBLY == 0`
In this case, `OPAL_HAVE_SYS_TIMER_GET_CYCLES` should be 0 because
the `opal_sys_timer_get_cycles` function is not defined.
The history:
1. Before 8d4175ad89, `OPAL_HAVE_SYS_TIMER_GET_CYCLES` was 0.
2. In 8d4175ad89, adf92d6237, adf92d6237, and c62ce1593a,
`OPAL_HAVE_SYS_TIMER_GET_CYCLES` was changed to 1 by introducing
`opal/asm/base/*.asm`.
3. In ebce88b7ad, `opal/asm/base/*.asm` were removed.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
This is mostly based off recent UCX additions to their patcher:
https://github.com/openucx/ucx/pull/2703
They added triggers for
* mmap when (flags & MAP_FIXED) && (addr != NULL)
* shmat when (shmflg & SHM_REMAP) && (shmaddr != NULL)
Beyond that I noticed they already had a trigger for
* madvise when (advice == MADV_FREE)
that we didn't so I added that.
And the other main thing is we didn't really have shmat/shmdt
active for some systems because we only had a path for
syscall(SYS_shmdt, ) but we needed to also have a path for
syscall(SYS_ipc, IPCOP_shmdt, ) and same for shmat.
Signed-off-by: Mark Allen <markalle@us.ibm.com>