1
1

5244 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
9121eb4ff9 opal/lifo: fix a ABA problem in opal_lifo_pop_atomic
that was introduced in open-mpi/ompi@11bb8b09a0

Fixes open-mpi/ompi#4784

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-02-09 14:48:54 +09:00
Ralph Castain
1a7dfd7d54 Sync to PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-02-07 12:16:51 -08:00
Ralph Castain
9fe8153d38 Sync to IOF branch and continue fix of request for job info from unknown nspace
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 02400d30d79ce3c7e7e28f9a08f7062a5b6f4c51)
2018-02-03 19:56:35 -08:00
Howard Pritchard
1adf0873f7
Merge pull request #4766 from hppritcha/topic/squash_grdma_comp_warning
rcache/grdma: squash a compiler warning
2018-02-02 18:58:32 -07:00
Gilles Gouaillardet
43700faba1 pmix/ext3x: remove autogenerated ext3x.h header file
This header file was meant to be autogenerated, and for
some reasons, was never removed from the repository.
Update .gitignore as well

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 23:45:42 +09:00
Gilles Gouaillardet
8209fca842 pmix/ext3x: bring external component up-to-date with the embedded pmix3x
add the callback prototype for the upcoming PMIx_IOF_push() API

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:35:34 +09:00
Gilles Gouaillardet
0481277e93 pmix/ext3x: bring external component up-to-date with the embedded pmix3x
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:33:33 +09:00
Nathan Hjelm
bb212e0c94
Merge pull request #4767 from ggouaillardet/topic/vader_backing_file
btl/vader: make the backing file job specific
2018-01-30 21:27:02 -07:00
Gilles Gouaillardet
0285c63348 pmix/ext3x: generate component source when only static libraries are built
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-31 13:21:14 +09:00
Gilles Gouaillardet
611d7c2d27 btl/vader: make the backing file job specific
Since open-mpi/ompi@47fd2313ab
the backing file is now in /dev/shm by default. As a consequence,
the backing file name has to include the jobid so more than one job
can run at a time.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-30 16:52:51 +09:00
Howard Pritchard
c3cac6731f rcache/grdma: squash a compiler warning
tired of seeing this compiler warning

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-01-29 11:50:01 -07:00
Ralph Castain
a17df810ed Sync with PMIx iof rfc
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 10:51:38 -08:00
Ralph Castain
e9cd7fd7e6 Update orte
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:53:43 -08:00
Ralph Castain
9fb80bd239 Update the opal/pmix base framework elements
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:37:52 -08:00
Ralph Castain
187352eb3d Update the PMIx external components
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:35:57 -08:00
Ralph Castain
a5679ef000 Update the PMIx 3.x component
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-25 08:34:44 -08:00
Yossi Itigin
7cee60346e opal_progress: check timer only once per 8 calls
Reading the system clock on every call to opal_progress() is an
expensive operation on most architectures, and it can negatively affect
the performance, for example of message rate benchmarks.

We change opal_progress() to read the clock once per 8 calls, unless
there are active users of the event mechanism.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-01-16 19:18:53 +02:00
Ralph Castain
6216225bda Ensure cleanup of registered files/dirs
Resolve a race condition between registering for a file to be removed upon termination and actual creation of that file by providing attributes that identify whether the path is a file or directory. This removes the need for PMIx to detect the difference.

Refs #4686

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-11 11:05:30 -08:00
Ralph Castain
6dacf40a8c Ensure the epilog gets executed in PMIx server
If we abnormally terminate, then we still want any cleanups to be
executed.

Remove debug

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-10 18:28:05 -08:00
Gilles Gouaillardet
1a17cb3b1c opal/datatype: add opal_datatype_is_monotonic()
return true if the datatype has non-negative displacements and
monotonically nondecreasing, and false otherwise.

Thanks George for the guidance.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-09 18:05:14 +09:00
Ralph Castain
d620070c77 Correct the comment in the default MCA param template - we do not support a param called "component_path". The correct syntax is "mca_base_component_path"
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-05 08:46:44 -08:00
Nathan Hjelm
8b8aae372d opal/asm: add atomic min/max convenience functions
This commit adds atomic functions for min/max.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-01-02 08:38:36 -07:00
Gilles Gouaillardet
125169f057 opal/bitmap: fix opal_bitmap_set_bit()
Correctly reallocate the bitmap when needed

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-12-27 14:56:43 +09:00
Nathan Hjelm
39d598899b rcache/grdma: fix crash when part of a registration is unmapped
This commit fixes an issue when a registration is created for a large
region and then invalidated while part of it is in use.

References #4509

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-12-22 10:36:35 -07:00
Ralph Castain
d5471d7898 Silence warnings in optimized build
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-20 12:00:28 -08:00
Ralph Castain
db8ebd33ad Fix the optnone attribute, add extension attribute
See how the various compilers handle these

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-18 19:18:53 -08:00
Nathan Hjelm
47fd2313ab btl/vader: move backing files into /dev/shm on Linux
This commit moves the backing files to /dev/shm to avoid limitations
that may be set on /tmp. The files are registered with pmix to ensure
they are cleaned up after an erroneous exit.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 48101278160672317ade352365592f56ef3b8977)
2017-12-18 07:09:18 -08:00
Ralph Castain
07427c6d89 Update to PMIx v3.0 PR for cleanup registration
If available, have apps use registration capability to cleanup their session directories. Setup capability for vader to register its shared memory file location - let someone familiar with that code do so.

Final cleanup to track uid/gid, update the opal/pmix API to pass flags for ignore and leave top directory alone

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-18 06:53:11 -08:00
Ralph Castain
5c4185abd8 Add the __optnone__ attribute to help avoid optimizing out MPIR_Breakpoint
Thanks to @kiranchandramohan for the suggestion

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-14 13:14:21 -08:00
Nathan Hjelm
d3fa1bbbb0 rcache/grdma: try to prevent erroneous free error messages
It is possible to have parts of an in-use registered region be passed
to munmap or madvise. This does not necessarily mean the user has made
an error but does mean the entire region should be invalidated. This
commit checks that the munmap or madvise base matches the beginning of
the cached region. If it does and the region is in-use then we print
an error. There will certainly be false-negatives where a user
unmaps something that really is in-use but that is preferrable to a
false-positive.

References #4509

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-12-12 09:18:39 -07:00
Nathan Hjelm
a82f761a4a btl/vader: change the way fast boxes are used
There were multiple paths that could lead to a fast box
allocation. One of them made little sense (in-place send) so it has
been removed to allow a rework of the fast-box send function. This
should fix a number of issues with hanging/crashing when using the
vader btl.

References #4260

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-12-11 10:38:33 -07:00
Gilles Gouaillardet
11e5f86bf8 mpool/base: plug a memory leak
set the key of all mpool_tree_item objects, so they can be retrieved
in mpool_base_free and then returned back to the
mca_mpool_base_tree_item_free_list free list.

Refs. open-mpi/ompi#4567

Thanks Philip Blakely for the bug report.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-12-07 09:06:25 +09:00
Nathan Hjelm
2e74befa13
Merge pull request #4565 from benmenadue/master
Use malloc instead of posix_memalign for small (<= sizeof(void *)) alignments
2017-12-05 16:35:24 -07:00
Nathan Hjelm
ad59b93266
Merge pull request #4566 from kawashima-fj/pr/arm64-atomic
opal/asm/arm64: Fix `opal_atomic_compare_exchange_*` bug
2017-12-05 16:34:51 -07:00
Nathan Hjelm
8e0e184bc9 opal/asm: fix compilation of 128-bit compare-exchange with gcc7
This commit removes eax and edx from the clobber list. Older versions
of gcc handled these ok but gcc 7 does not. They are not required as
eax and edx are specified in output constraints.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-12-05 15:42:47 -07:00
Ben Menadue
90fa8af10b Use correct alignment request in mca_mpool_base_alloc.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
2017-12-06 07:02:17 +11:00
Nathan Hjelm
641bdc4ab7 opal/asm: fix 32-bit build
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-12-05 11:49:13 -07:00
KAWASHIMA Takahiro
08254e8b12 opal/asm/arm64: Fix opal_atomic_compare_exchange_* bug
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-12-05 15:57:29 +09:00
Ben Menadue
db3e25edad Update mca_mpool_base_alloc to use malloc instead of posix_memalign for alignment requests of <= sizeof(void *). This works around issue #4564.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
2017-12-05 09:51:31 +11:00
Matias Cabral
2c86b8723d
Merge pull request #4510 from matcabral/mtl_psm2_shadow_vars
New flag for MCA parameters that allows a behaving with a default value of "unset".
2017-12-04 12:25:37 -08:00
Gilles Gouaillardet
d062db1a98 sync_builtin: fix misc typos
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-12-04 11:59:50 +09:00
Nathan Hjelm
7893248c5a opal/asm: add fetch-and-op atomics
This commit adds support for fetch-and-op atomics. This is needed
because and and or are irreversible operations so there needs to be a
way to get the old value atomically. These are also the only semantics
supported by C11 (there is not atomic_op_fetch, just
atomic_fetch_op). The old op-and-fetch atomics have been defined in
terms of fetch-and-op.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:23 -07:00
Nathan Hjelm
1282e98a01 opal/asm: rename existing arithmetic atomic functions
This commit renames the arithmetic atomic operations in opal to
indicate that they return the new value not the old value. This naming
differentiates these routines from new functions that return the old
value.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:22 -07:00
Nathan Hjelm
9d0b3fe9f4 opal/asm: remove opal_atomic_bool_cmpset functions
This commit eliminates the old opal_atomic_bool_cmpset functions. They
have been replaced by the opal_atomic_compare_exchange_strong
functions.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:22 -07:00
Nathan Hjelm
11bb8b09a0 opal/class: use new compare-and-swap functions
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-29 12:56:32 -07:00
Nathan Hjelm
84f63d0aca opal/asm: add opal_atomic_compare_exchange_strong functions
This commit adds a new set of compare-and-exchange functions. These
functions have a signature similar to the functions found in C11. The
old cmpset functions are now deprecated and defined in terms of the
new compare-and-exchange functions. All asm backends have been
updated.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-29 12:45:44 -07:00
Nathan Hjelm
647b40f3f2
Merge pull request #4442 from bosilca/topic/ob1_pvar
Topic/ob1 pvar
2017-11-29 09:31:07 -07:00
Josh Hursey
7bbd24c868
Merge pull request #4517 from jjhursey/fix/ppc-asm-eieio
ppc/asm: Fix opal_atomic_wmb definition
2017-11-28 07:43:58 -06:00
Gilles Gouaillardet
3b4b3bb6f9 pmix/ext3x: add a missing cnctcbfunc field to ext3x_opalcaddy_t
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-11-28 16:11:08 +09:00
Ralph Castain
3906aaf41a Silence warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-25 11:50:18 -08:00