1
1

5518 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
2c8f745d8d
Merge pull request #6337 from hjelmn/btl_vader_fix_a_stupid_error_in_the_fragment_sizes_used_by_the_free_lists_that_can_cause_weird_results
btl/vader: fix fragment sizes used by free lists
2019-01-30 13:26:57 -07:00
Nathan Hjelm
b51c8f888c btl/vader: fix fragment sizes used by free lists
This commit fixes a bug introduced in
f62d26ddbc8cda4d985cceee531a2ec32406d1f6. That commit changed how
vader allocates fragment memory from the shared memory
segment. Unfortunately, the values used for the fragment sizes did not
include space for the fragment header. This can cause an overrun of
data from one fragment to the header of the next fragment.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-30 12:31:34 -07:00
Jeff Squyres
2203f8d900
Merge pull request #6185 from ggouaillardet/topic/hwloc_macros
hwloc: remove public hwloc macros from opal_config.h
2019-01-30 07:32:22 -05:00
bosilca
29915fc943
Merge pull request #6292 from ggouaillardet/topic/opal_datatype_destruct
opal/datatype: plug a memory leak in opal_datatype_t destructor
2019-01-29 17:33:18 -05:00
Thananon Patinyasakdikul
56d3e0a43d opal/threads: reverted #6199
This commit reverted pr #6199 as it introduced deadlock in some cases.
Also removed the assert as the condition is obsoleted.

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
2019-01-29 13:34:44 -05:00
Gilles Gouaillardet
c8790d29de opal: remove unnecessary #include file
opal_config_bottom.h can only be #include'd in opal_config.h,
so there is no need to #include "opal_config.h" inside.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-29 07:40:54 -08:00
Gilles Gouaillardet
73d104f695 hwloc/base: fix some off-by-one errors
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-29 07:36:56 -08:00
Jeff Squyres
f22b7d4f46 hwloc/external.h: fix a clash with external HWLOC_VERSION[*]
Some macros defined by the embedded hwloc ends up in opal_config.h
because hwloc configury m4 files are slurped into Open MPI.  These
macros are not required here, and they might conflict with an external
hwloc install, so simply #undef them in hwloc/external/external.h
after including <opal_config.h> but before including the external
<hwloc.h>.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-01-29 07:36:01 -08:00
Gilles Gouaillardet
0832ab5acc opal/datatype: fix opal_convertor_raw
correctly handle the case in which iovec is full and the
last accessed element of the datatype is the beginning of a loop

Refs. open-mpi/ompi#6285

Thanks Axel Huebl for reporting this

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-23 15:38:43 +09:00
Gilles Gouaillardet
7c938f070f opal/datatype: plug a memory leak in opal_datatype_t destructor
correctly free ptypes if the datatype is not pre-defined.

Thanks Axel Huebl for reporting this.

Refs. open-mpi/ompi#6291

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-22 10:57:57 +09:00
Nathan Hjelm
fe5d5c75f9
Merge pull request #6279 from hjelmn/ompi_memory_usage
mpool: add new base module type "basic"
2019-01-15 09:48:38 -07:00
Nathan Hjelm
f62d26ddbc btl/vader: use basic mpool type to handle frag/fbox allocation
This commit updates btl/vader to use an mpool for handling all shared
memory allocations (frags, fboxes).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-14 15:54:16 -07:00
Nathan Hjelm
6ffc7cc96c mpool: add new base module type "basic"
This commit adds a new mpool base module type: basic. This module can
be used with an opal_free_list_t to allocate space from a
pre-allocated block (such as a shared memory region). The new module
only supports allocation and is not meant for more dynamic use cases
at this time.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-14 15:48:46 -07:00
Ralph Castain
1e70ea73f4 Update to latest PMIx master (v4.0)
Update the OPAL glue configure code to correctly link the opal/pmix4
component to the hwloc used by OMPI instead of defaulting to the
system-level hwloc. Required a corresponding update to the PMIx hwloc
configure code so we treat hwloc the same way we handle libevent in
embedded scenarios.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-01-10 09:37:56 -08:00
Nathan Hjelm
c7867ffc4f opal/runtime: fix teardown ordering
This commit fixes the ordering of the teardown for
opal_finalize_util. The installdirs and if frameworks need to come
down before the MCA system.

Fixes #6259

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-09 12:38:12 -07:00
Gilles Gouaillardet
950ba16aa1 pmix/ext3x: fix support for external PMIx v3.1
The PMIX_MODEX and PMIX_INFO_ARRAY macros were removed from the PMIx 3.1 standard.
Open MPI does not really need them (they are only used to be reported as not supported),
so smply #ifdef protect them to support an external PMIx v3.1

Refs. open-mpi/ompi#6247

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-08 10:11:05 +09:00
Nathan Hjelm
edaf08bf6d btl/vader: minor correction to match ompi coding style
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-07 14:24:58 -07:00
Nathan Hjelm
30b8336cb4 btl/vader: don't try to set reachabilty in add_procs if not requested
This commit fixes a bug where add_procs can incorrectly return an
error when going through the dynamic add_procs path. This doesn't
happen normally, only when pml/ob1 is not in use.

References #6201

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-07 13:00:48 -07:00
Gilles Gouaillardet
78fffa25f2 mpool/hugepage: plug a memory leak
Refs. open-mpi/ompi#6242

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-07 11:50:40 +09:00
Howard Pritchard
ee26ed9e8f
Merge pull request #6214 from heasterday/topic/fixmpool
Update mpool_hugepage_component.c
2019-01-02 16:24:39 -07:00
Gilles Gouaillardet
0203531695 pmix/pmi4x: refresh to latest PMIx
refresh to pmix/pmix@fae0ee7d94

Refs. open-mpi/ompi#6228

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-12-28 12:57:57 +09:00
Ralph Castain
529e17e0d9
Merge pull request #6223 from ggouaillardet/topic/pmix_refresh
pmix/pmi4x: refresh to latest PMIx
2018-12-24 10:51:00 -08:00
bosilca
182a2db2a4
Merge pull request #6029 from ggouaillardet/topic/large_datatypes
opal/datatype: correctly handle large datatypes
2018-12-24 12:49:52 -05:00
Jeff Squyres
a30864c634 opal: fix compiler warning
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-12-23 12:10:23 -08:00
Hunter Easterday
509380d99f Fix white extra whitespace
Signed-off-by: Hunter Easterday <heasterday@lanl.gov>
2018-12-20 14:32:10 -07:00
heasterday
ad0d2c451e Update mpool_hugepage_component.c
Signed-off-by: Hunter Easterday <heasterday@lanl.gov>
2018-12-20 14:28:19 -07:00
bosilca
f96305f96b
Merge pull request #6199 from aravindksg/opal-threads-fix
opal/sync: Fix assert during multi-threaded progress invocation
2018-12-20 13:27:07 -05:00
Nathan Hjelm
cf49957af6
Merge pull request #6207 from hjelmn/opal_cleanup
opal: fix common symbols introduced in opal cleanup
2018-12-19 15:39:39 -07:00
Jeff Squyres
9b88e60fc8 info_get: ensure to copy all requested characters
When querying an info value, copy out exactly as many characters as
the caller asked for -- do not artificially truncate the target just
to ensure that it is \0-terminated.

Specifically: do not use opal_string_copy() to copy info values,
because opal_string_copy() will guarantee to \0-terminate the target,
even if it means truncating the target.  E.g., if the caller calls
opal_info_get_nolock() with valuelen=5, opal_string_copy() will return
"1234\0" -- which is wrong.  This commit fixes the behavior to return
"12345".

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-12-19 13:01:45 -08:00
Nathan Hjelm
fd852c8f63 opal: fix common symbols introduced in opal cleanup
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-12-19 12:27:14 -07:00
Nathan Hjelm
4944508603
Merge pull request #6136 from hjelmn/opal_cleanup
opal: clean up init/finalize
2018-12-18 15:23:32 -07:00
Nathan Hjelm
0edfd328f8 opal: clean up init/finalize
This commit contains the following changes:

 - Remove the unused opal_test_init/opal_test_finalize
   functions. These functions are not used by anything in the code
   base or MTT. Tests use opal_init_util/opal_finalize_util instead.

 - Get rid of gotos in opal_init_util and opal_init. Replaced them
   with a cleaner solution.

 - Automatically register cleanup functions in init functions. The
   cleanup functions are executed in the reverse order of the
   initialization functions. The cleanup functions are run in
   opal_finalize_util() before tearing down the class system.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-12-18 14:37:04 -07:00
Gilles Gouaillardet
b0a668457c pmix/pmi4x: refresh to latest PMIx
refresh to pmix/pmix@2d4c2874fd

Refs. open-mpi/ompi#6222

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-12-18 17:11:01 +09:00
Ralph Castain
ece7696c2c Fix external PMIx v3 support
Don't erase the source files during "make clean"!

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-12-17 15:54:56 -08:00
Aravind Gopalakrishnan
2f3c5218ff opal/sync: Fix assert during multi-threaded progress invocation
PR #5241 provided an MCA variable to allow multi-threaded opal_progress.
However, it allowed to update the linked list even when multiple threads was
allowed to call opal_progress. This caused a scenario when a more recent thread
could complete it's progress and fail the assert(sync ==
wait_sync_list).

Allowing to update the linked list only for the case when the number of threads
exceeds the threshold fixes the problem.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-12-17 12:14:27 -08:00
Gilles Gouaillardet
b89deeb1bb btl/uct: fix a typo in configure.m4
remove whitespace around '=' when setting btl_uct_LIBS

Thanks Ake Sandgren for reporting this

Refs. open-mpi/ompi#6173

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-12-11 14:23:53 +09:00
Gilles Gouaillardet
78aa6fdd1d btl/uct: fix a warning
Use the PRIsize_t macro to correctly print a size_t

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-12-07 16:16:35 +09:00
George Bosilca
88a693bf71 Add a test for very large data.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-12-06 13:30:58 +09:00
Gilles Gouaillardet
fbb5bb8860 opal/datatype: correctly handle large datatypes
Always use size_t (instead of converting to an uint32_t) in order to
correctly support large datatypes.

Thanks Ben Menadue for the initial bug report

Refs open-mpi/ompi#6016

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-12-06 13:30:58 +09:00
Nathan Hjelm
e07a64c52d btl/uct: fix some issues when using UCX over ugni
Though not a recommended configuration it is possible to use Open MPI
over UCX over uGNI. This configuration had some issues related to the
connection management and tl selection. This commit fixes those
issues.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-12-05 16:30:54 -07:00
Gilles Gouaillardet
fccb3e7514
Merge pull request #6137 from ggouaillardet/topic/ucx_warning
btl/openib: delay UCX warning to add_procs()
2018-12-05 11:50:10 +09:00
Yossi Itigin
83cca9d52a ucx: add owner.txt for components
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-12-01 17:14:03 +02:00
Gilles Gouaillardet
0a2ce58040 btl/openib: delay UCX warning to add_procs()
If UCX is available, then pml/ucx will be used instead of
pml/ob1 + btl/openib, so there is no need to warn about
btl/openib not supporting Infiniband.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-11-29 16:20:55 +09:00
Ralph Castain
f609542bbf Implement process set name support
Add the --pset option for app_contexts so the user can provide a string
name for each app_context. Use the new PMIx pset attribute to store the
names in the PMIx local storage for retrieval

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-11-27 11:07:58 -08:00
Josh Hursey
3339e2aa33
Merge pull request #5986 from jjhursey/vpid-unpack
Add OPAL_VPID to unpacking
2018-11-21 11:42:02 -06:00
KAWASHIMA Takahiro
cacd6f389c datatype: Remove #if HAVE_[TYPE] for C99 types
Now Open MPI requires a C99 compiler. Checking availability of
the following types is no more needed.

- `long long` (`signed` and `unsigned`)
- `long double`
- `float _Complex`
- `double _Complex`
- `long double _Complex`

Furthermore, the `#if HAVE_[TYPE]` style checking is not correct.
Availability of C types is checked by `AC_CHECK_TYPES` in `configure.ac`.
`AC_CHECK_TYPES` defines macro `HAVE_[TYPE]` as `1` in `opal_config.h`
if the `[TYPE]` is available. But it does not define `HAVE_[TYPE]`
(instead of defining as `0`) if it is not available. So even if we
need `HAVE_[TYPE]` checking, it should be `#if defined(HAVE_[TYPE])`.

I didn't remove `AC_CHECK_TYPES` for these types in `configure.ac`
since someone may use `HAVE_[TYPE]` macros somewhere.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-11-14 09:32:52 +09:00
Thananon Patinyasakdikul
3dc1629771
Merge pull request #5241 from thananon/opal_progress
Add MCA param for multithread opal_progress().
2018-11-09 12:30:07 -05:00
Gilles Gouaillardet
efe72d3d92
Merge pull request #6055 from ggouaillardet/topic/c11_atomics
Misc C11 atomics related fixes
2018-11-08 09:11:42 +09:00
Gilles Gouaillardet
07970f192a atomic/c11: fix include header path
simply #include "opal_stdint.h" in order to work with --devel-headers

Refs open-mpi/ompi#6053

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-11-07 13:23:28 +09:00
Thananon Patinyasakdikul
d9bd54c628 btl/ofi: fixed compiler warning on OSX.
This commit closes #6049

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
2018-11-06 15:37:25 -05:00