1
1
Граф коммитов

3653 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
24a64524ae Correctly name the package so the flags get set
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-08 15:11:48 -08:00
Ralph Castain
fc0b0938a7 Cache the old orte/regx components
In case someone wants to restore and use them - leave them as
opal_ignore'd for now

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-08 11:25:35 -08:00
Ralph Castain
125d236173 Move from the use of regex to compression
We've been fighting the battle of trying to create a regex generator and
parser that can handle arbitrary hostname schemes - without long-term
success. The worst of it is that there is no way of checking to see if
the computed regex is correct short of parsing it and doing a
character-by-character comparison with the original string. Ugh...there
has to be a better solution.

One option is to investigate using 3rd-party regex libraries as
those are coming from communities whose sole focus is resolving that
problem. However, someone would need to spend the time to investigate
it, and we'd have to find a license-friendly implementation.

Another option is to quit beating our heads against the wall and just
compress the information. It won't be as much of a reduction, but we
also won't keep hitting scenarios where things break. In this case, it
seems that "perfection" is definitely the enemy of "good enough".

This PR implements the compression option while retaining the
possibility of people adding regex-generating components. The
compression code used in ORTE is consolidated into the opal/compress
framework. That framework currently held bzip and gzip components for
use in compressing checkpoint files - since we no longer support C/R, I
have .opal_ignore'd those components.

However, I have left the original framework APIs alone in case someone
ever decides to redo C/R. The APIs of interest here are added to the
framework - specifically, the "compress_block" and "decompress_block"
functions. I then moved the ORTE zlib compression code into a new
component in this framework.

Unfortunately, the framework currently is a single-select one - i.e.,
only one active component at a time. Since I .opal_ignore'd the other
two and made the priority of zlib high, this isn't a problem. However,
if someone wants to re-enable bzip/gzip or add another component, they
might need to transition opal/compress to a multi-select framework.

Included changes:

* Consolidate the compression code into the opal/compress framework

* Move the ORTE zlib compression code into a new opal/compress/zlib
  component

* Ignore the bzip and gzip components in opal/compress framework

* Add a "compress_base_limit" MCA param to set the threshold above which
  we compress data - defaults to 4096 bytes

* Delete stale brucks and rcd components from orte/grpcomm framework

* Delete the orte/regx framework

* Update the launch system to use opal/compress instead of string regex

* Provide a default module if no zlib is available

* Fix some misc multi-node issues

* Properly generate the nidmap in response to a "connection warmup"
  message so the remote daemon knows the children it needs to launch.

* Remove stale references to orte_node_regex

* opal_byte_object_t's are not OPAL objects - properly release allocated
  memory.

* Set the topology

* Currently only handling homogeneous case

* Update the compress framework files to conform

* Consolidate open/close into one "frame" file. Ensure we open/close the
  framework

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-08 11:11:14 -08:00
Jeff Squyres
dd20174532 Remove opal/mca/common/ofi.
It never lived up to its purpose (and has caused amorphous indirect
errors such as https://github.com/open-mpi/ompi/issues/2519), so
delete it.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 06:29:58 -08:00
Jeff Squyres
f5e1a672cc ofi: revamp OPAL_CHECK_OFI configury
Update the OPAL_CHECK_OFI configury macro:

- Make it safe to call the macro multiple times:
  - The checks only execute the first time it is invoked
  - Subsequent invocations, it just emits a friendly "checking..."
    message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
  output variables from OPAL_CHECK_OFI to be
  opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
  causes the macro to be invoked at a fairly random time, which makes
  configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
  resulted in an indenting change, making the change to
  opal_check_ofi.m4 look larger than it really is).

Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 06:29:58 -08:00
Jeff Squyres
b556cabfe9 btl/ofi/Makefile.am: down with tabs!
Replace all tabs with spaces.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 06:29:58 -08:00
Jeff Squyres
59c8ab6da4 m4: remove all configury related to libibverbs
Now that all components that use libibverbs are gone, remove
OPAL_CHECK_VERBS and the confusingly-named OPAL_CHECK_OPENFABRICS
(which really just checked for verbs things -- not all the possible
OpenFabrics APIs/libraries).

The only code left in Open MPI that calls verbs is hwloc -- and that's
just the APIs that takes an IBV device and returns topological
information about it.  Since nothing in the Open MPI code base uses
the "ibv_*" API any more, we have no need for this hwloc functionality
so we'll even remove the --with-verbs configure options.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:36:06 -08:00
Jeff Squyres
3f4af8e51c opal/common: remove stale common components
The verbs and verbs_usnic components are now no longer necessary / no
longer used anywhere in the code base.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:36:06 -08:00
Jeff Squyres
8de786f5a4 btl/openib: So long / farewell / it's time to say goodnight
So long BTL openib!  After many years of (mostly) faithful service, it
is time to remove the openib BTL.  It has been fully replaced by other
components, such as the UCX PML and OFI MTL.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:34:19 -08:00
Ralph Castain
677ce0a69f Update PMIx configure logic in the embedded component
PMIx is removing the --enable-embedded-libevent and
--enable-embedded-hwloc flags as they are confusing users. Instead, we
will use the --enable-embedded-mode to handle both of these options.
Update the embedded configury to handle it.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-06 17:15:44 -08:00
Matias Cabral
5aef3148d3
Merge pull request #6351 from aravindksg/fix_btl_ofi_valgrind
btl/ofi: Fix valgrind complaints on uninitialized pointer use
2019-02-05 16:36:45 -08:00
Ralph Castain
baef25338a Update to latest PRI master
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-04 10:10:58 -08:00
Aravind Gopalakrishnan
786e686d43 btl/ofi: Fix valgrind complaints on uninitialized pointer use
It doesn't seem like the BTL was using uninitialized pointer. But simply
setting the rcache pointer to NULL after destroying it makes the valgrind
errors go away.

Fixes Issue #6345

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2019-02-01 14:03:23 -08:00
Nathan Hjelm
2c8f745d8d
Merge pull request #6337 from hjelmn/btl_vader_fix_a_stupid_error_in_the_fragment_sizes_used_by_the_free_lists_that_can_cause_weird_results
btl/vader: fix fragment sizes used by free lists
2019-01-30 13:26:57 -07:00
Nathan Hjelm
b51c8f888c btl/vader: fix fragment sizes used by free lists
This commit fixes a bug introduced in
f62d26ddbc. That commit changed how
vader allocates fragment memory from the shared memory
segment. Unfortunately, the values used for the fragment sizes did not
include space for the fragment header. This can cause an overrun of
data from one fragment to the header of the next fragment.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-30 12:31:34 -07:00
Gilles Gouaillardet
73d104f695 hwloc/base: fix some off-by-one errors
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-29 07:36:56 -08:00
Jeff Squyres
f22b7d4f46 hwloc/external.h: fix a clash with external HWLOC_VERSION[*]
Some macros defined by the embedded hwloc ends up in opal_config.h
because hwloc configury m4 files are slurped into Open MPI.  These
macros are not required here, and they might conflict with an external
hwloc install, so simply #undef them in hwloc/external/external.h
after including <opal_config.h> but before including the external
<hwloc.h>.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-01-29 07:36:01 -08:00
Nathan Hjelm
fe5d5c75f9
Merge pull request #6279 from hjelmn/ompi_memory_usage
mpool: add new base module type "basic"
2019-01-15 09:48:38 -07:00
Nathan Hjelm
f62d26ddbc btl/vader: use basic mpool type to handle frag/fbox allocation
This commit updates btl/vader to use an mpool for handling all shared
memory allocations (frags, fboxes).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-14 15:54:16 -07:00
Nathan Hjelm
6ffc7cc96c mpool: add new base module type "basic"
This commit adds a new mpool base module type: basic. This module can
be used with an opal_free_list_t to allocate space from a
pre-allocated block (such as a shared memory region). The new module
only supports allocation and is not meant for more dynamic use cases
at this time.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-14 15:48:46 -07:00
Ralph Castain
1e70ea73f4 Update to latest PMIx master (v4.0)
Update the OPAL glue configure code to correctly link the opal/pmix4
component to the hwloc used by OMPI instead of defaulting to the
system-level hwloc. Required a corresponding update to the PMIx hwloc
configure code so we treat hwloc the same way we handle libevent in
embedded scenarios.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-01-10 09:37:56 -08:00
Gilles Gouaillardet
950ba16aa1 pmix/ext3x: fix support for external PMIx v3.1
The PMIX_MODEX and PMIX_INFO_ARRAY macros were removed from the PMIx 3.1 standard.
Open MPI does not really need them (they are only used to be reported as not supported),
so smply #ifdef protect them to support an external PMIx v3.1

Refs. open-mpi/ompi#6247

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-08 10:11:05 +09:00
Nathan Hjelm
edaf08bf6d btl/vader: minor correction to match ompi coding style
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-07 14:24:58 -07:00
Nathan Hjelm
30b8336cb4 btl/vader: don't try to set reachabilty in add_procs if not requested
This commit fixes a bug where add_procs can incorrectly return an
error when going through the dynamic add_procs path. This doesn't
happen normally, only when pml/ob1 is not in use.

References #6201

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-07 13:00:48 -07:00
Gilles Gouaillardet
78fffa25f2 mpool/hugepage: plug a memory leak
Refs. open-mpi/ompi#6242

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-07 11:50:40 +09:00
Howard Pritchard
ee26ed9e8f
Merge pull request #6214 from heasterday/topic/fixmpool
Update mpool_hugepage_component.c
2019-01-02 16:24:39 -07:00
Gilles Gouaillardet
0203531695 pmix/pmi4x: refresh to latest PMIx
refresh to pmix/pmix@fae0ee7d94

Refs. open-mpi/ompi#6228

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-12-28 12:57:57 +09:00
Hunter Easterday
509380d99f Fix white extra whitespace
Signed-off-by: Hunter Easterday <heasterday@lanl.gov>
2018-12-20 14:32:10 -07:00
heasterday
ad0d2c451e Update mpool_hugepage_component.c
Signed-off-by: Hunter Easterday <heasterday@lanl.gov>
2018-12-20 14:28:19 -07:00
Gilles Gouaillardet
b0a668457c pmix/pmi4x: refresh to latest PMIx
refresh to pmix/pmix@2d4c2874fd

Refs. open-mpi/ompi#6222

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-12-18 17:11:01 +09:00
Nathan Hjelm
4944508603
Merge pull request #6136 from hjelmn/opal_cleanup
opal: clean up init/finalize
2018-12-18 15:23:32 -07:00
Nathan Hjelm
0edfd328f8 opal: clean up init/finalize
This commit contains the following changes:

 - Remove the unused opal_test_init/opal_test_finalize
   functions. These functions are not used by anything in the code
   base or MTT. Tests use opal_init_util/opal_finalize_util instead.

 - Get rid of gotos in opal_init_util and opal_init. Replaced them
   with a cleaner solution.

 - Automatically register cleanup functions in init functions. The
   cleanup functions are executed in the reverse order of the
   initialization functions. The cleanup functions are run in
   opal_finalize_util() before tearing down the class system.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-12-18 14:37:04 -07:00
Ralph Castain
ece7696c2c Fix external PMIx v3 support
Don't erase the source files during "make clean"!

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-12-17 15:54:56 -08:00
Gilles Gouaillardet
b89deeb1bb btl/uct: fix a typo in configure.m4
remove whitespace around '=' when setting btl_uct_LIBS

Thanks Ake Sandgren for reporting this

Refs. open-mpi/ompi#6173

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-12-11 14:23:53 +09:00
Gilles Gouaillardet
78aa6fdd1d btl/uct: fix a warning
Use the PRIsize_t macro to correctly print a size_t

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-12-07 16:16:35 +09:00
Nathan Hjelm
e07a64c52d btl/uct: fix some issues when using UCX over ugni
Though not a recommended configuration it is possible to use Open MPI
over UCX over uGNI. This configuration had some issues related to the
connection management and tl selection. This commit fixes those
issues.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-12-05 16:30:54 -07:00
Gilles Gouaillardet
fccb3e7514
Merge pull request #6137 from ggouaillardet/topic/ucx_warning
btl/openib: delay UCX warning to add_procs()
2018-12-05 11:50:10 +09:00
Yossi Itigin
83cca9d52a ucx: add owner.txt for components
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-12-01 17:14:03 +02:00
Gilles Gouaillardet
0a2ce58040 btl/openib: delay UCX warning to add_procs()
If UCX is available, then pml/ucx will be used instead of
pml/ob1 + btl/openib, so there is no need to warn about
btl/openib not supporting Infiniband.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-11-29 16:20:55 +09:00
Ralph Castain
f609542bbf Implement process set name support
Add the --pset option for app_contexts so the user can provide a string
name for each app_context. Use the new PMIx pset attribute to store the
names in the PMIx local storage for retrieval

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-11-27 11:07:58 -08:00
KAWASHIMA Takahiro
cacd6f389c datatype: Remove #if HAVE_[TYPE] for C99 types
Now Open MPI requires a C99 compiler. Checking availability of
the following types is no more needed.

- `long long` (`signed` and `unsigned`)
- `long double`
- `float _Complex`
- `double _Complex`
- `long double _Complex`

Furthermore, the `#if HAVE_[TYPE]` style checking is not correct.
Availability of C types is checked by `AC_CHECK_TYPES` in `configure.ac`.
`AC_CHECK_TYPES` defines macro `HAVE_[TYPE]` as `1` in `opal_config.h`
if the `[TYPE]` is available. But it does not define `HAVE_[TYPE]`
(instead of defining as `0`) if it is not available. So even if we
need `HAVE_[TYPE]` checking, it should be `#if defined(HAVE_[TYPE])`.

I didn't remove `AC_CHECK_TYPES` for these types in `configure.ac`
since someone may use `HAVE_[TYPE]` macros somewhere.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-11-14 09:32:52 +09:00
Thananon Patinyasakdikul
d9bd54c628 btl/ofi: fixed compiler warning on OSX.
This commit closes #6049

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
2018-11-06 15:37:25 -05:00
Howard Pritchard
8126779a35 btl/openib: fix a problem with ib query
Under certain circumstances, ibv_exp_query_device was
returning an error due to uninitialized fields in the
extended attributes struct.

Fixes: #5810
Fixes: #5914

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-10-31 14:50:15 -06:00
Brian Barrett
a1e85b03aa btl tcp: Fix compile error in IPv6
In 457f058 I broke the TCP BTL with --enable-ipv6.  This patch
fixes the compile error, so IPv6 works again.

Fixed #5996

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-30 12:31:04 -07:00
Gilles Gouaillardet
b205039205 event/external: fix version requirement
Only default to the external component if its version is
greater or equal than the internal libevent (2.0.22)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-29 10:12:56 +09:00
Gilles Gouaillardet
35e77a286c event/external: misc configury fixes
- Always use the external component when configure'd with --with-libevent=external
 - Fix the external libevent library version detection
   by testing _EVENT_NUMERIC_VERSION and EVENT__NUMERIC_VERSION macros
 - Use the event2/event.h header (event.h is deprecated since libevent 2.0

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-29 10:12:56 +09:00
Aurelien Bouteiller
f2e6d7891e
Merge pull request #5975 from ICLDisco/export/pmix-fini-threadinterlock
Avoid a double lock interlock when calling pmix_finalize
2018-10-26 10:01:23 -04:00
Gilles Gouaillardet
b715dd2657 btl/uct: fix AC_CHECK_DECLS usage
AC_CHECK_DECLS take a comma separated list of macros/symbols,
so replace the whitespace separator with a comma.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-26 15:36:02 +09:00
Aurelien Bouteiller
50cf707f3f
Avoid a double lock interlock when calling pmix_finalize
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2018-10-25 16:36:58 -04:00
Nathan Hjelm
7b34c9b3fe
Merge pull request #5958 from hjelmn/btl_uct_sync
btl/uct: update for UCT_CB_FLAG_SYNC removal
2018-10-22 23:17:27 -06:00