1
1

151 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
9ffac85650 build: Move libevent to a 3rd-party package
With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.

This patch moves libevent from an MCA framework to a stand-alone
library built outside of OPAL.  A wrapper in opal/util is provided
to minimize the unnecessary changes in the rest of the code.  When
using the internal Libevent, it will be installed as a stand-alone
libevent.a, instead of bundled in OPAL.  Any pre-installed version
of Libevent at or after 2.0.21 is preferred over the internal
version.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-10-01 16:55:58 +00:00
Tomislav Janjusic
cb1955bb53 Fix renamed interface functions for argo, q, and pthreads
Co-authored by: Artem Polykaov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-24 18:29:07 +03:00
Howard Pritchard
f136a20cae
Merge pull request #6578 from hppritcha/topic/thread_framework2
Implement a MCA framework for threads
2020-03-27 15:55:48 -06:00
Noah Evans
ee3517427e Add threads framework
Add a framework to support different types of threading models including
user space thread packages such as Qthreads and argobot:

https://github.com/pmodels/argobots

https://github.com/Qthreads/qthreads

The default threading model is pthreads.  Alternate thread models are
specificed at configure time using the --with-threads=X option.

The framework is static.  The theading model to use is selected at
Open MPI configure/build time.

mca/threads: implement Argobots threading layer

config: fix thread configury

- Add double quotations
- Change Argobot to Argobots
config: implement Argobots check

If the poll time is too long, MPI hangs.

This quick fix just sets it to 0, but it is not good for the
Pthreads version. Need to find a good way to abstract it.

Note that even 1 (= 1 millisecond) causes disastrous performance
degradation.

rework threads MCA framework configury

It now works more like the ompi/mca/rte configury,
modulo some edge items that are special for threading package
linking, etc.

qthreads module
some argobots cleanup

Signed-off-by: Noah Evans <noah.evans@gmail.com>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-03-27 10:15:45 -06:00
Ralph Castain
95dacd2086
Fix singletons and ensure adequate PMIx version
OMPI can only support PMIx v3 and above. PRRTE requires at least PMIx
v4, so protect against the case where OMPI is built against an external
PMIx v3.

Fix check of PMIx_Init return code for singleton operations.

Ensure that the PMIx framework gets properly opened.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-23 10:29:42 -07:00
Tsubasa Yanagibashi
7d5fbcfd76 opal: Fix opal_initialized reference counter
Before this change, the reference counters `opal_util_initialized`
and `opal_initialized` were incremented at the beginning of the
`opal_init_util` and the `opal_init` functions respectively.
In other words, they were incremented before fully initialized.

This causes the following program to abort by SIGFPE if
`--enable-timing` is enabled on `configure`.

```c
// need -lm option on link

int main(int argc, char *argv[])
{
    // raise SIGFPE on division-by-zero
    feenableexcept(FE_DIVBYZERO);
    MPI_Init(&argc, &argv);
    MPI_Finalize();
    return 0;
}
```

The logic of the SIGFPE is:

1. `MPI_Init` calls `opal_init` through `ompi_rte_init`.
2. `opal_init` changes the value of `opal_initialized` to 1.
3. `opal_init` calls `opal_init_util`.
4. `opal_init_util` calls `opal_timing_ts_func` through
   `OPAL_TIMING_ENV_INIT`, and `opal_timing_ts_func` returns
   `get_ts_cycle` instead of `get_ts_gettimeofday` because
   `opal_initialized` to 1.
   (This is the problem)
5. `opal_init_util` calls `get_ts_cycle` through
   `OPAL_TIMING_ENV_INIT`.
6. `get_ts_cycle` executes
   `opal_timer_base_get_cycles()) / opal_timer_base_get_freq()`
   and it raises SIGFPE (division-by-zero) because the OPAL TIMER
   framework is not initialized yet and `opal_timer_base_get_freq`
   returns 0.

This commit changes the increment timing of `opal_util_initialized`
and `opal_initialized` to the end of `opal_init_util` and the
`opal_init` functions respectively.

Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
2020-02-26 14:09:19 +09:00
Jeff Squyres
66da0c6361 Remove "compress" OPAL framework
This framework is no longer used.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-02-21 06:28:16 -08:00
Jeff Squyres
8c819e2a85 opal: ensure opal_gethostname() always returns a value
We initially thought it was a safe bet that opal_gethostname() would
never be called before opal_init().  However, it turns out that there
are some cases -- e.g., developer debugging -- where it is useful to
call opal_output() (which calls opal_gethostname()) before
opal_init().

Hence, we need to guarantee that opal_gethostname() always returns a
valid value.  If opal_gethostname() finds NULL in
opal_process_info.nodename, simply call the internal function to
initialize opal_process_info.nodename.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-01-21 09:56:52 -08:00
Charles Shereda
cbc6feaab2 Created opal_gethostname() as safer gethostname substitute.
The opal_gethostname() function provides a more robust mechanism
to retrieve the hostname than gethostname(), which can return
results that are not null-terminated, and which can vary in its
behavior from system to system.

opal_gethostname() just returns the value in opal_process_info.nodename;
this is populated in opal_init_gethostname() inside opal_init.c.

-Changed all gethostname calls in opal subtree to opal_gethostname
-Changed all gethostname calls in orte subtree to opal_gethostname
-Changed all gethostname calls in ompi subdir to opal_gethostname
-Changed all gethostname calls in oshmem subdir to opal_gethostname
-Changed opal_if.c in test subdir to use opal_gethostname
-Changed opal_init.c to include opal_init_gethostname. This function
 returns an int and directly sets opal_process_info.nodename per
 jsquyres' modifications.

Relates to open-mpi#6801

Signed-off-by: Charles Shereda <cpshereda@lanl.gov>
2020-01-13 08:52:17 -08:00
Gilles Gouaillardet
88ac05fca6 misc fixes
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-08 11:11:25 -08:00
Ralph Castain
125d236173 Move from the use of regex to compression
We've been fighting the battle of trying to create a regex generator and
parser that can handle arbitrary hostname schemes - without long-term
success. The worst of it is that there is no way of checking to see if
the computed regex is correct short of parsing it and doing a
character-by-character comparison with the original string. Ugh...there
has to be a better solution.

One option is to investigate using 3rd-party regex libraries as
those are coming from communities whose sole focus is resolving that
problem. However, someone would need to spend the time to investigate
it, and we'd have to find a license-friendly implementation.

Another option is to quit beating our heads against the wall and just
compress the information. It won't be as much of a reduction, but we
also won't keep hitting scenarios where things break. In this case, it
seems that "perfection" is definitely the enemy of "good enough".

This PR implements the compression option while retaining the
possibility of people adding regex-generating components. The
compression code used in ORTE is consolidated into the opal/compress
framework. That framework currently held bzip and gzip components for
use in compressing checkpoint files - since we no longer support C/R, I
have .opal_ignore'd those components.

However, I have left the original framework APIs alone in case someone
ever decides to redo C/R. The APIs of interest here are added to the
framework - specifically, the "compress_block" and "decompress_block"
functions. I then moved the ORTE zlib compression code into a new
component in this framework.

Unfortunately, the framework currently is a single-select one - i.e.,
only one active component at a time. Since I .opal_ignore'd the other
two and made the priority of zlib high, this isn't a problem. However,
if someone wants to re-enable bzip/gzip or add another component, they
might need to transition opal/compress to a multi-select framework.

Included changes:

* Consolidate the compression code into the opal/compress framework

* Move the ORTE zlib compression code into a new opal/compress/zlib
  component

* Ignore the bzip and gzip components in opal/compress framework

* Add a "compress_base_limit" MCA param to set the threshold above which
  we compress data - defaults to 4096 bytes

* Delete stale brucks and rcd components from orte/grpcomm framework

* Delete the orte/regx framework

* Update the launch system to use opal/compress instead of string regex

* Provide a default module if no zlib is available

* Fix some misc multi-node issues

* Properly generate the nidmap in response to a "connection warmup"
  message so the remote daemon knows the children it needs to launch.

* Remove stale references to orte_node_regex

* opal_byte_object_t's are not OPAL objects - properly release allocated
  memory.

* Set the topology

* Currently only handling homogeneous case

* Update the compress framework files to conform

* Consolidate open/close into one "frame" file. Ensure we open/close the
  framework

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-08 11:11:14 -08:00
Nathan Hjelm
c7867ffc4f opal/runtime: fix teardown ordering
This commit fixes the ordering of the teardown for
opal_finalize_util. The installdirs and if frameworks need to come
down before the MCA system.

Fixes #6259

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-09 12:38:12 -07:00
Nathan Hjelm
0edfd328f8 opal: clean up init/finalize
This commit contains the following changes:

 - Remove the unused opal_test_init/opal_test_finalize
   functions. These functions are not used by anything in the code
   base or MTT. Tests use opal_init_util/opal_finalize_util instead.

 - Get rid of gotos in opal_init_util and opal_init. Replaced them
   with a cleaner solution.

 - Automatically register cleanup functions in init functions. The
   cleanup functions are executed in the reverse order of the
   initialization functions. The cleanup functions are run in
   opal_finalize_util() before tearing down the class system.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-12-18 14:37:04 -07:00
Boris Karasev
3796307a57 timings: added new timing points
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-03-21 05:16:25 +02:00
Ralph Castain
0434b615b5 Update ORTE to support PMIx v3
This is a point-in-time update that includes support for several new PMIx features, mostly focused on debuggers and "instant on":

* initial prototype support for PMIx-based debuggers. For the moment, this is restricted to using the DVM. Supports direct launch of apps under debugger control, and indirect launch using prun as the intermediate launcher. Includes ability for debuggers to control the environment of both the launcher and the spawned app procs. Work continues on completing support for indirect launch

* IO forwarding for tools. Output of apps launched under tool control is directed to the tool and output there - includes support for XML formatting and output to files. Stdin can be forwarded from the tool to apps, but this hasn't been implemented in ORTE yet.

* Fabric integration for "instant on". Enable collection of network "blobs" to be delivered to network libraries on compute nodes prior to local proc spawn. Infrastructure is in place - implementation will come later.

* Harvesting and forwarding of envars. Enable network plugins to harvest envars and include them in the launch msg for setting the environment prior to local proc spawn. Currently, only OmniPath is supported. PMIx MCA params control which envars are included, and also allows envars to be excluded.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-02 02:00:31 -08:00
Ralph Castain
fe9b584c05 Fully support OMPI spawn options. Fix a bug in the round-robin mappers where we weren't adding nodes to the job map node array, and so resources were not released
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 285d8cfef74ffc899e9c51e1d9c597b7fb2ceb89)
2017-09-21 10:29:27 -07:00
Gabe Saba
8f2df42055 reachable: Initialize / Finalize reachable framework
Initialize the reachable framework during opal_init() and tear
it back down during opal_finalize().  The framework was never
used, so the lack of initialization didn't matter, but this is
a required step in using the framework.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2017-09-19 19:42:54 -07:00
Ralph Castain
3c914a7a97 Complete the fix of the ORTE DVM. We will now use "prun" instead of "orterun -hnp foo" to execute jobs. This provides the feature of automatic discovery of the orte-dvm so you don't need to manually enter URI's or contact file locations. All IO is forwarded to prun.
Still in the "needs to be done" category:

* mapping/ranking/binding options aren't correctly supported

* if the DVM encounters some errors (e.g., not enough resources for the job), the resulting error is globally set and impacts any subsequent job submission

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-16 13:13:07 -07:00
Ralph Castain
acd60a2cc4 Add missing constant to error-strings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 16:10:52 -07:00
Ralph Castain
c86f71376a Increase fine grain of timing info
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-20 00:17:40 -07:00
Ralph Castain
d645557fa0 Update to include the PMIx 2.0 APIs for monitoring and job control. Include required integration, but leave the monitors off for now. Move the sensor framework out of ORTE as it is being absorbed into PMIx
Fix typo and silence warnings

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 17:47:08 -07:00
Ralph Castain
83199979ba Remove the stale opal/sec framework
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-02 15:41:56 -08:00
Gilles Gouaillardet
b3a2bdda7b opal/threads: manually invoke thread-specific key destructors on the main thread.
there is no such thing as pthread_join(main_thread), so key destructors
are never invoked on the main thread, which causes valgrind report
some memory leaks. Manually store and then invoke the key destructors and
make valgrind happy.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
c9aeccb84e opal/if: open the if framework once in opal_init_util
the if framework is no more open in opal_if*, which plugs
several memory leaks

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-01 14:24:30 +09:00
Gilles Gouaillardet
8cc3f288c9 opal: fix opal_class_finalize() usage
the class system can be initialized/finalized as many times as we like,
so there is no more need to have opal_class_finalize() invoked in a destructor

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-26 15:15:54 +09:00
Ralph Castain
0ea1cff733 Implement notification of completion on comm_spawn'd child jobs. Add a configure flag to enable PMIx 3's shared memory datastore, and set it disable by default so that comm_spawn functions again. Will reverse the default once that feature is fully functional 2016-09-01 13:10:10 -07:00
Ralph Castain
20a91c2baf Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program.
Cleanup debug message
2016-07-13 15:28:33 -07:00
Ralph Castain
6e434d6785 Add support for PMIx tool connections and queries. Initially only support a request to list all known namespaces (jobids) from ORTE, but other folks will extend that support to include additional information
Update to match PMIx RFC

Fix configury to point to correct libevent and hwloc locations
2016-06-29 19:19:19 -07:00
Ralph Castain
5d330d5220 Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler.
Add PMIx 2.0

Remove PMIx 1.1.4

Cleanup copying of component

Add missing file

Touchup a typo in the Makefile.am

Update the pmix ext114 component

Minor cleanups and resync to master

Update to latest PMIx 2.x

Update to the PMIx event notification branch latest changes
2016-06-14 13:08:41 -07:00
Jeff Squyres
5071602c59 PSM/PSM2: Disable signal handler hijacking by default
Per discussion on https://github.com/open-mpi/ompi/pull/1767 (and some
subsequent phone calls and off-issue email discussions), the PSM
library is hijacking signal handlers by default.  Specifically: unless
the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel
TrueScale) is set, the library constructor for this library will
hijack various signal handlers for the purpose of invoking its own
error reporting mechanisms.

This may be a bit *surprising*, but is not a *problem*, per se.  The
real problem is that older versions of at least the PSM library do not
unregister these signal handlers upon being unloaded from memory.
Hence, a segv can actually result in a double segv (i.e., the original
segv and then another segv when the now-non-existent signal handler is
invoked).

This PSM signal hijacking subverts Open MPI's own signal reporting
mechanism, which may be a bit surprising for some users (particularly
those who do not have Intel TrueScale).  As such, we disable it by
default so that Open MPI's own error-reporting mechanisms are used.

Additionally, there is a typo in the library destructor for the PSM2
library that may cause problems in the unloading of its signal
handlers.  This problem can be avoided by setting `HFI_NO_BACKTRACE=1`
(for PSM2 / Intel OmniPath).

This is further compounded by the fact that the PSM / PSM2 libraries
can be loaded by the OFI MTL and the usNIC BTL (because they are
loaded by libfabric), even when there is no Intel networking hardware
present.  Having the PSM/PSM2 libraries behave this way when no Intel
hardware is present is clearly undesirable (and is likely to be fixed
in future releases of the PSM/PSM2 libraries).

This commit sets the following two environment variables to disable
this behavior from the PSM/PSM2 libraries (if they are not already
set):

* IPATH_NO_BACKTRACE=1
* HFI_NO_BACKTRACE=1

If the user has set these variables before invoking Open MPI, we will
not override their values (i.e., their preferences will be honored).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-06-14 11:45:23 -07:00
Ralph Castain
80f4e3b872 Fix the --tune problem by searching the argv for MCA params in advance of opal_init_util. Only search the first app_context as we historically have done - we can debate whether or not to search all app_contexts 2016-05-23 21:09:44 -07:00
Ralph Castain
42ecffb6d0 Move the registration of MCA params out of the init of the var system - put them in with the rest of the OPAL MCA param registrations
Take another shot at untangling the spaghetti

orterun: fix for command line parsing

orte-submit calls opal_init_util () before parsing out MCA command line
options (-mca, -am, etc). This prevents mpirun from setting opal MCA
variables for some frameworks as well as the MCA base. This is because
when a framework is opened all of its variables are set to read-only.
Eventually we want to lift this restriction on some MCA variables but
since -mca is affected we must parse out the MCA command line options
before opal_init_util(). This commit fixes the bug by adding a new
option to opal_cmd_line_parse (ignore unknown option) so orte-submit
can pre-parse the command line for MCA options.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>

Minor cleanups to avoid releasing/recreating the cmd line
2016-05-20 09:59:50 -07:00
Nathan Hjelm
41f00b7465 memory/patcher: initialize patcher framework when needed
This commit moves the patcher framework initialization to the
memory/patcher component.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-04 12:46:42 -06:00
Karol Mroz
e1c64e6e59 opal: standardize on max hostname length
Define OPAL_MAXHOSTNAMELEN to be either:
  (MAXHOSTNAMELEN + 1) or
  (limits.h:HOST_NAME_MAX + 1) or
  (255 + 1)

For pmix code, define above using PMIX_MAXHOSTNAMELEN.

Fixup opal layer to use the new max.

Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-24 08:19:47 +02:00
Nathan Hjelm
c2b6fbb124 opal/memory: move initialization to first rcache creation
Because of the removal of the linux memory component it is no longer
necessary to initialize the memory component in opal_init(). This
commit moves the initialization to the creation of the first rcache
component.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:21:46 -06:00
Nathan Hjelm
27f8a4e806 opal: add code patcher framework
This commit adds a framework to abstract runtime code patching.
Components in the new framework can provide functions for either
patching a named function or a function pointer. The later
functionality is not being used but may provide a way to allow memory
hooks when dlopen functionality is disabled.

This commit adds two different flavors of code patching. The first is
provided by the overwrite component. This component overwrites the
first several instructions of the target function with code to jump to
the provided hook function. The hook is expected to provide the full
functionality of the hooked function.

The linux patcher component is based on the memory hooks in ucx. It
only works on linux and operates by overwriting function pointers in
the symbol table. In this case the hook is free to call the original
function using the function pointer returned by dlsym.

Both components restore the original functions when the patcher
framework closes.

Changes had to be made to support Power/PowerPC with the Linux
dynamic loader patcher. Some of the changes:

 - Move code necessary for powerpc/power support to the patcher
   base. The code is needed by both the overwrite and linux
   components.

 - Move patch structure down to base and move the patch list to
   mca_patcher_base_module_t. The structure has been modified to
   include a function pointer to the function that will unapply the
   patch. This allows the mixing of multiple different types of
   patches in the patch_list.

 - Update linux patching code to keep track of the matching between
   got entry and original (unpatched) address. This allows us to
   completely clean up the patch on finalize.

All patchers keep track of the changes they made so that they can be
reversed when the patcher framework is closed.

At this time there are bugs in the Linux dynamic loader patcher so
its priority is lower than the overwrite patcher.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:16:13 -06:00
Ralph Castain
60a7bc2e50 Enable the PMIx notification callback system. This currently is only supported by the pmix120 component, which is not selected by default. All other components will ignore error registration requests, and thus do not support debugger attach when launched via mpirun. Note that direct launched applications will support such attachment, but may not do so in a scalable fashion.
Fixes ##1225
2016-02-18 09:29:12 -08:00
Gilles Gouaillardet
0b3e3c6817 opal/runtime: add missing #include <unistd.h>
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:56 +09:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Gilles Gouaillardet
c809aace47 initialize common symbols from opal
A few uninitialized common symbols are remaining:

common symbols generated by flex :
 * opal/util/keyval/keyval_lex.l: opal_util_keyval_yyleng
 * opal/util/keyval/keyval_lex.o: opal_util_keyval_yytext
 * opal/util/show_help_lex.l: opal_show_help_yyleng
 * opal/util/show_help_lex.l: opal_show_help_yytext

common symbol generated by "external" hwloc library:
 * opal/mca/hwloc/hwloc191/hwloc/src/components.o: component_map
2015-05-08 09:48:51 +09:00
Nathan Hjelm
8287e1d28f Merge pull request #528 from hjelmn/add_destructor
Add opal destructor/fini function
2015-04-20 11:30:15 -06:00
Nathan Hjelm
662460b06b Modify destructor function configury
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-20 09:51:06 -06:00
Nathan Hjelm
38589c46c0 opal: add a destructor/fini function to opal
This commit is related to an RFC from June 2014. Disscussion can be
found at:

http://www.open-mpi.org/community/lists/devel/2014/07/15140.php

The finalize function is set using either the linker option -fini or
__attribute__((destructor)) depending on compiler support. I have
confirmed that this hybrid approach works with all the major
compilers. The attribute is supported by gcc, clang, llvm, xlc, and
icc. The fini function will support pgi. If a compiler/linker
combination does not support either the destructor or fini function a
message will be printed on re-init indicating it is not supported (an
improvement over the current behavior-- SEGV).

I moved the following to the destructor function:

 - Class system finalize. This solves a bug when MPI_T_finalize is
   called before MPI_Init. The only downside to this change is we will
   leave the footprint of the opal class system after
   MPI_Finalize. This footprint should be relatively small.

This is an alternative to #517 but the two PRs are not
mutually-exclusive (with some modifications). This commit should also
be safe for 1.8.x as it does not change internal or external ABI (#517
changes internal ABI).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-16 19:53:52 -06:00
Nathan Hjelm
e794658f2d Merge pull request #516 from hjelmn/repository_update
RFC: Repository update
2015-04-15 10:03:08 -06:00
Nathan Hjelm
c954f457d9 mca/base: update the way dynamic components are handled
This commit is a rework of the component repository. The changes
included in this commit are:

 - Remove the component dependency code based off .ompi_info
   files. This code is legacy code dating back 10 years that and is no
   longer used.

 - Move the plugin scanning code to the component repository. New
   calls have been added to add new scanning paths, query available
   components, and dlopen/load components.

 - Pass the framework down to mca_base_component_find/filter. Eventually
   the framework structure will be used to further validate components
   before they are used.

 - Add support to the MCA framework system to disable scanning for
   dlopened components on open (support already existed in
   register). This is really only relevant to installdirs as it has no
   register function and no DSO components.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-14 15:55:33 -06:00
Nathan Hjelm
a7b0c00ab6 fix memory leaks and valgrind errors
This commit fixes several vagrind errors. Included:

 - installdirs did not correctly reinitialize all pointers to NULL
   at close. This causes valgrind errors on a subsequent call to
   opal_init_tool.

 - several opal strings were leaked by opal_deregister_params which
   was setting them to NULL instead of letting them be freed by the
   MCA variable system.

 - move opal_net_init to AFTER the variable system is initialized and
   opal's MCA variables have been registered. opal_net_init uses a
   variable registered by opal_register_params!

 - do not leak ompi_mpi_main_thread when it is allocated by
   MPI_T_init_thread.

 - do not overwrite ompi_mpi_main_thread if it is already set (by
   MPI_T_init_thread).

 - mca_base_var: read_files was overwritting mca_base_var_file_list
   even if it was non-NULL.

 - mca_base_var: set all file global variables to initial states on
   finalize.

 - btl/vader: decrement enumerator reference count to ensure that it
   is freed.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-11 09:28:35 -06:00
Jeff Squyres
336626dafe spelling: trivial spelling fix
s/interupted/interrupted/gi
2015-02-27 18:30:43 -08:00
Gilles Gouaillardet
661c35ca67 cleanup dead code caused by the removal of the --with-threads configure option 2015-01-16 19:13:59 +09:00
Ralph Castain
fd6a044b7f Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages.
Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.
2014-10-03 16:02:57 -06:00
Ralph Castain
e32d541c8d Bring over a slight modification to the opal_init_test routine
This commit was SVN r32676.
2014-09-07 15:46:53 +00:00