1
1
Граф коммитов

22553 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
ef01e130aa mca/base: protect mca_base_component_repository_release if dlopen support is disabled 2015-04-15 10:06:43 -06:00
Nathan Hjelm
d5b52d3141 ompi/communicator: make comm_request internal variables static 2015-04-15 10:05:21 -06:00
Nathan Hjelm
e794658f2d Merge pull request #516 from hjelmn/repository_update
RFC: Repository update
2015-04-15 10:03:08 -06:00
Nathan Hjelm
81502fafa8 Merge pull request #379 from hjelmn/remove_enable_smp_locks
Per-RFC: remove the --disable-smp-locks configure option
2015-04-15 10:02:23 -06:00
Ralph Castain
8113b37f68 Complete update of the NEWS and README for 1.8.5 2015-04-15 08:05:08 -07:00
Mike Dubman
eb922fc321 Merge pull request #532 from elenash/master
fix for -am -tune options issue came from PR 520
2015-04-15 16:35:36 +03:00
Elena
96bdf595c2 fix for -am -tune options issue came from PR 520 2015-04-15 15:51:49 +03:00
Jeff Squyres
3869887bae NEWS: update and expand 1.8.5 bullets 2015-04-15 05:14:35 -07:00
Dave Goodell
849f882ab3 Merge pull request #381 from goodell/pr/common-syms
detect common symbols at install time
2015-04-14 17:05:40 -05:00
Nathan Hjelm
c954f457d9 mca/base: update the way dynamic components are handled
This commit is a rework of the component repository. The changes
included in this commit are:

 - Remove the component dependency code based off .ompi_info
   files. This code is legacy code dating back 10 years that and is no
   longer used.

 - Move the plugin scanning code to the component repository. New
   calls have been added to add new scanning paths, query available
   components, and dlopen/load components.

 - Pass the framework down to mca_base_component_find/filter. Eventually
   the framework structure will be used to further validate components
   before they are used.

 - Add support to the MCA framework system to disable scanning for
   dlopened components on open (support already existed in
   register). This is really only relevant to installdirs as it has no
   register function and no DSO components.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-14 15:55:33 -06:00
Dave Goodell
8085edc27c build: detect common symbols at install time
This adds a check at `make install` time to look for common symbols.  It
attempts to ignore "Fortran-shaped" symbols by default.  It also will
look in the source tree for any files named "common_sym_whitelist" and
will ignore any symbols listed in that file (one per line, comments
allowed).

See open-mpi/ompi#375 for more background.
2015-04-14 14:54:26 -07:00
Howard Pritchard
836eefd66f Merge pull request #524 from hppritcha/topic/no_ud_if_no_verbs
oob/config: if --with-verbs=no, no ud
2015-04-14 12:23:10 -06:00
Nathan Hjelm
f8158a5ec1 btl/vader: fix deadlock in mca_btl_vader_progress_endpoints
This commit fixes a typo in mca_btl_vader_progress_endpoints where
OPAL_THREAD_LOCK was used when OPAL_THREAD_UNLOCK was intended.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-14 09:34:45 -06:00
Ralph Castain
a4b1225892 Don't register the PSM errhandler until it is certain that the PSM component can be used.
This doesn't matter on the master, but it does matter on the 1.8 branch as the MTL select logic is different over there.
2015-04-14 07:54:53 -07:00
Jeff Squyres
9ac9be15c4 opal_config_subdir.m4: do not export vars before invoking subdir configure
== Short version

Do not export special variables into the environment (e.g., LIBS,
LDFLAGS, etc.) when invoking subdir configure scripts.  This prevents
problems described in open-mpi/ompi#471.

== More detail

Exporing special env variables before invoking a subdir configure
script causes problems in some cases.  E.g., in open-mpi/ompi#471,
when the user configures with `--with-hwloc=/path/to/hwloc`, and that
directory is *not* in a default linker search location will cause the
libevent subdir configuration to fail.

This happens because:

1. We'll pass LIBS="-L/path/to/hwloc/lib -lhwloc" to the libevent
   configure script
1. Meaning: configure-generated executables will link successfully
1. But unless LD_LIBRARY_PATH (or some other
   tell-the-linker-where-to-find-things mechanism) includes
   /path/to/hwloc/lib, the executable can't run.

Specifically, the libevent "hey, does the compiler generate proper
executables?" check will fail, and configure will abort (because OMPI
needs libevent).

I checked the history: exporting these vars dates all the way back to
LAM/MPI.  I can't think of a reason why we need to export these
variables -- AC_CONFIG_SUBDIRs doesn't do it; subdir configure scripts
should be orthogonal from the upper-layer configure script (and its
variables).  So let's remove these export statements and see if
anything breaks.
2015-04-14 07:04:01 -07:00
Jeff Squyres
fadc3ad01a hwloc/external/configure.m4: no need to unset
Instead, use a safe environment variable name (that is SCOPE_PUSHed and
SCOPE_POPed).
2015-04-14 07:04:01 -07:00
Jeff Squyres
24fe86b74f opal_config_subdir[_args].m4: use OPAL_VAR_SCOPE_PUSH/POP 2015-04-14 07:04:01 -07:00
Jeff Squyres
f9b7b9f6b2 opal_config_subdir.m4: remove some dead code 2015-04-14 07:04:01 -07:00
Howard Pritchard
283ef4c05d oob/config: if --with-verbs=no, no ud
The oob/ud configure was not honoring the case
if the ompi is configured with --with-verbs=no.
This fixes that problems.

Fixes #522

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-04-14 06:31:18 -07:00
Jeff Squyres
1029dea72b configure.ac: don't call AC_C_INLINE or AC_C_RESTRICT
OMPI *requires* a C99 compiler, so we don't need the AC_C_INLINE or
AC_C_RESTRICT tests anymore (because the C99 compiler guarantees to
support them).  Indeed, in some cases (see open-mpi/ompi#491),
AC_C_INLINE gets the wrong answer and defines `inline` to be empty,
which screws up some systems (e.g., OS X with clang).

As an added bonus, we get to get rid of a bunch of gcc 2.96-specific
code (!) in configure.ac.
2015-04-13 17:17:24 -04:00
Nathan Hjelm
113c890ccf Merge pull request #520 from hjelmn/valgrind_cleanness
fix memory leaks and valgrind errors
2015-04-13 10:09:34 -06:00
Jeff Squyres
49f52a5356 osc_sm_passive_target.c: update the check for lock types
Based on some on-list and IM discussion with @hjelmn about
open-mpi/ompi@40b7643119, change the testing to a switch/case.  If we
fall into the default case, assert() error (because it's an OMPI
developer programming error).
2015-04-13 12:02:15 -04:00
Jeff Squyres
40b7643119 osc_sm_passive_target.c: ensure ret is always defined
Fixes a compiler warning
2015-04-13 11:31:43 -04:00
Ralph Castain
9c6d452d6b If we are using HT cpus and have <= 2 procs, then map-by hwthread by default 2015-04-11 21:18:05 -07:00
Ralph Castain
cd686057f6 If the HNP is on a coprocessor, record it so we don't get an error log later 2015-04-11 15:30:15 -07:00
Nathan Hjelm
a7b0c00ab6 fix memory leaks and valgrind errors
This commit fixes several vagrind errors. Included:

 - installdirs did not correctly reinitialize all pointers to NULL
   at close. This causes valgrind errors on a subsequent call to
   opal_init_tool.

 - several opal strings were leaked by opal_deregister_params which
   was setting them to NULL instead of letting them be freed by the
   MCA variable system.

 - move opal_net_init to AFTER the variable system is initialized and
   opal's MCA variables have been registered. opal_net_init uses a
   variable registered by opal_register_params!

 - do not leak ompi_mpi_main_thread when it is allocated by
   MPI_T_init_thread.

 - do not overwrite ompi_mpi_main_thread if it is already set (by
   MPI_T_init_thread).

 - mca_base_var: read_files was overwritting mca_base_var_file_list
   even if it was non-NULL.

 - mca_base_var: set all file global variables to initial states on
   finalize.

 - btl/vader: decrement enumerator reference count to ensure that it
   is freed.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-11 09:28:35 -06:00
Ralph Castain
91e1cbf284 Init variable 2015-04-11 07:44:57 -07:00
Ralph Castain
033418f62a Correct a typo that reversed the default binding pattern. Ensure we default bind to hwthread if user specified --use-hwthread-cpus if nprocs <= 2, and bind to hwthread if told to do so. 2015-04-10 15:58:35 -07:00
Ralph Castain
3e44d3c9e3 Enable singletons to run without any active OOB module until they attempt to comm_spawn 2015-04-10 14:06:42 -07:00
Ralph Castain
e4f6f83b9d Attempt to silence new Coverity complaint by ensuring the string read from file is NULL terminated. 2015-04-10 07:54:37 -07:00
Ralph Castain
396700ad8b Protect the notifier macro's against NULL job objects 2015-04-09 16:04:43 -07:00
Nathan Hjelm
f27bf45475 Merge pull request #519 from hjelmn/opal_err_fix
opal/util/error: check for existing converter for error range
2015-04-09 12:41:58 -06:00
Nathan Hjelm
75f210fdb9 opal/util/error: check for existing convertor for error range
This commit fixes a bug when opal_error_init is called with the same
values multiple times. If opal_error_init is called too many times it
will start failing with OPAL_ERR_OUT_OF_RESOURCE. To fix the problem
check if an existing convertor matching the requested one and return
that one instead.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-09 11:51:36 -06:00
Nathan Hjelm
2be769bc0c Merge pull request #518 from hjelmn/putenv_fix
ess/singleton: do not put component strings into the environment
2015-04-09 11:06:42 -06:00
Nathan Hjelm
c416c423bb ess/singleton: do not put component strings into the environment
putenv requires that any string put into the environment is not
changed or freed. That is not the case with constant strings as they
will go away when dlclose is called on the component. Instead, just
use opal_setenv which does not have this restriction.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-09 11:00:47 -06:00
Rolf vandeVaart
b7913836fc Initialize variables for safety 2015-04-09 12:58:55 -04:00
Rolf vandeVaart
d6d7184703 Enhance verbose message 2015-04-09 12:29:09 -04:00
Ralph Castain
acc2c7937c Thanks Nathan - decrement the counter to ensure singleton's startup correctly 2015-04-08 11:23:35 -07:00
Nathan Hjelm
eb56117405 Merge pull request #513 from hjelmn/mca_bug_fixes
opal: fix multiple bugs in MCA and opal
2015-04-08 10:29:44 -06:00
Rolf vandeVaart
7163c41a4d Fix pathname 2015-04-08 08:52:17 -04:00
Nathan Hjelm
9cd955badf opal: fix multiple bugs in MCA and opal
This commit fixes the following bugs:

 - opal_output_finalize did not properly set internal state. This
   caused problems when calling the sequence opal_output_init (),
   opal_output_finalize (), opal_output_init ().

 - opal_info support called mca_base_open () but never called the
   matching mca_base_close (). mca_base_open () and mca_base_close ()
   have been updated to use a open count instead of an open flag to
   allow mca_base_open to be called through multiple paths (as may be
   the case when MPI_T is in use).

 - orte_info support did not register opal variables. This can cause
   orte-info to not return opal variables.

 - opal_info, orte_info, and ompi_info support have been updated to
   use a register count.

 - When opening the dl framework the reference count was added to
   ensure the framework stuck around. The framework being closed
   prematurely was a bug in the MCA base that has since been
   corrected. The increment (and associated decrement) have been
   removed.

 - dl/dlopen did not set the value of
   mca_dl_dlopen_component.filename_suffixes_mca_storage on each call
   to register. Instead the value was set in the component
   structure. This caused the value to be lost when re-loading the
   component. Fixed by setting the default value in register.

 - Reset shmem framework state on close to avoid returning a stale
   component after reloading opal/shmem.

 - MCA base parameters were not properly deregistered when the MCA
   base was closed.

This commit may fix #374.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-07 19:13:20 -06:00
Howard Pritchard
5ee18f4f00 Merge pull request #514 from hppritcha/topic/mpi_win_lock_all_man
man pages: fix problem with MPI_Win_lock_all
2015-04-07 17:17:30 -06:00
Howard Pritchard
291c775e74 man pages: fix problem with MPI_Win_lock_all
thanks to Thomas Jahns for pointing this out -

http://www.open-mpi.org/community/lists/users/2015/04/26633.php

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-04-07 16:29:00 -06:00
Nathan Hjelm
2409715fc3 Merge pull request #511 from hjelmn/osc_pt2pt_fix
osc/pt2pt: fix synchronization bugs
2015-04-07 09:14:00 -06:00
Howard Pritchard
fc3a0f60c5 Merge pull request #512 from hppritcha/topic/java_better_dlopen_error
ompi/java: better error message if dlopen fails
2015-04-06 14:08:10 -06:00
Howard Pritchard
18039b34b4 ompi/java: better error message if dlopen fails
The error message emitted by ompi/java when dlopen
fails is misleading and not very informative.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-04-06 13:35:09 -06:00
Nathan Hjelm
80ed805a16 osc/pt2pt: fix synchronization bugs
The fragment flush code tries to send the active fragment before
sending any queued fragments. This could cause osc messages to arrive
out-of-order at the target (bad). Ensure ordering by alway sending
the active fragment after sending queued fragments.

This commit also fixes a bug when a synchronization message (unlock,
flush, complete) can not be packed at the end of an existing active
fragment. In this case the source process will end up sending 1 more
fragment than claimed in the synchronization message. To fix the issue
a check has been added that fixes the fragment count if this situation
is detected.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-06 08:39:19 -06:00
Ralph Castain
108bcb70b0 Update NEWS with 1.8.5 items 2015-04-05 11:30:56 -07:00
Ralph Castain
c32609b1c7 Bring over open-mpi/hwloc@f714f8d
linux: only use the device-tree on Power machines

It's available on ARM but the assumption that cpus' "reg" start at 0
is invalid.
We could make that work but the device-tree doesn't currently
bring anything better than sysfs on ARM, so don't bother for now.
2015-04-04 09:30:21 -07:00
rhc54
657490c763 Merge pull request #510 from jithinjosepkl/pr/mtl-opt-pr
Optimizations to PML-CM, MTL-OFI
2015-04-04 09:22:55 -07:00