1
1
Граф коммитов

3805 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
db4f483653 btl/sm: fix race condition
write to file and then rename, so when the file is open for read, its content is known to have been written.

Fixes open-mpi/ompi#1230
2015-12-21 16:37:51 +09:00
Jeff Squyres
53ca721ff4 configury: clean up .so version numbers
Move .so version numbers to their appropriate project in the top-level
VERSION file.  Also add the project name to all .so version number
names.  Remove no-longer-used .so names.
2015-12-18 12:50:23 -05:00
rhc54
978c54880d Merge pull request #1238 from rhc54/topic/cleanup
Cleanup warnings in opal and orte layers when building optimized on Mac
2015-12-17 09:37:48 -08:00
Ralph Castain
64b695669a Cleanup warnings in opal and orte layers when building optimized on Mac 2015-12-17 07:51:24 -08:00
Nathan Hjelm
e77199fd4f Merge pull request #1235 from ggouaillardet/topic/ibv_exp_fixes
btl/openib: do not mix exp and non exp verbs
2015-12-17 08:36:09 -07:00
Gilles Gouaillardet
994a627f82 btl/openib: do not mix exp and non exp verbs 2015-12-17 16:45:43 +09:00
Artem Polyakov
0951a34e95 Fix openib memory registration limit calculation if cutoff = 0. 2015-12-17 13:45:19 +06:00
Gilles Gouaillardet
75d16cfb27 Fix a few places where opal/util/argv.h were required when building pmix components (go figure) 2015-12-17 16:19:25 +09:00
Jeff Squyres
2b9341a38a usnic: fix embarrissing typo 2015-12-15 19:01:19 -08:00
rhc54
7a9106e74d Merge pull request #1226 from rhc54/extpmix
Create the pmix external component.
2015-12-15 17:33:14 -08:00
Jeff Squyres
944d5061a6 usnic: sendto() can return EPERM if we send too fast
If we send too fast, sendto() can run out of resources and return
EPERM.  So delay a little and try again.
2015-12-15 15:31:29 -08:00
Ralph Castain
3a56f0d34b Create the pmix external component. Fix a few places where opal/util/argv.h were required when building with an external pmix (go figure).
NOTE: Building with external pmix *requires* that you also build with external libevent and hwloc libraries. Detect this at configure and error out with large message if this requirement is violated.

Closes #1204  (replaces it)
Fixes #1064
2015-12-15 15:26:13 -08:00
Jeff Squyres
ab1bbca5b9 usnic: improve error message
When sendto() fails, it would be helpful to see the errno value.
2015-12-15 15:04:25 -08:00
Jeff Squyres
c1a6beac8d usnic: fix error message
There were too many "%s" instances.  Re-order the output so that we
show file, line, and then the error message.
2015-12-15 14:48:38 -08:00
Nathan Hjelm
c98086f028 Merge pull request #1223 from hjelmn/ib_use_srq
btl/openib: use only SRQ on ib by default
2015-12-15 14:04:19 -08:00
Nathan Hjelm
00da520fd5 Merge pull request #1222 from hjelmn/vader_fix
btl/vader: do not attempt to munmap opal/shmem pointer
2015-12-15 09:06:50 -08:00
Nathan Hjelm
b24b3a4ae4 btl/openib: use only SRQ on ib by default
It was decided some time ago that there is no benefit to using any
per-peer receive queues on infiniband. At the time we decided not to
change the default but that objection has been dropped. This commit
changes the 128 message queue to use SRQ instead of PP. This has no
impact on iWarp which sets the default in a different way.

Closes open-mpi/ompi#1156

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-15 09:48:03 -07:00
Nathan Hjelm
60591ae753 btl/vader: do not attempt to munmap opal/shmem pointer
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-15 08:48:04 -07:00
Todd Kordenbrock
7b97963669 btl-portals4: remove unnecessary PtlMDBind result check
When PtlMDBind was removed, the result check was left in which
causes intermittent failures depending on the junk value found in
the 'ret' variable.  The commit removes the result check.
2015-12-14 12:09:01 -06:00
Jeff Squyres
7977fa3f0b pmix112 config.h.in: remove generated file 2015-12-13 06:46:55 -08:00
Ralph Castain
03eb1a80bf Update the PMIx native component to release v1.1.1, with addition of one bug-fix commit beyond the official release
Rename the pmix1xx component to pmix111 so it reflects the actual release it includes

Resolve the problem of PMIx being passed a bogus --with-platform argument when configuring the PMIx tarball code. There is no reason we should be passing --with-platform arguments to any internal subdirectory, so just leave that out when constructing the opal_subdir_args variable.

Update the PMIx code and continue attempting to debug direct modex

Fix a problem in the ORTE PMIx server - there was an early intent to optimize the direct modex by fetching data for all procs from the target job on the remote node, instead of fetching the data one proc at a time. However, this was never completely implemented, and so we would hang if we had multiple overlapping requests for data from more than one proc on the node.

Update PMIx to v1.1.2
2015-12-12 18:46:38 -08:00
Ralph Castain
5e5adebf8e Port the changes from #782 to the master. Not everything applies here as the code in the 1.10 series is a little different. In addition, we asked for a few changes (e.g., using MPI_ERR_ARG instead of "13") that are incorporated here.
Thanks to @jsharpe for the PR
2015-12-12 12:40:34 -08:00
Nathan Hjelm
772172a99b mca/base: remove erroneous check in var group register function
This commit removes a check that causes mca_base_group_register to
improperly create a new group instead of using an existing group
when the project and framework names are the same. This check was
originally intended to prevent forming groups with names like
ompi_ompi, opal_opal, etc but there is no reason why we shouldn't
allow that.

Fixes open-mpi/ompi#1155

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-12-09 19:48:39 -07:00
Nathan Hjelm
f692576f1e btl/openib: add check for IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG
Mofed 2.2 does not have the IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG attribute
flag. Add a check to fix compilation for mofed 2.2. This commit only
fixes complilation with the older mofed. It will not allow an Open MPI
compiled with mofed 2.3 or newer to work on a machine with mofed 2.2.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-09 17:02:36 -07:00
Ryan Grant
e5ea2e3248 Merge pull request #1193 from tkordenbrock/topic/fix.btl.logical.endpoint.rank
btl-portals4: set endpoint rank even if endpoint already exists

--Needs to be pulled over to 2.0.0 still @tkordenbrock
2015-12-09 13:49:44 -08:00
Howard Pritchard
c2ea018ce5 Merge pull request #1194 from hppritcha/topic/fix_cray_pmix_locality
pmix/cray: fix locality bug
2015-12-08 16:25:30 -07:00
Howard Pritchard
fecb326256 pmix/cray: fix locality bug
There was a bug with the way the cray pmix component
was setting the locality property for ranks on the
same node, etc.

Improve location/syntax of a comment block.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-12-08 11:13:48 -08:00
Todd Kordenbrock
2b7e983989 btl-portals4: set endpoint rank even if endpoint already exists
If btl-portals4 is configured to use logical mapping of ranks to
physical nodes, then the endpoint must have the rank field set.
This commit fixes a bug that caused the endpoint to have the
nid/pid instead of the rank if the endpoint already exists.
2015-12-08 12:29:00 -06:00
Nathan Hjelm
c9382f23e9 mlx5: need to set comp_mask to get experimental verbs attributes
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-08 10:34:16 -07:00
Gilles Gouaillardet
4d2c7f7de1 cuda: fix missing #include opal/util/argv.h 2015-12-07 14:10:32 +09:00
George Bosilca
4d00c59b2e Cleanup the memory handling for temporary buffers in
some of the collective modules. Added a new function
opan_datatype_span, to compute the memory span of
count number of datatype, excluding the gaps in the
beginning and at the end. If a memory allocation is
made using the returned value, the gap (also returned)
should be removed from the allocated pointer.
2015-12-02 20:42:18 -05:00
Ralph Castain
9803d69d02 Ensure the embedded PMIx respects an OMPI-level --disable-debug 2015-12-01 08:00:24 -08:00
rhc54
ac892a667a Merge pull request #1168 from rhc54/topic/libevent
Somehow, this got left out when merging PR #1109, so let's try it again.
2015-12-01 07:56:36 -08:00
Artem Polyakov
6a4ec3396a Merge pull request #1164 from hjelmn/mlx5_atomics_update
btl/openib: fix compile problems when using experimental verbs
2015-12-01 19:14:31 +05:00
Ralph Castain
7ac5c082ff Somehow, this got left out when merging PR #1109, so let's try it again. 2015-12-01 06:02:29 -08:00
igor-ivanov
d8c85738ab Merge pull request #1151 from igor-ivanov/pr/opal-abort-vars
Add new mca variables opal_abort_delay and opal_abort_print_stack
2015-12-01 16:27:11 +04:00
Nathan Hjelm
191aebb9c8 btl/openib: fix compile problems when using experimental verbs
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-30 22:21:26 -07:00
Nathan Hjelm
bb8e347371 btl/openib: update experimental verbs support
This update adds an additional check (if supported) to see if 8-byte
atomics are supported by the hardware. If 8-byte atomics are not
supported the atomics support is disabled.

This commit also includes some cleanup.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-30 12:32:04 -07:00
igor.ivanov@itseez.com
c15bf147bf opal: Add opal_abort_print_stack mca variable with aliases for ompi/oshmem
This commit allows to control output during abnormal oshmem/ompi application
termination.
Fixed issue in backtrace output. HAVE_BACKTRACE was never set so user was limited
in control of this variable.
Two related mca variables are moved to opal layer. Corresponding aliases are
added for ompi and oshmem.
2015-11-25 18:18:33 +02:00
Nathan Hjelm
02a6c6856d btl/openib: add support for mlx5 atomic operations
This commit adds support for fetch-and-add and compare-and-swap when
using the mlx5 driver. The support is only enabled if the expanded
verbs interface is detected. This is required because mlx5 HCAs return
the atomic result in network byte order. This support may need to be
tweaked if Mellanox commits their changes into upstream verbs.

Closes open-mpi/ompi#1077

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-23 16:07:12 -07:00
Francois WELLENREITER
251009e0aa BTL portals4: remove useless PtlMDBind PtlMDRelease calls for RMDA 2015-11-19 14:51:00 +01:00
Matias A Cabral
254a05dbbb Default values for Intel HFI1 (OmniPath gen1 device) in openib btl 2015-11-11 12:35:35 -08:00
Nathan Hjelm
2c02294389 opal_free_list: fix strange size check
OPAL free lists can be initialized with a fragment size that differs
from the size of objects from a class. This allows the free list code
to support OPAL objects that have flexible array members.

Unfortunately the free list code will throw out the desired length in
some cases. The code in question was committed in
open-mpi/ompi@90fb58de. The side effects of this are varied and can
cause segmentation faults, assert failures, hangs, etc. This commit
adds a check to ensure the requested size is at least as large as the
class size and makes opal_free_list allocations always honor the
requested fragment size (as long as it is larger than the class
size).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-09 19:47:55 -07:00
Ralph Castain
52ea538bc1 Per fix from Nysal: set the listener_active flag before starting the progress thread, and declare the flag to be volatile 2015-11-09 09:00:59 -08:00
Ralph Castain
ee9aa67483 Update the libevent renaming file to ensure that all public symbols are covered 2015-11-07 12:52:31 -08:00
rhc54
7a9b9325a8 Merge pull request #1107 from rhc54/topic/pmix
Work on cleaning up memory leaks that are causing orte-dvm to eventua…
2015-11-06 17:16:49 -07:00
Ralph Castain
fed28e4cfc Add missing file that was previously ignored 2015-11-06 14:37:09 -08:00
Ralph Castain
5f446570d8 Work on cleaning up memory leaks that are causing orte-dvm to eventually run out of memory. Still don't have everything plugged, but getting better. Sync to the PMIx master that includes removal of the pmix_common.h.in file that really didn't need to be generated, and update to the PMIx_server_init API. 2015-11-06 14:15:30 -08:00
Jeff Squyres
b35b708979 tcp BTL: fix inconsistent whitespace problems
No code/logic changes.
2015-11-06 12:41:13 -08:00
Jeff Squyres
300cff2b89 usnic: fix/update the usnic stats
1. Fix: old v1.6-era code reset the stats-emitting event to fire twice
   for each time period.
1. Add the usNIC device name to the output for differentiating the
   output in multi-rail scenarios.
2015-11-06 12:05:34 -08:00