1
1

21814 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
4e592ac434 Fix the tarball by providing the correct list of headers in the Makefile.am 2015-01-07 18:37:26 -08:00
Nathan Hjelm
7d206ae769 btl/ugni: fix a couple of bugs
Two fixes:

 - Do not try to return a mailbox to the free list if one wasn't
   allocated.

 - Do not try to tear down IRQ CQs if they were not created.
2015-01-07 13:48:17 -07:00
mjbhaskar
2d33b0a745 A fix for memory corruption seen on 32 bit machines 2015-01-07 14:41:44 -06:00
mjbhaskar
27dfcaaab2 Merge branch 'master' of https://github.com/open-mpi/ompi 2015-01-07 14:39:23 -06:00
mjbhaskar
74f8ba2acb A fix for memory corruption problem 2015-01-07 14:34:38 -06:00
Howard Pritchard
f34dd5f5fd plm/alps: update copyright 2015-01-07 12:33:38 -07:00
Howard Pritchard
c454d11b01 plm/alps: fix orted abort hang problem
Turns out the alps plm component wasn't changing the state
of the job upon terminating the orted's in the case of
an abnormal termination.  This caused mpirun to hang
with a zommbie'd aprun process if an orted on a node
in the job was killed via signal.
2015-01-07 12:31:41 -07:00
Nathan Hjelm
81dc3a5db9 Merge pull request #335 from hjelmn/osc_updates
Osc updates
2015-01-07 11:16:55 -06:00
Dave Goodell
49069bc661 usnic: fix fi_av_insert (ARP resolution) bugs
We had several problems in the old code:

1. We were specifying an arbitrary timeout (100 ms) and then abandoning
   all remaining pending AV insert operations.  We would then free the
   endpoint buffer that we gave to fi_av_insert(), usually causing
   libfabric's progress thread to write to a freed buffer.

2. We were claiming in a show_help message that the timeout was
   controllable via an MCA parameter.  This commit removes that
   parameter, since there's no good method for us to specify a timeout
   like this to libfabric right now.

3. We also weren't waiting for the correct number of fi_av_insert()
   operations to complete.  We were waiting for nprocs, which is
   accidentally fine for 2 procs on separate hosts, but not for most
   other proc counts.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
2015-01-07 08:25:17 -08:00
Gilles Gouaillardet
06e071454e btl/openib: cleanup duplicate code 2015-01-07 14:07:30 +09:00
Gilles Gouaillardet
135ecce0eb btl/openib: rename OPAL_HAVE_XRCD macro into OPAL_HAVE_CONNECTX_XRC_DOMAINS 2015-01-07 13:27:25 +09:00
Ralph Castain
e0927895db Grrr...how many files did they forget? 2015-01-06 19:40:18 -08:00
Ralph Castain
84c41429e9 Add missing file 2015-01-06 18:41:11 -08:00
George Bosilca
bf62bed65f Typo in the poll/epoll ops declaration. 2015-01-06 21:21:25 -05:00
Ralph Castain
a7c5ff2ace Update to libevent 2.0.22-stable 2015-01-06 16:37:25 -08:00
Howard Pritchard
061a587384 Merge pull request #336 from hppritcha/topic/odls_signal_fix
odls/base: fix an edge case with signals
2015-01-06 16:11:22 -07:00
Howard Pritchard
f0f98f13b6 odls/base: fix an edge case with signals
In the course of doing some testing with how orted's
handle signaled child processes, found out that very
often doing a kill -9 on a process on a node just
results in the job hanging. The problem was that the
orted odls/errmgr was not properly handling the exit_code
being returned from waitpid.  Now mark the proc state
as ORTE_PROC_STATE_ABORTED_BY_SIG if the exit_code
from waitpid indicates the process exited owing to
a signal.
2015-01-06 15:42:38 -07:00
Nathan Hjelm
6733d89cf9 btl/vader: fix return code check when opening ptrace_scope file 2015-01-06 15:17:56 -07:00
Nathan Hjelm
e68ed2876c osc/pt2pt: threading fixes and code cleanup 2015-01-06 13:39:16 -07:00
Nathan Hjelm
3d79806805 add more internal RMA error codes 2015-01-06 13:39:04 -07:00
Nathan Hjelm
9eba7b9d35 Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma 2015-01-06 13:38:55 -07:00
Nathan Hjelm
cde79bfa60 btl/openib: misc cleanup (tabs, etc) and put credit code into a common place (was duplicated in the send and sendi paths) 2015-01-06 11:39:23 -07:00
Nathan Hjelm
9bae131589 btl/openib: fix message coalescing
There was a bug in the openib btl handling this valid sequence of
calls:

desc = btl_alloc ();
btl_free (desc);

When triggered the bug would cause either fragment loss or undefined
behavior (SEGV, etc). The problem occured because btl_alloc contained
the logic to modify the pending fragment (length, etc) and these
changes were not corrected if the fragment was freed instead of sent.

To fix this issue I 1) moved some of the coalescing logic to the
btl_send function, and 2) retry the coalesced fragment on btl_free
if it was never sent. This appears to completely address the issue.
2015-01-06 11:39:16 -07:00
Nathan Hjelm
9aaac11648 btl/openib: fix recieve queue source detection 2015-01-06 11:39:11 -07:00
Howard Pritchard
7df648f1cf btl/openib: fix problems from commit b3617e73
For systems with OFED's lacking XRC support, commit b3617e73
broke the build of the openib btl.  This commit addresses
the issues introduced by this commit.
2015-01-06 11:31:12 -07:00
Jeff Squyres
cab1379dfb Fortran: only emit real16 and complex32 if supported
This is the master version of @ggouaillardet's patch from
open-mpi/ompi-release#148 (there was a minor conflict to fix and
several fuzzings of line numbers).
2015-01-06 09:47:26 -08:00
Howard Pritchard
ec632001b1 Merge pull request #329 from ggouaillardet/topic/romio_refresh
refresh ROMIO based on v3.2a2-84-gef1cf14
2015-01-06 10:27:20 -07:00
Ralph Castain
4c38c31ccf Actually copy buffer contents when dss.copy of a buffer is requested 2015-01-06 09:09:06 -08:00
Jeff Squyres
e77838973d Merge pull request #313 from ggouaillardet/topic/OFED_3_12
btl/openib: add XRC support with OFED 3.12+
2015-01-06 11:33:19 -05:00
Jeff Squyres
3d5a1bfb7b Merge pull request #334 from yburette/topic/ofimtlbugfixes
Topic/ofimtlbugfixes
2015-01-06 11:30:34 -05:00
Gilles Gouaillardet
0914de9eae refresh ROMIO based on v3.2a2-84-gef1cf14 2015-01-06 19:43:58 +09:00
Gilles Gouaillardet
b3617e736e btl/openib: add XRC support with OFED 3.12+
based on an original patch contributed by Bull.
2015-01-06 15:30:52 +09:00
Yohann Burette
f01dd429df Reset pointer to NULL to prevent double-freeing. 2015-01-05 17:01:37 -08:00
Yohann Burette
1e24da90fe Fix fi_av_insert return code test. 2015-01-05 17:01:37 -08:00
Yohann Burette
5944c294ad Add return code testing for fi_mr_reg. 2015-01-05 17:01:37 -08:00
Howard Pritchard
c857cc926c Merge pull request #327 from hppritcha/topic/async_progress
Topic/async progress
2015-01-05 16:20:44 -07:00
Howard Pritchard
f009c8425e Merge pull request #325 from hppritcha/topic/issue_324
opal/configury: allow param usage multiple times
2015-01-05 16:19:14 -07:00
Howard Pritchard
a179d6a1d7 opal/configury: add url ref to OPAL_FLAGS_UNIQ
Add a reference to the git issue related to additions to
OPAL_FLAGS_UNIQ to handle multiple instances of --param
in the CFLAGS env. variable.
2015-01-05 16:01:18 -07:00
Dave Goodell
8afd8487f8 opal_stdint.h: fix "#pragma GCC" warnings
This was more complicated than I would like, but it's just an
unfortunate GCC/clang difference.  I don't have access to all the C
compilers out there, so this may still have problems with other
compilers that implement some form of `#pragma GCC diagnostic` support
but don't actually behave the same as some versions of GCC.

fixes #323
2015-01-05 14:44:46 -08:00
Jeff Squyres
ce2008aa88 man pages: update non-blocking send descriptions
As noted by Alexander Pozdneev, non-blocking sends are now able to
*access* buffers in pending non-blocking send operations; the buffers
just can't be *modified*.
2015-01-05 15:44:27 -05:00
Mike Dubman
0e4ce91f5f Merge pull request #331 from miked-mellanox/topic/fix_mkey_recursion_master
fix infinite recursion during mkey exchange at scale
2015-01-04 20:23:50 +02:00
Mike Dubman
54a072caaa OSHMEM: fix infinite recursion and stack size violation
send reply before posting the receive request again to limit the recursion size to
number of receive requests.
send can call opal_progress which calls this function again. If recv req is started
stack size will be proportional to number of job ranks.
2015-01-04 16:31:19 +02:00
Devendar Bureddy
e732152304 HCOLL: Fix hcoll supported datatype checks corretcly 2015-01-02 21:18:12 +02:00
Gilles Gouaillardet
e8d084e6b9 fix ABI fix
Fix an undeleted line in open-mpi/ompi@24df0ed039
Thanks to Nick Papior Andersen for pointing this.
2014-12-28 18:07:51 +09:00
Gilles Gouaillardet
9e9261e90a pmix: correctly set locality flags in proc_flags
do not use opal_process_info.cpuset which is not
set at that time.
2014-12-26 15:37:08 +09:00
Gilles Gouaillardet
24df0ed039 MPI_Comm_split_type: fix ABI compatibility
ABI compatibility was previously broken in
open-mpi/ompi@3deda3dc82
2014-12-25 19:43:58 +09:00
Howard Pritchard
a98441cb12 Merge pull request #328 from hppritcha/topic/xpmem_configury
xpmem/config: simple xpmem search on Cray's
2014-12-24 15:04:10 -07:00
Howard Pritchard
0a6f841d5f xpmem/config: simple xpmem search on Cray's
Use the pkg-config related m4 functions to find out where
Cray's xpmem.h and libxpmem are located on a system.

With this commit, there is no longer any need to have to
explicitly indicate an xpmem install location on the configure
line, at least for Cray systems running CLE 4.X and 5.X.
2014-12-24 14:40:06 -07:00
Howard Pritchard
065c756860 btl/ugni: improve error handling
Improve error handling when pthread functions return errors.
Remove stale debug code.
2014-12-24 11:50:24 -07:00
Howard Pritchard
f8e354ce00 btl/ugni: add a request_progress_thread mca param
Replace temporary environment variables with a MCA
parameter for the ugni btl.  A user wishing to
use the ugni btl async. progress thread needs to
set the request_progress_thread param to true.
For example, using env. variable format:

export OMPI_MCA_btl_ugni_request_progress_thread=1
2014-12-24 11:50:24 -07:00