1
1
Граф коммитов

21775 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
8c1698ae4a coll/libnbc: enhance fix for MPI_Ireduce_scatter on single task communicator
this improves open-mpi/ompi@b9349d2eb9
2015-01-09 13:44:01 +09:00
Gilles Gouaillardet
194d9f84d3 btl/usnic: move call to check_reg_mem_basics()
avoid annoying memlock related messages when there is no usnic device.
2015-01-09 11:37:45 +09:00
George Bosilca
1344097d35 Turn OFF the TCP dump mechanism. 2015-01-08 18:50:49 -05:00
George Bosilca
8ddd3b3b09 Cleanup the TCP dump mechanism. 2015-01-08 18:50:05 -05:00
mjbhaskar
39f9880759 Fixed the data type argument in an all reduce operation to fix a bug
seen on 32 bit machines.
2015-01-08 14:18:54 -06:00
mjbhaskar
ba5dc660f7 Merge branch 'master' of https://github.com/open-mpi/ompi 2015-01-08 14:12:01 -06:00
Nathan Hjelm
c65f026fee btl/vader: fix typo in xpmem setup 2015-01-08 12:52:38 -07:00
Nathan Hjelm
9f6faadd91 opal_fifo: add missing memory barrier in pop
Thanks to Adrian Reber for reporting this.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-01-08 09:14:56 -07:00
Gilles Gouaillardet
4c29d8e247 btl/openib: silence warning (unused code) 2015-01-08 17:18:07 +09:00
Gilles Gouaillardet
8ab605d9c5 btl/tcp: fix overflow in mca_btl_tcp_endpoint_dump() 2015-01-08 15:40:16 +09:00
Gilles Gouaillardet
b746a8f584 romio: compile openmpi mpi-io glue 2015-01-08 14:08:46 +09:00
Ralph Castain
4e592ac434 Fix the tarball by providing the correct list of headers in the Makefile.am 2015-01-07 18:37:26 -08:00
Nathan Hjelm
7d206ae769 btl/ugni: fix a couple of bugs
Two fixes:

 - Do not try to return a mailbox to the free list if one wasn't
   allocated.

 - Do not try to tear down IRQ CQs if they were not created.
2015-01-07 13:48:17 -07:00
mjbhaskar
2d33b0a745 A fix for memory corruption seen on 32 bit machines 2015-01-07 14:41:44 -06:00
mjbhaskar
27dfcaaab2 Merge branch 'master' of https://github.com/open-mpi/ompi 2015-01-07 14:39:23 -06:00
mjbhaskar
74f8ba2acb A fix for memory corruption problem 2015-01-07 14:34:38 -06:00
Howard Pritchard
f34dd5f5fd plm/alps: update copyright 2015-01-07 12:33:38 -07:00
Howard Pritchard
c454d11b01 plm/alps: fix orted abort hang problem
Turns out the alps plm component wasn't changing the state
of the job upon terminating the orted's in the case of
an abnormal termination.  This caused mpirun to hang
with a zommbie'd aprun process if an orted on a node
in the job was killed via signal.
2015-01-07 12:31:41 -07:00
Nathan Hjelm
81dc3a5db9 Merge pull request #335 from hjelmn/osc_updates
Osc updates
2015-01-07 11:16:55 -06:00
Dave Goodell
49069bc661 usnic: fix fi_av_insert (ARP resolution) bugs
We had several problems in the old code:

1. We were specifying an arbitrary timeout (100 ms) and then abandoning
   all remaining pending AV insert operations.  We would then free the
   endpoint buffer that we gave to fi_av_insert(), usually causing
   libfabric's progress thread to write to a freed buffer.

2. We were claiming in a show_help message that the timeout was
   controllable via an MCA parameter.  This commit removes that
   parameter, since there's no good method for us to specify a timeout
   like this to libfabric right now.

3. We also weren't waiting for the correct number of fi_av_insert()
   operations to complete.  We were waiting for nprocs, which is
   accidentally fine for 2 procs on separate hosts, but not for most
   other proc counts.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
2015-01-07 08:25:17 -08:00
Gilles Gouaillardet
06e071454e btl/openib: cleanup duplicate code 2015-01-07 14:07:30 +09:00
Gilles Gouaillardet
135ecce0eb btl/openib: rename OPAL_HAVE_XRCD macro into OPAL_HAVE_CONNECTX_XRC_DOMAINS 2015-01-07 13:27:25 +09:00
Ralph Castain
e0927895db Grrr...how many files did they forget? 2015-01-06 19:40:18 -08:00
Ralph Castain
84c41429e9 Add missing file 2015-01-06 18:41:11 -08:00
George Bosilca
bf62bed65f Typo in the poll/epoll ops declaration. 2015-01-06 21:21:25 -05:00
Ralph Castain
a7c5ff2ace Update to libevent 2.0.22-stable 2015-01-06 16:37:25 -08:00
Howard Pritchard
061a587384 Merge pull request #336 from hppritcha/topic/odls_signal_fix
odls/base: fix an edge case with signals
2015-01-06 16:11:22 -07:00
Howard Pritchard
f0f98f13b6 odls/base: fix an edge case with signals
In the course of doing some testing with how orted's
handle signaled child processes, found out that very
often doing a kill -9 on a process on a node just
results in the job hanging. The problem was that the
orted odls/errmgr was not properly handling the exit_code
being returned from waitpid.  Now mark the proc state
as ORTE_PROC_STATE_ABORTED_BY_SIG if the exit_code
from waitpid indicates the process exited owing to
a signal.
2015-01-06 15:42:38 -07:00
Nathan Hjelm
6733d89cf9 btl/vader: fix return code check when opening ptrace_scope file 2015-01-06 15:17:56 -07:00
Nathan Hjelm
e68ed2876c osc/pt2pt: threading fixes and code cleanup 2015-01-06 13:39:16 -07:00
Nathan Hjelm
3d79806805 add more internal RMA error codes 2015-01-06 13:39:04 -07:00
Nathan Hjelm
9eba7b9d35 Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma 2015-01-06 13:38:55 -07:00
Nathan Hjelm
cde79bfa60 btl/openib: misc cleanup (tabs, etc) and put credit code into a common place (was duplicated in the send and sendi paths) 2015-01-06 11:39:23 -07:00
Nathan Hjelm
9bae131589 btl/openib: fix message coalescing
There was a bug in the openib btl handling this valid sequence of
calls:

desc = btl_alloc ();
btl_free (desc);

When triggered the bug would cause either fragment loss or undefined
behavior (SEGV, etc). The problem occured because btl_alloc contained
the logic to modify the pending fragment (length, etc) and these
changes were not corrected if the fragment was freed instead of sent.

To fix this issue I 1) moved some of the coalescing logic to the
btl_send function, and 2) retry the coalesced fragment on btl_free
if it was never sent. This appears to completely address the issue.
2015-01-06 11:39:16 -07:00
Nathan Hjelm
9aaac11648 btl/openib: fix recieve queue source detection 2015-01-06 11:39:11 -07:00
Howard Pritchard
7df648f1cf btl/openib: fix problems from commit b3617e73
For systems with OFED's lacking XRC support, commit b3617e73
broke the build of the openib btl.  This commit addresses
the issues introduced by this commit.
2015-01-06 11:31:12 -07:00
Jeff Squyres
cab1379dfb Fortran: only emit real16 and complex32 if supported
This is the master version of @ggouaillardet's patch from
open-mpi/ompi-release#148 (there was a minor conflict to fix and
several fuzzings of line numbers).
2015-01-06 09:47:26 -08:00
Howard Pritchard
ec632001b1 Merge pull request #329 from ggouaillardet/topic/romio_refresh
refresh ROMIO based on v3.2a2-84-gef1cf14
2015-01-06 10:27:20 -07:00
Ralph Castain
4c38c31ccf Actually copy buffer contents when dss.copy of a buffer is requested 2015-01-06 09:09:06 -08:00
Jeff Squyres
e77838973d Merge pull request #313 from ggouaillardet/topic/OFED_3_12
btl/openib: add XRC support with OFED 3.12+
2015-01-06 11:33:19 -05:00
Jeff Squyres
3d5a1bfb7b Merge pull request #334 from yburette/topic/ofimtlbugfixes
Topic/ofimtlbugfixes
2015-01-06 11:30:34 -05:00
Gilles Gouaillardet
0914de9eae refresh ROMIO based on v3.2a2-84-gef1cf14 2015-01-06 19:43:58 +09:00
Gilles Gouaillardet
b3617e736e btl/openib: add XRC support with OFED 3.12+
based on an original patch contributed by Bull.
2015-01-06 15:30:52 +09:00
Yohann Burette
f01dd429df Reset pointer to NULL to prevent double-freeing. 2015-01-05 17:01:37 -08:00
Yohann Burette
1e24da90fe Fix fi_av_insert return code test. 2015-01-05 17:01:37 -08:00
Yohann Burette
5944c294ad Add return code testing for fi_mr_reg. 2015-01-05 17:01:37 -08:00
Howard Pritchard
c857cc926c Merge pull request #327 from hppritcha/topic/async_progress
Topic/async progress
2015-01-05 16:20:44 -07:00
Howard Pritchard
f009c8425e Merge pull request #325 from hppritcha/topic/issue_324
opal/configury: allow param usage multiple times
2015-01-05 16:19:14 -07:00
Howard Pritchard
a179d6a1d7 opal/configury: add url ref to OPAL_FLAGS_UNIQ
Add a reference to the git issue related to additions to
OPAL_FLAGS_UNIQ to handle multiple instances of --param
in the CFLAGS env. variable.
2015-01-05 16:01:18 -07:00
Dave Goodell
8afd8487f8 opal_stdint.h: fix "#pragma GCC" warnings
This was more complicated than I would like, but it's just an
unfortunate GCC/clang difference.  I don't have access to all the C
compilers out there, so this may still have problems with other
compilers that implement some form of `#pragma GCC diagnostic` support
but don't actually behave the same as some versions of GCC.

fixes #323
2015-01-05 14:44:46 -08:00