Jeff Squyres
9e6b157cb6
opal: minor update to guess_strlen
...
This is a minor update to
open-mpi/ompi@c52601f0c5 .
If we have vsnprintf(), we might as well not have the rest of the
guess_strlen() routine. Also document the nifty trick/behavior of
vsnprintf() that enables this shortcut (it was new to me!).
2014-12-13 08:09:34 -05:00
George Bosilca
3430714989
Correctly propagate the requested level of thread support during the
...
component init calls.
2014-12-13 02:36:21 -05:00
George Bosilca
2edbe16c47
Add the necessary infrastructure to allow the dumping of all TCP
...
informations related to an endpoint (status and all pending fragments).
Do some minor space cleanup.
2014-12-13 01:59:55 -05:00
George Bosilca
5b8616d890
Fix the race condition in endpoint connection initialization. The race
...
was quite subtle, and only happened on the process with the smallest
guid (as this process will tear down the connection created locally and
replace it with the result of accept). If multiple threads are active in
the system, the deadlock occurs during the recv event deletion as one
thread will hold the recv event lock of the endpoint and try to access
the TCP event base lock, while the other thread will hold the TCP event
base lock while trying to access the recv event lock (in case data is
available on the socket).
The proposed solution let the event callback fail to process the data,
preventing the deadlock and allowing the other thread to always complete
it's job. As the event is not execute the same triggered will trigger
again at the next opportunity, so this solution introduce a minimal
delay in the connection establishement.
2014-12-13 01:45:00 -05:00
Ralph Castain
c52601f0c5
It looks like the guess_len function in our local printf.c has some questionable code in it. Now that we are checking in configure for vsnprintf, take advantage of that check to use the far simpler method if it is available. Given that we no longer support such ancient systems where this might not be available, one suspects the other questionable code may no longer be required - but set that aside for another day.
2014-12-12 17:47:17 -08:00
Ralph Castain
bffb2b7a4b
Correct some issues with variables used before being set
2014-12-12 17:23:32 -08:00
Ralph Castain
0630680f36
Two cleanups required for transfer to 1.8.4:
...
* Use %d format for the topo signature as some systems apparently have problems with %u
* Use correct variable in show_help message
2014-12-12 17:23:32 -08:00
Howard Pritchard
6cf258638a
mpool/udreg: minor comment improvement
2014-12-12 14:05:18 -07:00
Ralph Castain
66a860c1e2
Okay, okay - use the oshmem word here
2014-12-12 08:42:54 -08:00
Ralph Castain
064a241145
Don't install the Java shmem tools if we aren't building Java support. Thanks to Paul Hargrove for noticing.
2014-12-12 08:25:38 -08:00
Nathan Hjelm
38d66272c5
btl/vader: fix compile on SGI UV
2014-12-12 09:09:01 -07:00
Rolf vandeVaart
f4aecdbfd2
Change logging function name from log to logfn. Fixes issue with PGI compile
2014-12-12 09:46:44 -05:00
Jeff Squyres
e4b3c6f1c4
libfabric psm: fix (void*) dereference
...
Committed upstream to libfabric as well.
2014-12-11 20:12:13 -08:00
Jeff Squyres
0f28233b35
libfabric: don't use __thread
...
There's no real reason that this routine should use thread local
storage. Plus, __thread appears to be a GCC extension.
2014-12-11 14:10:48 -08:00
Rolf vandeVaart
9ee8e1dcf4
With PGI compile we need stdarg.h for va_list define
2014-12-11 16:14:57 -05:00
Jeff Squyres
4551cab6f1
help messages: fix obvious typos
2014-12-11 12:23:33 -08:00
Jeff Squyres
0b00e980e0
.gitignore: ignore executes for new tests
2014-12-11 10:02:10 -08:00
Nathan Hjelm
da5e3ce936
test/class: update class tests to also use opal_finalize_util
2014-12-10 17:50:26 -07:00
Jeff Squyres
b1e9e7f56f
Whitespace cleanup only; no code changes
2014-12-10 13:32:04 -08:00
Jeff Squyres
8b2410f554
class tests: re-enable a bunch of tests
...
Many of these tests were failing due to opal_init() failing in some
cases (because the opal shmem framework needs installed components, so
"make distcheck" would fail these tests because the opal shmem
components were not installed). However, all of these tests seem to
be fine with opal_init_util() -- so let's re-enable these tests.
2014-12-10 13:30:14 -08:00
Jeff Squyres
ff2a75b29b
class tests: change from opal_init() to opal_init_util()
2014-12-10 13:29:38 -08:00
Nathan Hjelm
7e5af9cecf
opal_lifo: fix potential race condition when using 128-bit atomics
...
On x86_64 reading a 128-bit value requires multiple instructions.
Under some conditions if the counted pointer counter is read before
the item pointer the fifo can be left in an inconsistent state. This
commit forces the read of the counter to always be read first.
The fifo does not appear to suffer from the same race.
2014-12-10 12:51:44 -07:00
rolfv
f471b09ae9
Add support for CUDA Unified memory. Basically, add a new flag and disable some
...
optimizations when that flag is detected. Lightly reviewed by bosilca.
2014-12-10 05:46:00 -08:00
Artem Polyakov
8ffad75a0a
Introduce timing interval measurement facility in timing framework
2014-12-10 16:47:49 +06:00
Nathan Hjelm
52ed5a9bf8
opal_lifo: fix one more potential issue with the new 128-bit lifo atomics
...
It is possible the compiler can reorder the read of the head item and
the head itself. This could lead to a situation where the item
returned was not really the head item.
2014-12-09 21:48:14 -07:00
Nathan Hjelm
a40fe8311f
opal_lifo: add missing memory barriers in 128-bit atomic functions
2014-12-09 19:50:08 -07:00
Nathan Hjelm
ccbb869274
Use AC_TRY_LINK not AC_TRY_COMPILE when testing for __sync_bool_compare_and_swap on 128-bit values
2014-12-09 18:56:21 -07:00
Nathan Hjelm
1231bb7479
Update lifo and fifo tests to use opal_init/finalize_util so they work during make distcheck
2014-12-09 17:41:18 -07:00
Ralph Castain
04c6d1d01d
Silence warnings
2014-12-09 16:10:58 -08:00
Jeff Squyres
e6c8bfc201
libfabric: Gah -- also remove the "pragma pop" line
...
Thanks to Nathan for pointing out that I missed snipping one line in
2f9c69f016
(I removed the trailing
comment, but not the trailing pragma -- oops!).
2014-12-09 14:03:39 -08:00
Jeff Squyres
2f9c69f016
libfabric: use correct C99 notation for var-length array
...
Nathan pointed out the correct C99 way to notate a variable-length
array in a struct. This change has now been accepted upstream in
libfabric.
2014-12-09 13:33:15 -08:00
Nathan Hjelm
d0da29351f
opal_progress: fix sched_yield check
2014-12-09 14:14:20 -07:00
Jeff Squyres
cd0a54d76f
usnic: short term fix to enable builds on non-libfabric platforms
...
This isn't quite the Right fix yet, because it doesn't address usnic
for external libfabric builds. I'll fix that separately / later.
2014-12-09 09:19:26 -08:00
Nathan Hjelm
b2b7ecc7c4
Merge pull request #300 from hjelmn/topic/atomic_lifo_fifo
...
Add opal_fifo_t class and rename opal_atomic_lifo_t to opal_lifo_t
2014-12-09 10:54:50 -06:00
Jeff Squyres
c40fd09d2a
libfabric: fix providers to conditionally add libs/flags
...
Only allow the usnic and PSM providers to add CPPFLAGS and LIBADD
flags when they are going to be built.
2014-12-09 07:15:25 -08:00
Jeff Squyres
45d6f29a27
Merge pull request #310 from yburette/master
...
libfabric: add optional PSM provider.
2014-12-09 06:39:34 -08:00
Jeff Squyres
6e24a1eb85
usnic: update for libfabric API change
...
Use FI_ADDR_UNSPEC for posting a receive from an unspecified source.
2014-12-09 06:06:52 -08:00
Jeff Squyres
f5a07f651c
libfabric: Open MPI addition to stem a flood of warnings
...
Add a pragma to not warn about zero-length arrays. This needs to be
addressed upstream, but for now, do it here.
2014-12-09 06:04:37 -08:00
Jeff Squyres
f331f48796
libfabric: update embedded libfabric to 934a714
...
Update the embedded copy of libfabric to the github ofiwg/libfabric
repo hash 934a714ca85f1a30a1e384a7d5f714ee962dc253.
2014-12-09 06:03:51 -08:00
Jeff Squyres
09d03a154b
libfabric: fix some typos in the usnic configury
2014-12-09 05:52:24 -08:00
Ralph Castain
18d9fdfd8d
Restore full topology comparison to support inventory monitoring
2014-12-09 01:33:06 -08:00
Ralph Castain
9b2f8cd840
Add the processor architecture to the topology signature
2014-12-09 01:17:00 -08:00
Ralph Castain
9d5135e6cd
Function definition should use the correct type
2014-12-09 01:04:31 -08:00
Ralph Castain
06e49d0e92
Per contribution from Pascal Deveze of Bull: move opal_set_using_threads earlier in MPI_Init (before datatype init) so the value gets set in time to be properly used.
2014-12-09 00:37:57 -08:00
Howard Pritchard
3a14c8eeff
fix build for cray xc
...
Recent addition of libfabric embdded broke build on Cray XC/XE.
This commit fixes this problem.
2014-12-08 22:21:13 -08:00
Yohann Burette
f90a7b51d2
libfabric: add optional PSM provider.
2014-12-08 16:49:41 -08:00
Ralph Castain
bb529ebd8e
Revise the way we handle hetero nodes as users are finding this (a) a significant surprise, and (b) confusing as to when it is required. So try to automate it a bit by creating a topology "signature" that mpirun can share on the cmd line with the remote daemons, thus allowing them to check to see if they match. This isn't comprehensive of course - for now, it only checks the number of each type of hwloc object on the node. This is good enough to pickup major differences (e.g., where we have different numbers of sockets or assigned core bindings).
...
Retain the hetero-nodes flag for those cases where the user *knows* that there are differences and our automated system isn't good enough to see it.
Will obviously require further refinement as we find out which variances it can detect, and which it cannot.
2014-12-08 15:38:14 -08:00
Jeff Squyres
f1629b66da
Merge pull request #309 from yburette/master
...
libfabric: fix typo in Makefile.am
2014-12-08 13:24:56 -08:00
Yohann Burette
f33a9afd22
libfabric: fix typo in Makefile.am
2014-12-08 13:19:43 -08:00
Jeff Squyres
ac8e9d103c
libfabric: need to make AM_CONDITIONALs always be run
...
Ensure that the usnic-specific AM_CONDITIONAL for the embedded
libfabric is always run.
2014-12-08 11:51:26 -08:00