1
1
Граф коммитов

2967 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
c1b43b6753 libfabric: the LIBADD should be unconditional
The LIBADD for the common libfabric library does not belong down in
the providers; it needs to be set when the libfabric core itself
decides to build.
2014-12-17 14:02:08 -08:00
Jeff Squyres
f1a5d3a90d configury: propagate a libtool shared lib version for libfabric 2014-12-17 13:36:01 -08:00
Jeff Squyres
d6f059f538 configury: add some descriptive output messages in configure
Ensure that the ofi MTL and the usnic BTL have good descriptive output
messages in configure.
2014-12-17 13:36:01 -08:00
Jeff Squyres
6edc19d78d libfabric: ensure that shell variables are initialized
Ensure that the <provider>_happy shell variables are initialized to
0.  Without this, the --without-libfabric case would leave them
initialized, resulting in "test: -eq operator expecting a value" kinds
of errors.
2014-12-17 13:36:01 -08:00
Rolf vandeVaart
f55de452ab Change the way we register the sm memory pool with CUDA. Rather than just registering local free lists, register the entire pool as the local process does not know which memory the remote processes are using for free lists. Fixes performance problem we were seeing with copying out of memory (since host piece was not pinned). 2014-12-17 14:21:34 -05:00
George Bosilca
830df07202 Fix the indentation. 2014-12-16 16:07:42 -05:00
George Bosilca
146ab96e29 These variables are now unnecessary. 2014-12-16 16:05:00 -05:00
Aurélien Bouteiller
ee3b090316 The fallback case when yama is not installed was not correct in CMA vader 2014-12-16 14:39:14 -05:00
Aurélien Bouteiller
0bf860ef02 indentation 2014-12-16 14:22:26 -05:00
Jeff Squyres
95da4a5a0e usnic: no longer use opal_using_threads()
Instead, use the flag that is passed in.
2014-12-16 08:49:01 -08:00
Artem Polyakov
01601f3284 Merge pull request #305 from artpol84/timing
Timing framework improvement
2014-12-16 15:13:48 +06:00
George Bosilca
357daa834e Stay on the safe side: Only one thread is allowed
to handle an event_base.
2014-12-15 23:19:51 -05:00
George Bosilca
2fec570fe7 There is no need to keep track of these events. They are scheduled
as triggers in libevent, so one bookkepping should be enough.
2014-12-15 22:35:29 -05:00
George Bosilca
46baab350c The event is automatically deleted by default. 2014-12-15 21:59:20 -05:00
George Bosilca
b01abfa0d7 Don't over-do it! 2014-12-15 21:33:32 -05:00
George Bosilca
f87a4b691b Solve another handshake problem, where one threads was calling del_event
while cleaning up after receiving a zero byte on the connect socket
(localyy started connection), while another was trying to accept a
new connection from the same peer. Create a zero-timed event and
delocalize the accept into a timer_event.
Add support for registering an error callback, that can be used when a
connection is discovered as failed during the initialization process.
2014-12-15 20:27:32 -05:00
George Bosilca
e20413c885 Rearrange the code to remove a compiler complaint about
the missing return from a non-void function.
2014-12-15 15:42:57 -05:00
Ralph Castain
573a574a3c Remove an unused dstore type that was redundant with another one. Define a corresponding PMIX_NODE_ID type (contains the vpid of the daemon hosting the proc) and ensure that the PMIx server includes that info in its process map 2014-12-15 12:11:13 -08:00
Mike Dubman
2fbe87defe Merge pull request #314 from miked-mellanox/topic/fix_opal_path_nfs
add support for autofs and make check pass. jenkins: check,src_rpm
2014-12-15 20:52:52 +02:00
Ralph Castain
9658256a98 Restore the passing of the complete job map to the local proc on first get_attr so the info can be used by the MPI layer without continual calls back to the server. We'll find a more memory efficient method later. 2014-12-13 18:44:09 -08:00
Mike Dubman
42f3fa0d1e OPAL: add support for autofs magic type 2014-12-13 20:27:47 +02:00
Jeff Squyres
9e6b157cb6 opal: minor update to guess_strlen
This is a minor update to
open-mpi/ompi@c52601f0c5.

If we have vsnprintf(), we might as well not have the rest of the
guess_strlen() routine.  Also document the nifty trick/behavior of
vsnprintf() that enables this shortcut (it was new to me!).
2014-12-13 08:09:34 -05:00
George Bosilca
2edbe16c47 Add the necessary infrastructure to allow the dumping of all TCP
informations related to an endpoint (status and all pending fragments).
Do some minor space cleanup.
2014-12-13 01:59:55 -05:00
George Bosilca
5b8616d890 Fix the race condition in endpoint connection initialization. The race
was quite subtle, and only happened on the process with the smallest
guid (as this process will tear down the connection created locally and
replace it with the result of accept). If multiple threads are active in
the system, the deadlock occurs during the recv event deletion as one
thread will hold the recv event lock of the endpoint and try to access
the TCP event base lock, while the other thread will hold the TCP event
base lock while trying to access the recv event lock (in case data is
available on the socket).

The proposed solution let the event callback fail to process the data,
preventing the deadlock and allowing the other thread to always complete
it's job. As the event is not execute the same triggered will trigger
again at the next opportunity, so this solution introduce a minimal
delay in the connection establishement.
2014-12-13 01:45:00 -05:00
Ralph Castain
c52601f0c5 It looks like the guess_len function in our local printf.c has some questionable code in it. Now that we are checking in configure for vsnprintf, take advantage of that check to use the far simpler method if it is available. Given that we no longer support such ancient systems where this might not be available, one suspects the other questionable code may no longer be required - but set that aside for another day. 2014-12-12 17:47:17 -08:00
Ralph Castain
bffb2b7a4b Correct some issues with variables used before being set 2014-12-12 17:23:32 -08:00
Ralph Castain
0630680f36 Two cleanups required for transfer to 1.8.4:
* Use %d format for the topo signature as some systems apparently have problems with %u
* Use correct variable in show_help message
2014-12-12 17:23:32 -08:00
Howard Pritchard
6cf258638a mpool/udreg: minor comment improvement 2014-12-12 14:05:18 -07:00
Nathan Hjelm
38d66272c5 btl/vader: fix compile on SGI UV 2014-12-12 09:09:01 -07:00
Jeff Squyres
e4b3c6f1c4 libfabric psm: fix (void*) dereference
Committed upstream to libfabric as well.
2014-12-11 20:12:13 -08:00
Jeff Squyres
0f28233b35 libfabric: don't use __thread
There's no real reason that this routine should use thread local
storage.  Plus, __thread appears to be a GCC extension.
2014-12-11 14:10:48 -08:00
Rolf vandeVaart
9ee8e1dcf4 With PGI compile we need stdarg.h for va_list define 2014-12-11 16:14:57 -05:00
Jeff Squyres
4551cab6f1 help messages: fix obvious typos 2014-12-11 12:23:33 -08:00
Nathan Hjelm
7e5af9cecf opal_lifo: fix potential race condition when using 128-bit atomics
On x86_64 reading a 128-bit value requires multiple instructions.
Under some conditions if the counted pointer counter is read before
the item pointer the fifo can be left in an inconsistent state. This
commit forces the read of the counter to always be read first.

The fifo does not appear to suffer from the same race.
2014-12-10 12:51:44 -07:00
rolfv
f471b09ae9 Add support for CUDA Unified memory. Basically, add a new flag and disable some
optimizations when that flag is detected.  Lightly reviewed by bosilca.
2014-12-10 05:46:00 -08:00
Artem Polyakov
8ffad75a0a Introduce timing interval measurement facility in timing framework 2014-12-10 16:47:49 +06:00
Nathan Hjelm
52ed5a9bf8 opal_lifo: fix one more potential issue with the new 128-bit lifo atomics
It is possible the compiler can reorder the read of the head item and
the head itself. This could lead to a situation where the item
returned was not really the head item.
2014-12-09 21:48:14 -07:00
Nathan Hjelm
a40fe8311f opal_lifo: add missing memory barriers in 128-bit atomic functions 2014-12-09 19:50:08 -07:00
Jeff Squyres
e6c8bfc201 libfabric: Gah -- also remove the "pragma pop" line
Thanks to Nathan for pointing out that I missed snipping one line in
2f9c69f016 (I removed the trailing
comment, but not the trailing pragma -- oops!).
2014-12-09 14:03:39 -08:00
Jeff Squyres
2f9c69f016 libfabric: use correct C99 notation for var-length array
Nathan pointed out the correct C99 way to notate a variable-length
array in a struct.  This change has now been accepted upstream in
libfabric.
2014-12-09 13:33:15 -08:00
Nathan Hjelm
d0da29351f opal_progress: fix sched_yield check 2014-12-09 14:14:20 -07:00
Jeff Squyres
cd0a54d76f usnic: short term fix to enable builds on non-libfabric platforms
This isn't quite the Right fix yet, because it doesn't address usnic
for external libfabric builds.  I'll fix that separately / later.
2014-12-09 09:19:26 -08:00
Nathan Hjelm
b2b7ecc7c4 Merge pull request #300 from hjelmn/topic/atomic_lifo_fifo
Add opal_fifo_t class and rename opal_atomic_lifo_t to opal_lifo_t
2014-12-09 10:54:50 -06:00
Jeff Squyres
c40fd09d2a libfabric: fix providers to conditionally add libs/flags
Only allow the usnic and PSM providers to add CPPFLAGS and LIBADD
flags when they are going to be built.
2014-12-09 07:15:25 -08:00
Jeff Squyres
45d6f29a27 Merge pull request #310 from yburette/master
libfabric: add optional PSM provider.
2014-12-09 06:39:34 -08:00
Jeff Squyres
6e24a1eb85 usnic: update for libfabric API change
Use FI_ADDR_UNSPEC for posting a receive from an unspecified source.
2014-12-09 06:06:52 -08:00
Jeff Squyres
f5a07f651c libfabric: Open MPI addition to stem a flood of warnings
Add a pragma to not warn about zero-length arrays.  This needs to be
addressed upstream, but for now, do it here.
2014-12-09 06:04:37 -08:00
Jeff Squyres
f331f48796 libfabric: update embedded libfabric to 934a714
Update the embedded copy of libfabric to the github ofiwg/libfabric
repo hash 934a714ca85f1a30a1e384a7d5f714ee962dc253.
2014-12-09 06:03:51 -08:00
Jeff Squyres
09d03a154b libfabric: fix some typos in the usnic configury 2014-12-09 05:52:24 -08:00
Ralph Castain
18d9fdfd8d Restore full topology comparison to support inventory monitoring 2014-12-09 01:33:06 -08:00