Gilles Gouaillardet
f6da257477
configury: test external hwloc version is 1.8 or greater
...
hwloc_topology_dup is only available from hwloc 1.8
2014-12-22 13:42:38 +09:00
Jeff Squyres
40dd4c5b76
configury: manually remove some stamp-h? files
...
Due to what might be a bug in Automake, we need to remove stamp-h?
files manually. See
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19418 .
2014-12-20 08:32:57 -08:00
Jeff Squyres
d5b3e5802e
libfabric configury: add more tests
...
Properly test for some dependent libraries; don't just assume
elsewhere in Open MPI's configury will find those libraries. Also
consolidate some CPPFLAGS and clarify some comments.
2014-12-20 08:32:47 -08:00
Jeff Squyres
012e008649
libfabric configury: make AC_CONFIG_FILES be unconditional
...
Also add the generated config.h file to .gitignore.
2014-12-20 08:32:47 -08:00
Jeff Squyres
45ef0352d7
libfabric: do a proper check for intrinsic atomics
2014-12-20 08:32:46 -08:00
Jeff Squyres
ff1364cbe4
Revert "libfabric: add missing header file"
...
That wasn't a missing header file; in fact, it should have been
.gitignored!
This reverts commit 35bf5fc60c
.
2014-12-19 17:39:30 -08:00
Jeff Squyres
35bf5fc60c
libfabric: add missing header file
2014-12-19 17:33:11 -08:00
Jeff Squyres
e0f660cb9e
libfabric: fix clang compile error in usnic provider
...
From ofiwg/libfabric@0078c93ae4
2014-12-19 15:45:16 -08:00
Jeff Squyres
75797c4f30
libfabric: update embedded libfabric configury
...
To support the newly-copied libfabric downloaded from github
ofiwg/libfabric@8da3957de3 .
2014-12-19 14:45:30 -08:00
Jeff Squyres
e2362988a9
libfabric: update to ofiwg/libfabric@8da3957de3
...
Pull down a new embedded copy of libfabric from
https://github.com/ofiwg/libfabric .
2014-12-19 14:45:21 -08:00
Howard Pritchard
91b0d03bf2
pmix/cray: remove dead code
2014-12-19 13:08:23 -08:00
Ralph Castain
123fdd603f
If we are using hwthread cpus, then default to binding there, letting the user override to whatever they want
2014-12-19 08:04:28 -08:00
Rolf vandeVaart
26482db736
Bump up max send size. Gives much better performance for GPU transfers while only decreasing host transfers by a small amount.
2014-12-18 13:22:58 -08:00
Jeff Squyres
c621d1e622
libfabric: don't LIBADD the common library in the static case
...
Adding the libfabric common library in the --disable-dlopen case will
result in duplicate symbols.
2014-12-18 11:04:08 -08:00
Jeff Squyres
140bb3d421
hwloc configure: fix typo -- add missing $
...
Arrgh! Missed a "$" in the last commit, making the test always
false.
2014-12-18 10:25:43 -08:00
Jeff Squyres
be6d46490f
hwloc: only add CPPFLAGS if hwloc is actually being built
...
As pointed out by @ggouaillardet, we were adding some unnecessary -I
flags to CPPFLAFGS when --without-hwloc was being used. This commit
slightly updates the hwloc191 component configury to only add such
things when the component is, in fact, going to be
compiled/installed.
2014-12-18 08:56:49 -08:00
Jeff Squyres
c205c70f39
usnic libfabric: remove useless "config.h" includes
...
This change was also committed upstream in libfabric.
2014-12-18 08:47:59 -08:00
Jeff Squyres
269d7f9713
openib: don't use opal_using_threads() in component_init
...
Use the flag that was passed in, instead.
2014-12-17 15:08:43 -08:00
Jeff Squyres
c1b43b6753
libfabric: the LIBADD should be unconditional
...
The LIBADD for the common libfabric library does not belong down in
the providers; it needs to be set when the libfabric core itself
decides to build.
2014-12-17 14:02:08 -08:00
Jeff Squyres
f1a5d3a90d
configury: propagate a libtool shared lib version for libfabric
2014-12-17 13:36:01 -08:00
Jeff Squyres
d6f059f538
configury: add some descriptive output messages in configure
...
Ensure that the ofi MTL and the usnic BTL have good descriptive output
messages in configure.
2014-12-17 13:36:01 -08:00
Jeff Squyres
6edc19d78d
libfabric: ensure that shell variables are initialized
...
Ensure that the <provider>_happy shell variables are initialized to
0. Without this, the --without-libfabric case would leave them
initialized, resulting in "test: -eq operator expecting a value" kinds
of errors.
2014-12-17 13:36:01 -08:00
Rolf vandeVaart
f55de452ab
Change the way we register the sm memory pool with CUDA. Rather than just registering local free lists, register the entire pool as the local process does not know which memory the remote processes are using for free lists. Fixes performance problem we were seeing with copying out of memory (since host piece was not pinned).
2014-12-17 14:21:34 -05:00
George Bosilca
830df07202
Fix the indentation.
2014-12-16 16:07:42 -05:00
George Bosilca
146ab96e29
These variables are now unnecessary.
2014-12-16 16:05:00 -05:00
Aurélien Bouteiller
ee3b090316
The fallback case when yama is not installed was not correct in CMA vader
2014-12-16 14:39:14 -05:00
Aurélien Bouteiller
0bf860ef02
indentation
2014-12-16 14:22:26 -05:00
Jeff Squyres
95da4a5a0e
usnic: no longer use opal_using_threads()
...
Instead, use the flag that is passed in.
2014-12-16 08:49:01 -08:00
George Bosilca
357daa834e
Stay on the safe side: Only one thread is allowed
...
to handle an event_base.
2014-12-15 23:19:51 -05:00
George Bosilca
2fec570fe7
There is no need to keep track of these events. They are scheduled
...
as triggers in libevent, so one bookkepping should be enough.
2014-12-15 22:35:29 -05:00
George Bosilca
46baab350c
The event is automatically deleted by default.
2014-12-15 21:59:20 -05:00
George Bosilca
b01abfa0d7
Don't over-do it!
2014-12-15 21:33:32 -05:00
George Bosilca
f87a4b691b
Solve another handshake problem, where one threads was calling del_event
...
while cleaning up after receiving a zero byte on the connect socket
(localyy started connection), while another was trying to accept a
new connection from the same peer. Create a zero-timed event and
delocalize the accept into a timer_event.
Add support for registering an error callback, that can be used when a
connection is discovered as failed during the initialization process.
2014-12-15 20:27:32 -05:00
George Bosilca
e20413c885
Rearrange the code to remove a compiler complaint about
...
the missing return from a non-void function.
2014-12-15 15:42:57 -05:00
Ralph Castain
573a574a3c
Remove an unused dstore type that was redundant with another one. Define a corresponding PMIX_NODE_ID type (contains the vpid of the daemon hosting the proc) and ensure that the PMIx server includes that info in its process map
2014-12-15 12:11:13 -08:00
Ralph Castain
9658256a98
Restore the passing of the complete job map to the local proc on first get_attr so the info can be used by the MPI layer without continual calls back to the server. We'll find a more memory efficient method later.
2014-12-13 18:44:09 -08:00
George Bosilca
2edbe16c47
Add the necessary infrastructure to allow the dumping of all TCP
...
informations related to an endpoint (status and all pending fragments).
Do some minor space cleanup.
2014-12-13 01:59:55 -05:00
George Bosilca
5b8616d890
Fix the race condition in endpoint connection initialization. The race
...
was quite subtle, and only happened on the process with the smallest
guid (as this process will tear down the connection created locally and
replace it with the result of accept). If multiple threads are active in
the system, the deadlock occurs during the recv event deletion as one
thread will hold the recv event lock of the endpoint and try to access
the TCP event base lock, while the other thread will hold the TCP event
base lock while trying to access the recv event lock (in case data is
available on the socket).
The proposed solution let the event callback fail to process the data,
preventing the deadlock and allowing the other thread to always complete
it's job. As the event is not execute the same triggered will trigger
again at the next opportunity, so this solution introduce a minimal
delay in the connection establishement.
2014-12-13 01:45:00 -05:00
Ralph Castain
bffb2b7a4b
Correct some issues with variables used before being set
2014-12-12 17:23:32 -08:00
Ralph Castain
0630680f36
Two cleanups required for transfer to 1.8.4:
...
* Use %d format for the topo signature as some systems apparently have problems with %u
* Use correct variable in show_help message
2014-12-12 17:23:32 -08:00
Howard Pritchard
6cf258638a
mpool/udreg: minor comment improvement
2014-12-12 14:05:18 -07:00
Nathan Hjelm
38d66272c5
btl/vader: fix compile on SGI UV
2014-12-12 09:09:01 -07:00
Jeff Squyres
e4b3c6f1c4
libfabric psm: fix (void*) dereference
...
Committed upstream to libfabric as well.
2014-12-11 20:12:13 -08:00
Jeff Squyres
0f28233b35
libfabric: don't use __thread
...
There's no real reason that this routine should use thread local
storage. Plus, __thread appears to be a GCC extension.
2014-12-11 14:10:48 -08:00
Jeff Squyres
4551cab6f1
help messages: fix obvious typos
2014-12-11 12:23:33 -08:00
rolfv
f471b09ae9
Add support for CUDA Unified memory. Basically, add a new flag and disable some
...
optimizations when that flag is detected. Lightly reviewed by bosilca.
2014-12-10 05:46:00 -08:00
Jeff Squyres
e6c8bfc201
libfabric: Gah -- also remove the "pragma pop" line
...
Thanks to Nathan for pointing out that I missed snipping one line in
2f9c69f016
(I removed the trailing
comment, but not the trailing pragma -- oops!).
2014-12-09 14:03:39 -08:00
Jeff Squyres
2f9c69f016
libfabric: use correct C99 notation for var-length array
...
Nathan pointed out the correct C99 way to notate a variable-length
array in a struct. This change has now been accepted upstream in
libfabric.
2014-12-09 13:33:15 -08:00
Jeff Squyres
cd0a54d76f
usnic: short term fix to enable builds on non-libfabric platforms
...
This isn't quite the Right fix yet, because it doesn't address usnic
for external libfabric builds. I'll fix that separately / later.
2014-12-09 09:19:26 -08:00
Nathan Hjelm
b2b7ecc7c4
Merge pull request #300 from hjelmn/topic/atomic_lifo_fifo
...
Add opal_fifo_t class and rename opal_atomic_lifo_t to opal_lifo_t
2014-12-09 10:54:50 -06:00
Jeff Squyres
c40fd09d2a
libfabric: fix providers to conditionally add libs/flags
...
Only allow the usnic and PSM providers to add CPPFLAGS and LIBADD
flags when they are going to be built.
2014-12-09 07:15:25 -08:00
Jeff Squyres
45d6f29a27
Merge pull request #310 from yburette/master
...
libfabric: add optional PSM provider.
2014-12-09 06:39:34 -08:00
Jeff Squyres
6e24a1eb85
usnic: update for libfabric API change
...
Use FI_ADDR_UNSPEC for posting a receive from an unspecified source.
2014-12-09 06:06:52 -08:00
Jeff Squyres
f5a07f651c
libfabric: Open MPI addition to stem a flood of warnings
...
Add a pragma to not warn about zero-length arrays. This needs to be
addressed upstream, but for now, do it here.
2014-12-09 06:04:37 -08:00
Jeff Squyres
f331f48796
libfabric: update embedded libfabric to 934a714
...
Update the embedded copy of libfabric to the github ofiwg/libfabric
repo hash 934a714ca85f1a30a1e384a7d5f714ee962dc253.
2014-12-09 06:03:51 -08:00
Jeff Squyres
09d03a154b
libfabric: fix some typos in the usnic configury
2014-12-09 05:52:24 -08:00
Ralph Castain
18d9fdfd8d
Restore full topology comparison to support inventory monitoring
2014-12-09 01:33:06 -08:00
Ralph Castain
9b2f8cd840
Add the processor architecture to the topology signature
2014-12-09 01:17:00 -08:00
Howard Pritchard
3a14c8eeff
fix build for cray xc
...
Recent addition of libfabric embdded broke build on Cray XC/XE.
This commit fixes this problem.
2014-12-08 22:21:13 -08:00
Yohann Burette
f90a7b51d2
libfabric: add optional PSM provider.
2014-12-08 16:49:41 -08:00
Ralph Castain
bb529ebd8e
Revise the way we handle hetero nodes as users are finding this (a) a significant surprise, and (b) confusing as to when it is required. So try to automate it a bit by creating a topology "signature" that mpirun can share on the cmd line with the remote daemons, thus allowing them to check to see if they match. This isn't comprehensive of course - for now, it only checks the number of each type of hwloc object on the node. This is good enough to pickup major differences (e.g., where we have different numbers of sockets or assigned core bindings).
...
Retain the hetero-nodes flag for those cases where the user *knows* that there are differences and our automated system isn't good enough to see it.
Will obviously require further refinement as we find out which variances it can detect, and which it cannot.
2014-12-08 15:38:14 -08:00
Yohann Burette
f33a9afd22
libfabric: fix typo in Makefile.am
2014-12-08 13:19:43 -08:00
Jeff Squyres
ac8e9d103c
libfabric: need to make AM_CONDITIONALs always be run
...
Ensure that the usnic-specific AM_CONDITIONAL for the embedded
libfabric is always run.
2014-12-08 11:51:26 -08:00
Jeff Squyres
d64881f040
psm_am.h: add missing file from libfabric snapshot
...
This is just about to be fixed upstream, but "make dist" was not
including this file in the libfabric tarball.
2014-12-08 11:39:08 -08:00
Jeff Squyres
d02756cdbb
libfabric: various configury updates
...
1. Ensure to override CFLAGS properly. Move the setting of CFLAGS outside the AM_CONDITIONAL so that Automake doesn't get confused (because CFLAGS is already set inside an AM_CONDITIONAL -- moving it outside the conditional ensure that this local CFLAGS override trumps all other CFLAGS overrides).
2. Only build libfabric on Linux. Add a little more configury to ensure that we only try to build libfabric on Linux.
3. Remove a dead/unused file
4. Fix typo in condition check
5. Use "false", not "/bin/false"
2014-12-08 11:39:07 -08:00
Jeff Squyres
92818d1fa5
usnic: remove SVN-style $Id$ tokens (and #idents)
...
This commit is also upstream in libfabric.
2014-12-08 11:39:07 -08:00
Jeff Squyres
9547345b18
usnic: fix show_help message
...
Rename a few symbols to use libfabric-friendly names. Fix a show_help
message when fi_av_insert times out.
2014-12-08 11:39:07 -08:00
Jeff Squyres
8e49cc754f
usnic: update to latest libfabric API changes
2014-12-08 11:37:37 -08:00
Jeff Squyres
c4e8d67515
libfabric: sync to upstream libfabric github
...
Bring down the latest from the libfabric github, as of
9d051567c8eb7adc2af89516f94c7d0539152948.
2014-12-08 11:37:37 -08:00
Jeff Squyres
7a96b58882
common verbs: remove usnic-specific code
...
Now that the usnic BTL uses libfabric, we can remove the
usnic-specific code from opal/mca/common/verbs.
2014-12-08 11:37:37 -08:00
Jeff Squyres
984982790a
usnic: convert from verbs to libfabric (yay!)
...
This commit represents the conversion of the usnic BTL from verbs to
libfabric.
For the moment, libfabric is embedded in Open MPI (currently in the
usnic BTL). This is because the libfabric API is still changing, and
also has not yet been released. Ultimately, this embedded copy of
libfabric will likely disappear and the usnic BTL will rely on an
external installation of libfabric.
New configure options:
* --with-libfabric: will cause configure to fail if libfabric support
cannot be built
* --without-libfabric: will prevent libfabric support from being built
* --with-libfabric=DIR: use an external libfabric installation
* --with-libfabric-libdir=LIBDIR: when paired with --with-libfabric=DIR,
use LIBDIR for the libfabric installation library dir
The --with-libnl3[-libdir] arguments are now gone.
2014-12-08 11:37:37 -08:00
Nathan Hjelm
20c6eb5237
Rename opal_atomic_lifo_t to opal_lifo_t and improve interface
...
- Rename opal_atomic_lifo_t to opal_lifo_t to reflect both atomic and
non-atomic usage. Added new routines (opal_lifo_*_st) for non-atomic
usage as well as routines conditioned off opal_using_threads(). The
atomic versions are always thread safe and the non-atomic are always
not thread safe.
- Add a new atomic lifo implementation that makes use of 128-bit
compare-and-swap. The new implementation should scale better with
larger numbers of threads.
- Add threading unit test for opal_lifo_t.
2014-12-04 15:30:02 -07:00
George Bosilca
04a4cbd77a
Fix the clock_gettime monotonic timer. Thanks to Gilles for the
...
first sketch of the patch.
2014-12-04 00:20:56 -05:00
Jeff Squyres
983bd49f11
opal_timer_require_monotinic: change to bool / level 5
2014-12-03 17:09:43 -08:00
Jeff Squyres
8880b070b8
Merge pull request #295 from jsquyres/topic/bosilca-accurate-timers
...
Topic/bosilca accurate timers
2014-12-03 19:46:14 -05:00
Howard Pritchard
c67afadcfc
Merge pull request #289 from hppritcha/topic/remove_pmi
...
Topic/remove pmi
2014-12-03 16:58:35 -07:00
Nathan Hjelm
f989fe27b8
btl/vader: workaround to make jenkins happy
2014-12-03 15:51:58 -07:00
Todd Kordenbrock
c0c680bccb
Portals4 BTL: Do not disqualify if a peer does not put Portals4 BTL modex info
...
If OPAL_MODEX_RECV() returns OPAL_ERR_NOT_FOUND, the peer didn't
send any Portals4 BTL info. This is not a fatal error. Instead of
disqualifying the Portals4 BTL just ignore that peer.
@jsquyres reported this in #194 .
2014-12-03 14:22:10 -06:00
Howard Pritchard
c75dccede1
pmix/cray: remove finalize call from comp close
...
The finalize call in component close method is
no longer being matched by an equivalent init call,
so remove this call in the close method.
2014-12-03 09:44:18 -07:00
Ralph Castain
d9b23c1054
Increment the init_count in the Slurm pmix components so they correctly respond to calls to pmix.initialized
2014-12-02 20:20:29 -08:00
Ralph Castain
cb15cc06e1
Minor changes per Jeff's request on PR for 1.8.4
2014-12-02 19:54:10 -08:00
George Bosilca
a35d2b9fb5
Update copyrights and mark ia32 timers as non-monotonic.
2014-12-01 14:03:54 -08:00
George Bosilca
5277fd5aa2
Various cleanups.
2014-12-01 14:03:47 -08:00
George Bosilca
00300f464d
Add support for clock_gettime on Linux. Allow the user to
...
request a monotonic timer via MCA parameters.
2014-12-01 14:03:40 -08:00
Ralph Castain
960ef34988
Ensure the LSF ras adds the hosts to the allocation. Correctly handle the semi-colon vs comma situation in hwloc slot_lists
2014-11-30 14:37:37 -08:00
Ralph Castain
3f9d9ae8b6
Provide tighter LSF integration by correctly handling scenarios where the user has asked LSF to assign bindings. Fix a couple of typos in lex parser definitions. Tell hostfile parser to ignore binding designations in hostfiles. Add an attribute to indicate that cpusets were provided as physical cpu ids.
...
Once validated, a version of this will be backported to the v1.8.4 release.
2014-11-30 11:50:31 -08:00
George Bosilca
dee243c58d
ompi_proc_finalize has an interesting side effect. A proc is
...
inserted in the ompi_proc_list as soon as it is created and it
is removed only upon the call to the destructor. In ompi_proc_finalize
we loop over all procs in ompi_proc_finalize and release them once.
However, as a proc is not removed from this list right away, we
decrease the ref count for each proc until it reach zero and the
proc is finally removed. Thus, we cannot clean the BML/BTL after
the call the ompi_proc_finalize.
A quick fix is to delay the call to ompi_proc_finalize until all
other frameworks have been finalized, and then the behavior
depicted above will give the expected outcome.
2014-11-28 18:26:36 -05:00
bosilca
8cae899a42
Merge pull request #285 from bosilca/master
...
Reenable high accuracy timers
2014-11-25 17:09:34 -05:00
Gilles Gouaillardet
578fe41788
fix hangs introduced by previous commit a6744b8177
2014-11-25 17:50:44 +09:00
Gilles Gouaillardet
a6744b8177
fix misc memory leaks specific to the master
2014-11-25 13:52:10 +09:00
George Bosilca
261684858f
Improved support for OSX timers.
2014-11-24 17:15:49 -05:00
George Bosilca
1877dfd0df
On Darwin make sure the field we expect to be 0 is indeed 0.
2014-11-24 14:16:36 -05:00
George Bosilca
766cfece36
Remove useless header.
2014-11-24 00:57:54 -05:00
George Bosilca
5f49a11b29
Minor cleanups.
2014-11-24 00:44:50 -05:00
George Bosilca
e27759956f
Allow the use of the optimized used timers
2014-11-23 23:51:13 -05:00
George Bosilca
324e43909d
Enable CUDA support on Mac OS X.
2014-11-20 13:51:10 -06:00
Gilles Gouaillardet
758f7ab768
Revert "btl/vader: use FRAG_ALLOC_USER when single_copy_mechanism is VADER_NONE"
...
as discussed with @hjelmn in open-mpi/ompi-release#86
This reverts commit d2d7f39a4b
.
2014-11-20 16:04:55 +09:00
Nathan Hjelm
1b564f62bd
Revert "Merge pull request #275 from hjelmn/btlmod"
...
This reverts commit ccaecf0fd6
, reversing
changes made to 6a19bf85dd
.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
b1f9569b7d
Revert "btl/openib: fix warnings"
...
This reverts commit 6e6c786b49
.
2014-11-19 23:16:16 -07:00
Nathan Hjelm
6e6c786b49
btl/openib: fix warnings
2014-11-19 15:57:01 -07:00