Jeff Squyres
9547345b18
usnic: fix show_help message
...
Rename a few symbols to use libfabric-friendly names. Fix a show_help
message when fi_av_insert times out.
2014-12-08 11:39:07 -08:00
Jeff Squyres
8e49cc754f
usnic: update to latest libfabric API changes
2014-12-08 11:37:37 -08:00
Jeff Squyres
c4e8d67515
libfabric: sync to upstream libfabric github
...
Bring down the latest from the libfabric github, as of
9d051567c8eb7adc2af89516f94c7d0539152948.
2014-12-08 11:37:37 -08:00
Jeff Squyres
7cd4832a0d
.gitignore: ignore usnic unit test executable
2014-12-08 11:37:37 -08:00
Jeff Squyres
7a96b58882
common verbs: remove usnic-specific code
...
Now that the usnic BTL uses libfabric, we can remove the
usnic-specific code from opal/mca/common/verbs.
2014-12-08 11:37:37 -08:00
Jeff Squyres
984982790a
usnic: convert from verbs to libfabric (yay!)
...
This commit represents the conversion of the usnic BTL from verbs to
libfabric.
For the moment, libfabric is embedded in Open MPI (currently in the
usnic BTL). This is because the libfabric API is still changing, and
also has not yet been released. Ultimately, this embedded copy of
libfabric will likely disappear and the usnic BTL will rely on an
external installation of libfabric.
New configure options:
* --with-libfabric: will cause configure to fail if libfabric support
cannot be built
* --without-libfabric: will prevent libfabric support from being built
* --with-libfabric=DIR: use an external libfabric installation
* --with-libfabric-libdir=LIBDIR: when paired with --with-libfabric=DIR,
use LIBDIR for the libfabric installation library dir
The --with-libnl3[-libdir] arguments are now gone.
2014-12-08 11:37:37 -08:00
elenash
baf32fe480
Merge pull request #308 from elenash/master
...
restored _process_name_print_for_opal function in orte_init: it's requir...
2014-12-08 19:14:36 +03:00
Ralph Castain
b757b3f452
Ensure that the #nodes in the job map gets properly updated when using the sequential mapper. Provide some further diagnostic info to help understand the problem when encountered.
2014-12-08 08:03:53 -08:00
Elena
6cf3925b09
restored _process_name_print_for_opal function in orte_init: it's required for opal output from daemons which never called ompi_init so didn't set opal_process_name_print pointer
2014-12-08 13:13:35 +02:00
Ralph Castain
d6d69e2b13
Get the direct routed component to work with both TCP and USOCK OOB components. We previously had setup the direct component so it would only support direct-launched applications. Thus, all routes went direct between processes. However, if the job had been launched by mpirun, this made no sense - what you wanted instead was to have each app proc talk directly to its daemon, but have the daemons all directly connect to each other.
...
So we need all the routing code for dealing with cross-job communications, lifelines, etc. The HNP will be directly connected to all daemons as they must callback at startup, and so we need to track those children correctly so we know when it is okay to terminate.
We still have to support direct launch, though, as this is the only component we can use in that scenario. So if the app doesn't have daemon URI info, then it must fall back to directly connecting to everything.
2014-12-07 09:11:48 -08:00
Ralph Castain
595740a8e3
Sigh - readd missing headers
2014-12-05 21:54:41 -08:00
Ralph Castain
4a0b4ad5ef
You can't have a variable of the same name as the function...
2014-12-05 21:50:40 -08:00
Ralph Castain
aff1f0ee49
Add missing header files
2014-12-05 19:03:21 -08:00
Ralph Castain
b1bf557024
Fix the hostfile parser so it correctly ignores binding directives that are just integers. Fix the create_dmns function so we don't hang if we can't get an error before creating the job map for an application.
2014-12-05 15:47:09 -08:00
Howard Pritchard
328a408dd0
comment out alps select in cray_xe6 platform
...
This alps selection stuff in the platform file is no longer required.
2014-12-05 13:22:59 -07:00
Nathan Hjelm
23d59b0f5d
Fix one typo in opal_path_nfs.c
2014-12-05 13:13:35 -07:00
Nathan Hjelm
0fc8777aa8
opal_path_nfs test: do not try to test filesystems that can not be stat'd
2014-12-05 13:11:45 -07:00
elenash
9cb3e7d181
Merge pull request #307 from elenash/master
...
these changes fix direct routed component under mpirun; oob tcp and oob ...
2014-12-05 18:53:54 +03:00
Nathan Hjelm
113b6bbdca
opal_stdint.h: fix GCC diagnostic pragma
2014-12-05 07:19:14 -07:00
Jeff Squyres
9b18b4b2d2
opal_path_nfs: enable debugging output
...
Now that "make check" siphons off stdout/stderr to logfiles, it's ok
to have output by default from tests. This test fails often enough
that it's useful to see the diagnostic output.
2014-12-05 03:19:51 -08:00
Elena
af38a762a2
these changes fix direct routed component under mpirun; oob tcp and oob ud are working with direct routed component, but usock doesn't work with direct routed component yet.
2014-12-05 12:38:59 +02:00
Gilles Gouaillardet
32bac600f7
opal: fix a warning caused by the introduction of opal_int128_t type
2014-12-05 12:14:31 +09:00
Nathan Hjelm
a0083ceab4
Adjust cmpxchg16b clobber list
2014-12-04 15:29:28 -07:00
Nathan Hjelm
077f4a2982
Merge pull request #297 from hjelmn/topic/cmpset128
...
Add support for 128-bit compare and swap on x86_64 when available.
2014-12-04 16:05:13 -06:00
Nathan Hjelm
0efe6baf64
Add check for -mcx16 flag for 128-bit compare and swap
...
Some versions of gcc require this flag to be set before the __sync
builtin atomic compare and swap will support 128-bit values. If the
flag is required this check adds the flag to the CFLAGS.
2014-12-04 14:25:53 -07:00
Ralph Castain
c4fd6d1cde
Fix typo
2014-12-04 12:24:35 -08:00
Ralph Castain
c4002a8485
Further cleanups on the LSF integration - the affinity file is apparently always present, but simply empty if affinity wasn't set.
2014-12-04 12:24:35 -08:00
Howard Pritchard
dc311e3a4b
minor ugni config cleanup
2014-12-04 12:16:55 -07:00
Howard Pritchard
53dd5b6379
minor cray pmi config cleanup
2014-12-04 11:04:02 -07:00
Nathan Hjelm
fe787512d8
Add support for __sync builtin compare and swap on 128-bit values
2014-12-04 09:23:51 -07:00
Nathan Hjelm
250f749602
Fix return type of opal_atomic_cmpset_128.
...
The return type will be opal_int128_t after the fetching atomics
changes but for now it is int.
2014-12-04 09:23:51 -07:00
Nathan Hjelm
b1632dfb3c
Define opal_int128_t type if a 128-bit integer is available.
...
There currently is no standard support for 128-bit integer types. Any use
of the __int128 and int128_t types can lead to warnings from the compiler
when using -Wpedantic. Additionally, some compilers may support __int128
and other may support int128_t. This commit addresses both issues by
defining opal_int128_t if there is a supported 128-bit type. In the
case of GCC a pragma has been added to suppress warnings about __int128
not being a standard C type.
2014-12-04 09:23:51 -07:00
Nathan Hjelm
b2b58b31a2
Add support for 128-bit compare and swap on x86_64 when available.
...
A 128-bit compare-and-swap will enable a better atomic lifo implementation
that uses the pointer + counter method to avoid ABA issues. This commit
adds configury to check for the instruction (cmpxchg16b) and adds an
implementation that uses the __int128 type available in C99.
2014-12-04 08:53:28 -07:00
Jeff Squyres
a71b5dd5c7
debuggers: update warning messages when types not found
...
Fixes #302 .
2014-12-04 03:01:51 -08:00
George Bosilca
04a4cbd77a
Fix the clock_gettime monotonic timer. Thanks to Gilles for the
...
first sketch of the patch.
2014-12-04 00:20:56 -05:00
Ralph Castain
c88f181efe
Fix singleton comm-spawn, yet again. The new grpcomm collectives require a complete knowledge of every active proc in the system in case they participate in a collective. So ensure we pass the required job info when we spawn new daemons, and construct the necessary connections to allow grpcomm to operate.
2014-12-03 18:11:17 -08:00
Jeff Squyres
983bd49f11
opal_timer_require_monotinic: change to bool / level 5
2014-12-03 17:09:43 -08:00
Jeff Squyres
1dd68d48a8
MPI_Wtime.3: give further explanation about high-res timers
2014-12-03 17:07:42 -08:00
Jeff Squyres
8880b070b8
Merge pull request #295 from jsquyres/topic/bosilca-accurate-timers
...
Topic/bosilca accurate timers
2014-12-03 19:46:14 -05:00
Jeff Squyres
cf35e0c28c
timers: fix 32 bit compile of timer
2014-12-03 16:43:33 -08:00
Howard Pritchard
c67afadcfc
Merge pull request #289 from hppritcha/topic/remove_pmi
...
Topic/remove pmi
2014-12-03 16:58:35 -07:00
Nathan Hjelm
f989fe27b8
btl/vader: workaround to make jenkins happy
2014-12-03 15:51:58 -07:00
rhc54
148c5d8b27
Merge pull request #303 from jsquyres/pr/more-lsf-configury-fixes
...
Wrapper compiler static library fixes
2014-12-03 14:47:40 -08:00
Jeff Squyres
4e8ea6f716
tm: add proper libraries for static builds
...
Ensure to set the proper WRAPPER_EXTRA flags for static builds.
2014-12-03 13:32:56 -08:00
Jeff Squyres
a3af7d6dbb
Revert "lsf configury: add dependent libraries for static linking"
...
This reverts commit 56cfa90ddaaf9939926ff54b5f2f34e32254681e.
2014-12-03 13:32:56 -08:00
Jeff Squyres
92c2ff91ec
Revert "Cleanup static build requirements by adding the wrapper flags back to the component configure.m4's. Minor cleanup of the lsf configure logic."
...
This reverts commit open-mpi/ompi@32bf0e7b7e .
2014-12-03 13:15:20 -08:00
Todd Kordenbrock
c0c680bccb
Portals4 BTL: Do not disqualify if a peer does not put Portals4 BTL modex info
...
If OPAL_MODEX_RECV() returns OPAL_ERR_NOT_FOUND, the peer didn't
send any Portals4 BTL info. This is not a fatal error. Instead of
disqualifying the Portals4 BTL just ignore that peer.
@jsquyres reported this in #194 .
2014-12-03 14:22:10 -06:00
Ralph Castain
54c955c92d
Fix a race condition that only appears to be affecting certain setups. The pmix.finalize function closes the file descriptor to the server, which then triggers the errhandler callback. Since the errmgr is about to be unloaded, it might be getting hit.
2014-12-03 12:19:00 -08:00
Howard Pritchard
c75dccede1
pmix/cray: remove finalize call from comp close
...
The finalize call in component close method is
no longer being matched by an equivalent init call,
so remove this call in the close method.
2014-12-03 09:44:18 -07:00
Howard Pritchard
666344a081
orte/mca/common/alps: fix configure file
...
Fix configure file for alps to actually check for
alps being available.
Also include stdio.h explicitly in common_alps.c
2014-12-03 09:44:18 -07:00