Jeff Squyres
c40fd09d2a
libfabric: fix providers to conditionally add libs/flags
...
Only allow the usnic and PSM providers to add CPPFLAGS and LIBADD
flags when they are going to be built.
2014-12-09 07:15:25 -08:00
Jeff Squyres
45d6f29a27
Merge pull request #310 from yburette/master
...
libfabric: add optional PSM provider.
2014-12-09 06:39:34 -08:00
Jeff Squyres
6e24a1eb85
usnic: update for libfabric API change
...
Use FI_ADDR_UNSPEC for posting a receive from an unspecified source.
2014-12-09 06:06:52 -08:00
Jeff Squyres
f5a07f651c
libfabric: Open MPI addition to stem a flood of warnings
...
Add a pragma to not warn about zero-length arrays. This needs to be
addressed upstream, but for now, do it here.
2014-12-09 06:04:37 -08:00
Jeff Squyres
f331f48796
libfabric: update embedded libfabric to 934a714
...
Update the embedded copy of libfabric to the github ofiwg/libfabric
repo hash 934a714ca85f1a30a1e384a7d5f714ee962dc253.
2014-12-09 06:03:51 -08:00
Jeff Squyres
09d03a154b
libfabric: fix some typos in the usnic configury
2014-12-09 05:52:24 -08:00
Ralph Castain
18d9fdfd8d
Restore full topology comparison to support inventory monitoring
2014-12-09 01:33:06 -08:00
Ralph Castain
9b2f8cd840
Add the processor architecture to the topology signature
2014-12-09 01:17:00 -08:00
Ralph Castain
9d5135e6cd
Function definition should use the correct type
2014-12-09 01:04:31 -08:00
Ralph Castain
06e49d0e92
Per contribution from Pascal Deveze of Bull: move opal_set_using_threads earlier in MPI_Init (before datatype init) so the value gets set in time to be properly used.
2014-12-09 00:37:57 -08:00
Howard Pritchard
3a14c8eeff
fix build for cray xc
...
Recent addition of libfabric embdded broke build on Cray XC/XE.
This commit fixes this problem.
2014-12-08 22:21:13 -08:00
Yohann Burette
f90a7b51d2
libfabric: add optional PSM provider.
2014-12-08 16:49:41 -08:00
Ralph Castain
bb529ebd8e
Revise the way we handle hetero nodes as users are finding this (a) a significant surprise, and (b) confusing as to when it is required. So try to automate it a bit by creating a topology "signature" that mpirun can share on the cmd line with the remote daemons, thus allowing them to check to see if they match. This isn't comprehensive of course - for now, it only checks the number of each type of hwloc object on the node. This is good enough to pickup major differences (e.g., where we have different numbers of sockets or assigned core bindings).
...
Retain the hetero-nodes flag for those cases where the user *knows* that there are differences and our automated system isn't good enough to see it.
Will obviously require further refinement as we find out which variances it can detect, and which it cannot.
2014-12-08 15:38:14 -08:00
Jeff Squyres
f1629b66da
Merge pull request #309 from yburette/master
...
libfabric: fix typo in Makefile.am
2014-12-08 13:24:56 -08:00
Yohann Burette
f33a9afd22
libfabric: fix typo in Makefile.am
2014-12-08 13:19:43 -08:00
Jeff Squyres
ac8e9d103c
libfabric: need to make AM_CONDITIONALs always be run
...
Ensure that the usnic-specific AM_CONDITIONAL for the embedded
libfabric is always run.
2014-12-08 11:51:26 -08:00
Jeff Squyres
d64881f040
psm_am.h: add missing file from libfabric snapshot
...
This is just about to be fixed upstream, but "make dist" was not
including this file in the libfabric tarball.
2014-12-08 11:39:08 -08:00
Jeff Squyres
d02756cdbb
libfabric: various configury updates
...
1. Ensure to override CFLAGS properly. Move the setting of CFLAGS outside the AM_CONDITIONAL so that Automake doesn't get confused (because CFLAGS is already set inside an AM_CONDITIONAL -- moving it outside the conditional ensure that this local CFLAGS override trumps all other CFLAGS overrides).
2. Only build libfabric on Linux. Add a little more configury to ensure that we only try to build libfabric on Linux.
3. Remove a dead/unused file
4. Fix typo in condition check
5. Use "false", not "/bin/false"
2014-12-08 11:39:07 -08:00
Jeff Squyres
92818d1fa5
usnic: remove SVN-style $Id$ tokens (and #idents)
...
This commit is also upstream in libfabric.
2014-12-08 11:39:07 -08:00
Jeff Squyres
9547345b18
usnic: fix show_help message
...
Rename a few symbols to use libfabric-friendly names. Fix a show_help
message when fi_av_insert times out.
2014-12-08 11:39:07 -08:00
Jeff Squyres
8e49cc754f
usnic: update to latest libfabric API changes
2014-12-08 11:37:37 -08:00
Jeff Squyres
c4e8d67515
libfabric: sync to upstream libfabric github
...
Bring down the latest from the libfabric github, as of
9d051567c8eb7adc2af89516f94c7d0539152948.
2014-12-08 11:37:37 -08:00
Jeff Squyres
7cd4832a0d
.gitignore: ignore usnic unit test executable
2014-12-08 11:37:37 -08:00
Jeff Squyres
7a96b58882
common verbs: remove usnic-specific code
...
Now that the usnic BTL uses libfabric, we can remove the
usnic-specific code from opal/mca/common/verbs.
2014-12-08 11:37:37 -08:00
Jeff Squyres
984982790a
usnic: convert from verbs to libfabric (yay!)
...
This commit represents the conversion of the usnic BTL from verbs to
libfabric.
For the moment, libfabric is embedded in Open MPI (currently in the
usnic BTL). This is because the libfabric API is still changing, and
also has not yet been released. Ultimately, this embedded copy of
libfabric will likely disappear and the usnic BTL will rely on an
external installation of libfabric.
New configure options:
* --with-libfabric: will cause configure to fail if libfabric support
cannot be built
* --without-libfabric: will prevent libfabric support from being built
* --with-libfabric=DIR: use an external libfabric installation
* --with-libfabric-libdir=LIBDIR: when paired with --with-libfabric=DIR,
use LIBDIR for the libfabric installation library dir
The --with-libnl3[-libdir] arguments are now gone.
2014-12-08 11:37:37 -08:00
elenash
baf32fe480
Merge pull request #308 from elenash/master
...
restored _process_name_print_for_opal function in orte_init: it's requir...
2014-12-08 19:14:36 +03:00
Ralph Castain
b757b3f452
Ensure that the #nodes in the job map gets properly updated when using the sequential mapper. Provide some further diagnostic info to help understand the problem when encountered.
2014-12-08 08:03:53 -08:00
Elena
6cf3925b09
restored _process_name_print_for_opal function in orte_init: it's required for opal output from daemons which never called ompi_init so didn't set opal_process_name_print pointer
2014-12-08 13:13:35 +02:00
Ralph Castain
d6d69e2b13
Get the direct routed component to work with both TCP and USOCK OOB components. We previously had setup the direct component so it would only support direct-launched applications. Thus, all routes went direct between processes. However, if the job had been launched by mpirun, this made no sense - what you wanted instead was to have each app proc talk directly to its daemon, but have the daemons all directly connect to each other.
...
So we need all the routing code for dealing with cross-job communications, lifelines, etc. The HNP will be directly connected to all daemons as they must callback at startup, and so we need to track those children correctly so we know when it is okay to terminate.
We still have to support direct launch, though, as this is the only component we can use in that scenario. So if the app doesn't have daemon URI info, then it must fall back to directly connecting to everything.
2014-12-07 09:11:48 -08:00
Ralph Castain
595740a8e3
Sigh - readd missing headers
2014-12-05 21:54:41 -08:00
Ralph Castain
4a0b4ad5ef
You can't have a variable of the same name as the function...
2014-12-05 21:50:40 -08:00
Ralph Castain
aff1f0ee49
Add missing header files
2014-12-05 19:03:21 -08:00
Ralph Castain
b1bf557024
Fix the hostfile parser so it correctly ignores binding directives that are just integers. Fix the create_dmns function so we don't hang if we can't get an error before creating the job map for an application.
2014-12-05 15:47:09 -08:00
Howard Pritchard
328a408dd0
comment out alps select in cray_xe6 platform
...
This alps selection stuff in the platform file is no longer required.
2014-12-05 13:22:59 -07:00
Nathan Hjelm
23d59b0f5d
Fix one typo in opal_path_nfs.c
2014-12-05 13:13:35 -07:00
Nathan Hjelm
0fc8777aa8
opal_path_nfs test: do not try to test filesystems that can not be stat'd
2014-12-05 13:11:45 -07:00
elenash
9cb3e7d181
Merge pull request #307 from elenash/master
...
these changes fix direct routed component under mpirun; oob tcp and oob ...
2014-12-05 18:53:54 +03:00
Nathan Hjelm
113b6bbdca
opal_stdint.h: fix GCC diagnostic pragma
2014-12-05 07:19:14 -07:00
Jeff Squyres
9b18b4b2d2
opal_path_nfs: enable debugging output
...
Now that "make check" siphons off stdout/stderr to logfiles, it's ok
to have output by default from tests. This test fails often enough
that it's useful to see the diagnostic output.
2014-12-05 03:19:51 -08:00
Elena
af38a762a2
these changes fix direct routed component under mpirun; oob tcp and oob ud are working with direct routed component, but usock doesn't work with direct routed component yet.
2014-12-05 12:38:59 +02:00
Gilles Gouaillardet
32bac600f7
opal: fix a warning caused by the introduction of opal_int128_t type
2014-12-05 12:14:31 +09:00
Nathan Hjelm
a0083ceab4
Adjust cmpxchg16b clobber list
2014-12-04 15:29:28 -07:00
Nathan Hjelm
077f4a2982
Merge pull request #297 from hjelmn/topic/cmpset128
...
Add support for 128-bit compare and swap on x86_64 when available.
2014-12-04 16:05:13 -06:00
Nathan Hjelm
0efe6baf64
Add check for -mcx16 flag for 128-bit compare and swap
...
Some versions of gcc require this flag to be set before the __sync
builtin atomic compare and swap will support 128-bit values. If the
flag is required this check adds the flag to the CFLAGS.
2014-12-04 14:25:53 -07:00
Ralph Castain
c4fd6d1cde
Fix typo
2014-12-04 12:24:35 -08:00
Ralph Castain
c4002a8485
Further cleanups on the LSF integration - the affinity file is apparently always present, but simply empty if affinity wasn't set.
2014-12-04 12:24:35 -08:00
Howard Pritchard
dc311e3a4b
minor ugni config cleanup
2014-12-04 12:16:55 -07:00
Howard Pritchard
53dd5b6379
minor cray pmi config cleanup
2014-12-04 11:04:02 -07:00
Nathan Hjelm
fe787512d8
Add support for __sync builtin compare and swap on 128-bit values
2014-12-04 09:23:51 -07:00
Nathan Hjelm
250f749602
Fix return type of opal_atomic_cmpset_128.
...
The return type will be opal_int128_t after the fetching atomics
changes but for now it is int.
2014-12-04 09:23:51 -07:00