1
1

3623 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
291afe502f Add the libs required for PMIx to support static builds
Remove unneeded CPPFLAGS
2015-09-08 10:21:06 -07:00
Ralph Castain
e6add86e4f Deal with connect/accept between two jobs from different mpirun's. Somewhat optimize connect/accept by using MPI bcast to distribute the participants instead of another PMIx lookup. Cleanup some Coverity issues. 2015-09-07 09:19:24 -07:00
Ralph Castain
37c3ed68e7 Cleanup connect/disconnect and bring comm_spawn back online! 2015-09-06 10:27:39 -07:00
Jeff Squyres
f782a7640e usnic: minor re-order of Makefile.am sources
Put the hwloc.c file alphabetically in the list.
2015-09-05 05:02:00 -07:00
rhc54
665b30376a Merge pull request #868 from rhc54/topic/hwloc
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 17:58:07 -07:00
Ralph Castain
2ecbbc84e7 Hide a symbol that is only used in one file and is not properly prefixed 2015-09-04 17:08:24 -07:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
rhc54
d45ccda813 Merge pull request #866 from rhc54/topic/updatepmix
Update PMIx support
2015-09-04 11:09:36 -07:00
Ralph Castain
f6948c2bb4 Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working 2015-09-04 10:07:17 -07:00
Rolf vandeVaart
ebfd00b66e While debugging user problems, these extra verbosity statements would be helpful 2015-09-03 17:15:39 -04:00
Howard Pritchard
0557beee22 Merge pull request #864 from hppritcha/topic/pmix_cray_more_funcs
pmix/cray: more stubs plus a get_version method
2015-09-03 14:52:46 -06:00
Howard Pritchard
6e7345c790 pmix/cray: more stubs plus a get_version method
Add more stubs to reduce likelihood of future
mysterious segfaults if some of the newer pmix
funcs start to get used within ompi.

Add a get_version to return the version of the
Cray PMI library being used, since the Cray PMI
library actually has a function to get that info.

Be more accurate about which functions have a hope
of being implemented using Cray PMI and those which
never will.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-09-03 12:51:50 -07:00
Ralph Castain
a772b46c15 Bring the MPI_Publish and friends online 2015-09-02 12:04:07 -07:00
Ralph Castain
95dbd70f44 Sync to PMIx 1.1, sha- 51479b0 2015-09-01 14:09:25 -07:00
Rolf vandeVaart
30b1a6e003 Merge pull request #836 from rolfv/pr/fix-cuda-war
Add config code to check for need of workaround. Add runtime way to turn oiff just in case.
2015-09-01 15:05:29 -04:00
Nathan Hjelm
f926796e57 Merge pull request #828 from hjelmn/openib_thread_fix
openib thread fixes
2015-09-01 09:12:50 -06:00
rhc54
d8cb3fe705 Merge pull request #852 from rhc54/topic/pmix
Sync to PMIx tarball - includes:
2015-09-01 06:54:34 -07:00
Gilles Gouaillardet
6dfa996760 configury: fix a typo in opal/mca/pmix/pmix1xx/configure.m4 2015-09-01 14:59:07 +09:00
Ralph Castain
c1bbd7bc78 Sync to PMIx tarball - includes:
* update to configury to silence ident messages (thanks Gilles!)
* fix for warnings Jeff saw when get didn't find the requested data
* fix for Mac OSX operations
2015-08-31 21:51:02 -07:00
rhc54
2d3c6af8ad Merge pull request #851 from rhc54/topic/copyfix
Only copy the value across if the "get" operation succeeded
2015-08-31 19:51:13 -07:00
Ralph Castain
ef69958e01 Only copy the value across if the "get" operation succeeded 2015-08-31 17:11:26 -07:00
Jeff Squyres
8558458bb9 usnic: adjust for new PMIX argument type 2015-08-31 14:55:58 -07:00
Rolf vandeVaart
54ab0d1a51 Add config code to check for need of workaround. Add runtime way to turn it off just in case 2015-08-31 17:18:47 -04:00
Nathan Hjelm
3c34f6f25c Merge pull request #517 from hjelmn/class_fix
opal/class: enable use of opal classes after opal_class_finalize
2015-08-31 12:13:58 -07:00
Nathan Hjelm
faf06edb5b Merge pull request #824 from hjelmn/opal_mutex_mod
opal/mutex: remove unnecessary ()s from OPAL_SCOPED_LOCK macro
2015-08-31 12:08:25 -07:00
rhc54
6e78e2c89b Merge pull request #846 from rhc54/topic/pmix
Sync to PMIx tarball
2015-08-31 08:53:07 -07:00
Nathan Hjelm
2aab6ad90f Merge pull request #827 from hjelmn/recursive_locks
Add support for recursive locks (revisited)
2015-08-31 07:52:23 -07:00
Ralph Castain
a3842af709 Sync to PMIx tarball 2015-08-31 07:47:46 -07:00
Ralph Castain
bcabd1e282 Sync with PMIx tarball, bringing across the warning fixes pointed out by Gilles 2015-08-30 21:13:55 -07:00
Gilles Gouaillardet
7e6a213465 pmix: fix compilation error
compilation failed because of missing prototypes when configure'd with --enable-debug --enable-picky on a CentOS 7 box
2015-08-31 10:33:13 +09:00
rhc54
51a8a0f5d7 Merge pull request #842 from rhc54/topic/smfix
Fix shared memory operations by resolving local peers
2015-08-30 14:49:43 -07:00
Ralph Castain
b0d7564400 Sync to PMIx 1.1 - do not check pmix version when making connections 2015-08-30 12:15:30 -07:00
Ralph Castain
38ba54366c Fix shared memory operations by resolving local peers 2015-08-30 12:07:14 -07:00
Ralph Castain
0d5814b5ca Cleanup Coverity issues 2015-08-29 21:19:27 -07:00
Ralph Castain
3cab860a01 Some cleanups - still some errors that impact shared memory operations 2015-08-29 18:11:11 -07:00
Ralph Castain
1d71037139 Update some APIs 2015-08-29 17:26:32 -07:00
Ralph Castain
79827ceaa8 Remove stale directory 2015-08-29 17:15:17 -07:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
Nathan Hjelm
1d56007ab1 rcache/vma: make rcache lock recursive
There is currently a path through the grdma mpool and vma rcache that
leads to deadlock. It happens during the rcache insert. Before the
insert the rcache mutex is locked. During the call a new vma item is
allocated and then inserted into the rcache tree. The allocation
currently goes through the malloc hooks which may (and does) call back
into the mpool if the ptmalloc heap needs to be reallocated. This
callback tries to lock the rcache mutex which leads to the
deadlock. This has been observed with multi-threaded tests and the
openib btl.

This change may lead to some minor slowdown in the rcache vma when
threading is enabled. This will only affect larger message paths in
some of the btls.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-26 10:01:37 -06:00
Nathan Hjelm
54998e5745 opal: add recursive mutex
This new class is the same as the opal_mutex_t class but has a
different constructor. This constructor adds the recursive flag to the
mutex attributes for the lock. This class can be used where there may
be re-enty into the lock from within the same thread.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-26 10:01:37 -06:00
Nathan Hjelm
f451876058 Merge pull request #825 from hjelmn/white_space_purge
periodic trailing whitespace purge
2015-08-25 19:23:52 -06:00
Nathan Hjelm
64e4419d76 btl/openib: allow the use of the openib btl in thread muliple
There were several issues preventing the openib btl from running in
thread multiple mode:

 - Missing locks in UDCM when generating a loopback endpoint. Fixed in
   open-mpi/ompi@8205d79819.

 - Incorrect sequence numbers generated in debug mode. This did not
   prevent the openib btl from running but instead produced incorrect
   error messages in debug builds.

 - Recursive locking of the rcache lock caused by the malloc
   hooks. This is fixed by open-mpi/ompi#827

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 16:04:52 -06:00
Nathan Hjelm
c101385f64 btl/openib: fix sequence number generation for debug mode
When using eager RDMA in debug builds the openib btl generates a
sequence number for each send. The code independently updated the head
index and the sequence number for the eager rdma transaction. If
multiple threads enter this code at the same time and run in the
following order:

thread 1: update sequence (0 -> 1)
thread 2: update sequence (1 -> 2)
thread 2: update head (0 -> 1)
thread 1: update head (1 -> 2)

the sequence number for head[0] gets 1 and the sequence number for
head[1] gets 0. The fix is to generate the sequence number from the
head index.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 16:00:06 -06:00
Nathan Hjelm
8205d79819 btl/openib: add missing lock calls
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 12:21:49 -06:00
Nathan Hjelm
156ce6af21 periodic whitespace purge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 09:32:33 -06:00
Nathan Hjelm
f59b3ed7ed opal/mutex: remove unnecessary ()s from OPAL_SCOPED_LOCK macro
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-21 10:36:49 -06:00
Nathan Hjelm
209a7a0721 opal/lifo: add load-linked store-conditional support
This commit adds implementations for opal_atomic_lifo_pop and
opal_atomic_lifo_push that make use of the load-linked and
store-conditional instruction. These instruction allow for a more
efficient implementation on supported platforms.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-18 14:01:52 -06:00
Nathan Hjelm
2a7e191dd8 opal/fifo: if available use load-linked store-conditional
These instructions allow a more efficient implementation of the
opal_fifo_pop_atomic function.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-18 14:01:52 -06:00
Nathan Hjelm
6a19a10fbb atomic/ppc: add atomics for load-link, store-conditional, and swap
This commit adds implementations of opal_atomic_ll_32/64 and
opal_atomic_sc_32/64. These atomics can be used to implement more
efficient lifo/fifo operations on supported platforms. The only
supported platform with this commit is powerpc/power.

This commit also adds an implementation of opal_atomic_swap_32/64 for
powerpc.

Tested with Power8.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-18 14:01:52 -06:00
Nathan Hjelm
f87dbca042 Merge pull request #817 from hjelmn/remove_alpha
opal/asm: remove alpha support
2015-08-18 13:52:03 -06:00