1
1

2186 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
1cdb86b8c7 Cleanup s1 and s2 components, and ensure that mpirun and orteds only use non-direct-launch pmix components. 2015-09-08 18:37:09 -07:00
Jeff Squyres
bc9e5652ff whitespace: purge whitespace at end of lines
Generated by running "./contrib/whitespace-purge.sh".
2015-09-08 09:47:17 -07:00
Ralph Castain
e6add86e4f Deal with connect/accept between two jobs from different mpirun's. Somewhat optimize connect/accept by using MPI bcast to distribute the participants instead of another PMIx lookup. Cleanup some Coverity issues. 2015-09-07 09:19:24 -07:00
Ralph Castain
37c3ed68e7 Cleanup connect/disconnect and bring comm_spawn back online! 2015-09-06 10:27:39 -07:00
Jeff Squyres
f782a7640e usnic: minor re-order of Makefile.am sources
Put the hwloc.c file alphabetically in the list.
2015-09-05 05:02:00 -07:00
rhc54
665b30376a Merge pull request #868 from rhc54/topic/hwloc
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 17:58:07 -07:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
rhc54
d45ccda813 Merge pull request #866 from rhc54/topic/updatepmix
Update PMIx support
2015-09-04 11:09:36 -07:00
Ralph Castain
f6948c2bb4 Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working 2015-09-04 10:07:17 -07:00
Rolf vandeVaart
ebfd00b66e While debugging user problems, these extra verbosity statements would be helpful 2015-09-03 17:15:39 -04:00
Howard Pritchard
0557beee22 Merge pull request #864 from hppritcha/topic/pmix_cray_more_funcs
pmix/cray: more stubs plus a get_version method
2015-09-03 14:52:46 -06:00
Howard Pritchard
6e7345c790 pmix/cray: more stubs plus a get_version method
Add more stubs to reduce likelihood of future
mysterious segfaults if some of the newer pmix
funcs start to get used within ompi.

Add a get_version to return the version of the
Cray PMI library being used, since the Cray PMI
library actually has a function to get that info.

Be more accurate about which functions have a hope
of being implemented using Cray PMI and those which
never will.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-09-03 12:51:50 -07:00
Ralph Castain
a772b46c15 Bring the MPI_Publish and friends online 2015-09-02 12:04:07 -07:00
Ralph Castain
95dbd70f44 Sync to PMIx 1.1, sha- 51479b0 2015-09-01 14:09:25 -07:00
Rolf vandeVaart
30b1a6e003 Merge pull request #836 from rolfv/pr/fix-cuda-war
Add config code to check for need of workaround. Add runtime way to turn oiff just in case.
2015-09-01 15:05:29 -04:00
Nathan Hjelm
f926796e57 Merge pull request #828 from hjelmn/openib_thread_fix
openib thread fixes
2015-09-01 09:12:50 -06:00
rhc54
d8cb3fe705 Merge pull request #852 from rhc54/topic/pmix
Sync to PMIx tarball - includes:
2015-09-01 06:54:34 -07:00
Gilles Gouaillardet
6dfa996760 configury: fix a typo in opal/mca/pmix/pmix1xx/configure.m4 2015-09-01 14:59:07 +09:00
Ralph Castain
c1bbd7bc78 Sync to PMIx tarball - includes:
* update to configury to silence ident messages (thanks Gilles!)
* fix for warnings Jeff saw when get didn't find the requested data
* fix for Mac OSX operations
2015-08-31 21:51:02 -07:00
rhc54
2d3c6af8ad Merge pull request #851 from rhc54/topic/copyfix
Only copy the value across if the "get" operation succeeded
2015-08-31 19:51:13 -07:00
Ralph Castain
ef69958e01 Only copy the value across if the "get" operation succeeded 2015-08-31 17:11:26 -07:00
Jeff Squyres
8558458bb9 usnic: adjust for new PMIX argument type 2015-08-31 14:55:58 -07:00
Rolf vandeVaart
54ab0d1a51 Add config code to check for need of workaround. Add runtime way to turn it off just in case 2015-08-31 17:18:47 -04:00
rhc54
6e78e2c89b Merge pull request #846 from rhc54/topic/pmix
Sync to PMIx tarball
2015-08-31 08:53:07 -07:00
Nathan Hjelm
2aab6ad90f Merge pull request #827 from hjelmn/recursive_locks
Add support for recursive locks (revisited)
2015-08-31 07:52:23 -07:00
Ralph Castain
a3842af709 Sync to PMIx tarball 2015-08-31 07:47:46 -07:00
Ralph Castain
bcabd1e282 Sync with PMIx tarball, bringing across the warning fixes pointed out by Gilles 2015-08-30 21:13:55 -07:00
Gilles Gouaillardet
7e6a213465 pmix: fix compilation error
compilation failed because of missing prototypes when configure'd with --enable-debug --enable-picky on a CentOS 7 box
2015-08-31 10:33:13 +09:00
rhc54
51a8a0f5d7 Merge pull request #842 from rhc54/topic/smfix
Fix shared memory operations by resolving local peers
2015-08-30 14:49:43 -07:00
Ralph Castain
b0d7564400 Sync to PMIx 1.1 - do not check pmix version when making connections 2015-08-30 12:15:30 -07:00
Ralph Castain
38ba54366c Fix shared memory operations by resolving local peers 2015-08-30 12:07:14 -07:00
Ralph Castain
0d5814b5ca Cleanup Coverity issues 2015-08-29 21:19:27 -07:00
Ralph Castain
3cab860a01 Some cleanups - still some errors that impact shared memory operations 2015-08-29 18:11:11 -07:00
Ralph Castain
1d71037139 Update some APIs 2015-08-29 17:26:32 -07:00
Ralph Castain
79827ceaa8 Remove stale directory 2015-08-29 17:15:17 -07:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
Nathan Hjelm
1d56007ab1 rcache/vma: make rcache lock recursive
There is currently a path through the grdma mpool and vma rcache that
leads to deadlock. It happens during the rcache insert. Before the
insert the rcache mutex is locked. During the call a new vma item is
allocated and then inserted into the rcache tree. The allocation
currently goes through the malloc hooks which may (and does) call back
into the mpool if the ptmalloc heap needs to be reallocated. This
callback tries to lock the rcache mutex which leads to the
deadlock. This has been observed with multi-threaded tests and the
openib btl.

This change may lead to some minor slowdown in the rcache vma when
threading is enabled. This will only affect larger message paths in
some of the btls.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-26 10:01:37 -06:00
Nathan Hjelm
f451876058 Merge pull request #825 from hjelmn/white_space_purge
periodic trailing whitespace purge
2015-08-25 19:23:52 -06:00
Nathan Hjelm
64e4419d76 btl/openib: allow the use of the openib btl in thread muliple
There were several issues preventing the openib btl from running in
thread multiple mode:

 - Missing locks in UDCM when generating a loopback endpoint. Fixed in
   open-mpi/ompi@8205d79819.

 - Incorrect sequence numbers generated in debug mode. This did not
   prevent the openib btl from running but instead produced incorrect
   error messages in debug builds.

 - Recursive locking of the rcache lock caused by the malloc
   hooks. This is fixed by open-mpi/ompi#827

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 16:04:52 -06:00
Nathan Hjelm
c101385f64 btl/openib: fix sequence number generation for debug mode
When using eager RDMA in debug builds the openib btl generates a
sequence number for each send. The code independently updated the head
index and the sequence number for the eager rdma transaction. If
multiple threads enter this code at the same time and run in the
following order:

thread 1: update sequence (0 -> 1)
thread 2: update sequence (1 -> 2)
thread 2: update head (0 -> 1)
thread 1: update head (1 -> 2)

the sequence number for head[0] gets 1 and the sequence number for
head[1] gets 0. The fix is to generate the sequence number from the
head index.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 16:00:06 -06:00
Nathan Hjelm
8205d79819 btl/openib: add missing lock calls
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 12:21:49 -06:00
Nathan Hjelm
156ce6af21 periodic whitespace purge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 09:32:33 -06:00
Nathan Hjelm
0a968de53f mca/base: use standard verbosity levels
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-17 11:48:06 -06:00
Nathan Hjelm
8cbf743cfa mca/base: standize MCA verbosity levels
Up until this point we have had inconsistent usage for MCA verbosity
levels. This commit attempts to correct this by recommending
components use these standard levels: none (0), error (1), warn (10),
info (20), debug (40), and trace (60).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-17 11:47:57 -06:00
Gilles Gouaillardet
950b07d17b Merge pull request #809 from ggouaillardet/topic/xrc_runtime_check
btl/openib: remove OFED version runtime check when XRC is used
2015-08-16 10:54:09 +09:00
Gilles Gouaillardet
d02ccd67de btl/openib: remove OFED version runtime check when XRC is used
this test seems broken :
 - some false positive were reported
 - it fails to detect some OFED version mismatch
this commit simply removes this test, which means the application
will likely fail if XRC is used ad OFED version is different
between compile time and runtime
2015-08-14 09:10:03 +09:00
Rolf vandeVaart
cb8c86910e Add static definitions where needed and remove one unused definition 2015-08-13 14:59:07 -04:00
Jeff Squyres
14340770c4 usnic: remove some logically dead code
This code really had no purpose; just assign FI_VERSION(1, 1).  This
fixes CID 1315274.

Also clarify the commet about why we still retain libfabric v1.0.0
compatibility code, even though configure.m4 requires libfabric >= v1.1.0.
2015-08-12 05:21:18 -07:00
Jeff Squyres
7f857034d9 common verbs: check return value of sscanf()
Fixes CID 1304563.
2015-08-12 05:14:58 -07:00
Gilles Gouaillardet
92b2d2ffeb configury: fix libevent configure.ac
fix interleaved messages :
checking for working epoll library interface... checking if epoll can build... yes
yes
2015-08-12 15:37:22 +09:00