1
1

57 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
6dacf40a8c Ensure the epilog gets executed in PMIx server
If we abnormally terminate, then we still want any cleanups to be
executed.

Remove debug

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-10 18:28:05 -08:00
Ralph Castain
d5471d7898 Silence warnings in optimized build
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-20 12:00:28 -08:00
Ralph Castain
07427c6d89 Update to PMIx v3.0 PR for cleanup registration
If available, have apps use registration capability to cleanup their session directories. Setup capability for vader to register its shared memory file location - let someone familiar with that code do so.

Final cleanup to track uid/gid, update the opal/pmix API to pass flags for ignore and leave top directory alone

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-18 06:53:11 -08:00
Jeff Squyres
c19822dad4 pmix: pack pointer to object (vs. pointer to pointer)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-13 09:50:44 -08:00
Ralph Castain
9c84e1485b Some minor cleanups of the DVM
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-12 16:27:37 -08:00
Ralph Castain
d75d0bc5f6 Sync to PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-11 17:06:41 -08:00
Ralph Castain
d4b83cc951 Sync with PMIx master
Implement direct modex protection to turn off PMIx dstore when direct modex scenario is detected

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-07 18:10:56 -08:00
Ralph Castain
b97caf8f05 Correct copy/paste error in example
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-02 10:33:28 -07:00
Ralph Castain
27f3d417ca Revert the MPI_Init fence operations to use volatile bool instead of thread macros.
The problem is that the waiting thread is cycling using OMPI_LAZY_WAIT_FOR_COMPLETION so it can exercise opal_progress. This probably isn't as critical for the modex step, but definitely necessary for the barrier at the end of mpi_init. The problem this creates is that the lazy macro exits as soon as "active" becomes false, and then we destruct the lock.

However, wakeup_thread sets "active" to false - and then calls the condition broadcast to wakeup any waiting threads. So there is a race condition between that broadcast and the lock destruct.

Add OPAL_ACQUIRE_OBJECT and OPAL_POST_OBJECT memory barriers to help protect against thread race conditions on some platforms

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-31 08:09:02 -07:00
Ralph Castain
7839dc91a8 Sync to PMIx v3.0 (master)
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-30 13:06:41 -07:00
Ralph Castain
36d7e752b6 I think we have all concluded that there is no good answer to locating the external libevent library, so surrender to the situation and simply remove that requirement. Users wanting to utilize the embedded PMIx library can install it, but will have to use mpicc _and_ add an explicit -lpmix to their cmd line to compile their application.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-29 07:39:02 -07:00
Ralph Castain
ea3508b26b Sync to PMIx master (now v3.0)
Fix an apparent typo in external libevent configury
Require external libevent for install of separate libpmix

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-26 21:05:17 -07:00
Ralph Castain
01ed7548c4 Update to PMIx v3.0a
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-25 12:25:27 -07:00
Ralph Castain
8fbfe68754 Alter the PMIx embedded configuration so that we can build static with devel headers - if the builder requests that we install a separate libpmix, then don't prefix the PMIx variables.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-24 21:45:27 -07:00
Ralph Castain
292983261a We should never block when requesting dmodex data from the PMIx server as this will block it from being able to accept connections from local clients. Do not deregister standing dmodx requests when a fence completes unless we actually collected the data in the fence
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-24 07:51:10 -07:00
bosilca
ac348da13a Merge pull request #4374 from bosilca/topic/osx_syslog
Topic/osx syslog
2017-10-23 18:06:36 -04:00
Ralph Castain
6ea3c8a0bd Update the interlib example to show an alternative method for model declaration. Add a missing range value to the OPAL layer. Make it easier to see OMPI model callbacks
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-23 11:27:42 -07:00
George Bosilca
8f32b345de
Address syslog issues on OSX 10.13 with gcc 7.x
gcc 7.[1,2] (at least) fails to correctly parse the OSX 10.13 sys/syslog.h
header. As a results we need to potect syslog support in OPAL, PMIX and
ORTE.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-10-23 14:02:10 -04:00
Ralph Castain
a63904d47f Updates to support cross-version operations with OMPI v2.x
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-22 08:38:33 -07:00
Ralph Castain
f8ce31f13c Fix event registration so OpenMP/MPI coordination sides can both get notification of model declarations
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-19 18:06:38 -07:00
Ralph Castain
60b338e857 Sync to PMIx v3. Ensure prun uses the ess/tool component.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-14 08:24:57 -07:00
Ralph Castain
388034c814 Add support for the -v (verbose) option to prun and silence the "executing" and "completed" output otherwise.
Debounce "unreachable" notifications for tools when they disconnect
Enable the -x cmd line option for prun

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 0a5b36180a22959654461ac1303cec35313f8b4a)
2017-10-10 12:54:49 -07:00
Ralph Castain
c696e04c5e Since PMIx is moving to release v3.0, embed the new release candidate in opal/pmix framework. Move the pmix2x code over to the ext2x component. Create a new ext3x component
Remove some build product. Tell PMIx that we don't need a new nspace generated when OMPI calls connect
Add missing Makefile

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-09 13:51:08 -07:00
Ralph Castain
1a0bccb536 Now that PMIx has settled on its release strategy and numbering, update the OPAL pmix framework to track
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-02 15:44:43 -08:00
Ralph Castain
6041467df0 Update to latest PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-01 14:47:44 -08:00
Ralph Castain
64873487b4 Remove the max_connections parameter from the radix component as it is confusing. Modify PMIx client init so that it simply returns the nspace/rank if called by a server - this allows the server to retrieve its assigned ID. Register the server's nspace so client-side operations can succeed
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-01 12:17:11 -07:00
Ralph Castain
f4a55118e6 Update to latest PMIx master - mostly updates example codes, but includes one critical cleanup during finalize.
NOTE: set dstore shared memory storage option to "on" by default

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-10-27 11:16:03 -07:00
Ralph Castain
f298f294e1 Update PMIx to latest master tarball. Ensure we set the HNP name for orted's so that PMIx_Lookup can find the server
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-10-26 15:48:56 -07:00
Ralph Castain
649301a3a2 Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier).
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
2016-10-23 21:52:39 -07:00
Ralph Castain
9131eca9c6 Update to latest PMIx master 2016-10-20 21:13:40 -07:00
Ralph Castain
8113a8d1b0 Now that we are hiding symbols in the internal PMIx component, we cannot reuse that component for integration to the external PMIx master as the symbols don't match. So create a new "ext3x" component and copy the PMIx v3 integration over there.
Also, remove a couple of build-product files from the pmix3x component.
2016-10-18 13:15:32 -07:00
Ralph Castain
50c9f3de55 Ensure the PMIx progress thread is stopped prior to tearing anything down. Thanks to Gilles for spotting this error! 2016-10-18 00:27:52 -07:00
Ralph Castain
6f65d0a173 Repair event notification support. Cleanup the long-suffering "epoll: warning" coming out of libevent whenever a process abnormally terminated.
Add changes to test program

Sync to PMIx master
2016-10-13 16:27:39 -07:00
Ralph Castain
6417f217e1 Turn PMIx dstore off by default as MTT was effectively broken 2016-10-13 08:14:51 -07:00
Ralph Castain
8f05beb1ec Sync pmix/master@cb53105 2016-10-11 20:54:59 -07:00
Ralph Castain
6ce4b6d098 Eliminate -Wall from being hardcoded 2016-10-11 12:50:31 -07:00
Ralph Castain
1859b03416 Enable PMIx shared memory support by default 2016-10-11 12:18:01 -07:00
Ralph Castain
1d7d7c201b Update PMIx support to latest PMIx master 2016-10-11 10:17:23 -07:00
Ralph Castain
5b1484a836 Implement the backend support for process-generated event notification 2016-10-08 09:24:28 -07:00
Gilles Gouaillardet
f1f1fb15eb pmix3x: configury: output major, minor and release version after checking them
and hence fix the configure output

(back-ported from upstream commit pmix/master@7b7cdda2de)
2016-10-08 13:01:28 +09:00
Gilles Gouaillardet
f3af799608 pmix3x: misc fixes to get pmix build on Solaris
- replace MAXHOSTNAMELEN with hardcoded 1024.
  unlike Linux, Solaris #define MAXHOSTNAMELEN in <netdb.h>,
  so use a hard coded value to keep the test simpl
- stdout cannot be assigned on Solaris, so use freopen instead

(back-ported from upstream commit pmix/master@a63f6e53f4)
2016-10-08 13:01:28 +09:00
Gilles Gouaillardet
5cbfddb8f1 pmix3x: fix misc memory leaks
(back-ported from upstream commit pmix/master@1eff526929)
2016-10-08 13:01:28 +09:00
Gilles Gouaillardet
b4e4e4a5f1 pmix3x: enhance pmix_nspace_t destructor
PMIX_RELEASE all elements stored in the internal and modex hash tables

(back-ported from upstream commit pmix/master@b90674fc52)
2016-10-08 13:01:27 +09:00
Gilles Gouaillardet
f1dc033767 pmix3x: add the PMIX_HASH_TABLE_FOREACH macro
this is a convenience macro similar to the PMIX_LIST_FOREACH macro,
that can be used to iterate on all the key/value pairs of a pmix_hash_table_t

(back-ported from upstream commit pmix/master@349971c68c)
2016-10-08 13:01:27 +09:00
Gilles Gouaillardet
7601e783cc pmix3x: sec/munge: add a missing include file
(cherry picked from upstream pmix/master@f7cfb11f6b)
2016-10-03 16:09:10 +09:00
Ralph Castain
e773c17cf3 Put show_help thru the PMIx "log" API. This pushes the show_help output from apps into the pmix thread, thus avoiding conflicts in the RML thread, which should help with thread lock situations. 2016-10-02 16:02:23 -07:00
Gilles Gouaillardet
1fbc9a5431 pmix3x: dstore/pmix: flock portability
Using the fcntl-locking instead of the flock

(back-ported from upstream pmix/master@3030a0cca1)
2016-09-27 13:21:03 +09:00
Gilles Gouaillardet
362a5886de pmix3x: client: fix PMIx_Finalize() sequence
pmix_progress_thread_finalize() invokes libevent event_base_free,
so all libevent stuff cannot be used after.
Hence, pmix_client_globals.myserver must be PMIX_DESTRUCT'ed
before invoking pmix_progress_thread_finalize()
2016-09-24 00:01:23 +09:00
Gilles Gouaillardet
5479c6cca7 pmix3x: add missing #include
and get Open MPI build on OpenBSD 6.0
2016-09-23 11:23:18 +09:00
Gilles Gouaillardet
041a431966 pmix3x: configury: correctly handle --disable-dlopen
the LT_* macros do overwrite the enable_dlopen variable,
so it must be tested and saved before invoking LT_INIT.
delay the invokation of the LT_* macros and use the
PMIX_ENABLE_DLOPEN_SUPPORT variable to figure out whether
--disable-dlopen was invoked
2016-09-15 13:26:20 +09:00