1
1

20 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
9178219e6b Deregister event handlers only on final call to finalize. Ensure we pass PMIx mca params
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-28 15:00:43 -07:00
Ralph Castain
8263efff65 Fix uninitialized variables
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-23 11:12:26 -07:00
Ralph Castain
3e78f84093 Silence Coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-21 13:19:51 -07:00
Ralph Castain
952726c121 Update to latest PMIx master - equivalent to 2.0rc2. Update the thread support in the opal/pmix framework to protect the framework-level structures.
This now passes the loop test, and so we believe it resolves the random hangs in finalize.

Changes in PMIx master that are included here:

* Fixed a bug in the PMIx_Get logic
* Fixed self-notification procedure
* Made pmix_output functions thread safe
* Fixed a number of thread safety issues
* Updated configury to use 'uname -n' when hostname is unavailable

Work on cleaning up the event handler thread safety problem
Rarely used functions, but protect them anyway
Fix the last part of the intercomm problem
Ensure we don't cover any PMIx calls with the framework-level lock.
Protect against NULL argv comm_spawn

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-20 09:02:15 -07:00
Ralph Castain
93cf3c7203 Update OPAL and ORTE for thread safety
(I swear, if I look this over one more time, I'll puke)

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 12:30:57 -07:00
Ralph Castain
ef0e0171c9 Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c).
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-08 10:04:50 -07:00
Gilles Gouaillardet
0f47310a75 pmix2x/pmix2x_client: plug misc memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:29 +09:00
Ralph Castain
9eab9a1ed3 Remove stale global variables
Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers.

Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation).

Begin first cut at memory profiler

Some minor cleanups of memprobe

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-02 14:04:24 -08:00
Ralph Castain
e8aea2ebfc Minor cleanups
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-30 16:19:42 -08:00
Ralph Castain
884fb7fcf2 Update the PMIx2 support to include the latest shared memory optimizations
Update ORTE support for dynamic PMIx operations e.g., PMIx_Spawn
Update to track master
Ensure that --disable-pmix-dstore actually disables the dstore. Sync to a few debugger updates

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-14 15:00:10 -08:00
Ralph Castain
1a0bccb536 Now that PMIx has settled on its release strategy and numbering, update the OPAL pmix framework to track
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-02 15:44:43 -08:00
Ralph Castain
af67f16422 Update configury to support multiple PMIx versions, rename pmix2x component to pmix3x for support of PMIx master
Update support for external v1.1.x and v2.x libraries. Minor corrections to the v3.x component
2016-08-25 18:19:05 -07:00
Artem Polyakov
c5a91c5c9d opal/pmix: fix pmix jobid calculation if external PMIx server is used. 2016-08-15 21:13:51 +03:00
Artem Polyakov
f3c816b52e opal/pmix: fix indentation in some files. 2016-08-15 18:21:50 +07:00
Ralph Castain
cacb582ecd Support timeout values when performing connect/accept operations. Bump default timeout to 10 minutes so folks have time to start the partnering application 2016-07-28 14:09:06 -07:00
Ralph Castain
01a653d50a Remove a debug print in comm_cid.c. Update PMIx2 to include the revised PMIx_Get logic for higher performance by reducing the number of hash table lookups. Fix a bug where requests for data from a proc in another nspace could hang, or result in "not found".
Remove stale file reference

Restore autogen pass thru pmix

Remove generated file
2016-07-20 00:58:19 -07:00
Nathan Hjelm
03bce91de8 pmix/pmix2x: add missing increment in loop
This commit fixes a bug in the pmix2x client code where a loop
variable is not correctly incremented. This was leading to hangs and
crashes when creating intercommunicators. Also fixed two double
increments in other loops.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-18 10:35:05 -06:00
Jeff Squyres
72f41d4490 pmix: replace all tabs with spaces
No code or logic changes

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:08:33 -04:00
Ralph Castain
dde69e1be2 Cleanup CIDs 1362763, 1362762, 1362760, 1362759, 1362758, 1362757, 1362756, 1362755, 1362754. Unsure how to resolve 1362761.
Fixes #1792
2016-06-18 12:28:46 -07:00
Ralph Castain
5d330d5220 Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler.
Add PMIx 2.0

Remove PMIx 1.1.4

Cleanup copying of component

Add missing file

Touchup a typo in the Makefile.am

Update the pmix ext114 component

Minor cleanups and resync to master

Update to latest PMIx 2.x

Update to the PMIx event notification branch latest changes
2016-06-14 13:08:41 -07:00