Howard Pritchard
f52dd511d4
Merge pull request #1600 from hppritcha/topic/pmix_fix_for_finalize
...
pmix/cray: set fence_nb to NULL
2016-04-28 13:50:15 -06:00
hppritcha
aa1d7b9c50
pmix/cray: set fence_nb to NULL
...
Rather than have a stub function for the pmix fence_nb
operation, just set to NULL. Causes fewer problems.
Fixes #1597
Fixes #1527
Signed-off-by: hppritcha <howardp@lanl.gov>
2016-04-28 13:48:54 -05:00
Ralph Castain
02876564d4
Silence warning of zero-byte malloc
2016-04-26 11:55:59 -07:00
Gilles Gouaillardet
d96919638f
pmix: remote autogenerated file and update .gitignore
...
removed: opal/mca/pmix/pmix114/pmix/src/include/private/autogen/config.h.in
2016-04-18 12:57:41 +09:00
Ralph Castain
b009e58d25
Roll to PMIx 1.1.4rc2 - replaces some code that was incorrectly removed in prior update
2016-04-16 18:24:24 -07:00
Ralph Castain
8ff114e668
Update to official PMIx 1.1.4rc1
2016-04-15 21:47:46 -07:00
Ralph Castain
449ec41532
Roll to PMIx 1.1.4rc1 and remove the PMIx 1.2.0 directory as the community has decided to not do that release version. This incorporates a number of bug fixes that have been identified and repaired in the PMIx and OMPI code bases. Also includes several minor corrections to the PMIx code so it now supports run-thru without hanging on collectives involving a process that exits
2016-04-15 10:11:11 -07:00
Ralph Castain
2432daf065
Some minor cleanups of a memory leak and error output
2016-04-08 07:46:18 -07:00
Rainer Keller
52080a5736
As per the pull request to pmix/master:
...
https://github.com/pmix/master/pull/71
Have OMPI's current version of pmix120 nicely fail in case of
too long sun_path (longer than 108 or in case of OSX 103 chars).
And have OMPI return proper error messages with hints how to
amend.
2016-04-07 22:12:53 +02:00
Gilles Gouaillardet
6f450630d8
pmix/external: fix misc missing conversion and type issues
2016-04-04 10:12:34 +09:00
Gilles Gouaillardet
2ede47c462
pmix: fix misc missing conversion and type issues
2016-04-04 10:12:34 +09:00
Nysal Jan K.A
75233573d1
pmix: Increment the reference count in PMIx_Init
...
The reference counting was broken which led PMIx_Finalize
to release resources early. This fixes the "use after free" scenarios
that I encountered.
(based on commit pmix/master@abfaa4c )
2016-03-27 04:11:25 -04:00
Josh Hursey
099170bb31
Merge pull request #1496 from jjhursey/topic/pmix120-obj-patch
...
pmix/pmix120: Fix OBJ_ to PMIX_ symbol name
2016-03-25 19:35:43 -05:00
Ralph Castain
0b4310b186
Remove an unnecessary header that forced exposure of the PMIx internal headers
2016-03-25 16:57:41 -07:00
Ralph Castain
e8246e079b
Minor cleanup to match the changes in the PMIx master
2016-03-25 15:12:41 -07:00
Joshua Hursey
8ebeaa5861
pmix/pmix120: Fix OBJ_ to PMIX_ symbol name
2016-03-25 16:17:08 -05:00
Ralph Castain
af1444b6e1
Cleanup a debug statement. Plug a memory leak
2016-03-08 18:27:55 -08:00
Ralph Castain
bac6290b22
Ensure the process name is positive when using direct launch
...
Fixes #1425
2016-03-08 08:31:05 -08:00
Ralph Castain
b57a191ccc
Update the external client to the new PMIx init/finalize signatures
2016-03-03 20:50:20 -08:00
Ralph Castain
4a55fba414
Fix registration of error handlers thru the pmix120 component. A thread-shift operation was hanging on the sync_event_base, which made it dependent on someone calling opal_progress. Unfortunately, a process in "sleep" or spinning outside the MPI library won't do that, and so we never complete errhandler registration.
2016-03-02 15:01:01 -08:00
Ralph Castain
011403c04a
Fix a number of issues, some of which have lingered for a long time:
...
* provide a more reliable way of determining that a process is a singleton by leveraging the schizo framework. Add new components for slurm, alps, and orte to detect when we are in a managed environment, and if we have been launched by mpirun or a native launcher. Set the correct envars to control ess and pmix selection in each case.
* change the relative priority of the pmix120 and pmix112 components to make pmix120 the default
* fix singleton comm-spawn by correctly setting the num_apps field of the orte_job_t created by the daemon - this fixes a segfault in register_nspace on newly created daemons
* ensure orterun doesn't propagate any ess or pmix directives in its environment
* Cleanup a few valgrind issues and memory leaks
* Fix a race condition that prevented the client from completing notification registrations (missing thread shift)
* Ensure the shizo/alps component detects launch by mpirun
2016-03-01 06:53:00 -08:00
Ralph Castain
d28d3ee901
Make the error message on external pmix library a little clearer by separating out the libevent from the libhwloc checks
2016-02-24 11:20:25 -06:00
Ralph Castain
d653cf2847
Convert the orte_job_data pointer array to a hash table so it doesn't grow forever as we run lots and lots of jobs in the persistent DVM.
2016-02-21 11:55:49 -08:00
Ralph Castain
8c92a179c0
Minor memory leak
2016-02-19 15:05:39 -08:00
Ralph Castain
6e68d758b9
Cleanup some valgrind complaints about jumps with uninitialized values. Fix a few IOF issues reported by Mark Santcroos when submitting jobs from tools. Add the ability to pass directives to the --output-filename option that tell ORTE to (a) not include the jobid in the path to the output files, and (b) not to copy the output to the tool (i.e., just store it in the files).
...
ck
Remove stale debug
Fix a segfault if no subscribers are present
2016-02-18 16:30:37 -08:00
Ralph Castain
60a7bc2e50
Enable the PMIx notification callback system. This currently is only supported by the pmix120 component, which is not selected by default. All other components will ignore error registration requests, and thus do not support debugger attach when launched via mpirun. Note that direct launched applications will support such attachment, but may not do so in a scalable fashion.
...
Fixes ##1225
2016-02-18 09:29:12 -08:00
rhc54
2745610eb7
Merge pull request #1377 from rhc54/topic/pmix
...
Plug a leak in the PMIx subsystem
2016-02-17 20:05:45 -08:00
Ralph Castain
efb0eff43e
Plug a leak in the PMIx subsystem
2016-02-17 19:00:36 -08:00
Ralph Castain
8f9508cace
Further enhance the support for Singularity containers. Extend the "personality" command-line option to allow specifying both model (e.g., "ompi") and container (e.g., "singularity"), and add the necessary logic to support multiple options. Add a new pmix "isolated" component to handle singletons where no HNP is available since containers cannot launch the HNP.
2016-02-17 13:33:06 -08:00
Gilles Gouaillardet
f5a53b5f1e
pmix: fix Makefile.am to correctly exclude autogenerated file from tarball
...
(back-ported from pmix/master@73daf58ee5 )
2016-01-28 11:42:03 +09:00
Gilles Gouaillardet
15e26da1e1
pmix configury: add missing PMIX_CHECK_ICC_VARARGS function
...
Thanks Paul Hargrove for the report
(back-ported from pmix/master@7b16e914bf )
2016-01-26 10:57:16 +09:00
rhc54
b172b8599b
Merge pull request #1285 from ggouaillardet/topic/pmix_dist_fix
...
pmix: do not include automatically generated include/private/autogen/…
2016-01-16 20:49:41 -08:00
Gilles Gouaillardet
1d38430e43
opal: replace opal_convert_jobid_to_string with opal_snprintf_jobid
2016-01-14 10:39:03 +09:00
Gilles Gouaillardet
955fe85cb6
pmix/pmix120: add missing include file
2016-01-12 11:35:32 +09:00
Gilles Gouaillardet
73daf58ee5
pmix: do not include automatically generated include/private/autogen/config.h into dist tarball
...
Thanks Siegmar Gross for the initial report of this issue
2016-01-08 13:18:15 +09:00
Nysal Jan K.A
13f9bb9202
Use PMI2 constants for consistency
2016-01-07 11:46:22 +05:30
Jeff Squyres
e4bdad09c1
pmix: remove extra wrapper LIBS
...
These extra libs are now no longer necessary.
Fixes open-ompi/ompi#1281 .
2016-01-05 12:09:53 -08:00
Ralph Castain
0a6b8d2c14
Correctly handle connection terminations during finalize so mpirun doesn't hang. Cleanup some corner cases in the error notification system
2015-12-30 07:16:43 -08:00
Ralph Castain
a04f1cd643
Silence some Coverity warnings
2015-12-29 20:37:25 -08:00
Gilles Gouaillardet
0ca1ee5156
configury: misc pmix120 fixes
2015-12-28 23:17:41 +09:00
Gilles Gouaillardet
3300d7cc00
pmix: rename pmix_munge_module
2015-12-28 23:16:27 +09:00
Ralph Castain
a5b95a0939
Continue work on error notification system
2015-12-28 23:15:59 +09:00
Ralph Castain
810f2446b7
Add pmix120 component, update the error handling functions in the PMIx API.
...
Update the configure logic for the new pmix120 component
ckpt
Get the pmix120 component to work - still not really registering or handling notifications, but infrastructure now operates
Cleanup some of the symbol scopes, and provide a more comprehensive rename.h file. Will pretty it up later - let's see how this works
Cleanup the rename files to use the pretty macros
2015-12-28 23:15:44 +09:00
Gilles Gouaillardet
c757c5c612
pmix/external: Fix error handler usage
2015-12-28 23:15:17 +09:00
Gilles Gouaillardet
1157329732
configury: misc pmix112 fixes
2015-12-28 23:15:16 +09:00
Gilles Gouaillardet
d416c7fd8a
pmix/external: no more circular dependencies if not building shared DSO
2015-12-28 23:14:03 +09:00
Gilles Gouaillardet
f0e3e16f49
pmix/base: add missing #include <unistd.h>
...
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:52 +09:00
rhc54
978c54880d
Merge pull request #1238 from rhc54/topic/cleanup
...
Cleanup warnings in opal and orte layers when building optimized on Mac
2015-12-17 09:37:48 -08:00
Ralph Castain
64b695669a
Cleanup warnings in opal and orte layers when building optimized on Mac
2015-12-17 07:51:24 -08:00
Gilles Gouaillardet
75d16cfb27
Fix a few places where opal/util/argv.h were required when building pmix components (go figure)
2015-12-17 16:19:25 +09:00