Ralph Castain
d28d3ee901
Make the error message on external pmix library a little clearer by separating out the libevent from the libhwloc checks
2016-02-24 11:20:25 -06:00
Ralph Castain
d653cf2847
Convert the orte_job_data pointer array to a hash table so it doesn't grow forever as we run lots and lots of jobs in the persistent DVM.
2016-02-21 11:55:49 -08:00
Ralph Castain
8c92a179c0
Minor memory leak
2016-02-19 15:05:39 -08:00
Ralph Castain
6e68d758b9
Cleanup some valgrind complaints about jumps with uninitialized values. Fix a few IOF issues reported by Mark Santcroos when submitting jobs from tools. Add the ability to pass directives to the --output-filename option that tell ORTE to (a) not include the jobid in the path to the output files, and (b) not to copy the output to the tool (i.e., just store it in the files).
...
ck
Remove stale debug
Fix a segfault if no subscribers are present
2016-02-18 16:30:37 -08:00
Ralph Castain
60a7bc2e50
Enable the PMIx notification callback system. This currently is only supported by the pmix120 component, which is not selected by default. All other components will ignore error registration requests, and thus do not support debugger attach when launched via mpirun. Note that direct launched applications will support such attachment, but may not do so in a scalable fashion.
...
Fixes ##1225
2016-02-18 09:29:12 -08:00
rhc54
2745610eb7
Merge pull request #1377 from rhc54/topic/pmix
...
Plug a leak in the PMIx subsystem
2016-02-17 20:05:45 -08:00
Ralph Castain
efb0eff43e
Plug a leak in the PMIx subsystem
2016-02-17 19:00:36 -08:00
Ralph Castain
8f9508cace
Further enhance the support for Singularity containers. Extend the "personality" command-line option to allow specifying both model (e.g., "ompi") and container (e.g., "singularity"), and add the necessary logic to support multiple options. Add a new pmix "isolated" component to handle singletons where no HNP is available since containers cannot launch the HNP.
2016-02-17 13:33:06 -08:00
Gilles Gouaillardet
f5a53b5f1e
pmix: fix Makefile.am to correctly exclude autogenerated file from tarball
...
(back-ported from pmix/master@73daf58ee5 )
2016-01-28 11:42:03 +09:00
Gilles Gouaillardet
15e26da1e1
pmix configury: add missing PMIX_CHECK_ICC_VARARGS function
...
Thanks Paul Hargrove for the report
(back-ported from pmix/master@7b16e914bf )
2016-01-26 10:57:16 +09:00
rhc54
b172b8599b
Merge pull request #1285 from ggouaillardet/topic/pmix_dist_fix
...
pmix: do not include automatically generated include/private/autogen/…
2016-01-16 20:49:41 -08:00
Gilles Gouaillardet
1d38430e43
opal: replace opal_convert_jobid_to_string with opal_snprintf_jobid
2016-01-14 10:39:03 +09:00
Gilles Gouaillardet
955fe85cb6
pmix/pmix120: add missing include file
2016-01-12 11:35:32 +09:00
Gilles Gouaillardet
73daf58ee5
pmix: do not include automatically generated include/private/autogen/config.h into dist tarball
...
Thanks Siegmar Gross for the initial report of this issue
2016-01-08 13:18:15 +09:00
Nysal Jan K.A
13f9bb9202
Use PMI2 constants for consistency
2016-01-07 11:46:22 +05:30
Jeff Squyres
e4bdad09c1
pmix: remove extra wrapper LIBS
...
These extra libs are now no longer necessary.
Fixes open-ompi/ompi#1281 .
2016-01-05 12:09:53 -08:00
Ralph Castain
0a6b8d2c14
Correctly handle connection terminations during finalize so mpirun doesn't hang. Cleanup some corner cases in the error notification system
2015-12-30 07:16:43 -08:00
Ralph Castain
a04f1cd643
Silence some Coverity warnings
2015-12-29 20:37:25 -08:00
Gilles Gouaillardet
0ca1ee5156
configury: misc pmix120 fixes
2015-12-28 23:17:41 +09:00
Gilles Gouaillardet
3300d7cc00
pmix: rename pmix_munge_module
2015-12-28 23:16:27 +09:00
Ralph Castain
a5b95a0939
Continue work on error notification system
2015-12-28 23:15:59 +09:00
Ralph Castain
810f2446b7
Add pmix120 component, update the error handling functions in the PMIx API.
...
Update the configure logic for the new pmix120 component
ckpt
Get the pmix120 component to work - still not really registering or handling notifications, but infrastructure now operates
Cleanup some of the symbol scopes, and provide a more comprehensive rename.h file. Will pretty it up later - let's see how this works
Cleanup the rename files to use the pretty macros
2015-12-28 23:15:44 +09:00
Gilles Gouaillardet
c757c5c612
pmix/external: Fix error handler usage
2015-12-28 23:15:17 +09:00
Gilles Gouaillardet
1157329732
configury: misc pmix112 fixes
2015-12-28 23:15:16 +09:00
Gilles Gouaillardet
d416c7fd8a
pmix/external: no more circular dependencies if not building shared DSO
2015-12-28 23:14:03 +09:00
Gilles Gouaillardet
f0e3e16f49
pmix/base: add missing #include <unistd.h>
...
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:52 +09:00
rhc54
978c54880d
Merge pull request #1238 from rhc54/topic/cleanup
...
Cleanup warnings in opal and orte layers when building optimized on Mac
2015-12-17 09:37:48 -08:00
Ralph Castain
64b695669a
Cleanup warnings in opal and orte layers when building optimized on Mac
2015-12-17 07:51:24 -08:00
Gilles Gouaillardet
75d16cfb27
Fix a few places where opal/util/argv.h were required when building pmix components (go figure)
2015-12-17 16:19:25 +09:00
Ralph Castain
3a56f0d34b
Create the pmix external component. Fix a few places where opal/util/argv.h were required when building with an external pmix (go figure).
...
NOTE: Building with external pmix *requires* that you also build with external libevent and hwloc libraries. Detect this at configure and error out with large message if this requirement is violated.
Closes #1204 (replaces it)
Fixes #1064
2015-12-15 15:26:13 -08:00
Jeff Squyres
7977fa3f0b
pmix112 config.h.in: remove generated file
2015-12-13 06:46:55 -08:00
Ralph Castain
03eb1a80bf
Update the PMIx native component to release v1.1.1, with addition of one bug-fix commit beyond the official release
...
Rename the pmix1xx component to pmix111 so it reflects the actual release it includes
Resolve the problem of PMIx being passed a bogus --with-platform argument when configuring the PMIx tarball code. There is no reason we should be passing --with-platform arguments to any internal subdirectory, so just leave that out when constructing the opal_subdir_args variable.
Update the PMIx code and continue attempting to debug direct modex
Fix a problem in the ORTE PMIx server - there was an early intent to optimize the direct modex by fetching data for all procs from the target job on the remote node, instead of fetching the data one proc at a time. However, this was never completely implemented, and so we would hang if we had multiple overlapping requests for data from more than one proc on the node.
Update PMIx to v1.1.2
2015-12-12 18:46:38 -08:00
Howard Pritchard
fecb326256
pmix/cray: fix locality bug
...
There was a bug with the way the cray pmix component
was setting the locality property for ranks on the
same node, etc.
Improve location/syntax of a comment block.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-12-08 11:13:48 -08:00
Ralph Castain
9803d69d02
Ensure the embedded PMIx respects an OMPI-level --disable-debug
2015-12-01 08:00:24 -08:00
Ralph Castain
52ea538bc1
Per fix from Nysal: set the listener_active flag before starting the progress thread, and declare the flag to be volatile
2015-11-09 09:00:59 -08:00
Ralph Castain
fed28e4cfc
Add missing file that was previously ignored
2015-11-06 14:37:09 -08:00
Ralph Castain
5f446570d8
Work on cleaning up memory leaks that are causing orte-dvm to eventually run out of memory. Still don't have everything plugged, but getting better. Sync to the PMIx master that includes removal of the pmix_common.h.in file that really didn't need to be generated, and update to the PMIx_server_init API.
2015-11-06 14:15:30 -08:00
Ralph Castain
bfdf08ae86
Fix intercomm_create by ensuring that both sides know how to translate jobid to/from nspace
...
Return something just to ensure that pack is happy
2015-11-06 02:19:45 -08:00
Ralph Castain
206e9a011e
Add a couple of missing translations to/from PMIx internal and OPAL error constants
2015-10-29 12:33:02 -07:00
Ralph Castain
8ad9b450c4
Silence Coverity warning
2015-10-28 20:10:28 -07:00
George Bosilca
c9d0fffab3
Add a missing include.
2015-10-28 00:50:58 -04:00
Ralph Castain
267ca8fcd3
Cleanup the PMIx direct modex support. Add an MCA parameter pmix_base_async_modex that will cause the async modex to be used when set to 1. Default it to 0 for now
...
to continue current default behavior.
Also add an MCA param pmix_base_collect_data to direct that the blocking fence shall return all data to each process. Obviously, this param has no effect if async_
modex is used.
2015-10-27 17:31:56 -07:00
George Bosilca
6c28f114f1
Silence a warning regarding the format str for snprintf.
2015-10-24 15:24:40 -04:00
Jeff Squyres
b43fcb7695
Merge pull request #1028 from ggouaillardet/poc/pmix1xx_configury
...
pmix1xx configury: invoke sub-configure with CFLAGS and CPPFLAGS on t…
2015-10-24 13:19:33 -04:00
Ralph Castain
4c12022a50
Silence a couple of warnings from valgrind and compilers. Since some pmix components may return success with a NULL value from a "get", check for that situation before attempting to unload the data. Preset the hostname before calling modex_recv to get it so unload properly checks for NULL. Cast a returned value to the correct ompi_proc_t pointer
2015-10-22 20:56:02 -07:00
Gilles Gouaillardet
0221f59197
pmix1xx configury: invoke sub-configure with CFLAGS and CPPFLAGS on the command line
...
if CFLAGS and/or CPPFLAGS are passed to the ompi configure command line, pmix1xx
configure will not use the correct ones previously passed in the environment
see discussion started at http://www.open-mpi.org/community/lists/devel/2015/10/18159.php
Thanks Siegmar Gross for bringing this to our attention
2015-10-22 10:13:52 +09:00
Nathan Hjelm
9602484568
Merge pull request #1040 from hjelmn/mtl_priority
...
Change how cm's priority is calculated
2015-10-19 14:18:36 -06:00
Nathan Hjelm
8b5810f7f7
mca/base: add priority output to mca_base_select
...
The mca_base_select function uses returned priorities to select the
best component/module. This priority may be of use to the caller so
pass that information back in an optional argument. If the priority is
not needed pass NULL.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:41 -06:00
Ralph Castain
363f62a506
Fix singleton operations when running under a SLURM allocation. Sadly, SLURM's PMI will return success even if the PMI server isn't actually available. This leads to erroneous selection of pmix and ess components. So add a further requirement (namely, that we see a job_step envar) to the SLURM pmix components along with some modification of ess selection code to avoid the problem
2015-10-17 20:24:03 -07:00
annu13
cc5e1e26a5
sync with pmix master (repo_rev git69c398e)
2015-10-09 15:17:43 -07:00