Jeff Squyres
72f41d4490
pmix: replace all tabs with spaces
...
No code or logic changes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:08:33 -04:00
Jeff Squyres
1c32742c66
pmix_ext20: fix syntax error
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:04:12 -04:00
Ralph Castain
99f7096031
Fix permissions
2016-07-16 21:03:55 -07:00
Ralph Castain
d4071fbd1c
Fix dynamic operations by ensuring that we only fire the debugger release if the debugger is attached, and that the OPAL pmix key for directing events to non-default handlers matches the PMIx spelling
2016-07-16 13:20:41 -07:00
Ralph Castain
1ceb35ba5c
Fix singletons - do not include the PMIx tool URI in the environment provided to child processes
2016-07-13 17:33:34 -07:00
Ralph Castain
20a91c2baf
Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program.
...
Cleanup debug message
2016-07-13 15:28:33 -07:00
Artem Polyakov
72585a905f
opal/pmix: add blocking Fence to SLURM components.
...
Blocking fence is used in yalla del proc. Native pmix exposes this functionality.
We need to expose it for SLURM's s1/s2 components as well.
Also this commit fixes uninitialized `rc` in fencenb's of both
components.
2016-07-11 09:43:15 +03:00
Artem Polyakov
8e16f47492
Merge pull request #1688 from artpol84/fix_base64
...
Fix base64 implementation in pmix framework.
2016-07-07 10:47:50 +06:00
Gilles Gouaillardet
acda07472a
configury: revamp and re-ident sub configure.m4 after open-mpi/ompi@846360fd4c
2016-07-06 11:59:51 +09:00
Gilles Gouaillardet
846360fd4c
configury: correctly perform make distclean when {libevent,hwloc,pmix} are external components
...
Thanks Jeff for the guidance
Fixes open-mpi/ompi#1683
note:
in order to keep this commit easy to review, some AS_IF([...]) were replaced with
AS_IF([false], ...) or AS_IF_([true], ...)
these will be removed and re-idented in a subsequent commit
2016-07-06 11:57:24 +09:00
Ralph Castain
ee56d9dc1a
Shorten the session directory name as some OS's are now providing unusually long temp directory names, causing us to overflow the sockaddr field
2016-07-05 14:59:50 -07:00
Ralph Castain
7e0af3f4f0
Update pmix2x to track upstream changes
2016-07-05 11:54:22 -07:00
Gilles Gouaillardet
267821f0dd
pmix2x/pmix: fix a typo in PMIx_tool_init()
...
and remove now useless local variable i
2016-07-05 13:47:50 +09:00
Gilles Gouaillardet
efce8cc734
pmix2x/pmix: add missing include files
...
pmix cannot be built on alpine linux because of some missing includes.
uid_t and gid_t are defined in unistd.h or sys/types.h, and unistd.h
is not indirectly pulled under alpine linux, so do it manually.
Thanks N.L.K Nguyen for the report
(back-ported from upstream pmix/master@c8d55350a9 )
2016-07-05 09:03:14 +09:00
Ralph Castain
c9ada8e095
Silence Coverity warnings
2016-07-03 20:45:08 -07:00
Ralph Castain
673f82e2b6
Update the PMIx listener to avoid leaking sockets into children, and better handle race condition errors
2016-07-03 08:23:33 -07:00
Ralph Castain
6e434d6785
Add support for PMIx tool connections and queries. Initially only support a request to list all known namespaces (jobids) from ORTE, but other folks will extend that support to include additional information
...
Update to match PMIx RFC
Fix configury to point to correct libevent and hwloc locations
2016-06-29 19:19:19 -07:00
Ralph Castain
08b1438f15
Add missing PMIx range value so OPAL and PMIx align again
2016-06-22 22:03:25 -07:00
Gilles Gouaillardet
bf133c401e
pmix2x: fix a typo in dereg_event_hdlr()
...
This bug has been fixed when open-mpi/ompi@dde69e1be2 was backported into upstream pmix in pmix/master@5e5577778c
but it was not fixed in open-mpi/ompi
2016-06-22 13:45:29 +09:00
Ralph Castain
441739b5a4
Cleanup a lagging message that generates an annoying (but seemingly harmless) warning
2016-06-20 12:23:27 -07:00
Ralph Castain
0ba02821e6
Add requested key and job-level info
2016-06-19 18:22:31 -07:00
Ralph Castain
0a29f5cb77
Sigh - missed two typos
2016-06-18 20:57:53 -07:00
Ralph Castain
dd38cf1fed
Fix typo
2016-06-18 20:56:43 -07:00
Ralph Castain
dde69e1be2
Cleanup CIDs 1362763, 1362762, 1362760, 1362759, 1362758, 1362757, 1362756, 1362755, 1362754. Unsure how to resolve 1362761.
...
Fixes #1792
2016-06-18 12:28:46 -07:00
Ralph Castain
044c561cba
Roll to latest PMIx master
2016-06-16 17:30:30 -07:00
Ralph Castain
5d330d5220
Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler.
...
Add PMIx 2.0
Remove PMIx 1.1.4
Cleanup copying of component
Add missing file
Touchup a typo in the Makefile.am
Update the pmix ext114 component
Minor cleanups and resync to master
Update to latest PMIx 2.x
Update to the PMIx event notification branch latest changes
2016-06-14 13:08:41 -07:00
Ralph Castain
d58da99dbc
Shift to memcpy to avoid Solaris issues
2016-06-09 12:07:17 -07:00
Ralph Castain
8fa935534b
Abstract the strnlen function for environments that do not have it (e.g., Solaris 10)
2016-06-08 10:12:43 -07:00
Gilles Gouaillardet
b707d138fe
pmix114/pmix1_client: fix misc memory leaks
...
Fixes CID 1325146-1325149
2016-06-06 09:33:35 +09:00
Ralph Castain
ecea1e3bb5
Update to 1.1.4rc3
2016-06-01 20:56:07 -07:00
Ralph Castain
12ecf972af
Split the pmix external component into one for the 1.1.4 release, and another for the upcoming 2.0 release. Clean up the configury so the components look for a series-specific function instead of running a program.
...
NOTE: the changes for the 2.0 series are not yet in the PMIx master.
2016-06-01 14:15:24 -07:00
Ralph Castain
55923eacd3
Stealing some pieces of Josh Hursey's PR #1583 and modifying a bit, allow the opal/pmix external component to handle both PMIx 1.1.4 and PMIx 2.0 versions. Automatically detect the version of the target external library and adjust the only two APIs that changed (PMIx_Init and PMIx_Finalize)
...
Rename temp vars in .m4 to avoid conflict with Travis
2016-05-27 08:06:31 -07:00
Artem Polyakov
725eea2819
Fix base64 implementation in pmix framework.
...
In the commit 80f07b65f1
setting of '-' marker used
as the string termination sign was moved from base64 code:
from: 80f07b65f1 (diff-1b10896c267d2591dc2c08fd0542ab67L491)
to: 80f07b65f1 (diff-1b10896c267d2591dc2c08fd0542ab67R189)
However the decoding function wasn't fixed and still expects on extra
byte at the end of the encoded string which leads to data truncation
during extraction (was noticed on standalone code that was using base64
from OMPI).
2016-05-23 23:30:31 +06:00
Gilles Gouaillardet
8466a3daf3
pmix: update .gitignore
...
git ignore opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in
git rm opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in
git ignore opal/mca/pmix/pmix*/...
2016-05-23 11:58:07 +09:00
Gilles Gouaillardet
cbbdce05b1
pmix/pmix114: silence a warning
2016-05-20 09:35:26 +09:00
Ralph Castain
6f743f81b6
Update PMIx 114 to current release candidate
2016-05-19 12:55:05 -07:00
rhc54
8b534e9897
Merge pull request #1668 from rhc54/topic/slurm
...
When direct launching applications, we must allow the MPI layer to pr…
2016-05-16 12:23:19 -07:00
Howard Pritchard
1a676e5b35
pmix/cray: fix some breakage
...
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-05-16 12:45:05 -05:00
Ralph Castain
01ba861f2a
When direct launching applications, we must allow the MPI layer to progress during RTE-level barriers. Neither SLURM nor Cray provide non-blocking fence functions, so push those calls into a separate event thread (use the OPAL async thread for this purpose so we don't create another one) and let the MPI thread sping in wait_for_completion. This also restores the "lazy" completion during MPI_Finalize to minimize cpu utilization.
...
Update external as well
Revise the change: we still need the MPI_Barrier in MPI_Finalize when we use a blocking fence, but do use the "lazy" wait for completion. Replace the direct logic in MPI_Init with a cleaner macro
2016-05-14 16:37:00 -07:00
Ralph Castain
7767882346
Per user request, add some missing data and definitions:
...
OPAL_PMIX_UNIV_RANK - synonym for OPAL_PMIX_GLOBAL_RANK
OPAL_PMIX_APP_SIZE - #ranks in the application of this proc
2016-05-09 08:39:01 -07:00
Ralph Castain
8ec1891d11
Silence warning
2016-05-05 20:04:10 -07:00
Ralph Castain
08022d7af1
Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required.
2016-05-05 15:28:13 -07:00
rhc54
648043597a
Merge pull request #1612 from ggouaillardet/poc/pmix_external_configury
...
pmix/external: revamp external pmix package detection
2016-05-02 09:46:05 -07:00
Jeff Squyres
265e5b9795
Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1
...
ompi/opal/orte/oshmem/test: max hostname length cleanup
2016-05-02 09:44:18 -04:00
Gilles Gouaillardet
45f9a47d77
pmix/external: fix typo and silence a warning
2016-05-02 17:15:52 +09:00
Gilles Gouaillardet
08d91b9a03
pmix/external: revamp external pmix package detection
2016-05-02 16:23:31 +09:00
Ralph Castain
42d9d861fc
Fix minor typo in PMIx packing of pmix_app_t - thanks to Gilles for pointing it out
2016-04-29 08:55:46 -07:00
Howard Pritchard
f52dd511d4
Merge pull request #1600 from hppritcha/topic/pmix_fix_for_finalize
...
pmix/cray: set fence_nb to NULL
2016-04-28 13:50:15 -06:00
hppritcha
aa1d7b9c50
pmix/cray: set fence_nb to NULL
...
Rather than have a stub function for the pmix fence_nb
operation, just set to NULL. Causes fewer problems.
Fixes #1597
Fixes #1527
Signed-off-by: hppritcha <howardp@lanl.gov>
2016-04-28 13:48:54 -05:00
Ralph Castain
02876564d4
Silence warning of zero-byte malloc
2016-04-26 11:55:59 -07:00
Karol Mroz
e1c64e6e59
opal: standardize on max hostname length
...
Define OPAL_MAXHOSTNAMELEN to be either:
(MAXHOSTNAMELEN + 1) or
(limits.h:HOST_NAME_MAX + 1) or
(255 + 1)
For pmix code, define above using PMIX_MAXHOSTNAMELEN.
Fixup opal layer to use the new max.
Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-24 08:19:47 +02:00
Gilles Gouaillardet
d96919638f
pmix: remote autogenerated file and update .gitignore
...
removed: opal/mca/pmix/pmix114/pmix/src/include/private/autogen/config.h.in
2016-04-18 12:57:41 +09:00
Ralph Castain
b009e58d25
Roll to PMIx 1.1.4rc2 - replaces some code that was incorrectly removed in prior update
2016-04-16 18:24:24 -07:00
Ralph Castain
8ff114e668
Update to official PMIx 1.1.4rc1
2016-04-15 21:47:46 -07:00
Ralph Castain
449ec41532
Roll to PMIx 1.1.4rc1 and remove the PMIx 1.2.0 directory as the community has decided to not do that release version. This incorporates a number of bug fixes that have been identified and repaired in the PMIx and OMPI code bases. Also includes several minor corrections to the PMIx code so it now supports run-thru without hanging on collectives involving a process that exits
2016-04-15 10:11:11 -07:00
Ralph Castain
2432daf065
Some minor cleanups of a memory leak and error output
2016-04-08 07:46:18 -07:00
Rainer Keller
52080a5736
As per the pull request to pmix/master:
...
https://github.com/pmix/master/pull/71
Have OMPI's current version of pmix120 nicely fail in case of
too long sun_path (longer than 108 or in case of OSX 103 chars).
And have OMPI return proper error messages with hints how to
amend.
2016-04-07 22:12:53 +02:00
Gilles Gouaillardet
6f450630d8
pmix/external: fix misc missing conversion and type issues
2016-04-04 10:12:34 +09:00
Gilles Gouaillardet
2ede47c462
pmix: fix misc missing conversion and type issues
2016-04-04 10:12:34 +09:00
Nysal Jan K.A
75233573d1
pmix: Increment the reference count in PMIx_Init
...
The reference counting was broken which led PMIx_Finalize
to release resources early. This fixes the "use after free" scenarios
that I encountered.
(based on commit pmix/master@abfaa4c )
2016-03-27 04:11:25 -04:00
Josh Hursey
099170bb31
Merge pull request #1496 from jjhursey/topic/pmix120-obj-patch
...
pmix/pmix120: Fix OBJ_ to PMIX_ symbol name
2016-03-25 19:35:43 -05:00
Ralph Castain
0b4310b186
Remove an unnecessary header that forced exposure of the PMIx internal headers
2016-03-25 16:57:41 -07:00
Ralph Castain
e8246e079b
Minor cleanup to match the changes in the PMIx master
2016-03-25 15:12:41 -07:00
Joshua Hursey
8ebeaa5861
pmix/pmix120: Fix OBJ_ to PMIX_ symbol name
2016-03-25 16:17:08 -05:00
Ralph Castain
af1444b6e1
Cleanup a debug statement. Plug a memory leak
2016-03-08 18:27:55 -08:00
Ralph Castain
bac6290b22
Ensure the process name is positive when using direct launch
...
Fixes #1425
2016-03-08 08:31:05 -08:00
Ralph Castain
b57a191ccc
Update the external client to the new PMIx init/finalize signatures
2016-03-03 20:50:20 -08:00
Ralph Castain
4a55fba414
Fix registration of error handlers thru the pmix120 component. A thread-shift operation was hanging on the sync_event_base, which made it dependent on someone calling opal_progress. Unfortunately, a process in "sleep" or spinning outside the MPI library won't do that, and so we never complete errhandler registration.
2016-03-02 15:01:01 -08:00
Ralph Castain
011403c04a
Fix a number of issues, some of which have lingered for a long time:
...
* provide a more reliable way of determining that a process is a singleton by leveraging the schizo framework. Add new components for slurm, alps, and orte to detect when we are in a managed environment, and if we have been launched by mpirun or a native launcher. Set the correct envars to control ess and pmix selection in each case.
* change the relative priority of the pmix120 and pmix112 components to make pmix120 the default
* fix singleton comm-spawn by correctly setting the num_apps field of the orte_job_t created by the daemon - this fixes a segfault in register_nspace on newly created daemons
* ensure orterun doesn't propagate any ess or pmix directives in its environment
* Cleanup a few valgrind issues and memory leaks
* Fix a race condition that prevented the client from completing notification registrations (missing thread shift)
* Ensure the shizo/alps component detects launch by mpirun
2016-03-01 06:53:00 -08:00
Ralph Castain
d28d3ee901
Make the error message on external pmix library a little clearer by separating out the libevent from the libhwloc checks
2016-02-24 11:20:25 -06:00
Ralph Castain
d653cf2847
Convert the orte_job_data pointer array to a hash table so it doesn't grow forever as we run lots and lots of jobs in the persistent DVM.
2016-02-21 11:55:49 -08:00
Ralph Castain
8c92a179c0
Minor memory leak
2016-02-19 15:05:39 -08:00
Ralph Castain
6e68d758b9
Cleanup some valgrind complaints about jumps with uninitialized values. Fix a few IOF issues reported by Mark Santcroos when submitting jobs from tools. Add the ability to pass directives to the --output-filename option that tell ORTE to (a) not include the jobid in the path to the output files, and (b) not to copy the output to the tool (i.e., just store it in the files).
...
ck
Remove stale debug
Fix a segfault if no subscribers are present
2016-02-18 16:30:37 -08:00
Ralph Castain
60a7bc2e50
Enable the PMIx notification callback system. This currently is only supported by the pmix120 component, which is not selected by default. All other components will ignore error registration requests, and thus do not support debugger attach when launched via mpirun. Note that direct launched applications will support such attachment, but may not do so in a scalable fashion.
...
Fixes ##1225
2016-02-18 09:29:12 -08:00
rhc54
2745610eb7
Merge pull request #1377 from rhc54/topic/pmix
...
Plug a leak in the PMIx subsystem
2016-02-17 20:05:45 -08:00
Ralph Castain
efb0eff43e
Plug a leak in the PMIx subsystem
2016-02-17 19:00:36 -08:00
Ralph Castain
8f9508cace
Further enhance the support for Singularity containers. Extend the "personality" command-line option to allow specifying both model (e.g., "ompi") and container (e.g., "singularity"), and add the necessary logic to support multiple options. Add a new pmix "isolated" component to handle singletons where no HNP is available since containers cannot launch the HNP.
2016-02-17 13:33:06 -08:00
Gilles Gouaillardet
f5a53b5f1e
pmix: fix Makefile.am to correctly exclude autogenerated file from tarball
...
(back-ported from pmix/master@73daf58ee5 )
2016-01-28 11:42:03 +09:00
Gilles Gouaillardet
15e26da1e1
pmix configury: add missing PMIX_CHECK_ICC_VARARGS function
...
Thanks Paul Hargrove for the report
(back-ported from pmix/master@7b16e914bf )
2016-01-26 10:57:16 +09:00
rhc54
b172b8599b
Merge pull request #1285 from ggouaillardet/topic/pmix_dist_fix
...
pmix: do not include automatically generated include/private/autogen/…
2016-01-16 20:49:41 -08:00
Gilles Gouaillardet
1d38430e43
opal: replace opal_convert_jobid_to_string with opal_snprintf_jobid
2016-01-14 10:39:03 +09:00
Gilles Gouaillardet
955fe85cb6
pmix/pmix120: add missing include file
2016-01-12 11:35:32 +09:00
Gilles Gouaillardet
73daf58ee5
pmix: do not include automatically generated include/private/autogen/config.h into dist tarball
...
Thanks Siegmar Gross for the initial report of this issue
2016-01-08 13:18:15 +09:00
Nysal Jan K.A
13f9bb9202
Use PMI2 constants for consistency
2016-01-07 11:46:22 +05:30
Jeff Squyres
e4bdad09c1
pmix: remove extra wrapper LIBS
...
These extra libs are now no longer necessary.
Fixes open-ompi/ompi#1281 .
2016-01-05 12:09:53 -08:00
Ralph Castain
0a6b8d2c14
Correctly handle connection terminations during finalize so mpirun doesn't hang. Cleanup some corner cases in the error notification system
2015-12-30 07:16:43 -08:00
Ralph Castain
a04f1cd643
Silence some Coverity warnings
2015-12-29 20:37:25 -08:00
Gilles Gouaillardet
0ca1ee5156
configury: misc pmix120 fixes
2015-12-28 23:17:41 +09:00
Gilles Gouaillardet
3300d7cc00
pmix: rename pmix_munge_module
2015-12-28 23:16:27 +09:00
Ralph Castain
a5b95a0939
Continue work on error notification system
2015-12-28 23:15:59 +09:00
Ralph Castain
810f2446b7
Add pmix120 component, update the error handling functions in the PMIx API.
...
Update the configure logic for the new pmix120 component
ckpt
Get the pmix120 component to work - still not really registering or handling notifications, but infrastructure now operates
Cleanup some of the symbol scopes, and provide a more comprehensive rename.h file. Will pretty it up later - let's see how this works
Cleanup the rename files to use the pretty macros
2015-12-28 23:15:44 +09:00
Gilles Gouaillardet
c757c5c612
pmix/external: Fix error handler usage
2015-12-28 23:15:17 +09:00
Gilles Gouaillardet
1157329732
configury: misc pmix112 fixes
2015-12-28 23:15:16 +09:00
Gilles Gouaillardet
d416c7fd8a
pmix/external: no more circular dependencies if not building shared DSO
2015-12-28 23:14:03 +09:00
Gilles Gouaillardet
f0e3e16f49
pmix/base: add missing #include <unistd.h>
...
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:52 +09:00
rhc54
978c54880d
Merge pull request #1238 from rhc54/topic/cleanup
...
Cleanup warnings in opal and orte layers when building optimized on Mac
2015-12-17 09:37:48 -08:00
Ralph Castain
64b695669a
Cleanup warnings in opal and orte layers when building optimized on Mac
2015-12-17 07:51:24 -08:00
Gilles Gouaillardet
75d16cfb27
Fix a few places where opal/util/argv.h were required when building pmix components (go figure)
2015-12-17 16:19:25 +09:00
Ralph Castain
3a56f0d34b
Create the pmix external component. Fix a few places where opal/util/argv.h were required when building with an external pmix (go figure).
...
NOTE: Building with external pmix *requires* that you also build with external libevent and hwloc libraries. Detect this at configure and error out with large message if this requirement is violated.
Closes #1204 (replaces it)
Fixes #1064
2015-12-15 15:26:13 -08:00
Jeff Squyres
7977fa3f0b
pmix112 config.h.in: remove generated file
2015-12-13 06:46:55 -08:00
Ralph Castain
03eb1a80bf
Update the PMIx native component to release v1.1.1, with addition of one bug-fix commit beyond the official release
...
Rename the pmix1xx component to pmix111 so it reflects the actual release it includes
Resolve the problem of PMIx being passed a bogus --with-platform argument when configuring the PMIx tarball code. There is no reason we should be passing --with-platform arguments to any internal subdirectory, so just leave that out when constructing the opal_subdir_args variable.
Update the PMIx code and continue attempting to debug direct modex
Fix a problem in the ORTE PMIx server - there was an early intent to optimize the direct modex by fetching data for all procs from the target job on the remote node, instead of fetching the data one proc at a time. However, this was never completely implemented, and so we would hang if we had multiple overlapping requests for data from more than one proc on the node.
Update PMIx to v1.1.2
2015-12-12 18:46:38 -08:00
Howard Pritchard
fecb326256
pmix/cray: fix locality bug
...
There was a bug with the way the cray pmix component
was setting the locality property for ranks on the
same node, etc.
Improve location/syntax of a comment block.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-12-08 11:13:48 -08:00
Ralph Castain
9803d69d02
Ensure the embedded PMIx respects an OMPI-level --disable-debug
2015-12-01 08:00:24 -08:00
Ralph Castain
52ea538bc1
Per fix from Nysal: set the listener_active flag before starting the progress thread, and declare the flag to be volatile
2015-11-09 09:00:59 -08:00
Ralph Castain
fed28e4cfc
Add missing file that was previously ignored
2015-11-06 14:37:09 -08:00
Ralph Castain
5f446570d8
Work on cleaning up memory leaks that are causing orte-dvm to eventually run out of memory. Still don't have everything plugged, but getting better. Sync to the PMIx master that includes removal of the pmix_common.h.in file that really didn't need to be generated, and update to the PMIx_server_init API.
2015-11-06 14:15:30 -08:00
Ralph Castain
bfdf08ae86
Fix intercomm_create by ensuring that both sides know how to translate jobid to/from nspace
...
Return something just to ensure that pack is happy
2015-11-06 02:19:45 -08:00
Ralph Castain
206e9a011e
Add a couple of missing translations to/from PMIx internal and OPAL error constants
2015-10-29 12:33:02 -07:00
Ralph Castain
8ad9b450c4
Silence Coverity warning
2015-10-28 20:10:28 -07:00
George Bosilca
c9d0fffab3
Add a missing include.
2015-10-28 00:50:58 -04:00
Ralph Castain
267ca8fcd3
Cleanup the PMIx direct modex support. Add an MCA parameter pmix_base_async_modex that will cause the async modex to be used when set to 1. Default it to 0 for now
...
to continue current default behavior.
Also add an MCA param pmix_base_collect_data to direct that the blocking fence shall return all data to each process. Obviously, this param has no effect if async_
modex is used.
2015-10-27 17:31:56 -07:00
George Bosilca
6c28f114f1
Silence a warning regarding the format str for snprintf.
2015-10-24 15:24:40 -04:00
Jeff Squyres
b43fcb7695
Merge pull request #1028 from ggouaillardet/poc/pmix1xx_configury
...
pmix1xx configury: invoke sub-configure with CFLAGS and CPPFLAGS on t…
2015-10-24 13:19:33 -04:00
Ralph Castain
4c12022a50
Silence a couple of warnings from valgrind and compilers. Since some pmix components may return success with a NULL value from a "get", check for that situation before attempting to unload the data. Preset the hostname before calling modex_recv to get it so unload properly checks for NULL. Cast a returned value to the correct ompi_proc_t pointer
2015-10-22 20:56:02 -07:00
Gilles Gouaillardet
0221f59197
pmix1xx configury: invoke sub-configure with CFLAGS and CPPFLAGS on the command line
...
if CFLAGS and/or CPPFLAGS are passed to the ompi configure command line, pmix1xx
configure will not use the correct ones previously passed in the environment
see discussion started at http://www.open-mpi.org/community/lists/devel/2015/10/18159.php
Thanks Siegmar Gross for bringing this to our attention
2015-10-22 10:13:52 +09:00
Nathan Hjelm
9602484568
Merge pull request #1040 from hjelmn/mtl_priority
...
Change how cm's priority is calculated
2015-10-19 14:18:36 -06:00
Nathan Hjelm
8b5810f7f7
mca/base: add priority output to mca_base_select
...
The mca_base_select function uses returned priorities to select the
best component/module. This priority may be of use to the caller so
pass that information back in an optional argument. If the priority is
not needed pass NULL.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:41 -06:00
Ralph Castain
363f62a506
Fix singleton operations when running under a SLURM allocation. Sadly, SLURM's PMI will return success even if the PMI server isn't actually available. This leads to erroneous selection of pmix and ess components. So add a further requirement (namely, that we see a job_step envar) to the SLURM pmix components along with some modification of ess selection code to avoid the problem
2015-10-17 20:24:03 -07:00
annu13
cc5e1e26a5
sync with pmix master (repo_rev git69c398e)
2015-10-09 15:17:43 -07:00
annu13
5787e9248f
cleaned up debug stmts
2015-10-06 06:25:36 -07:00
annu13
30ba00e05d
sync with master
2015-10-06 06:04:54 -07:00
annu13
6f37c0e3e8
sync with PMIX master
2015-10-02 17:25:48 -07:00
annu13
7434c47626
sync with PMIX master
2015-10-02 17:17:48 -07:00
Ralph Castain
8f6855459d
Cleanup some coverity warnings
2015-09-30 10:33:53 -07:00
Ralph Castain
ec5d001538
Don't set the return value pointer to NULL as it actually is required to point to real storage - just return an error code if a modex recv doesn't succeed.
2015-09-28 20:45:50 -07:00
Ralph Castain
a4a3dfd480
Cleanup the code a bit by simply adding our nspace to the top of the list of jobid <-> nspace correlations. Add two new APIs to opal_pmix for registering new jobid/nspace pairs and retrieving an nspace given a jobid - these are required to support connect/accept. No impact on the PMIx library.
2015-09-28 08:50:13 -07:00
Ralph Castain
f713e71d51
Minor cleanup - add jobid <-> nspace in one more place
2015-09-27 14:48:39 -07:00
Ralph Castain
fad5638596
Resolve the naming issue when direct-launched by PMIx-enabled RMs using a minimal-impact approach. Detect if we were launched via ORTE - if so, then use our standard methods for computing the jobid. If not, then just hash the nspace to create the jobid, and track the jobid <-> nspace correspondece down in the opal/mca/pmix/pmix1xx component. We then do the translation any time a function that passes process names is invoked.
2015-09-27 09:57:59 -07:00
Ralph Castain
209600fe26
Sync to PMIx master
2015-09-23 21:00:30 -07:00
Ralph Castain
749bd4e6fe
Plug a few memory leaks identified by valgrind
2015-09-23 15:21:04 -07:00
Nathan Hjelm
8c4da756cf
pmix: do not touch recently freed memory
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 08:44:50 -06:00
Ralph Castain
4c654ffd94
Sync to PMIx master
2015-09-21 21:27:06 -07:00
Ralph Castain
1b7930ad52
Silence some warnings and address Coverity issues
2015-09-16 07:58:22 -07:00
Ralph Castain
c1bbbb5e2f
Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds
2015-09-15 13:08:35 -07:00
Ralph Castain
22d7c0081a
Fix the no-disconnect test by resolving a segfault on free - opal_dss.unload will return the remaining unpacked portion of a buffer. As such, it cannot return the pointer to that info as it might be partway inside of a malloc'd region. So copy the data out of the buffer.
2015-09-11 13:01:35 -07:00
Ralph Castain
dc5796b8a1
Revert "Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local""
...
Fix the locality computation by correctly computing the vpid of the local peer
This reverts commit open-mpi/ompi@6a8fad49e5 .
2015-09-11 08:29:51 -07:00
Ralph Castain
6a8fad49e5
Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local"
...
This reverts commit f94f3cda21
.
2015-09-11 02:01:25 -07:00
Ralph Castain
e0a52354d4
Sync to PMIx master at open-mpi/pmix@89680d6663
...
Includes changes to support BigEndian machines
2015-09-10 20:47:40 -07:00
Ralph Castain
a2a15cea8a
Fix the s1 component so direct launch is supported for SLURM
2015-09-10 16:07:37 -07:00
rhc54
3430f154fc
Merge pull request #885 from hppritcha/topic/pmix_not_pmix1xx_u16_prob
...
pmix/~pmix1xx: use u32 for OPAL_PMIX_LOCAL_SIZE
2015-09-10 15:38:54 -07:00
Howard Pritchard
2bbf22e2d0
pmix/~pmix1xx: use u32 for OPAL_PMIX_LOCAL_SIZE
...
Looks like in ess_pmi_module.c u32 is being used
for retrieving OPAL_PMIX_LOCAL_SIZE, while s1/s2/cray
pmix components were storing as u16.
This commit fixes this problem.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-09-10 11:41:39 -07:00
Ralph Castain
f94f3cda21
Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local
2015-09-10 10:25:30 -07:00
Ralph Castain
4c47c498ac
Sync to latest PMIx master
...
Allow the blocking send and recv to keep trying
2015-09-09 11:48:47 -07:00
Gilles Gouaillardet
7f0ed74d24
pmix1xx: fix CPPFLAGS when DSO are not built
2015-09-09 14:20:12 +09:00
rhc54
f6b6b9a9ca
Merge pull request #877 from rhc54/topic/s1s2
...
Cleanup s1 and s2 components
2015-09-08 19:20:59 -07:00
Ralph Castain
1cdb86b8c7
Cleanup s1 and s2 components, and ensure that mpirun and orteds only use non-direct-launch pmix components.
2015-09-08 18:37:09 -07:00
rhc54
3a446c9797
Merge pull request #876 from rhc54/topic/hnp
...
Fix segfault upon job error
2015-09-08 15:10:51 -07:00
rhc54
47f437608d
Merge pull request #875 from rhc54/topic/dynamics
...
Stop a segfault in the test by correctly passing all the argv during spawn
2015-09-08 14:35:42 -07:00
Ralph Castain
459f169e06
Fix segfault upon job error
...
Silence some unnecessary error-logs
2015-09-08 14:03:06 -07:00
Ralph Castain
ae7156cabb
Stop a segfault in the test by correctly passing all the argv during spawn
2015-09-08 13:42:46 -07:00