Howard Pritchard
b65bbe017f
pmix/cray: switch to using wildcards for some
...
items so that at least srun native launch on
cray works again.
More issues to fix when using alps.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-07-26 17:07:58 -05:00
Ralph Castain
71de03fc67
Cleanup the new naming requirements to ensure that info is correctly retrieved
...
Cleanup permissions
Restore singleton operations
2016-07-21 09:46:03 -07:00
Ralph Castain
2b55ee8118
Cleanup Coverity warnings
2016-07-20 20:31:58 -07:00
Ralph Castain
01a653d50a
Remove a debug print in comm_cid.c. Update PMIx2 to include the revised PMIx_Get logic for higher performance by reducing the number of hash table lookups. Fix a bug where requests for data from a proc in another nspace could hang, or result in "not found".
...
Remove stale file reference
Restore autogen pass thru pmix
Remove generated file
2016-07-20 00:58:19 -07:00
Nathan Hjelm
03bce91de8
pmix/pmix2x: add missing increment in loop
...
This commit fixes a bug in the pmix2x client code where a loop
variable is not correctly incremented. This was leading to hangs and
crashes when creating intercommunicators. Also fixed two double
increments in other loops.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-18 10:35:05 -06:00
Jeff Squyres
72f41d4490
pmix: replace all tabs with spaces
...
No code or logic changes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:08:33 -04:00
Jeff Squyres
1c32742c66
pmix_ext20: fix syntax error
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:04:12 -04:00
Ralph Castain
99f7096031
Fix permissions
2016-07-16 21:03:55 -07:00
Ralph Castain
d4071fbd1c
Fix dynamic operations by ensuring that we only fire the debugger release if the debugger is attached, and that the OPAL pmix key for directing events to non-default handlers matches the PMIx spelling
2016-07-16 13:20:41 -07:00
Ralph Castain
1ceb35ba5c
Fix singletons - do not include the PMIx tool URI in the environment provided to child processes
2016-07-13 17:33:34 -07:00
Ralph Castain
20a91c2baf
Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program.
...
Cleanup debug message
2016-07-13 15:28:33 -07:00
Artem Polyakov
72585a905f
opal/pmix: add blocking Fence to SLURM components.
...
Blocking fence is used in yalla del proc. Native pmix exposes this functionality.
We need to expose it for SLURM's s1/s2 components as well.
Also this commit fixes uninitialized `rc` in fencenb's of both
components.
2016-07-11 09:43:15 +03:00
Artem Polyakov
8e16f47492
Merge pull request #1688 from artpol84/fix_base64
...
Fix base64 implementation in pmix framework.
2016-07-07 10:47:50 +06:00
Gilles Gouaillardet
acda07472a
configury: revamp and re-ident sub configure.m4 after open-mpi/ompi@846360fd4c
2016-07-06 11:59:51 +09:00
Gilles Gouaillardet
846360fd4c
configury: correctly perform make distclean when {libevent,hwloc,pmix} are external components
...
Thanks Jeff for the guidance
Fixes open-mpi/ompi#1683
note:
in order to keep this commit easy to review, some AS_IF([...]) were replaced with
AS_IF([false], ...) or AS_IF_([true], ...)
these will be removed and re-idented in a subsequent commit
2016-07-06 11:57:24 +09:00
Ralph Castain
ee56d9dc1a
Shorten the session directory name as some OS's are now providing unusually long temp directory names, causing us to overflow the sockaddr field
2016-07-05 14:59:50 -07:00
Ralph Castain
7e0af3f4f0
Update pmix2x to track upstream changes
2016-07-05 11:54:22 -07:00
Gilles Gouaillardet
267821f0dd
pmix2x/pmix: fix a typo in PMIx_tool_init()
...
and remove now useless local variable i
2016-07-05 13:47:50 +09:00
Gilles Gouaillardet
efce8cc734
pmix2x/pmix: add missing include files
...
pmix cannot be built on alpine linux because of some missing includes.
uid_t and gid_t are defined in unistd.h or sys/types.h, and unistd.h
is not indirectly pulled under alpine linux, so do it manually.
Thanks N.L.K Nguyen for the report
(back-ported from upstream pmix/master@c8d55350a9 )
2016-07-05 09:03:14 +09:00
Ralph Castain
c9ada8e095
Silence Coverity warnings
2016-07-03 20:45:08 -07:00
Ralph Castain
673f82e2b6
Update the PMIx listener to avoid leaking sockets into children, and better handle race condition errors
2016-07-03 08:23:33 -07:00
Ralph Castain
6e434d6785
Add support for PMIx tool connections and queries. Initially only support a request to list all known namespaces (jobids) from ORTE, but other folks will extend that support to include additional information
...
Update to match PMIx RFC
Fix configury to point to correct libevent and hwloc locations
2016-06-29 19:19:19 -07:00
Ralph Castain
08b1438f15
Add missing PMIx range value so OPAL and PMIx align again
2016-06-22 22:03:25 -07:00
Gilles Gouaillardet
bf133c401e
pmix2x: fix a typo in dereg_event_hdlr()
...
This bug has been fixed when open-mpi/ompi@dde69e1be2 was backported into upstream pmix in pmix/master@5e5577778c
but it was not fixed in open-mpi/ompi
2016-06-22 13:45:29 +09:00
Ralph Castain
441739b5a4
Cleanup a lagging message that generates an annoying (but seemingly harmless) warning
2016-06-20 12:23:27 -07:00
Ralph Castain
0ba02821e6
Add requested key and job-level info
2016-06-19 18:22:31 -07:00
Ralph Castain
0a29f5cb77
Sigh - missed two typos
2016-06-18 20:57:53 -07:00
Ralph Castain
dd38cf1fed
Fix typo
2016-06-18 20:56:43 -07:00
Ralph Castain
dde69e1be2
Cleanup CIDs 1362763, 1362762, 1362760, 1362759, 1362758, 1362757, 1362756, 1362755, 1362754. Unsure how to resolve 1362761.
...
Fixes #1792
2016-06-18 12:28:46 -07:00
Ralph Castain
044c561cba
Roll to latest PMIx master
2016-06-16 17:30:30 -07:00
Ralph Castain
5d330d5220
Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler.
...
Add PMIx 2.0
Remove PMIx 1.1.4
Cleanup copying of component
Add missing file
Touchup a typo in the Makefile.am
Update the pmix ext114 component
Minor cleanups and resync to master
Update to latest PMIx 2.x
Update to the PMIx event notification branch latest changes
2016-06-14 13:08:41 -07:00
Ralph Castain
d58da99dbc
Shift to memcpy to avoid Solaris issues
2016-06-09 12:07:17 -07:00
Ralph Castain
8fa935534b
Abstract the strnlen function for environments that do not have it (e.g., Solaris 10)
2016-06-08 10:12:43 -07:00
Gilles Gouaillardet
b707d138fe
pmix114/pmix1_client: fix misc memory leaks
...
Fixes CID 1325146-1325149
2016-06-06 09:33:35 +09:00
Ralph Castain
ecea1e3bb5
Update to 1.1.4rc3
2016-06-01 20:56:07 -07:00
Ralph Castain
12ecf972af
Split the pmix external component into one for the 1.1.4 release, and another for the upcoming 2.0 release. Clean up the configury so the components look for a series-specific function instead of running a program.
...
NOTE: the changes for the 2.0 series are not yet in the PMIx master.
2016-06-01 14:15:24 -07:00
Ralph Castain
55923eacd3
Stealing some pieces of Josh Hursey's PR #1583 and modifying a bit, allow the opal/pmix external component to handle both PMIx 1.1.4 and PMIx 2.0 versions. Automatically detect the version of the target external library and adjust the only two APIs that changed (PMIx_Init and PMIx_Finalize)
...
Rename temp vars in .m4 to avoid conflict with Travis
2016-05-27 08:06:31 -07:00
Artem Polyakov
725eea2819
Fix base64 implementation in pmix framework.
...
In the commit 80f07b65f16e9538aca7fc5e124d2074e7e0b69e setting of '-' marker used
as the string termination sign was moved from base64 code:
from: 80f07b65f1 (diff-1b10896c267d2591dc2c08fd0542ab67L491)
to: 80f07b65f1 (diff-1b10896c267d2591dc2c08fd0542ab67R189)
However the decoding function wasn't fixed and still expects on extra
byte at the end of the encoded string which leads to data truncation
during extraction (was noticed on standalone code that was using base64
from OMPI).
2016-05-23 23:30:31 +06:00
Gilles Gouaillardet
8466a3daf3
pmix: update .gitignore
...
git ignore opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in
git rm opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in
git ignore opal/mca/pmix/pmix*/...
2016-05-23 11:58:07 +09:00
Gilles Gouaillardet
cbbdce05b1
pmix/pmix114: silence a warning
2016-05-20 09:35:26 +09:00
Ralph Castain
6f743f81b6
Update PMIx 114 to current release candidate
2016-05-19 12:55:05 -07:00
rhc54
8b534e9897
Merge pull request #1668 from rhc54/topic/slurm
...
When direct launching applications, we must allow the MPI layer to pr…
2016-05-16 12:23:19 -07:00
Howard Pritchard
1a676e5b35
pmix/cray: fix some breakage
...
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-05-16 12:45:05 -05:00
Ralph Castain
01ba861f2a
When direct launching applications, we must allow the MPI layer to progress during RTE-level barriers. Neither SLURM nor Cray provide non-blocking fence functions, so push those calls into a separate event thread (use the OPAL async thread for this purpose so we don't create another one) and let the MPI thread sping in wait_for_completion. This also restores the "lazy" completion during MPI_Finalize to minimize cpu utilization.
...
Update external as well
Revise the change: we still need the MPI_Barrier in MPI_Finalize when we use a blocking fence, but do use the "lazy" wait for completion. Replace the direct logic in MPI_Init with a cleaner macro
2016-05-14 16:37:00 -07:00
Ralph Castain
7767882346
Per user request, add some missing data and definitions:
...
OPAL_PMIX_UNIV_RANK - synonym for OPAL_PMIX_GLOBAL_RANK
OPAL_PMIX_APP_SIZE - #ranks in the application of this proc
2016-05-09 08:39:01 -07:00
Ralph Castain
8ec1891d11
Silence warning
2016-05-05 20:04:10 -07:00
Ralph Castain
08022d7af1
Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required.
2016-05-05 15:28:13 -07:00
rhc54
648043597a
Merge pull request #1612 from ggouaillardet/poc/pmix_external_configury
...
pmix/external: revamp external pmix package detection
2016-05-02 09:46:05 -07:00
Jeff Squyres
265e5b9795
Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1
...
ompi/opal/orte/oshmem/test: max hostname length cleanup
2016-05-02 09:44:18 -04:00
Gilles Gouaillardet
45f9a47d77
pmix/external: fix typo and silence a warning
2016-05-02 17:15:52 +09:00