1
1

264 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
b65bbe017f pmix/cray: switch to using wildcards for some
items so that at least srun native launch on
cray works again.

More issues to fix when using alps.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-07-26 17:07:58 -05:00
Ralph Castain
71de03fc67 Cleanup the new naming requirements to ensure that info is correctly retrieved
Cleanup permissions

Restore singleton operations
2016-07-21 09:46:03 -07:00
Ralph Castain
2b55ee8118 Cleanup Coverity warnings 2016-07-20 20:31:58 -07:00
Ralph Castain
01a653d50a Remove a debug print in comm_cid.c. Update PMIx2 to include the revised PMIx_Get logic for higher performance by reducing the number of hash table lookups. Fix a bug where requests for data from a proc in another nspace could hang, or result in "not found".
Remove stale file reference

Restore autogen pass thru pmix

Remove generated file
2016-07-20 00:58:19 -07:00
Nathan Hjelm
03bce91de8 pmix/pmix2x: add missing increment in loop
This commit fixes a bug in the pmix2x client code where a loop
variable is not correctly incremented. This was leading to hangs and
crashes when creating intercommunicators. Also fixed two double
increments in other loops.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-18 10:35:05 -06:00
Jeff Squyres
72f41d4490 pmix: replace all tabs with spaces
No code or logic changes

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:08:33 -04:00
Jeff Squyres
1c32742c66 pmix_ext20: fix syntax error
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:04:12 -04:00
Ralph Castain
99f7096031 Fix permissions 2016-07-16 21:03:55 -07:00
Ralph Castain
d4071fbd1c Fix dynamic operations by ensuring that we only fire the debugger release if the debugger is attached, and that the OPAL pmix key for directing events to non-default handlers matches the PMIx spelling 2016-07-16 13:20:41 -07:00
Ralph Castain
1ceb35ba5c Fix singletons - do not include the PMIx tool URI in the environment provided to child processes 2016-07-13 17:33:34 -07:00
Ralph Castain
20a91c2baf Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program.
Cleanup debug message
2016-07-13 15:28:33 -07:00
Artem Polyakov
72585a905f opal/pmix: add blocking Fence to SLURM components.
Blocking fence is used in yalla del proc. Native pmix exposes this functionality.
We need to expose it for SLURM's s1/s2 components as well.

Also this commit fixes uninitialized `rc` in fencenb's of both
components.
2016-07-11 09:43:15 +03:00
Artem Polyakov
8e16f47492 Merge pull request #1688 from artpol84/fix_base64
Fix base64 implementation in pmix framework.
2016-07-07 10:47:50 +06:00
Gilles Gouaillardet
acda07472a configury: revamp and re-ident sub configure.m4 after open-mpi/ompi@846360fd4c 2016-07-06 11:59:51 +09:00
Gilles Gouaillardet
846360fd4c configury: correctly perform make distclean when {libevent,hwloc,pmix} are external components
Thanks Jeff for the guidance

Fixes open-mpi/ompi#1683

note:
in order to keep this commit easy to review, some AS_IF([...]) were replaced with
AS_IF([false], ...) or AS_IF_([true], ...)
these will be removed and re-idented in a subsequent commit
2016-07-06 11:57:24 +09:00
Ralph Castain
ee56d9dc1a Shorten the session directory name as some OS's are now providing unusually long temp directory names, causing us to overflow the sockaddr field 2016-07-05 14:59:50 -07:00
Ralph Castain
7e0af3f4f0 Update pmix2x to track upstream changes 2016-07-05 11:54:22 -07:00
Gilles Gouaillardet
267821f0dd pmix2x/pmix: fix a typo in PMIx_tool_init()
and remove now useless local variable i
2016-07-05 13:47:50 +09:00
Gilles Gouaillardet
efce8cc734 pmix2x/pmix: add missing include files
pmix cannot be built on alpine linux because of some missing includes.
uid_t and gid_t are defined in unistd.h or sys/types.h, and unistd.h
is not indirectly pulled under alpine linux, so do it manually.

Thanks N.L.K Nguyen for the report

(back-ported from upstream pmix/master@c8d55350a9)
2016-07-05 09:03:14 +09:00
Ralph Castain
c9ada8e095 Silence Coverity warnings 2016-07-03 20:45:08 -07:00
Ralph Castain
673f82e2b6 Update the PMIx listener to avoid leaking sockets into children, and better handle race condition errors 2016-07-03 08:23:33 -07:00
Ralph Castain
6e434d6785 Add support for PMIx tool connections and queries. Initially only support a request to list all known namespaces (jobids) from ORTE, but other folks will extend that support to include additional information
Update to match PMIx RFC

Fix configury to point to correct libevent and hwloc locations
2016-06-29 19:19:19 -07:00
Ralph Castain
08b1438f15 Add missing PMIx range value so OPAL and PMIx align again 2016-06-22 22:03:25 -07:00
Gilles Gouaillardet
bf133c401e pmix2x: fix a typo in dereg_event_hdlr()
This bug has been fixed when open-mpi/ompi@dde69e1be2 was backported into upstream pmix in pmix/master@5e5577778c
but it was not fixed in open-mpi/ompi
2016-06-22 13:45:29 +09:00
Ralph Castain
441739b5a4 Cleanup a lagging message that generates an annoying (but seemingly harmless) warning 2016-06-20 12:23:27 -07:00
Ralph Castain
0ba02821e6 Add requested key and job-level info 2016-06-19 18:22:31 -07:00
Ralph Castain
0a29f5cb77 Sigh - missed two typos 2016-06-18 20:57:53 -07:00
Ralph Castain
dd38cf1fed Fix typo 2016-06-18 20:56:43 -07:00
Ralph Castain
dde69e1be2 Cleanup CIDs 1362763, 1362762, 1362760, 1362759, 1362758, 1362757, 1362756, 1362755, 1362754. Unsure how to resolve 1362761.
Fixes #1792
2016-06-18 12:28:46 -07:00
Ralph Castain
044c561cba Roll to latest PMIx master 2016-06-16 17:30:30 -07:00
Ralph Castain
5d330d5220 Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler.
Add PMIx 2.0

Remove PMIx 1.1.4

Cleanup copying of component

Add missing file

Touchup a typo in the Makefile.am

Update the pmix ext114 component

Minor cleanups and resync to master

Update to latest PMIx 2.x

Update to the PMIx event notification branch latest changes
2016-06-14 13:08:41 -07:00
Ralph Castain
d58da99dbc Shift to memcpy to avoid Solaris issues 2016-06-09 12:07:17 -07:00
Ralph Castain
8fa935534b Abstract the strnlen function for environments that do not have it (e.g., Solaris 10) 2016-06-08 10:12:43 -07:00
Gilles Gouaillardet
b707d138fe pmix114/pmix1_client: fix misc memory leaks
Fixes CID 1325146-1325149
2016-06-06 09:33:35 +09:00
Ralph Castain
ecea1e3bb5 Update to 1.1.4rc3 2016-06-01 20:56:07 -07:00
Ralph Castain
12ecf972af Split the pmix external component into one for the 1.1.4 release, and another for the upcoming 2.0 release. Clean up the configury so the components look for a series-specific function instead of running a program.
NOTE: the changes for the 2.0 series are not yet in the PMIx master.
2016-06-01 14:15:24 -07:00
Ralph Castain
55923eacd3 Stealing some pieces of Josh Hursey's PR #1583 and modifying a bit, allow the opal/pmix external component to handle both PMIx 1.1.4 and PMIx 2.0 versions. Automatically detect the version of the target external library and adjust the only two APIs that changed (PMIx_Init and PMIx_Finalize)
Rename temp vars in .m4 to avoid conflict with Travis
2016-05-27 08:06:31 -07:00
Artem Polyakov
725eea2819 Fix base64 implementation in pmix framework.
In the commit 80f07b65f16e9538aca7fc5e124d2074e7e0b69e setting of '-' marker used
as the string termination sign was moved from base64 code:
from: 80f07b65f1 (diff-1b10896c267d2591dc2c08fd0542ab67L491)
to: 80f07b65f1 (diff-1b10896c267d2591dc2c08fd0542ab67R189)

However the decoding function wasn't fixed and still expects on extra
byte at the end of the encoded string which leads to data truncation
during extraction (was noticed on standalone code that was using base64
from OMPI).
2016-05-23 23:30:31 +06:00
Gilles Gouaillardet
8466a3daf3 pmix: update .gitignore
git ignore opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in
git rm opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in
git ignore opal/mca/pmix/pmix*/...
2016-05-23 11:58:07 +09:00
Gilles Gouaillardet
cbbdce05b1 pmix/pmix114: silence a warning 2016-05-20 09:35:26 +09:00
Ralph Castain
6f743f81b6 Update PMIx 114 to current release candidate 2016-05-19 12:55:05 -07:00
rhc54
8b534e9897 Merge pull request #1668 from rhc54/topic/slurm
When direct launching applications, we must allow the MPI layer to pr…
2016-05-16 12:23:19 -07:00
Howard Pritchard
1a676e5b35 pmix/cray: fix some breakage
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-05-16 12:45:05 -05:00
Ralph Castain
01ba861f2a When direct launching applications, we must allow the MPI layer to progress during RTE-level barriers. Neither SLURM nor Cray provide non-blocking fence functions, so push those calls into a separate event thread (use the OPAL async thread for this purpose so we don't create another one) and let the MPI thread sping in wait_for_completion. This also restores the "lazy" completion during MPI_Finalize to minimize cpu utilization.
Update external as well

Revise the change: we still need the MPI_Barrier in MPI_Finalize when we use a blocking fence, but do use the "lazy" wait for completion. Replace the direct logic in MPI_Init with a cleaner macro
2016-05-14 16:37:00 -07:00
Ralph Castain
7767882346 Per user request, add some missing data and definitions:
OPAL_PMIX_UNIV_RANK - synonym for OPAL_PMIX_GLOBAL_RANK
OPAL_PMIX_APP_SIZE - #ranks in the application of this proc
2016-05-09 08:39:01 -07:00
Ralph Castain
8ec1891d11 Silence warning 2016-05-05 20:04:10 -07:00
Ralph Castain
08022d7af1 Some minor cleanups of warnings from gcc 6.0.0. Update s1/s2 pmix to get max_procs as required. 2016-05-05 15:28:13 -07:00
rhc54
648043597a Merge pull request #1612 from ggouaillardet/poc/pmix_external_configury
pmix/external: revamp external pmix package detection
2016-05-02 09:46:05 -07:00
Jeff Squyres
265e5b9795 Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1
ompi/opal/orte/oshmem/test: max hostname length cleanup
2016-05-02 09:44:18 -04:00
Gilles Gouaillardet
45f9a47d77 pmix/external: fix typo and silence a warning 2016-05-02 17:15:52 +09:00