1
1

465 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
f0e3e16f49 pmix/base: add missing #include <unistd.h>
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:52 +09:00
rhc54
978c54880d Merge pull request #1238 from rhc54/topic/cleanup
Cleanup warnings in opal and orte layers when building optimized on Mac
2015-12-17 09:37:48 -08:00
Ralph Castain
64b695669a Cleanup warnings in opal and orte layers when building optimized on Mac 2015-12-17 07:51:24 -08:00
Gilles Gouaillardet
75d16cfb27 Fix a few places where opal/util/argv.h were required when building pmix components (go figure) 2015-12-17 16:19:25 +09:00
Ralph Castain
3a56f0d34b Create the pmix external component. Fix a few places where opal/util/argv.h were required when building with an external pmix (go figure).
NOTE: Building with external pmix *requires* that you also build with external libevent and hwloc libraries. Detect this at configure and error out with large message if this requirement is violated.

Closes #1204  (replaces it)
Fixes #1064
2015-12-15 15:26:13 -08:00
Jeff Squyres
7977fa3f0b pmix112 config.h.in: remove generated file 2015-12-13 06:46:55 -08:00
Ralph Castain
03eb1a80bf Update the PMIx native component to release v1.1.1, with addition of one bug-fix commit beyond the official release
Rename the pmix1xx component to pmix111 so it reflects the actual release it includes

Resolve the problem of PMIx being passed a bogus --with-platform argument when configuring the PMIx tarball code. There is no reason we should be passing --with-platform arguments to any internal subdirectory, so just leave that out when constructing the opal_subdir_args variable.

Update the PMIx code and continue attempting to debug direct modex

Fix a problem in the ORTE PMIx server - there was an early intent to optimize the direct modex by fetching data for all procs from the target job on the remote node, instead of fetching the data one proc at a time. However, this was never completely implemented, and so we would hang if we had multiple overlapping requests for data from more than one proc on the node.

Update PMIx to v1.1.2
2015-12-12 18:46:38 -08:00
Howard Pritchard
fecb326256 pmix/cray: fix locality bug
There was a bug with the way the cray pmix component
was setting the locality property for ranks on the
same node, etc.

Improve location/syntax of a comment block.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-12-08 11:13:48 -08:00
Ralph Castain
9803d69d02 Ensure the embedded PMIx respects an OMPI-level --disable-debug 2015-12-01 08:00:24 -08:00
Ralph Castain
52ea538bc1 Per fix from Nysal: set the listener_active flag before starting the progress thread, and declare the flag to be volatile 2015-11-09 09:00:59 -08:00
Ralph Castain
fed28e4cfc Add missing file that was previously ignored 2015-11-06 14:37:09 -08:00
Ralph Castain
5f446570d8 Work on cleaning up memory leaks that are causing orte-dvm to eventually run out of memory. Still don't have everything plugged, but getting better. Sync to the PMIx master that includes removal of the pmix_common.h.in file that really didn't need to be generated, and update to the PMIx_server_init API. 2015-11-06 14:15:30 -08:00
Ralph Castain
bfdf08ae86 Fix intercomm_create by ensuring that both sides know how to translate jobid to/from nspace
Return something just to ensure that pack is happy
2015-11-06 02:19:45 -08:00
Ralph Castain
206e9a011e Add a couple of missing translations to/from PMIx internal and OPAL error constants 2015-10-29 12:33:02 -07:00
Ralph Castain
8ad9b450c4 Silence Coverity warning 2015-10-28 20:10:28 -07:00
George Bosilca
c9d0fffab3 Add a missing include. 2015-10-28 00:50:58 -04:00
Ralph Castain
267ca8fcd3 Cleanup the PMIx direct modex support. Add an MCA parameter pmix_base_async_modex that will cause the async modex to be used when set to 1. Default it to 0 for now
to continue current default behavior.

Also add an MCA param pmix_base_collect_data to direct that the blocking fence shall return all data to each process. Obviously, this param has no effect if async_
modex is used.
2015-10-27 17:31:56 -07:00
George Bosilca
6c28f114f1 Silence a warning regarding the format str for snprintf. 2015-10-24 15:24:40 -04:00
Jeff Squyres
b43fcb7695 Merge pull request #1028 from ggouaillardet/poc/pmix1xx_configury
pmix1xx configury: invoke sub-configure with CFLAGS and CPPFLAGS on t…
2015-10-24 13:19:33 -04:00
Ralph Castain
4c12022a50 Silence a couple of warnings from valgrind and compilers. Since some pmix components may return success with a NULL value from a "get", check for that situation before attempting to unload the data. Preset the hostname before calling modex_recv to get it so unload properly checks for NULL. Cast a returned value to the correct ompi_proc_t pointer 2015-10-22 20:56:02 -07:00
Gilles Gouaillardet
0221f59197 pmix1xx configury: invoke sub-configure with CFLAGS and CPPFLAGS on the command line
if CFLAGS and/or CPPFLAGS are passed to the ompi configure command line, pmix1xx
configure will not use the correct ones previously passed in the environment
see discussion started at http://www.open-mpi.org/community/lists/devel/2015/10/18159.php
Thanks Siegmar Gross for bringing this to our attention
2015-10-22 10:13:52 +09:00
Nathan Hjelm
9602484568 Merge pull request #1040 from hjelmn/mtl_priority
Change how cm's priority is calculated
2015-10-19 14:18:36 -06:00
Nathan Hjelm
8b5810f7f7 mca/base: add priority output to mca_base_select
The mca_base_select function uses returned priorities to select the
best component/module. This priority may be of use to the caller so
pass that information back in an optional argument. If the priority is
not needed pass NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:41 -06:00
Ralph Castain
363f62a506 Fix singleton operations when running under a SLURM allocation. Sadly, SLURM's PMI will return success even if the PMI server isn't actually available. This leads to erroneous selection of pmix and ess components. So add a further requirement (namely, that we see a job_step envar) to the SLURM pmix components along with some modification of ess selection code to avoid the problem 2015-10-17 20:24:03 -07:00
annu13
cc5e1e26a5 sync with pmix master (repo_rev git69c398e) 2015-10-09 15:17:43 -07:00
annu13
5787e9248f cleaned up debug stmts 2015-10-06 06:25:36 -07:00
annu13
30ba00e05d sync with master 2015-10-06 06:04:54 -07:00
annu13
6f37c0e3e8 sync with PMIX master 2015-10-02 17:25:48 -07:00
annu13
7434c47626 sync with PMIX master 2015-10-02 17:17:48 -07:00
Ralph Castain
8f6855459d Cleanup some coverity warnings 2015-09-30 10:33:53 -07:00
Ralph Castain
ec5d001538 Don't set the return value pointer to NULL as it actually is required to point to real storage - just return an error code if a modex recv doesn't succeed. 2015-09-28 20:45:50 -07:00
Ralph Castain
a4a3dfd480 Cleanup the code a bit by simply adding our nspace to the top of the list of jobid <-> nspace correlations. Add two new APIs to opal_pmix for registering new jobid/nspace pairs and retrieving an nspace given a jobid - these are required to support connect/accept. No impact on the PMIx library. 2015-09-28 08:50:13 -07:00
Ralph Castain
f713e71d51 Minor cleanup - add jobid <-> nspace in one more place 2015-09-27 14:48:39 -07:00
Ralph Castain
fad5638596 Resolve the naming issue when direct-launched by PMIx-enabled RMs using a minimal-impact approach. Detect if we were launched via ORTE - if so, then use our standard methods for computing the jobid. If not, then just hash the nspace to create the jobid, and track the jobid <-> nspace correspondece down in the opal/mca/pmix/pmix1xx component. We then do the translation any time a function that passes process names is invoked. 2015-09-27 09:57:59 -07:00
Ralph Castain
209600fe26 Sync to PMIx master 2015-09-23 21:00:30 -07:00
Ralph Castain
749bd4e6fe Plug a few memory leaks identified by valgrind 2015-09-23 15:21:04 -07:00
Nathan Hjelm
8c4da756cf pmix: do not touch recently freed memory
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 08:44:50 -06:00
Ralph Castain
4c654ffd94 Sync to PMIx master 2015-09-21 21:27:06 -07:00
Ralph Castain
1b7930ad52 Silence some warnings and address Coverity issues 2015-09-16 07:58:22 -07:00
Ralph Castain
c1bbbb5e2f Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds 2015-09-15 13:08:35 -07:00
Ralph Castain
22d7c0081a Fix the no-disconnect test by resolving a segfault on free - opal_dss.unload will return the remaining unpacked portion of a buffer. As such, it cannot return the pointer to that info as it might be partway inside of a malloc'd region. So copy the data out of the buffer. 2015-09-11 13:01:35 -07:00
Ralph Castain
dc5796b8a1 Revert "Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local""
Fix the locality computation by correctly computing the vpid of the local peer

This reverts commit open-mpi/ompi@6a8fad49e5.
2015-09-11 08:29:51 -07:00
Ralph Castain
6a8fad49e5 Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local"
This reverts commit f94f3cda214ab937c46802896fb53b84bec6cc3a.
2015-09-11 02:01:25 -07:00
Ralph Castain
e0a52354d4 Sync to PMIx master at open-mpi/pmix@89680d6663
Includes changes to support BigEndian machines
2015-09-10 20:47:40 -07:00
Ralph Castain
a2a15cea8a Fix the s1 component so direct launch is supported for SLURM 2015-09-10 16:07:37 -07:00
rhc54
3430f154fc Merge pull request #885 from hppritcha/topic/pmix_not_pmix1xx_u16_prob
pmix/~pmix1xx: use u32 for OPAL_PMIX_LOCAL_SIZE
2015-09-10 15:38:54 -07:00
Howard Pritchard
2bbf22e2d0 pmix/~pmix1xx: use u32 for OPAL_PMIX_LOCAL_SIZE
Looks like in ess_pmi_module.c u32 is being used
for retrieving OPAL_PMIX_LOCAL_SIZE, while s1/s2/cray
pmix components were storing as u16.

This commit fixes this problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-09-10 11:41:39 -07:00
Ralph Castain
f94f3cda21 Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local 2015-09-10 10:25:30 -07:00
Ralph Castain
4c47c498ac Sync to latest PMIx master
Allow the blocking send and recv to keep trying
2015-09-09 11:48:47 -07:00
Gilles Gouaillardet
7f0ed74d24 pmix1xx: fix CPPFLAGS when DSO are not built 2015-09-09 14:20:12 +09:00