1
1

521 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
e575c4d6f9 Fix tool connection logic so we properly search for default session server, perform specified number of retries, etc.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 7c755e01004f8b86c71f1729662979ea45ab1adb)
2017-09-19 13:35:46 -07:00
Ralph Castain
3b3ce243bb Merge pull request #4214 from karasevb/pmix1_hang_fix
pmix: fixed immediate request for PMIx v1.2
2017-09-19 06:51:25 -07:00
Ralph Castain
5708872112 Implement support for "local" range when publishing data
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 2d54f7e0dd3a47260b0b2634aae3361316005933)
2017-09-18 19:34:08 -07:00
Boris Karasev
2929f52ffc pmix1: fixed immediate request
This fixes a hang of immediate PMIx request. PMIx v1.2 does not support
the info key `PMIX_IMMEDIATE` that leads to hanging. For that request
the fix uses the key `PMIX_OPTIONAL` for not go to the server.

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-09-18 09:17:44 +03:00
Ralph Castain
3c914a7a97 Complete the fix of the ORTE DVM. We will now use "prun" instead of "orterun -hnp foo" to execute jobs. This provides the feature of automatic discovery of the orte-dvm so you don't need to manually enter URI's or contact file locations. All IO is forwarded to prun.
Still in the "needs to be done" category:

* mapping/ranking/binding options aren't correctly supported

* if the DVM encounters some errors (e.g., not enough resources for the job), the resulting error is globally set and impacts any subsequent job submission

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-16 13:13:07 -07:00
Ralph Castain
7c7d8a69a0 Backport changes from PMIx reference server
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-14 11:48:56 -07:00
Ralph Castain
691237801b Update to track PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-13 10:21:44 -07:00
Ralph Castain
bbd83fd4c0 Add a new launcher "prun" for starting applications against the ORTE DVM.
Unlike "orterun", "prun" is a PMIx-only program that discovers the DVM connection instead of requiring that we explicitly provide it. Only build "prun" if PMIx v2.x is available.

This gets the DVM working again, but still is showing problems for multiple executions. I'll detail those in a separate issue. Thus, the DVM should still be considered "broken".

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-12 21:40:41 -07:00
Ralph Castain
88eac797fb Silence coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-12 09:14:36 -07:00
Ralph Castain
3477079804 Repair the ORTE DVM
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-11 17:38:21 -07:00
Ralph Castain
cbc114e923 Update to track PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-06 13:15:24 -07:00
Ralph Castain
2c723f4338 Roll to track PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-01 12:30:34 -07:00
Gilles Gouaillardet
c9cca771cc pmix/ext2x: automatically generate ext2x component from pmix2x sources
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-08-30 09:41:31 +09:00
Gilles Gouaillardet
fd08b923d5 pmix: do not invoke PMIX_INFO_CREATE() with a zero size
Thanks Lisandro Dalcin for the report

Fixes open-mpi/ompi#3854

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-08-28 11:25:58 +09:00
Josh Hursey
ad87aa2674 Merge pull request #4121 from jjhursey/explore/dlopen-local
mca: Dynamic components link against project lib
2017-08-25 13:15:51 -05:00
Joshua Hursey
e1d079544b mca: Dynamic components link against project lib
* Resolves #3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-24 11:56:16 -04:00
Ralph Castain
68029b27e4 Fix the orte-dvm operations so that orterun can connect and execute an application. There is a lingering problem, though. The first invocation of orterun succeeds every time. However, subsequent invocations have a high probability of hanging in the OOB connection handshake.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-23 17:31:08 -07:00
Ralph Castain
0561d64748 Continue tracking PMIx v2.1.0
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-23 09:38:27 -07:00
Ralph Castain
d80b0c7990 If the HWLOC shared memory system is unable to connect, then fallback to providing the topology via XML. Do not automatically provide the XML to every process as that defeats the purpose of the shared memory system. Instead, use PMIx_Query_info_nb to get the info from the server when required.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 18:12:26 -07:00
Ralph Castain
e3213386ec Fix the internal PMIx installation - matching changes have been upstreamed
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 13:49:07 -07:00
Ralph Castain
a1b15c5666 Roll in update to PMIx master. Transfer updates from pmix2x component to ext2x
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-22 13:06:47 -07:00
Ralph Castain
d515f48885 The local PMIx server is notifying its clients of all events, but for some reason I don't recall, the broadcast notification was marked for delivery only to non-default event handlers. This creates a discrepancy between the two behaviors, so don't restrict the broadcast notifications.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-18 17:26:11 -07:00
Ralph Castain
088b6cdeee Silence coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-17 09:49:35 -07:00
Ralph Castain
c4d5dbfcdc Change test per recommendation of @jsquyres
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-16 11:19:15 -07:00
Ralph Castain
eb69df02ae Update to PMIx v2.1.0rc1
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-15 19:59:15 -07:00
Ralph Castain
65fb6070d9 Update tool support by adding MCA params to direct orted's to drop
session and/or system-level tool rendezous files. Ensure PMIx is
enabled for tools

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-15 17:49:47 -07:00
Ralph Castain
033a0eb373 Fix the --disable-dlopen --with-devel-headers case by not having libpmix link back to libopen-pal as the latter won't exist in time during this build case
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-15 10:51:35 -07:00
Ralph Castain
4290247d64 Update to latest PMIx v2.1.0a
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-10 18:48:07 -07:00
Ralph Castain
53c9270af7 Silence coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-08 06:10:14 -07:00
Ralph Castain
9921237f99 Merge pull request #4012 from rhc54/topic/p3
Cover the use-cases for OPAL_PREFIX and PMIX_INSTALL_PREFIX options
2017-08-07 11:42:53 -07:00
Ralph Castain
d593e5a4ce When we specify --with-devel-headers, we also emit a copy of libpmix. However, that library was built against the OPAL libevent component, which means all the libevent functions are prefixed with OPAL names. So ensure that the emitted libpmix is linked back against libopen-pal so those symbols will be resolved.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-07 09:36:16 -07:00
Ralph Castain
a239b4c3c3 Per discussion on the PMIx side, do a better job of detecting mismatches between location directives for OPAL and PMIx. Provide a more helpful error message and error out if we find a mismatch. If any OPAL values are set and the PMIx equivalent is not, then transfer it.
Do not clear PMIX_INSTALL_PREFIX from the daemon's launch environment

Fixes #3980
Closes #4007
Refs #3985

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-04 19:36:00 -07:00
Ralph Castain
f128b4c546 Fix incorrect usage of '==' in test comparisons
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-03 21:21:26 -07:00
Artem Polyakov
500c8be888 pmix: fix PMIx envar name for the installation prefix.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-08-02 08:03:36 +03:00
Ralph Castain
f39ce67982 Merge pull request #3951 from rhc54/topic/hwloc2
Update to hwloc 2.0.0a
2017-08-01 15:18:31 -06:00
Ralph Castain
8f34fa4a56 Move the detection of OPAL_PREFIX and subsequent posting of PMIX_PREFIX to the internal integration code for PMIx so we only do this when running with the embeddied PMIx
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-08-01 08:24:27 -06:00
Boris Karasev
e20b581529 pmix: fixed immediate request
This commit fixes a hang when using external PMIx v1 module

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-07-28 15:53:48 +06:00
Ralph Castain
7a83fdb9bb Update to hwloc 2.0.0a with shmem support.
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 20:26:22 -07:00
Ralph Castain
0042c758f1 Update the tools support so it allows tools to access PMIx
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 17:10:08 -07:00
Ralph Castain
058e802b11 Add missing export directives
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-25 07:19:08 -07:00
George Bosilca
1ea8fab095
Make external symbols visible.
All symbols that need to be accessed from a MCA component must be marked
explicitly as visible using PMIX_EXPORT. This patch allows current trunk
to almost work on OsX. More on the devel mailing list.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-07-25 01:14:22 -04:00
Ralph Castain
af85e48dd7 Silence Coverity warning, silence pmix_error_log of success
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-21 15:33:16 -07:00
Ralph Castain
492f98f8a5 Update to latest PMIx v2.1.0a
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-21 12:58:09 -07:00
Ralph Castain
f7e8780a42 Remove fortran support from platform file
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-20 21:02:30 -07:00
Ralph Castain
b225366012 Bring the ofi/rml component online by completing the wireup protocol for the daemons. Cleanup the current confusion over how connection info gets created and
passed to make it all flow thru the opal/pmix "put/get" operations. Update the PMIx code to latest master to pickup some required behaviors.

Remove the no-longer-required get_contact_info and set_contact_info from the RML layer.

Add an MCA param to allow the ofi/rml component to route messages if desired. This is mainly for experimentation at this point as we aren't sure if routing wi
ll be beneficial at large scales. Leave it "off" by default.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-20 21:01:57 -07:00
Ralph Castain
8c30958879 Update to PMIx v2.1.0alpha
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-20 11:12:06 -07:00
Ralph Castain
fca68b070b Merge pull request #3934 from rhc54/topic/singleton
Fix the isolated pmix component. Cleanup the ess/singleton component …
2017-07-19 16:02:37 -05:00
Ralph Castain
543c16b28d Fix the isolated pmix component. Cleanup the ess/singleton component - we shouldn't be automatically discovering the local topology as that is now done on-demand.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-19 12:14:29 -07:00
Howard Pritchard
2fa0c4c6ec pmix/s1: fix problems with ref counting in s1
s1 pmix component wasn't doing proper ref counting

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-07-18 15:59:28 -06:00
Gilles Gouaillardet
9124afbeae pmix: do not invoke PMIX_INFO_CREATE() with a zero size
Thanks Lisandro Dalcin for the report

Fixes open-mpi/ompi#3854

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-07-14 15:00:05 +09:00