openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	d5471d7898	Silence warnings in optimized build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-12-20 12:00:28 -08:00
Ralph Castain	4316213805	Fix add-host support by including the location for procs of prior jobs when spawning new daemons. Thanks to CalugaruVaxile for the report Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-12-07 14:48:58 -08:00
Ralph Castain	b5bf0a7f1d	Add a new posix_spawn component to the ODLS framework. Only selectable when specifically requested via "-mca odls pspawn" Note that there are several concerns: * we aren't getting SIGCHLD calls when the procs terminate * we aren't seeing the IO pipes close on termination, though we are getting output forwarded to mpirun * I haven't found a way to bind the child process prior to exec. If we want to use this method, we probably need someone to implement a cgroup component for the orte/rtc framework Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-11-30 18:01:31 -08:00
Ralph Castain	d4b83cc951	Sync with PMIx master Implement direct modex protection to turn off PMIx dstore when direct modex scenario is detected Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-11-07 18:10:56 -08:00
Ralph Castain	6be74bfa7e	Add another test program for cross-lib coordination, this one based on native PMIx commands Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-10-29 11:33:25 -07:00
Ralph Castain	f8ce31f13c	Fix event registration so OpenMP/MPI coordination sides can both get notification of model declarations Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-10-19 18:06:38 -07:00
Ralph Castain	c696e04c5e	Since PMIx is moving to release v3.0, embed the new release candidate in opal/pmix framework. Move the pmix2x code over to the ext2x component. Create a new ext3x component Remove some build product. Tell PMIx that we don't need a new nspace generated when OMPI calls connect Add missing Makefile Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-10-09 13:51:08 -07:00
Ralph Castain	fcb7a2f29b	Minor cleanups for when using external pmix Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-09-24 09:53:04 -07:00
Gabe Saba	c6235a9a0f	reachable: add tests Add test suite for netlink and weighted reachable components. We don't have a great way of running components through unit tests today, so make them stand-alone tests that are run with mpirun and such. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2017-09-19 19:42:54 -07:00
Brian Barrett	bffcc3bca0	util: move graph solver from usnic to util Cisco wrote a bipartite graph solver to properly solve interface pair selection for usNIC. Using the reachable framework, the TCP BTL (and possibly the runtime network code) can use the graph solver to make more optimal pair selection. Jeff was happy to have the code more broadly used, but didn't have time to do the move, hence this commit. There are a couple of minor changes to the code compared to the usNIC version. Obviously, the functions have been renamed to match naming convention for their new home. Since it's easier to write unit tests for util/ code, the unit tests have been made first class tests run at "make check" time. This last bit required moving some of the definitions into a new header, bipartite_graph_internal.h, so that they could be included in both the library code and the test code. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2017-09-15 15:08:47 -07:00
Ralph Castain	bbd83fd4c0	Add a new launcher "prun" for starting applications against the ORTE DVM. Unlike "orterun", "prun" is a PMIx-only program that discovers the DVM connection instead of requiring that we explicitly provide it. Only build "prun" if PMIx v2.x is available. This gets the DVM working again, but still is showing problems for multiple executions. I'll detail those in a separate issue. Thus, the DVM should still be considered "broken". Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-09-12 21:40:41 -07:00
Brian Barrett	29a53b0269	git: Ignore OSHMEM C++ wrapper artifacts Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2017-09-08 08:54:08 -07:00
Ralph Castain	d515f48885	The local PMIx server is notifying its clients of all events, but for some reason I don't recall, the broadcast notification was marked for delivery only to non-default event handlers. This creates a discrepancy between the two behaviors, so don't restrict the broadcast notifications. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-18 17:26:11 -07:00
Ralph Castain	f39ce67982	Merge pull request #3951 from rhc54/topic/hwloc2 Update to hwloc 2.0.0a	2017-08-01 15:18:31 -06:00
Brian Barrett	fe8e4a0402	dist: Autogenerate AUTHORS file Per discussion at the Summer 2017 developers meeting, generate the AUTHORS list at make dist time, rather than trying to keep it up to date and merge on the branches by hand. While most of the data is generated from git, the organization list was maintained by hand. The general feeling at the meeting was that the organization list was not adding value and there were concrete cases where it involved much chasing by the RMs, so it has been removed. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2017-07-27 13:04:17 -07:00
Ralph Castain	7a83fdb9bb	Update to hwloc 2.0.0a with shmem support. Update to support passing of HWLOC shmem topology to client procs Update use of distance API per @bgoglin Have the openib component lookup its object in the distance matrix Bring usnic up-to-date Restore binding for hwloc2 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-25 20:26:22 -07:00
Ralph Castain	7d8d877837	Remove build product and update .gitignore to avoid picking it up again Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-20 11:49:48 -07:00
Ralph Castain	0b9d8f8a41	Update ignores Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-27 09:07:19 -07:00
Ralph Castain	93cf3c7203	Update OPAL and ORTE for thread safety (I swear, if I look this over one more time, I'll puke) Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-06 12:30:57 -07:00
KAWASHIMA Takahiro	76b1f80664	java: Use correct date/version in `mpijava` man page `mpijavac.1` should be generated at `make`-time... Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-05-31 17:24:49 +09:00
Ralph Castain	e8759ca66b	Add minor test to ORTE test suite Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-29 15:43:52 -07:00
Ralph Castain	657e701c65	Add debug verbosity to the orte data server and pmix pub/lookup functions Start updating the various mappers to the new procedure. Remove the stale lama component as it is now very out-of-date. Bring round_robin and PPR online, and modify the mindist component (but cannot test/debug it). Remove unneeded test Fix memory corruption by re-initializing variable to NULL in loop Resolve the race condition identified by @ggouaillardet by resetting the mapped flag within the same event where it was set. There is no need to retain the flag beyond that point as it isn't used again. Add a new job attribute ORTE_JOB_FULLY_DESCRIBED to indicate that all the job information (including locations and binding) is included in the launch message. Thus, the backend daemons do not need to do any map computation for the job. Use this for the seq, rankfile, and mindist mappers until someone decides to update them. Note that this will maintain functionality, but means that users of those three mappers will see large launch messages and less performant scaling than those using the other mappers. Have the mindist module add procs to the job's proc array as it is a fully described module Protect the hnp-not-in-allocation case Per path suggested by Gilles - protect the HNP node when it gets added in the absence of any other allocation or hostfile Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-25 18:41:27 -07:00
Ralph Castain	ef0e0171c9	Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c). Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-08 10:04:50 -07:00
Ralph Castain	95ae0d1df3	Cleanup timing macros for portability across compilers. Rename the --enable-timing configure option to be --enable-pmix-timing so it doesn't pickup external timing requests. Remove a stale function reference in PMIx so it can compile with timing enabled. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-10 12:56:38 +06:00
Ralph Castain	9eab9a1ed3	Remove stale global variables Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers. Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation). Begin first cut at memory profiler Some minor cleanups of memprobe Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-01-02 14:04:24 -08:00
Ralph Castain	1a0bccb536	Now that PMIx has settled on its release strategy and numbering, update the OPAL pmix framework to track Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-02 15:44:43 -08:00
Ralph Castain	d5fd635efe	Bring forward the debugger-related changes Refs https://github.com/open-mpi/ompi/pull/2425 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-29 13:15:20 -08:00
Gilles Gouaillardet	52a1f96726	fortran/use-mpi-tkr: update .gitignore ignore automatically generated mpi-tkr-sizeof.* Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-10-28 10:41:59 +09:00
Ralph Castain	8113a8d1b0	Now that we are hiding symbols in the internal PMIx component, we cannot reuse that component for integration to the external PMIx master as the symbols don't match. So create a new "ext3x" component and copy the PMIx v3 integration over there. Also, remove a couple of build-product files from the pmix3x component.	2016-10-18 13:15:32 -07:00
Ralph Castain	6f65d0a173	Repair event notification support. Cleanup the long-suffering "epoll: warning" coming out of libevent whenever a process abnormally terminated. Add changes to test program Sync to PMIx master	2016-10-13 16:27:39 -07:00
Ralph Castain	a16b3cc33d	Fix some minor complaints - missing "void" in function parameters	2016-09-15 15:18:42 -07:00
Ralph Castain	af67f16422	Update configury to support multiple PMIx versions, rename pmix2x component to pmix3x for support of PMIx master Update support for external v1.1.x and v2.x libraries. Minor corrections to the v3.x component	2016-08-25 18:19:05 -07:00
Ralph Castain	92102304b6	Minor typo - init the job_data stdin_target field to 0 for default behavior. Add test.	2016-08-22 21:03:45 -07:00
Ralph Castain	9888615e75	Restore the coll/sync module and provide a test to verify its operation	2016-08-20 10:14:52 -07:00
Jeff Squyres	ce0124603d	.gitignore: add test executable to ignore Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-17 04:27:51 -07:00
Gilles Gouaillardet	483685eb6a	update .gitignore remove autogenerated opal/mca/pmix/pmix2x/pmix/src/include/pmix_config.h.in	2016-08-15 17:00:20 +09:00
Ralph Castain	48d35a9627	Ensure we properly convert pmix status to ORTE state before activating an error state upon notification. Cleanup some conversion issues on notification info. Add a new orte_notify.c test program	2016-08-12 21:14:29 -07:00
Ralph Castain	b0cc9b0bc8	Update to latest PMIx toolext branch Fix indentations Update the ext20 component to match latest PMIx master. Cleanup name conflicts and uninit vars	2016-08-11 12:29:48 -07:00
Ralph Castain	6e434d6785	Add support for PMIx tool connections and queries. Initially only support a request to list all known namespaces (jobids) from ORTE, but other folks will extend that support to include additional information Update to match PMIx RFC Fix configury to point to correct libevent and hwloc locations	2016-06-29 19:19:19 -07:00
Gilles Gouaillardet	8466a3daf3	pmix: update .gitignore git ignore opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in git rm opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in git ignore opal/mca/pmix/pmix*/...	2016-05-23 11:58:07 +09:00
Gilles Gouaillardet	d96919638f	pmix: remote autogenerated file and update .gitignore removed: opal/mca/pmix/pmix114/pmix/src/include/private/autogen/config.h.in	2016-04-18 12:57:41 +09:00
Ralph Castain	449ec41532	Roll to PMIx 1.1.4rc1 and remove the PMIx 1.2.0 directory as the community has decided to not do that release version. This incorporates a number of bug fixes that have been identified and repaired in the PMIx and OMPI code bases. Also includes several minor corrections to the PMIx code so it now supports run-thru without hanging on collectives involving a process that exits	2016-04-15 10:11:11 -07:00
Ralph Castain	aa9e5a1a27	Add support for Singularity containers, including a .m4 file for checking if Singularity is available and an orte/schizo component for setting the proper support if a container was given as the executable Cleanup the configury so we properly check for Singularity under the various typical use-cases Bring the Singularity support online. We have to turn "off" the sm BTL as it segfaults from inside the container - root cause remains unclear. Also turned "off" the various OPAL shmem components in case they are involved and someone else tries to use them. Happily, the vader BTL works just fine!	2016-02-13 04:40:22 -08:00
Ralph Castain	810f2446b7	Add pmix120 component, update the error handling functions in the PMIx API. Update the configure logic for the new pmix120 component ckpt Get the pmix120 component to work - still not really registering or handling notifications, but infrastructure now operates Cleanup some of the symbol scopes, and provide a more comprehensive rename.h file. Will pretty it up later - let's see how this works Cleanup the rename files to use the pretty macros	2015-12-28 23:15:44 +09:00
Ralph Castain	03eb1a80bf	Update the PMIx native component to release v1.1.1, with addition of one bug-fix commit beyond the official release Rename the pmix1xx component to pmix111 so it reflects the actual release it includes Resolve the problem of PMIx being passed a bogus --with-platform argument when configuring the PMIx tarball code. There is no reason we should be passing --with-platform arguments to any internal subdirectory, so just leave that out when constructing the opal_subdir_args variable. Update the PMIx code and continue attempting to debug direct modex Fix a problem in the ORTE PMIx server - there was an early intent to optimize the direct modex by fetching data for all procs from the target job on the remote node, instead of fetching the data one proc at a time. However, this was never completely implemented, and so we would hang if we had multiple overlapping requests for data from more than one proc on the node. Update PMIx to v1.1.2	2015-12-12 18:46:38 -08:00
Ralph Castain	fed28e4cfc	Add missing file that was previously ignored	2015-11-06 14:37:09 -08:00
George Bosilca	4f88c82500	Fix a convertion problem and add a comment about the lack of component retain in the new component infrastructure. Clean Makefile.am to fix "make distcheck". Update the gitignore rules.	2015-10-31 17:13:35 -04:00
Ralph Castain	64bea51cdf	Update ignores for new oshmem man pages	2015-10-12 17:30:12 -07:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Igor Ivanov	5b277f0069	oshmem: Add man related files into gitignore Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>	2015-08-25 13:04:44 +03:00

1 2

79 Коммитов