openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	e33f319380	Update example to show tests of various APIs Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-10-23 12:02:54 -07:00
Ralph Castain	6ea3c8a0bd	Update the interlib example to show an alternative method for model declaration. Add a missing range value to the OPAL layer. Make it easier to see OMPI model callbacks Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-10-23 11:27:42 -07:00
Ralph Castain	f8ce31f13c	Fix event registration so OpenMP/MPI coordination sides can both get notification of model declarations Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-10-19 18:06:38 -07:00
Ralph Castain	c696e04c5e	Since PMIx is moving to release v3.0, embed the new release candidate in opal/pmix framework. Move the pmix2x code over to the ext2x component. Create a new ext3x component Remove some build product. Tell PMIx that we don't need a new nspace generated when OMPI calls connect Add missing Makefile Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-10-09 13:51:08 -07:00
Ralph Castain	41df973359	Add diagnostics for hwloc get_topology Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-16 14:21:27 -07:00
Ralph Castain	7a83fdb9bb	Update to hwloc 2.0.0a with shmem support. Update to support passing of HWLOC shmem topology to client procs Update use of distance API per @bgoglin Have the openib component lookup its object in the distance matrix Bring usnic up-to-date Restore binding for hwloc2 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-25 20:26:22 -07:00
Ralph Castain	543c16b28d	Fix the isolated pmix component. Cleanup the ess/singleton component - we shouldn't be automatically discovering the local topology as that is now done on-demand. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-19 12:14:29 -07:00
Ralph Castain	7b39f19f60	Fix the backend mapper algorithm for comm_spawn. The front and back ends need to get the nodes into the job map in the same order so that the ranking algorithms will reach the same results Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-08 08:00:52 -07:00
George Bosilca	484004b03d	simple_spawn should be independent of ORTE.	2017-06-07 17:51:46 -04:00
Ralph Castain	e8759ca66b	Add minor test to ORTE test suite Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-29 15:43:52 -07:00
Ralph Castain	657e701c65	Add debug verbosity to the orte data server and pmix pub/lookup functions Start updating the various mappers to the new procedure. Remove the stale lama component as it is now very out-of-date. Bring round_robin and PPR online, and modify the mindist component (but cannot test/debug it). Remove unneeded test Fix memory corruption by re-initializing variable to NULL in loop Resolve the race condition identified by @ggouaillardet by resetting the mapped flag within the same event where it was set. There is no need to retain the flag beyond that point as it isn't used again. Add a new job attribute ORTE_JOB_FULLY_DESCRIBED to indicate that all the job information (including locations and binding) is included in the launch message. Thus, the backend daemons do not need to do any map computation for the job. Use this for the seq, rankfile, and mindist mappers until someone decides to update them. Note that this will maintain functionality, but means that users of those three mappers will see large launch messages and less performant scaling than those using the other mappers. Have the mindist module add procs to the job's proc array as it is a fully described module Protect the hnp-not-in-allocation case Per path suggested by Gilles - protect the HNP node when it gets added in the absence of any other allocation or hostfile Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-25 18:41:27 -07:00
Ralph Castain	0afcb1a448	Update to support server self-notifications Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-08 10:04:50 -07:00
Ralph Castain	ef0e0171c9	Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c). Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-08 10:04:50 -07:00
Ralph Castain	e8aea2ebfc	Minor cleanups Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-30 16:19:42 -08:00
Ralph Castain	d5fd635efe	Bring forward the debugger-related changes Refs https://github.com/open-mpi/ompi/pull/2425 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-29 13:15:20 -08:00
Ralph Castain	6f65d0a173	Repair event notification support. Cleanup the long-suffering "epoll: warning" coming out of libevent whenever a process abnormally terminated. Add changes to test program Sync to PMIx master	2016-10-13 16:27:39 -07:00
Ralph Castain	92102304b6	Minor typo - init the job_data stdin_target field to 0 for default behavior. Add test.	2016-08-22 21:03:45 -07:00
Ralph Castain	9888615e75	Restore the coll/sync module and provide a test to verify its operation	2016-08-20 10:14:52 -07:00
Ralph Castain	20a91c2baf	Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program. Cleanup debug message	2016-07-13 15:28:33 -07:00
Ralph Castain	380cc8f040	Add a test program to help diagnose binding issues	2016-06-23 06:27:18 -07:00
Ralph Castain	5d330d5220	Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler. Add PMIx 2.0 Remove PMIx 1.1.4 Cleanup copying of component Add missing file Touchup a typo in the Makefile.am Update the pmix ext114 component Minor cleanups and resync to master Update to latest PMIx 2.x Update to the PMIx event notification branch latest changes	2016-06-14 13:08:41 -07:00
Karol Mroz	5c11bdb251	orte: fixup hostname max length usage Also removes orte specific max hostname value. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-25 07:08:23 +02:00
Ralph Castain	449ec41532	Roll to PMIx 1.1.4rc1 and remove the PMIx 1.2.0 directory as the community has decided to not do that release version. This incorporates a number of bug fixes that have been identified and repaired in the PMIx and OMPI code bases. Also includes several minor corrections to the PMIx code so it now supports run-thru without hanging on collectives involving a process that exits	2016-04-15 10:11:11 -07:00
Ralph Castain	f0680008d1	Add test file for singularity	2016-03-02 05:40:41 -08:00
Ralph Castain	1748f44147	Stop a segfault that results in zombied processes by checking for NULL prior to object release	2016-02-18 13:48:41 -08:00
Ralph Castain	8f9508cace	Further enhance the support for Singularity containers. Extend the "personality" command-line option to allow specifying both model (e.g., "ompi") and container (e.g., "singularity"), and add the necessary logic to support multiple options. Add a new pmix "isolated" component to handle singletons where no HNP is available since containers cannot launch the HNP.	2016-02-17 13:33:06 -08:00
Ralph Castain	aa9e5a1a27	Add support for Singularity containers, including a .m4 file for checking if Singularity is available and an orte/schizo component for setting the proper support if a container was given as the executable Cleanup the configury so we properly check for Singularity under the various typical use-cases Bring the Singularity support online. We have to turn "off" the sm BTL as it segfaults from inside the container - root cause remains unclear. Also turned "off" the various OPAL shmem components in case they are involved and someone else tries to use them. Happily, the vader BTL works just fine!	2016-02-13 04:40:22 -08:00
Ralph Castain	03eb1a80bf	Update the PMIx native component to release v1.1.1, with addition of one bug-fix commit beyond the official release Rename the pmix1xx component to pmix111 so it reflects the actual release it includes Resolve the problem of PMIx being passed a bogus --with-platform argument when configuring the PMIx tarball code. There is no reason we should be passing --with-platform arguments to any internal subdirectory, so just leave that out when constructing the opal_subdir_args variable. Update the PMIx code and continue attempting to debug direct modex Fix a problem in the ORTE PMIx server - there was an early intent to optimize the direct modex by fetching data for all procs from the target job on the remote node, instead of fetching the data one proc at a time. However, this was never completely implemented, and so we would hang if we had multiple overlapping requests for data from more than one proc on the node. Update PMIx to v1.1.2	2015-12-12 18:46:38 -08:00
Ralph Castain	a2a049a612	Update test to match the one in MTT	2015-08-13 11:12:34 -07:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Elena	6c6fe75c7b	added one more time interval for barrier to pmix unit test	2015-03-06 10:33:14 +02:00
Ralph Castain	d5775bf9de	Cleanup orte MPI test directory so it all builds again	2015-02-11 10:14:06 -08:00
Jeff Squyres	c9e3f22933	orte mpi tests: fix a bunch of compiler warnings	2015-02-11 12:28:10 -05:00
Jeff Squyres	07179ef669	orte mpi tests: don't use deprecated MPI functions Change MPI_Errhandler_set -> MPI_Comm_set_errhandler	2015-02-11 12:28:10 -05:00
Jeff Squyres	cc7f433c0f	Makefile: this file should not be executable	2015-02-11 07:33:56 -08:00
Elena	948c20d862	added pmix unit test to tarball	2015-02-10 13:41:15 +02:00
Elena	5919b636e1	changed output format in pmix unit test	2015-02-02 14:22:51 +02:00
Elena	472baa1284	added unit test for pmix functionality	2015-01-28 13:18:26 +02:00
Elena	b937b31693	fix for multiple spawn test	2014-10-09 06:18:16 +02:00
Ralph Castain	6c5e592785	Revert r32222, r32210, and r32203 as they created a problem when daemon collectives did not involve app procs on every node. Instead, modify the ompi/mca/rte/orte/rte_orte.h to add a new function that allows apps to request new daemon collective ids for use in barrier and modex operations. This will only appear in ORTE-based installations, but it is only being used by a couple of researchers at the moment. Update the orte/test/mpi/coll_test.c test to show the revised example. This commit was SVN r32234. The following SVN revision numbers were found above: r32203 --> open-mpi/ompi@a523dba41d r32210 --> open-mpi/ompi@2ce11ed5c4 r32222 --> open-mpi/ompi@d55f16db50	2014-07-15 03:48:00 +00:00
Ralph Castain	a523dba41d	NOTE: this modifies the MPI-RTE interface We have been getting several requests for new collectives that need to be inserted in various places of the MPI layer, all in support of either checkpoint/restart or various research efforts. Until now, this would require that the collective id's be generated at launch. which required modification s to ORTE and other places. We chose not to make collectives reusable as the race conditions associated with resetting collective counters are daunti ng. This commit extends the collective system to allow self-generation of collective id's that the daemons need to support, thereby allowing developers to request any number of collectives for their work. There is one restriction: RTE collectives must occur at the process level - i.e., we don't curren tly have a way of tagging the collective to a specific thread. From the comment in the code: * In order to allow scalable * generation of collective id's, they are formed as: * * top 32-bits are the jobid of the procs involved in * the collective. For collectives across multiple jobs * (e.g., in a connect_accept), the daemon jobid will * be used as the id will be issued by mpirun. This * won't cause problems because daemons don't use the * collective_id * * bottom 32-bits are a rolling counter that recycles * when the max is hit. The daemon will cleanup each * collective upon completion, so this means a job can * never have more than 2*32 collectives going on at a time. If someone needs more than that - they've got * a problem. * * Note that this means (for now) that RTE-level collectives * cannot be done by individual threads - they must be * done at the overall process level. This is required as * there is no guaranteed ordering for the collective id's, * and all the participants must agree on the id of the * collective they are executing. So if thread A on one * process asks for a collective id before thread B does, * but B asks before A on another process, the collectives will * be mixed and not result in the expected behavior. We may * find a way to relax this requirement in the future by * adding a thread context id to the jobid field (maybe taking the * lower 16-bits of that field). This commit includes a test program (orte/test/mpi/coll_test.c) that cycles 100 times across barrier and modex collectives. This commit was SVN r32203.	2014-07-10 18:53:12 +00:00
Ralph Castain	e9d69ca370	Remove stale test This commit was SVN r32104.	2014-06-29 16:37:19 +00:00
Ralph Castain	2c3d07db24	Cleanup the test so it is MPI correct This commit was SVN r31919.	2014-06-01 17:57:36 +00:00
Ralph Castain	a91d358c48	Add/modify a couple of tests This commit was SVN r30743.	2014-02-16 20:54:34 +00:00
Ralph Castain	5b8e1180cf	Update a test This commit was SVN r30640.	2014-02-08 22:00:12 +00:00
Ralph Castain	31248c0985	Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match. Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node. Refs trac:4003 This commit was SVN r30033. The following Trac tickets were found above: Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003	2013-12-20 20:42:39 +00:00
Ralph Castain	fb0940a9d9	Add a couple of useful tests This commit was SVN r29539.	2013-10-28 13:24:16 +00:00
Ralph Castain	9902748108	*** THIS INCLUDES A SMALL CHANGE IN THE MPI-RTE INTERFACE *** Fix two problems that surfaced when using direct launch under SLURM: 1. locally store our own data because some BTLs want to retrieve it during add_procs rather than use what they have internally 2. cleanup MPI_Abort so it correctly passes the error status all the way down to the actual exit. When someone implemented the "abort_peers" API, they left out the error status. So we lost it at that point and always exited with a status of 1. This forces a change to the API to include the status. cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch This commit was SVN r29405.	2013-10-08 18:37:59 +00:00
George Bosilca	273d66d0f2	The MPI_Intercomm_create test was broken, as the remote peer was always considered as being 1 (instead of count). This commit was SVN r29207.	2013-09-18 16:47:54 +00:00
Ralph Castain	865a7028f8	Per patch from George, with a few minor cleanups. Correctly address the complete exchange of required wireup information in Intercomm_create so all procs in the resulting communicator know how to talk to each other. Refs trac:29166 This commit was SVN r29200. The following Trac tickets were found above: Ticket 29166 --> https://svn.open-mpi.org/trac/ompi/ticket/29166	2013-09-18 02:01:30 +00:00

1 2 3

145 Коммитов