openmpi

Автор	SHA1	Сообщение	Дата
Todd Kordenbrock	8a3660138e	mtl-portals4: initialize endpoint nid/pid when using logical mapping When mtl-portals4 is configured for logical mapping, coll-portals4 must disqualify because it does not yet support logical mapping. coll-portals4 looks for the endpoint pid to be zero which tells it that mtl-portals4 is configured for logical mapping. This commit initializes the endpoint nid/pid to zero for logical mapping.	2015-12-22 11:20:18 -06:00
Ryan Grant	81d482dca6	Merge pull request #1137 from francois-wellenreiter/trig_mtl_rdv MTL portals4 : improve the rendez-vous protocol using PtlTriggeredGet…	2015-11-24 17:31:31 -07:00
Ryan Grant	219581e87e	Merge pull request #1090 from tkordenbrock/topic/check.for.invalid.handles.in.finalize mtl-portals4: test for valid handle before releasing resources	2015-11-20 07:54:44 -06:00
Francois WELLENREITER	9126ea5e82	MTL portals4 : improve the rendez-vous protocol using PtlTriggeredGet operation	2015-11-19 09:52:53 +01:00
Francois WELLENREITER	b301b49a40	MTL portals4 : remove useless PtlMDBind PtlMDRelease calls for rendez-vous messages	2015-11-06 15:55:44 +01:00
Todd Kordenbrock	cefe50cf54	mtl-portals4: test for valid handle before releasing resources During component finalize, mtl-portals4 would blindly release resources without testing if the handle was valid. This was OK, but resource allocation is now delayed until add_procs(). If mtl-portals4 is deselected, it will be finalized without add_procs() ever being called. This commit ensures that invalid handles are not released.	2015-11-02 21:01:14 -06:00
Todd Kordenbrock	88d79efd9f	mtl-portals4: fix bug in the Portals4 get_peer family The Portals4 get_peer family incorrectly cast the ompi_proc_t to ptl_process_t and returned that as the peer. The ptl_process_t is actually found in the endpoint array. This commit fixes the Portals4 get_peer family to return the dereferenced endpoint pointer.	2015-10-08 07:57:48 -05:00
Todd Kordenbrock	3e63a3458c	portals4: add support for dynamic add_procs() to all Portals4 components In the default mode of operation, the Portals4 components support dynamic add_procs(). The Portals4 components have two alternate modes (flow control and logical-to-physical) that require knowledge of all procs at startup. In these modes, mtl-portals4 sets the MCA_MTL_BASE_FLAG_REQUIRE_WORLD flag and btl-portals4 sets the MCA_BTL_FLAGS_SINGLE_ADD_PROCS flag to tell the PML that we need all the procs in one add_procs() call.	2015-09-24 22:12:57 -05:00
Nathan Hjelm	54a4061d88	Add support for detecting when dynamic add_procs is not possible This commit adds support to the pml, mtl, and btl frameworks for components to indicate at runtime that they do not support the new dynamic add_procs behavior. At the high end the lack of dynamic add_procs support is signalled by the pml using the new pml_flags member to the pml module structure. If the MCA_PML_BASE_FLAG_REQUIRE_WORLD flag is set MPI_Init will generate the ompi_proc_t array passed to add_proc from ompi_proc_world () instead of ompi_proc_get_allocated (). Both cm and ob1 have been updated to detect if the underlying mtl and btl components support dynamic add_procs. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-23 16:22:05 -06:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Todd Kordenbrock	b725186768	mtl-portals4: Verify the result of PtlPTAlloc() The Portals4 MTL allocates two Portals IDs requesting specific well-known IDs and assumes that those IDs are allocated. If those IDs are in use, PtlPTAlloc() will allocate a different ID. This commit verifies that the requested IDs were allocated.	2015-06-09 14:43:50 -05:00
Todd Kordenbrock	c42e277385	mtl-portals4: thread multiple updates When activating short receive blocks on the overflow list, remove the PTL_ME_EVENT_LINK_DISABLE flag so the event gets generated. Without PTL_EVENT_LINK, the block status can't reach the activated state. Replace #ifdef with #if for Open MPI configure booleans, because Open MPI configure booleans are always defined and the value must be checked.	2015-05-13 17:06:18 -05:00
Ryan Grant	bbeaf41a52	Merge pull request #580 from tkordenbrock/topic/mtl.add.status.to.short.recv.blocks mtl-portals4: add status to short recv blocks to coordinate out of or…	2015-05-11 13:44:45 -06:00
Todd Kordenbrock	9df163f116	portals4: use a single Memory Descriptor to cover all of memory In days past, some implementations of Portals4 could not cover all of memory with a single Memory Descriptor so multiple large overlapping Memory Descriptors were created. Because none of the current implementations have this limitation (and no future implementations should either), this commit removes the overlapping Memory Descriptors code.	2015-05-11 11:49:41 -05:00
Todd Kordenbrock	074583060d	mtl-portals4: add status to short recv blocks to coordinate out of order events If OMPI is initialized as thread multiple, then it is possible for Portals events to be processed out of order by different threads. Out of order events could lead to reactivation of the block (PTL_EVENT_AUTO_FREE) before the block is removed from the active list (PTL_EVENT_AUTO_UNLINK). This commit adds a status field to ompi_mtl_portals4_recv_short_block_t that coordinates these events.	2015-05-11 11:48:25 -05:00
Ryan Grant	6ab91a6781	Merge pull request #561 from tkordenbrock/topic/mtl.fix.datatype.overflow mtl-portals4: fix datatype overflow in ompi_mtl_portals4_long_isend()	2015-04-28 15:15:59 -06:00
Todd Kordenbrock	8a4616f724	mtl-portals4: fix datatype overflow in ompi_mtl_portals4_long_isend() The length parameter of ompi_mtl_portals4_long_isend() was declared as "int", which may not be big enough depending on the platform and compiler options used. This commit changes the type to size_t to prevent overflow.	2015-04-28 14:40:25 -05:00
Todd Kordenbrock	3e437f6184	mtl-portals4: expand the source field of the match bits to 24 bits The source field was 16 bits which is not sufficient for many current and future machines. This commit expands the source field to 24 bits and reduces the tag field from 32 bits to 24 bits.	2015-04-28 14:25:30 -05:00
Todd Kordenbrock	8e56002ec7	mtl-portals4: add missing return to portals4_init_interface()	2015-04-21 11:30:33 -05:00
Todd Kordenbrock	34c50fa963	mtl-portals4: move MD cleanup closer to failure PtlMDRelease() was called if read_msg() returned a failure code. This commit moves the PtlMDRelease() inside read_msg() so that it doesn't get called in cases where the failure happens before or at the PtlMDBind().	2015-04-21 11:30:33 -05:00
Todd Kordenbrock	422be76770	mtl-portals4: add a debug message for thread multiple mode	2015-04-21 11:30:33 -05:00
Todd Kordenbrock	35e5ffd001	mtl-portals4: add the option to use the Portals4 logical to physical table This commit adds an MCA variable to select Portals4 logical addressing, populates the logical-to-physical mapping table and initializes the NI in this mode.	2015-04-21 11:30:33 -05:00
Nathan Hjelm	df75d0382f	ompi: use C99 subobject naming for component initialization This commit helps future-proof ompi components by initializing each component member by name. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-18 10:29:58 -06:00
Todd Kordenbrock	515d9e8cc9	mtl-portals4: fix compiler warnings	2015-03-12 20:34:04 -05:00
Todd Kordenbrock	b2696edeef	mtl-portals4: fix incomplete free list conversion	2015-02-26 10:53:45 -06:00
Nathan Hjelm	ed78553512	Update opal_free_list_t usage to reflect new class interface. Please verify your components have been updated correctly. Keep in mind that in terms of threading: OPAL_FREE_LIST_GET -> opal_free_list_get_st OPAL_FREE_LIST_RETURN -> opal_free_list_return_st I used the opal_using_threads() variant anytime it appeared multiple threads could be operating on the free list. If this is not the case update to _st. If multiple threads are always in use change to _mt.	2015-02-24 10:05:44 -07:00
Howard Pritchard	bf89131f9e	add owner files to opa/ompi/orte mca directories This commit adds an owner file in each of the component directories for each framework. This allows for a simple script to parse the contents of the files and generate, among other things, tables to be used on the project's wiki page. Currently there are two "fields" in the file, an owner and a status. A tool to parse the files and generate tables for the wiki page will be added in a subsequent commit.	2015-02-22 15:10:23 -07:00
Howard Pritchard	45e7c7fd60	Merge pull request #378 from hppritcha/topic/mtl_query_cast_fix mtl/query: squash compiler warning	2015-02-06 12:23:33 -07:00
Todd Kordenbrock	762b05bcda	mtl-portals4: fix mismatch between format and type-size	2015-02-04 15:35:03 -06:00
Todd Kordenbrock	87759a1b1e	mtl-portals4: fix signedness mismatch warning	2015-02-04 15:35:03 -06:00
Todd Kordenbrock	5ddce1acbc	mtl-portals4: add "unused" attribute to rc to prevent compiler warning	2015-02-04 15:35:03 -06:00
Howard Pritchard	69d2b818f7	mtl/query: squash compiler warning Squash compiler warnings now showing up in the query methods for the mtls. Cast pointers to the different mtl module specific types to the mca_base_module_t. Also, fix up a missing extern in mtl_psm_types.h. This was causing "multiple definition" errors when building the mca_mtl_psm.so shared library.	2015-02-04 14:15:54 -07:00
Todd Kordenbrock	b8b07d2d62	mtl-portals4: Fix initialization of the Portals4 MTL component Swap close and query methods in the initialization of the Portals4 MTL component. Fixes #373	2015-02-04 13:29:13 -06:00
Howard Pritchard	ed537ddca0	copyright updates for commit eb977de5 I really should start using Jeff's script..	2015-01-31 13:50:32 -07:00
Howard Pritchard	eb977de5e9	mtl: add query method to mtl components Switch to using the query/priority method for selecting MTLs. This switch was motivated by the fact that now on some platforms, its possible for multiple MTLs to be initializable, but only one MTL should be selected. In addition, there is a complication with the PSM and IFO (with PSM provider) MTLs owing to the fact that they cannot both intialize the underlying PSM context, i.e. only one call to psm_init is allowed per process. The mxm component has not been compiled as the author doesn't currently have access to a system with a recent enough mxm installed to allow for a compile. The portals4, ofi, and psm components have been checked for compilation. The ofi and psm components have been checked for runtime correctness on a intel/qlogic system with up to date PSM installed.	2015-01-29 09:02:52 -07:00
Todd Kordenbrock	6a3225d800	Fix invalid symbols left by the PMIx merge. This commit was SVN r32597.	2014-08-25 16:30:26 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Ryan Grant	caa10a5faf	Portals fixes after latest move This commit was SVN r32330.	2014-07-28 19:25:03 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
Ryan Grant	ca0a7b1a9a	Correct typo in r31332, mtl_portals_enpoint.h -> mtl_portals_endpoint.h This commit was SVN r31338. The following SVN revision numbers were found above: r31332 --> open-mpi/ompi@b12ee27b3d	2014-04-08 14:41:51 +00:00
Ralph Castain	b12ee27b3d	Add missing files - thanks to Mr. Anonymous for reporting them as missing from the 1.8 tarball cmr=v1.8.1:reviewer=jsquyres:subject=add missing portals4 files This commit was SVN r31332.	2014-04-08 02:55:14 +00:00
Brian Barrett	7d472ad5a5	Improve some comments This commit was SVN r30144.	2014-01-07 23:35:04 +00:00
Brian Barrett	afde8370b3	Pull both calls to get into one function, and wrap with the appropriate reference count if flow control is enabled. This commit was SVN r30141.	2014-01-07 23:15:09 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
Brian Barrett	dbcc53bc6f	Fix a threading issue Remove some unneeded UNLIKELYs This commit was SVN r30138.	2014-01-07 19:41:39 +00:00
Brian Barrett	d4bb1cbbad	* Start working on thread safety of Portals 4 MTL * Only call flowctl_add_procs if there's a new proc in the add_procs call This commit was SVN r30110.	2014-01-02 22:37:01 +00:00
Brian Barrett	16a1166884	Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a configure-time dynamic allocation of flags. The net result for platforms which only support BTL-based communication is a reduction of 8*nprocs bytes per process. Platforms which support both MTLs and BTLs will not see a space reduction, but will now be able to safely run both the MTL and BTL side-by-side, which will prove useful. This commit was SVN r29100.	2013-08-30 16:54:55 +00:00
Brian Barrett	2cc947513b	* Fix some compile errors * Need to subtract 1 off the size so that we stay in the bit length requirements This commit was SVN r28997.	2013-08-05 18:49:48 +00:00
Brian Barrett	ecbbf888d3	* Update Portals 4 MTL's multi-md code to be a bit cleaner (no if statements in the path) and not create MDs due to boundary crossing * Add the same logic to the Coll component This commit was SVN r28733.	2013-07-08 21:27:37 +00:00

1 2 3

114 Коммитов