openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	8f1a44e60e	bml/r2: add all rdma btls even if another btl has higher exclusivity Background: In order to support atomics each btl needs to provide support for communicating with self unless the btl module can guarantee global atomicity. Before this commit bml/r2 discarded any BTL with lower exclusivity than an existing send btl. This would cause the BML to discard any btl other than self. The new behavior is as follows: - If an exisiting send btl has higher exclusivity then the btl will not be added to the send btl list for the endpoint. - If a btl provides RDMA support then it is always added to the rdma btl list. - bml_btl weight for send btls is now calculated across all send btls. - bml_btl weight for rdma btls is now calculated across all rdma btls. With this change self should still win as the only send btl for loopback without disqualifying other btls (ugni, openib) for atomic operations.	2014-11-19 11:33:04 -07:00
Ralph Castain	b1a7375192	Fix the "unreachable" message so it outputs the correct hostname for the remote proc. Cleanup some of the pmix stuff when running corner cases of errors This commit was SVN r32584.	2014-08-22 19:20:45 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
George Bosilca	cc0239d52f	Remove unused variables. This commit was SVN r31846.	2014-05-21 01:34:26 +00:00
George Bosilca	137874ec4d	In fact btl_eager and btl_dma will just vanish upon destruction of the bml_endpoint. No need to clean them bfore. This commit was SVN r31835.	2014-05-20 08:53:22 +00:00
George Bosilca	85e3caaa17	Handle the del_procs correctly. The btl_send is the complete list of existing BTL fo an endpoint, all the others are just partial list. Thus, all the cleaning should first be done in the btl_send array, and them in the other arrays (btl_eager and btl_rdma). This commit was SVN r31834.	2014-05-20 08:46:57 +00:00
George Bosilca	1647664c43	Show the unreacheable message for the first unreacheable proc and then stop. This commit was SVN r31833.	2014-05-20 08:40:32 +00:00
Gilles Gouaillardet	ef4548a215	bml/r2 : fix mca_bml_r2_del_procs() cmr=v1.8.2:reviewer=hjelmn:ticket=trac:4645 This commit was SVN r31830. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855 The following Trac tickets were found above: Ticket 4645 --> https://svn.open-mpi.org/trac/ompi/ticket/4645	2014-05-20 02:55:47 +00:00
Nathan Hjelm	27d3e1ca25	bml/r2: fix a problem identified by Gilles This commit fixes two issues: - This intent of the code @ bml_r2.c:486 is to prevent calling the btl_del_procs more than once for a given proc. Gilles correctly identified there was a problem in this code but r31786 we not the correct fix. - Fix a segmentation fault in r2 finalize revealed by the fact we actually call del_procs now. cmr=v1.8.2:reviewer=ggouaillardet:ticket=trac:4645 This commit was SVN r31829. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855 r31786 --> open-mpi/ompi@fc96b0a7b8 The following Trac tickets were found above: Ticket 4645 --> https://svn.open-mpi.org/trac/ompi/ticket/4645	2014-05-19 20:22:34 +00:00
Gilles Gouaillardet	fc96b0a7b8	Fix a typo in mca_bml_r2_del_procs() Use bml_endpoint->btl_eager instead of bml_endpoint->btl_send. cmr=v1.8.2:reviewer=rhc This commit was SVN r31786.	2014-05-16 04:43:18 +00:00
Nathan Hjelm	faf008f527	Fix bugs that were causing leaks in finalize. This commit fixes leaks of bml endpoints in finalize. A summary of the bugs/fixes is below. 1) ompi_mpi_finalize used ompi_proc_all to get the list of procs but never released the reference to them (ompi_proc_all called OBJ_RETAIN on all the procs returned). When calling del_procs at finalize it should suffice to call ompi_proc_world which does not increment the reference count. 2) del_procs is called BEFORE ompi_comm_finalize. This leaves the references to the procs from calling the pml_add_comm function. The fix is to reorder the calls to do omp_comm_finalize, del_procs, pml_finalize instead of del_procs, pml_finalize, ompi_comm_finalize. 3) The check in del_procs in r2 checked for a reference count of 1. This is incorrect. At this point there should be 2 references: 1 from ompi_proc, and another from the add_procs. The fix is to change this check to look for a reference count of 22. This check makes me extremely uncomforable as nothing will call del_procs if the reference count of a procs is not 2 when del_procs is called. Maybe there should be an assert since this is a developer error IMHO. cmr=v1.8.2:reviewer=bosilca This commit was SVN r31782. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855	2014-05-15 18:28:03 +00:00
Nathan Hjelm	e4db2c3ebb	ompi: fix various small leaks This commit fixes three leaks: - bml/r2: fix leak of del_procs in mca_bml_r2_del_procs - Release the modex data in btl/scif, btl/ugni, and btl/vader - ompi_mpi_finalize: close the allocator framework cmr=v1.8.2:reviewer=jsquyres This commit was SVN r31778. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855	2014-05-15 15:59:51 +00:00
Rolf vandeVaart	ce5274652f	Add some additional verbose output per this RFC http://www.open-mpi.org/community/lists/devel/2014/03/14282.php Reviewed by Jeff Squyres This commit was SVN r31072.	2014-03-14 20:17:47 +00:00
Jeff Squyres	da87b506bd	Remove warnings identified by clang 3.4 * Remove unused static functions * Remove unused static variables cmr=v1.8:reviewer=hjelmn This commit was SVN r31023.	2014-03-12 13:17:54 +00:00
Christoph Niethammer	4f23d8214c	Fixed incorrect calculation of reallocated memory in mca_bml_r2_del_btl. This commit was SVN r30529.	2014-02-03 08:43:59 +00:00
George Bosilca	d265981c55	Don't always retain the proc, do it only for new procs. This enforce a strict policy in the BML, it has one and only one ref on each proc. This commit was SVN r30429.	2014-01-26 17:26:04 +00:00
Christoph Niethammer	86776daf75	Fixed typo in opal output message. This commit was SVN r30392.	2014-01-23 08:37:40 +00:00
Brian Barrett	16a1166884	Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a configure-time dynamic allocation of flags. The net result for platforms which only support BTL-based communication is a reduction of 8*nprocs bytes per process. Platforms which support both MTLs and BTLs will not see a space reduction, but will now be able to safely run both the MTL and BTL side-by-side, which will prove useful. This commit was SVN r29100.	2013-08-30 16:54:55 +00:00
Ralph Castain	45e695928f	As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time: * add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit. * remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL" * modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded * removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base * added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames This commit was SVN r29052.	2013-08-20 18:59:36 +00:00
Ralph Castain	611d7f9f6b	When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require. This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times. Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes: * upon first request for data, have the OPAL db pmi component fetch and decode all the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally * reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test * reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued). Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time. This commit was SVN r29040.	2013-08-17 00:49:18 +00:00
Brian Barrett	312f37706e	In talking about this with Jeff and Ralph, we don't actually need ompi_show_help, because opal_show_help is replaced with an aggregating version when using ORTE, so there's no reason to directly call orte_show_help. This commit was SVN r28051.	2013-02-12 21:10:11 +00:00
Brian Barrett	f42783ae1a	Move the RTE framework change into the trunk. With this change, all non-CR runtime code goes through one of the rte, dpm, or pubsub frameworks. This commit was SVN r27934.	2013-01-27 23:25:10 +00:00
Josh Hursey	28681deffa	Backout the ORCA commit. :( There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk. This commit was SVN r26676.	2012-06-27 01:28:28 +00:00
Josh Hursey	542330e3a7	Commit of ORCA: Open MPI Runtime Collaborative Abstraction This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI. The project is described on the wiki: https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition And on this email thread: http://www.open-mpi.org/community/lists/devel/2012/06/11109.php This commit was SVN r26670.	2012-06-26 21:42:16 +00:00
Ralph Castain	bd8b4f7f1e	Sorry for mid-day commit, but I had promised on the call to do this upon my return. Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code. Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch. This commit was SVN r26242.	2012-04-06 14:23:13 +00:00
George Bosilca	80c02647c8	Each level (OPAL/ORTE/OMPI) should only return it's own constants, instead of the current mismatch. This commit was SVN r25230.	2011-10-04 14:50:31 +00:00
Eugene Loh	9bbcd51c5a	Properly initialize ep->btl_max_send_size, ep->btl_pipeline_send_length, and ep->btl_send_limit in mca_bml_r2_del_proc_btl() so that the loops will correctly compute new endpoint max/min after the BTL has been removed. See http://www.open-mpi.org/community/lists/devel/2011/01/8829.php This commit was SVN r24202.	2011-01-04 20:35:33 +00:00
Abhishek Kulkarni	afbe3e99c6	* Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with (OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns back the native error code. * Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to decode 'ret' to get the native error code. This commit was SVN r23162.	2010-05-17 23:08:56 +00:00
George Bosilca	5145efdc47	This typo lived way too long ... This commit was SVN r21864.	2009-08-21 15:23:11 +00:00
Rolf vandeVaart	c82e468ede	Undo revision r21767 - sorry folks This commit was SVN r21769. The following SVN revision numbers were found above: r21767 --> open-mpi/ompi@41f38110ff	2009-08-05 22:23:26 +00:00
Rolf vandeVaart	41f38110ff	HCA failover support in openib BTL This commit was SVN r21767.	2009-08-05 21:53:02 +00:00
Rainer Keller	a94438343b	- Revert r20740 This commit was SVN r20741. The following SVN revision numbers were found above: r20740 --> open-mpi/ompi@2a70618a77	2009-03-05 21:50:47 +00:00
Rainer Keller	2a70618a77	- Second patch, as discussed in Louisville. Replace short macros in orte/util/name_fns.h to the actual fct. call. - Compiles on linux/x86-64 This commit was SVN r20740.	2009-03-05 21:14:18 +00:00
Rainer Keller	811f2bd9b4	- As discussed on RFC, move the ompi_bitmap to the opal layer. Add a check against a maximum (actually get rid of ifs internally to opal_bitmap.c) -- the functionality to set the current maximum size opal_bitmap_set_max_size() is currently only used in attribute.c to set the maximum OMPI_FORTRAN_HANDLE_MAX... Tested on linux/x86-64 with intel-tests with all_tests_no_perf_f run with 6 procs. Let's look into MTT as well... This commit was SVN r20708.	2009-03-03 22:25:13 +00:00
Jeff Squyres	14a83a6bbc	Clean up the BML shutdown. Reviewed by George. This commit was SVN r20586.	2009-02-19 13:17:01 +00:00
Rainer Keller	d81443cc5a	- On the way to get the BTLs split out and lessen dependency on orte: Often, orte/util/show_help.h is included, although no functionality is required -- instead, most often opal_output.h, or orte/mca/rml/rml_types.h Please see orte_show_help_replacement.sh commited next. - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration actually showed two missing #include "orte/util/show_help.h" in orte/mca/odls/base/odls_base_default_fns.c and in orte/tools/orte-top/orte-top.c Manually added these. Let's have MTT the last word. This commit was SVN r20557.	2009-02-14 02:26:12 +00:00
George Bosilca	4747a4bb53	ompi_comm_all allocate memory and retain the objects. Therefore, after each call to ompi_comm_all we should parse the communicator list and release the objects ... This commit was SVN r20525.	2009-02-11 21:48:11 +00:00
George Bosilca	69afbc084a	Don't forget to cast or the compiler will do the division as a double and then convert. This commit was SVN r20029.	2008-11-24 15:53:56 +00:00
Jeff Squyres	57c52d373b	Minor nit: use the real filename. opal_show_help() will ultimately find the file even if we don't specify the ".txt" extension here, but we might as well do it Right in the first place. This commit was SVN r19867.	2008-10-31 21:48:18 +00:00
George Bosilca	00d24bf8ab	Scalability patch, or slim-fast effect #1 . All BML structures just got a whole lot smaller, decreasing the memory footprint of the running application. How much it's a good question. Here is a breakdown: - in mca_bml_base_endpoint_t: 3 size_t + 1 uint32_t - in mca_bml_base_btl_t: 1 * int + 1 * double - 1 * float + 6 * size_t + 9 * (void) The decrease in mca_bml_base_endpoint_t is for each peer and the decrease in mca_bml_base_btl_t is for each BTL for each peer. So, if we consider the most convenient case where there is only one network between all peers, this decrease the memory foot print per peer by 9size_t + 9(void) + 2 * int32_t + 1 * double - 1 * float. On a 64 bits machine this will be 156 bytes per peer. Now we access all these fields directly from the underlying BTL structure, and as this structure is common to multiple BML endpoint, we are a lot more cache friendly. Even if this do not improve the latency, it makes the SM performance graph a lot smoother. This commit was SVN r19659.	2008-09-30 21:02:37 +00:00
George Bosilca	325d006577	Mostly cleanups, and eventually a little bit more scalable add_procs. There was an argument that was barely used, and on return at the PML level it contained nothing usable. It has been removed, so now we're using less memory ... This commit was SVN r19657.	2008-09-30 15:47:43 +00:00
Jeff Squyres	627f1ecd36	Oops -- fix silly cut-n-paste error... This commit was SVN r19630.	2008-09-24 20:17:54 +00:00
Jeff Squyres	d85aaf521a	Also show the process name; it is useful, at least to us developers ;-) This commit was SVN r19629.	2008-09-24 19:32:34 +00:00
Josh Hursey	80d05cf957	Cleanup the patch from r19566. Thanks to George and Jeff for pointing out a better way to do this. This commit was SVN r19573. The following SVN revision numbers were found above: r19566 --> open-mpi/ompi@351c3a3a86	2008-09-17 13:55:21 +00:00
Josh Hursey	351c3a3a86	The ft_event function needs access to the bml_r2_remove_btl_progress() to ensure that all progress events are flushed as needed across a checkpoint/restart. This commit was SVN r19566.	2008-09-16 19:06:53 +00:00
George Bosilca	17e65369be	Fix the deadlock when we run out of resources on the BTLs. Move the progress function from the BML into the PML. The BTL progress functions are now directly registered with the event library. This commit was SVN r19561.	2008-09-15 22:56:23 +00:00
Jeff Squyres	f794580bbe	Print a [much] better error message when MPI processes are unable to reach each other (this problem just bit me; I had forgotten how horrid our previous error message was). This commit was SVN r19548.	2008-09-11 20:52:58 +00:00
Jeff Squyres	d8860502df	Fix CID 1092: remove a useless header file (bml_base_endpoint.h -- it didn't contain anything!) and therefore remove some include file recursion. This commit was SVN r19226.	2008-08-08 12:39:30 +00:00
Ralph Castain	adf2b4dfda	Correct an error output so the process names are sensible This commit was SVN r19206.	2008-08-06 18:56:16 +00:00
George Bosilca	2d8cbc6ade	Allow other BTL to work even if they collide with regard to the exclusivity. The problem was that by decreasing the btl_inuse if there was already a registered BTL we basically reset the changes for this new BTL to register it's progress function, even if it was supposed to handle another peer. This commit was SVN r19080.	2008-07-29 18:38:11 +00:00

1 2 3

115 Коммитов