openmpi

Автор	SHA1	Сообщение	Дата
Abhishek Kulkarni	c63c4d6892	Fix bugs where (OMPI_ERROR == ) checks cannot be converted to (OMPI_SUCCESS != ) since the return codes are overloaded to return an "index" on success. The fix is to just check if the return value is positive or not, since all the SOS encoded errors are always negative. The real fix (as Ralph points out) is to change these functions (opal_pointer_array_add and mca_base_param*) to return the index as a pointer. This commit was SVN r23173.	2010-05-18 20:54:11 +00:00
Abhishek Kulkarni	afbe3e99c6	* Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with (OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns back the native error code. * Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to decode 'ret' to get the native error code. This commit was SVN r23162.	2010-05-17 23:08:56 +00:00
Rainer Keller	221fb9dbca	... Delayed due to notifier commits earlier this day ... - Delete unnecessary header files using contrib/check_unnecessary_headers.sh after applying patches, that include headers, being "lost" due to inclusion in one of the now deleted headers... In total 817 files are touched. In ompi/mpi/c/ header files are moved up into the actual c-file, where necessary (these are the only additional #include), otherwise it is only deletions of #include (apart from the above additions required due to notifier...) - To get different MCAs (OpenIB, TM, ALPS), an earlier version was successfully compiled (yesterday) on: Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled This commit was SVN r21096.	2009-04-29 01:32:14 +00:00
Ralph Castain	25491628b8	Discovered while documenting the "preconnect" mca params that several of them didn't make sense any more. After chatting with Jeff, we agreed to the following: 1. register "mpi_preconnect_all" as a deprecated synonym for "mpi_preconnect_mpi" 2. remove "mpi_preconnect_oob" and "mpi_preconnect_oob_simultaneous" as these are no longer valid. 3. remove the routed framework's "warmup_routes" API. With the removal of the direct routed component, this function at best only wasted communications. The daemon routes are completely "warmed up" during launch, so having MPI procs order the sending of additional messages is simply wasteful. 4. remove the call to orte_routed.warmup_routes from MPI_Init. This was the only place it was used anyway. The FAQs will be updated to reflect this changed situation, and a CMR filed to move this to the 1.3 branch. This commit was SVN r19933.	2008-11-05 19:41:16 +00:00
Ralph Castain	3b5e80fa61	Shift responsibility for preconnecting the oob to the orte routed framework, which is the only place that knows what needs to be done. Only the direct module will actually do anything - it uses the same algo as the original preconnect function. This commit was SVN r18677.	2008-06-19 13:48:26 +00:00
Ralph Castain	0532d799d6	Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm. Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed. This commit was SVN r18664.	2008-06-18 03:15:56 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Jeff Squyres	ad784a9ab0	Make "simultaneous" be a size_t; there's already a check to ensure that it is >= 1, so making it a size_t makes it easier to interact with all the other size_t variables and removes a compiler warning. This commit was SVN r15935.	2007-08-20 13:22:46 +00:00
Rolf vandeVaart	797078115d	Fix the case where mpi_preconnect_oob=1 and mpi_preconnect_oob_simultaneous > np. Need to scale back simultaneous to equal np in those cases. Reviewed by Brian. This commit fixes trac:1064. This commit was SVN r15916. The following Trac tickets were found above: Ticket 1064 --> https://svn.open-mpi.org/trac/ompi/ticket/1064	2007-08-17 20:18:42 +00:00
Jeff Squyres	3bc940ac27	Fix three things from r15474 (thanks to Brian for noticing): * bml.h had a change that introduced a variable named "_order" to avoid a conflict with a local variable. The namespace starting with _ belongs to the os/compiler/kernel/not us. So we can't start symbols with _. So I replaced it with arg_order, and also updated the threaded equivalent of the macro that was modified. * in btl_openib_proc.c, one opal_output accidentally had its string reverted from "ompi_modex_recv..." to "mca_pml_base_modex_recv....". This was fixed. * The change to ompi/runtime/ompi_preconnect.c was entirely reverted; it was an artifact of debugging. This commit was SVN r15475. The following SVN revision numbers were found above: r15474 --> open-mpi/ompi@8ace07efed	2007-07-18 11:38:06 +00:00
Jeff Squyres	8ace07efed	This commit brings in two major things: 1. Galen's fine-grain control of queue pair resources in the openib BTL. 1. Pasha's new implementation of asychronous HCA event handling. Pasha's new implementation doesn't take much explanation, but the new "multifrag" stuff does. Note that "svn merge" was not used to bring this new code from the /tmp/ib_multifrag branch -- something Bad happened in the periodic trunk pulls on that branch making an actual merge back to the trunk effectively impossible (i.e., lots and lots of arbitrary conflicts and artifical changes). :-( == Fine-grain control of queue pair resources == Galen's fine-grain control of queue pair resources to the OpenIB BTL (thanks to Gleb for fixing broken code and providing additional functionality, Pasha for finding broken code, and Jeff for doing all the svn work and regression testing). Prior to this commit, the OpenIB BTL created two queue pairs: one for eager size fragments and one for max send size fragments. When the use of the shared receive queue (SRQ) was specified (via "-mca btl_openib_use_srq 1"), these QPs would use a shared receive queue for receive buffers instead of the default per-peer (PP) receive queues and buffers. One consequence of this design is that receive buffer utilization (the size of the data received as a percentage of the receive buffer used for the data) was quite poor for a number of applications. The new design allows multiple QPs to be specified at runtime. Each QP can be setup to use PP or SRQ receive buffers as well as giving fine-grained control over receive buffer size, number of receive buffers to post, when to replenish the receive queue (low water mark) and for SRQ QPs, the number of outstanding sends can also be specified. The following is an example of the syntax to describe QPs to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues: {{{ -mca btl_openib_receive_queues \ "P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32" }}} Each QP description is delimited by ";" (semicolon) with individual fields of the QP description delimited by "," (comma). The above example therefore describes 4 QPs. The first QP is: P,128,16,4 Meaning: per-peer receive buffer QPs are indicated by a starting field of "P"; the first QP (shown above) is therefore a per-peer based QP. The second field indicates the size of the receive buffer in bytes (128 bytes). The third field indicates the number of receive buffers to allocate to the QP (16). The fourth field indicates the low watermark for receive buffers at which time the BTL will repost receive buffers to the QP (4). The second QP is: S,1024,256,128,32 Shared receive queue based QPs are indicated by a starting field of "S"; the second QP (shown above) is therefore a shared receive queue based QP. The second, third and fourth fields are the same as in the per-peer based QP. The fifth field is the number of outstanding sends that are allowed at a given time on the QP (32). This provides a "good enough" mechanism of flow control for some regular communication patterns. QPs MUST be specified in ascending receive buffer size order. This requirement may be removed prior to 1.3 release. This commit was SVN r15474.	2007-07-18 01:15:59 +00:00
Brian Barrett	5f15becf4e	Allow multiple connections to be started simultaneously when doing the OOB wireup. For small clusters or clusters with decent ARP lookup and connect times, this will have marginal impact. For systems with either bad ARP lookup times or long connect times, increasing this number to something much closer to SOMAXCONN (128 on most modern machines) will result in a faster OOB wireup. Don't set higher than SOMAXCONN or you can end up with lots of connect() retries and we'll end up slower. This commit was SVN r14742.	2007-05-23 21:35:44 +00:00
Brian Barrett	a25ce44dc1	Clean up the preconnect code: * Don't need the 2 process case -- we'll send an extra message, but at very little cost and less code is better. * Use COMPLETE sends instead of STANDARD sends so that the connection is fully established before we move on to the next connection. The previous code was still causing minor connection flooding for huge numbers of processes. * mpi_preconnect_all now connects both OOB and MPI layers. There's also mpi_preconnect_mpi and mpi_preconnect_oob should you want to be more specific. * Since we're only using the MCA parameters once at the beginning of time, no need for global constants. Just do the quick param lookup right before the parameter is needed. Save some of that global variable space for the next guy. Fixes trac:963 This commit was SVN r14553. The following Trac tickets were found above: Ticket 963 --> https://svn.open-mpi.org/trac/ompi/ticket/963	2007-05-01 04:49:36 +00:00
George Bosilca	cb93b1d40d	Deal with compiler warnings and size_t in same time ... It's getting more and more tricky !!! This commit was SVN r14162.	2007-03-28 22:02:13 +00:00
George Bosilca	4bc69447b4	Setting a size_t to -1 leads to unexpected results ... This commit was SVN r14160.	2007-03-28 18:23:42 +00:00
Shiqing Fan	fb50a72e92	Unnecessary header removed. This commit was SVN r14152.	2007-03-27 14:32:30 +00:00
Shiqing Fan	91cfb2f149	A few mismatched declearations are fixed, and several header files are added for Cygwin... This commit was SVN r14151.	2007-03-27 14:17:25 +00:00
Brian Barrett	211ed6e852	Make the trunk look similar to v1.1 and v1.2, but return an error if we can't find "me" in the list of procs, since we should always be in the proc_world list, or something bad has happened... This commit was SVN r14025.	2007-03-13 20:17:10 +00:00
Brian Barrett	f59d38dd81	fix stupid compiler warning This commit was SVN r14024.	2007-03-13 19:45:26 +00:00
George Bosilca	533dfff56d	Only do the preconnection stage if we found the local proc. It's mostly to make some compilers complain less about uninitialized values. This commit was SVN r13805.	2007-02-26 22:24:44 +00:00
Brian Barrett	727f64aecf	It appears that SEND_COMPLETE on a 0 byte message with BTLs that don't support SEND_IN_PLACE causes badness because the BTL tries to use the not-exactly-complete convertor. Don't need it in this situation anyway. This commit was SVN r13700.	2007-02-19 02:43:26 +00:00
Brian Barrett	8b28e5b33d	Allow the OOB to connect between all MPI applications during MPI_INIT without also establishing MPI connectivity. This commit was SVN r13595.	2007-02-09 20:17:37 +00:00
Brian Barrett	262cbbc5c9	Back out r13593, which contained a change that shouldn't be committed. This commit was SVN r13594. The following SVN revision numbers were found above: r13593 --> open-mpi/ompi@81472363ea	2007-02-09 20:13:02 +00:00
Brian Barrett	81472363ea	Allow the OOB to connect between all MPI applications during MPI_INIT without also establishing MPI connectivity. This commit was SVN r13593.	2007-02-09 20:11:40 +00:00
Galen Shipman	faa0bafa96	modify preconnect to use a rotating ring algorithm, OOB connections are brought up lazily so we want to be a bit less agressive. This commit was SVN r12906.	2006-12-21 01:36:57 +00:00
Brian Barrett	bdf0b231b2	Undo r12871, as it contained some code in ompi/runtime that shouldn't have been committed Refs trac:669 This commit was SVN r12872. The following SVN revision numbers were found above: r12871 --> open-mpi/ompi@597598b712 The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:52:13 +00:00
Brian Barrett	597598b712	Move the req_mtl structure back to the end of each of the structures in the CM PML. The req_mtl structure is cast into a mtl__request_structure for each MTL, which is larger than the req_mtl itself. The cast will cause the _request to overwrite parts of the heavy requests if the req_mtl isn't the LAST thing on each structure (hence the comment). This was moved as an optimization at some point, which caused buffer sends to fail... Refs trac:669 This commit was SVN r12871. The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:46:53 +00:00
George Bosilca	3f0a7cad9e	The last patch for Windows support. Mostly casting and conversion to C++ friendly headers. This commit was SVN r11400.	2006-08-24 16:38:08 +00:00
Jeff Squyres	c198fd2fd5	Remove some unused variables / compiler warnings. This commit was SVN r11118.	2006-08-05 10:43:54 +00:00
Jeff Squyres	b6c6d9a2b7	Bring over r10877 and r10881 from the /tmp/tbird branch: r10877: add warm up connection option.. of course this only warms up the first eager btl but this should be adequate for now.. r10881: Consulted with Galen and did a few things: - Fix the algorithm to actually make the connections that we want - Rename the MCA param to mpi_preconnect_all - Cleanup the code a bit: - move the logic to a separate .c file - check return codes properly This commit was SVN r11114. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r10877 r10877 r10881 r10881	2006-08-04 14:41:31 +00:00

30 Коммитов