openmpi

Автор	SHA1	Сообщение	Дата
Rolf vandeVaart	66725f6973	Enable some CUDA-aware support on tcp btl. Only when configured in. This commit was SVN r29364.	2013-10-04 12:50:16 +00:00
Nathan Hjelm	f3d18028e5	Fix typo in uGNI prepare source that could cause incorrect results with non-contiguous datatypes. cmr=v1.7.3 This commit was SVN r29294.	2013-09-30 16:00:58 +00:00
Dave Goodell	a42fa78da7	usnic: SEGV in OSU benchmarks Prevent frag from being freed out from under us in the case the PML callback routine calls usnic_free(). We accomplish this by delaying decrement of sf_bytes_to_ack until after the callback is performed, since sf_bytes_to_ack == 0 is condition of freeing the frag. Fixes Cisco bug CSCuj45094. Authored-by: Reese Faucette <rfaucett@cisco.com> cmr=v1.7.3 This commit was SVN r29264.	2013-09-26 21:48:04 +00:00
Ralph Castain	34fbec1f49	Sadly, the connection priorities being defined at time of variable instantiation were being overridden just before registering the param. Thus, changes people made to the relative priority of the cpc methods were being lost. Fix it be removing the duplicate initializiation, letting the value defined at instantiation be the one actually used. cmr:v1.7.4:reviewer=hjelmn This commit was SVN r29212.	2013-09-19 19:45:00 +00:00
Jeff Squyres	74d1278f48	btl_usnic_util.c:ompi_btl_usnic_util_abort() also passes in the strerror(). This commit was SVN r29188.	2013-09-17 12:35:51 +00:00
Reese Faucette	8f235e6977	usnic: wrong SG entry used to compute length for small put()s This commit was SVN r29186.	2013-09-17 08:18:02 +00:00
Reese Faucette	651d61f1a3	Clean up debugging logging a bit. MSGDEBUG2 now means "print a one-liner for all PML calls into BTL, and also when BTL calls PML with a recv completion (not send completions)" MSGDEBUG1 means print more internal gory detail MSGDEBUG is gone, replaced by MSGDEBUG1 In the process also found that PUT_DEST style fragments could potentially be leaked in usnic_free() since send_fragment tests were being applied to see if it was eligible to be freed. This commit was SVN r29185.	2013-09-17 07:29:40 +00:00
Reese Faucette	f35d9b50e3	Cisco CSCuj22803: fixes for Bsend changes required to support MPI_Bsend(). Introduces concept of attaching a buffer to a large segment that the PML can scribble into and we will send from. The reason we don't use a pinned buffer and send directly from that is that usnic_verbs does not (yes) support num_sge>1 for regular sends. This means the data gets copied twice, but that is unavoidable. changed the logic in handle_large_send to be more sensible Incorporated David's review comments This commit was SVN r29184.	2013-09-17 07:27:39 +00:00
Reese Faucette	25b5c84d0f	Cisco CSCuj13135: Data corruption in MPI_Bsend_ator_c Do not assume that the "size" passed to alloc_send() will be the same as the size of the message the resulting fragment will hold when usnic_send() is called. This means usnic_send()/usnic_put() can never trust any pre-computed size values, and are only allowed to look at the lengths and pointers of the elements in the desc SG list. This commit was SVN r29183.	2013-09-17 07:25:05 +00:00
Reese Faucette	b9103c0f66	Cisco CSCuj12524: c_put_big segfault - usnic_free() cannot free the fragment until ACK is received This commit was SVN r29182.	2013-09-17 07:23:15 +00:00
Reese Faucette	89b5f0899b	Cisco CSCuj12520: various problems running c_fence_put_1 - tag needs to be sent in our header, not the PML header - usnic_alloc() should return smaller value if too much data requested - be careful about callbacks vs removing items from lists (we need to remove from outr lists before the callback) - improve send callback handling - add some more MSGDEBUG2 logging and cleanup This commit was SVN r29181.	2013-09-17 07:20:44 +00:00
Rolf vandeVaart	096b8c022e	Also add flag to debug output. This commit was SVN r29163.	2013-09-13 19:47:05 +00:00
Rolf vandeVaart	c15b2a26b8	Fix some formatting. Move some CUDA-aware mca parameter initialization earlier. This commit was SVN r29162.	2013-09-13 17:43:41 +00:00
Rolf vandeVaart	d247c26b84	In the case that HAVE_IBV_FORK_INIT is not defined, we will need this variable so we can give the user an error if they ask for it. Also fixes compile error when HAVE_IBV_FORK_INIT is not defined. This commit was SVN r29160.	2013-09-13 14:38:49 +00:00
Rolf vandeVaart	ba9ec1b8bc	For debug builds, add the ability to view memory registrations and deregistrations in the openib BTL. This commit was SVN r29159.	2013-09-13 14:28:26 +00:00
Joshua Ladd	b3f88c4a1d	Per the RFC schedule, this commit adds Mellanox OpenSHMEM to the trunk. It does not yet run on OSX or with CM PML for an MTL other than MXM. Mellanox is aware of these issues and is in the process of resolving them. This should be added to \ncmr=v1.7.4:subject=Move OSHMEM to 1.7.4:reviewer=rhc This commit was SVN r29153.	2013-09-10 15:34:09 +00:00
Jeff Squyres	c9f05a2664	Delineate OMPI_FREE_LIST__MT separately. The FREE_LIST__MT stuff was introduced on the SVN trunk in r28722 (2013-07-04), but so far, has not been merged into the v1.7 branch yet (2013-09-06). So put it in its own #ifdef, rather than defining it based on OMPI_MAJOR_VERSION/OMPI_MINOR_VERSION. This commit was SVN r29148. The following SVN revision numbers were found above: r28722 --> open-mpi/ompi@c9e5ab9ed1	2013-09-06 19:22:56 +00:00
Jeff Squyres	e02cc0a7ec	No need for this header file. This commit was SVN r29147.	2013-09-06 19:22:28 +00:00
Jeff Squyres	c53b0890cf	Ensure that btl_usnic_compat.h is in the tarball. This commit was SVN r29140.	2013-09-06 15:53:56 +00:00
Dave Goodell	75fa28c303	usnic: v1.6<->trunk unification, trunk side The Cisco-maintained v1.6 port of the usnic BTL has diverged from the upstream trunk and v1.7 branches. This commit adjusts the trunk to more closely match the v1.6 branch to simplify future merging and cherry-picking. The usnic MCA parameters also need work on this side. Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760) This commit was SVN r29138. The following Trac tickets were found above: Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760	2013-09-06 03:21:34 +00:00
Dave Goodell	a669bd01e6	usnic: revamp convertor handling. The fix for the HPL SEGV was incorrect because it assumed the prepare_src() routine was always allowed to return "bytes processed" less than the requested "bytes to send". It turns out this is only true if the convertor is what limits the size, we are not allowed to limit the data sent for our own reasons, else we break login in the upper layers. This means we need to learn the number of bytes out of the size requested the convertor will give us, no matter how big the size is. Unfortunately, this is a destructive test, and (currently) the only way to learn that number is to actually have the convertor copy the data out into buffers. This change implements this, copying the entire data out into a chain of send segments which are attached to the large send fragment. Now we can always return the proper size value to the PML. Fixes Cisco bug CSCuj08024 Authored-by: Reese Faucette <rfaucett@cisco.com> Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760) This commit was SVN r29137. The following Trac tickets were found above: Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760	2013-09-06 03:21:21 +00:00
Dave Goodell	0ef8336502	new bookkeeping code should return value indicating whether packet is good or not. Authored-by: Reese Faucette <rfaucett@cisco.com> Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760) This commit was SVN r29136. The following Trac tickets were found above: Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760	2013-09-06 03:19:32 +00:00
Dave Goodell	122890c2fd	usnic: "bookeeping" --> "bookkeeping" Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760) This commit was SVN r29135. The following Trac tickets were found above: Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760	2013-09-06 03:19:20 +00:00
Dave Goodell	0df6ed4acc	usnic: squash warnings from perf improvements Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760) This commit was SVN r29134. The following Trac tickets were found above: Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760	2013-09-06 03:19:08 +00:00
Dave Goodell	6dc54d372d	usnic: Basket of performance changes including: - round segment buffer allocation to cache-line - split some routines into an inline fast section and a called slower section - introduce receive fastpath in component_progress that: o returns immediately if there is a packet available on priority queue and fastpath is enabled o disables fastpath for 1 time after use to provide fairness to other processing o defers receive buffer posting o defers bookeeping for receive until next call to usnic_component_progress Authored-by: Reese Faucette <rfaucett@cisco.com> Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760) This commit was SVN r29133. The following Trac tickets were found above: Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760	2013-09-06 03:18:57 +00:00
Dave Goodell	9cab9777d9	usnic: properly destroy embedded small send frag Without this, an `--enable-debug` build would hit an assertion in the list code when run under valgrind with `--malloc-fill=0xff` or any other case where malloc returned non-zeroed buffers. Also allow the normal OBJ_ machinery to handle the constructor invocation ordering for us instead of doing it by hand (which could have led to future bugs). Reviewed-by: jsquyres@cisco.com cmr=v1.7.4 Depends on trunk functionality in r29095 and r29096. Refs trac:3740,#3741. This commit was SVN r29127. The following SVN revision numbers were found above: r29095 --> open-mpi/ompi@d1b5940e97 r29096 --> open-mpi/ompi@a552921171 The following Trac tickets were found above: Ticket 3740 --> https://svn.open-mpi.org/trac/ompi/ticket/3740	2013-09-04 20:59:12 +00:00
Brian Barrett	16a1166884	Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a configure-time dynamic allocation of flags. The net result for platforms which only support BTL-based communication is a reduction of 8*nprocs bytes per process. Platforms which support both MTLs and BTLs will not see a space reduction, but will now be able to safely run both the MTL and BTL side-by-side, which will prove useful. This commit was SVN r29100.	2013-08-30 16:54:55 +00:00
Rolf vandeVaart	18962d296b	This has bothered me for a while. Change MCA_BTL_TAG_BTL to MCA_BTL_TAG_IB. They are the same value so this does not change anything. (MCA_BTL_TAG_IB = MCA_BTL_TAG_BTL + 0). This just makes it more correct. This commit was SVN r29099.	2013-08-30 14:53:59 +00:00
Dave Goodell	c5a7e8a079	usnic: stomp format specifier warnings The usnic BTL now builds cleanly under `--enable-picky` when `MSGDEBUG1` is set. Reviewed-by: jsquyres cmr=v1.7.4:reviewer=jsquyres This commit was SVN r29097.	2013-08-29 23:24:14 +00:00
George Bosilca	305fa88d4b	Remove two warnings from the SM BTL. The return code can be safely ignored as the internals of the SM BTL will repost the fragment until the send operation succesfully complete. This commit was SVN r29077.	2013-08-28 06:36:01 +00:00
Dave Goodell	dd82bd3c19	usnic: fix invalid rfstart initialization endpoint_rfstart was being initialized from a value which was not yet set. Also ensure that rfstart is a valid index in the range 0..WINDOW_SIZE-1, since it is used as the index into endpoint_rcvd_segs, which has WINDOW_SIZE elements. Without this change there is significant risk of memory corruption or segfaults, resulting in hangs or crashes, if malloc ever returns us a value >=WINDOW_SIZE (4096). Right now we seem to be getting lucky that the malloc is returning zero-pages to us when we are allocating endpoint structures (possibly because the freelist performs a single large allocation for all endpoints). Fixes Cisco bug CSCui88781. Reviewed-by: rfaucett@cisco.com Reviewed-by: jsquyres@cisco.com cmr=v1.7.3:reviewer=jsquyres This commit was SVN r29075.	2013-08-27 22:43:20 +00:00
Rolf vandeVaart	96457df9bc	Fix compile errors created from changeset 29058. This commit was SVN r29061.	2013-08-22 18:25:23 +00:00
Jeff Squyres	63ac60864b	Refs trac:3730 Turns out that AC_CHECK_DECLS is one of the "new style" Autoconf macros that #defines the output to be 0 or 1 (vs. #define'ing or #undef'ing it). So don't check for "#if defined(..."; just check for "#if ...". This commit was SVN r29059. The following Trac tickets were found above: Ticket 3730 --> https://svn.open-mpi.org/trac/ompi/ticket/3730	2013-08-22 17:44:20 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Rolf vandeVaart	504fa2cda9	Fix support in smcuda btl so it does not blow up when there is no CUDA IPC support between two GPUs. Also make it so CUDA IPC support is added dynamically. Fixes ticket 3531. This commit was SVN r29055.	2013-08-21 21:00:09 +00:00
Rolf vandeVaart	96fdb060ea	Fix compile errors and warnings from changeset 29052. This commit was SVN r29054.	2013-08-21 19:01:54 +00:00
Steve Wise	67fe3f23ed	Use the HAVE_DECL_IBV_LINK_LAYER_ETHERNET macro. Commit r27211 added ifdef checks for #define HAVE_IBV_LINK_LAYER_ETHERNET, which is incorrect. The correct #define is HAVE_DECL_IBV_LINK_LAYER_ETHERNET. This broke OMPI over iWARP. This fixes trac:3726 and should be added to cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29053. The following SVN revision numbers were found above: r27211 --> open-mpi/ompi@b27862e5c7 The following Trac tickets were found above: Ticket 3726 --> https://svn.open-mpi.org/trac/ompi/ticket/3726	2013-08-20 20:00:46 +00:00
Ralph Castain	45e695928f	As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time: * add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit. * remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL" * modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded * removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base * added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames This commit was SVN r29052.	2013-08-20 18:59:36 +00:00
Jeff Squyres	b30ad28276	Remove some unused variables and an unused goto label. This commit was SVN r29044.	2013-08-19 16:18:35 +00:00
Ralph Castain	611d7f9f6b	When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require. This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times. Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes: * upon first request for data, have the OPAL db pmi component fetch and decode all the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally * reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test * reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued). Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time. This commit was SVN r29040.	2013-08-17 00:49:18 +00:00
Ralph Castain	f8a72feb25	Silence unitialized var warning This commit was SVN r29036.	2013-08-16 21:39:28 +00:00
Jeff Squyres	c09ec204ad	Change usNIC BTL to always use small fragments when there is a non-contiguous converter. We can't "convert on the fly" because the # of bytes requested may not divide evenly into the convertor data type. This commit was SVN r29014.	2013-08-11 17:04:13 +00:00
George Bosilca	837b3363fe	Silence few warnings. This commit was SVN r29004.	2013-08-06 09:38:30 +00:00
Brian Barrett	2cc947513b	* Fix some compile errors * Need to subtract 1 off the size so that we stay in the bit length requirements This commit was SVN r28997.	2013-08-05 18:49:48 +00:00
Jeff Squyres	87910daf51	Fix a collection of bugs found by QA and Coverity, and make some minor improvements: * Fix minor memory leaks during component_init * Ensure that an initialization loop does not underflow an unsigned int * Improve mlock limit checking * Fix set of BTL modules created during component_init when failing to get QP resources or otherwise excluding some (but not all) usnic verbs devices * Fix/improve error messages to be consistent with other Cisco documentation * Randomize the initial sliding window sequence number so that we silently drop incoming frames from previous jobs that still have existant processes in the middle of dying (and are still transmitting) * Ensure we don't break out of add_procs too soon and create an asymetrical view of what interfaces are available This commit was SVN r28975.	2013-08-01 16:56:15 +00:00
Jeff Squyres	f7337b8f77	Correct faulty max payload and MTU computations (and update some debugging that helped us find those). This commit was SVN r28942.	2013-07-24 16:06:28 +00:00
Jeff Squyres	5323051047	Use sysfs to check MPI has enough VFs, QPs, and CQs Use the new sysfs files to check that there are enough VFs, QPs, and CQs for all the MPI processes on this server. Move the checking code into its own subroutine to make it smaller and easier to read/grok. This commit was SVN r28937.	2013-07-24 00:38:32 +00:00
Jeff Squyres	b437041aeb	Update one more comment. This commit was SVN r28908.	2013-07-22 17:29:00 +00:00
Jeff Squyres	4b6006402d	Use the RTE framework instead of calling ORTE directly. Brian (rightfully) hit me on the head with the don't-use-ORTE-use-the-rte-framework clue bat; the usnic BTL now nicely plays with the RTE framework. This commit was SVN r28907.	2013-07-22 17:28:23 +00:00
Jeff Squyres	194b285447	First commit of the Cisco usNIC BTL. This BTL accesses the Cisco usNIC Linux device via the Linux verbs API via Unreliable Datagram queue pairs. A few noteworthy points: * This BTL does most of its own fragmentation; it tells the PML that it has a very high max_send_size (much higher than the network MTU). * Since UD fragments are, by definition, unreliable, the usnic BTL handles all of its own reliability via a sliding window approach using the opal_hotel construct and many tricks stolen from the corpus of knowledge surrounding efficient TCP. * There is a fun PML latency-metric based optimization for NUMA awareness of short messages. * Note that this is ''not'' a generic UD verbs BTL; it is specific to the Cisco usNIC device. This commit was SVN r28879.	2013-07-19 22:13:58 +00:00

1 2 3 4 5 ...

1903 Коммитов