openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	05af83d5d8	common_verbs: Remove usnic magic probe test Check the IBV_TRANSPORT_* values. In the case of IBV_TRANSPORT_IWARP, there's an ambiguity and we need to also check to see whether the usnic verbs externsion probe exists. This commit was SVN r30913.	2014-03-03 21:32:44 +00:00
Dave Goodell	4875f48eaa	usnic: enable UDP support This commit decouples OMPI deployment from the version(s) of the lower layers of the stack by probing for UDP support. Verbs applications assume a 40-byte header (there is no current mechanism for querying payload offset). So to support a 42-byte UDP header without causing existing applications like ibv_ud_pingpong or older versions of OMPI to crash, we must inform libusnic_verbs that we are aware of the nonstandard payload offset. We do this by overriding the `transport_type` field of the device to be 42 before calling `ibv_open_device`. If the library resets it to something else, then we know the lower layers are UDP capable. Otherwise we use the older custom-L2 format. This necessitated some minor ugliness in common_verbs, but it's as tidy as Jeff and I know how to make it right now. This commit only adds support for UDP headers and connectivity over the same L2 network, it does not touch routing or interface pairing. Reviewed-by: Jeff Squyres <jsquyres@cisco.com> cmr=v1.7.5:ticket=trac:4253 This commit was SVN r30838. The following Trac tickets were found above: Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253	2014-02-26 07:44:35 +00:00
Dave Goodell	4af332bd4e	Fix the logic in ompi_common_verbs_find_ports(). The logic did not correctly perform the OR behavior that is described in the doxy docs for this function. This commit fixes the logic so that a port will be included if it has supports any of the capabilities indicated by the passed-in flags. Authored-by: Jeff Squyres <jsquyres@cisco.com> Reviewed-by: Dave Goodell <dgoodell@cisco.com> cmr=v1.7.5:ticket=trac:4253 This commit was SVN r30831. The following Trac tickets were found above: Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253	2014-02-26 07:39:21 +00:00
Joshua Ladd	9ea9bec4ad	Addressing Jeff's comments: 1. Changed rng_buff_t --> opal_rng_buff_t 2. All global variables obey the prefix rule 3. Old code has been removed 4. Found a couple of unnecessary includes Refs trac:4298 This commit was SVN r30807. The following Trac tickets were found above: Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298	2014-02-24 23:18:35 +00:00
Jeff Squyres	d07d1864ae	Revert r30804. We're going to be bringing a bunch of usnic code to the SVN trunk soon, and I basically brought this commit over out of order. So I'm reverting it for now; the same functionality will come back shortly. This commit was SVN r30805. The following SVN revision numbers were found above: r30804 --> open-mpi/ompi@5bedcc15bf	2014-02-24 19:12:49 +00:00
Jeff Squyres	5bedcc15bf	Support the IBV__USNIC_ verbs constants. These constants are now upstream (see https://git.kernel.org/cgit/libs/infiniband/libibverbs.git/commit/?id=f57a9c67eabb9e7f19c624ac3c8c27b7be55796c), so let's support them properly in Open MPI. Added bonus: consolidating these checks up in ompi_check_openfabrics.m4 allowed removing some custom checks and AC_DEFINE's from the usnic configure.m4 script. Also change the usnic/configure.m4 check for IBV_EVENT_GID_CHANGE to use AC_CHECK_DECLS (vs. AC_CHECK_DECL). cmr=v1.7.5:reviewer=dgoodell This commit was SVN r30804.	2014-02-24 18:57:04 +00:00
Joshua Ladd	e39d9f4080	Per the RFC schedule, add an additive lagged Fibonacci parallel random number generator to OPAL. In order to use, please add the following header to your code: opal/util/alfg.h. See ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c for an example how to seed with opal_srand and invoke the generator with opal_rand. This should be added to cmr=v1.7.5:reviewer=rhc:subject=Add an OPAL RNG This commit was SVN r30801.	2014-02-23 21:41:38 +00:00
Rolf vandeVaart	d4f12148c4	Fix several issues reported in ticket #4245 . This commit was SVN r30767.	2014-02-18 17:44:08 +00:00
Rolf vandeVaart	9f3bf4747d	Provide option to have synchronous copy be asynchronous with a wait. For now, this has to be selected at runtime. Also fix up some error messages to have node name in them. This commit was SVN r30396.	2014-01-23 15:47:20 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
Rolf vandeVaart	b3edca19df	Add braces per coding convention and design review. This commit was SVN r30137.	2014-01-07 17:30:37 +00:00
Rolf vandeVaart	695d854cd8	Fix return value. This commit was SVN r30034.	2013-12-20 20:57:04 +00:00
Rolf vandeVaart	4cd1958deb	Fix so we do not get warnings when running on system without CUDA software installed and CUDA-aware compiled in. This commit was SVN r30032.	2013-12-20 20:39:25 +00:00
Rolf vandeVaart	b955dbd6d9	Fix various items discovered by review of ticket #3951 . This commit was SVN r29900.	2013-12-13 21:25:07 +00:00
Jeff Squyres	bac67e0d81	Per discussion @Chicago OMPI dev meeting Dec 2013: remove all MX support. This commit was SVN r29873.	2013-12-12 18:54:47 +00:00
Rolf vandeVaart	d556b60b21	Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880 . Functionally, nothing changes. This commit was SVN r29815.	2013-12-06 14:35:10 +00:00
Rolf vandeVaart	218c05a4d1	Make sure synchronous copies are complete before moving the data. This commit was SVN r29789.	2013-12-03 21:20:14 +00:00
Rolf vandeVaart	aa98b0333b	Call function from function table. Discovered during static build. This commit was SVN r29755.	2013-11-25 22:46:07 +00:00
Nathan Hjelm	24a7e7aa34	Add support for the udreg registration cache and dynamics on XE/XK/XC. To support the new mpool two changes were made to the mpool infrastructure: 1) Added an mpool flag to indicate that an mpool does not need the memory hooks to use the leave pinned protocols. This flag is checked in the mpool lookup. 2) Add a mpool context to the base registration. This new member is used by the udreg mpool to store the udreg context associated with the particular registration. The new member will not break the ABI compatibility as the new member is only currently used by the udreg mpool. Dynamics support for Cray systems makes use of the global rank provided by orte to give the ugni library a unique rank for each process. Dynamics support is not available under direct-launch (srun.) cmr=v1.7.4 This commit was SVN r29719.	2013-11-18 04:58:37 +00:00
Rolf vandeVaart	92e6aaa808	Adjust a default value. Adjust some levels of verbosity and one more debug message. This commit was SVN r29712.	2013-11-14 21:47:27 +00:00
Rolf vandeVaart	4964a5e98b	Per this RFC from October 8, 2013 and as discuessed in telecon. http://www.open-mpi.org/community/lists/devel/2013/10/13072.php Add support for pinning GPU Direct RDMA in openib BTL for better small message latency of GPU buffers. Note that none of this is compiled in unless CUDA-aware support is requested. This commit was SVN r29680.	2013-11-13 13:22:39 +00:00
Rolf vandeVaart	a6df7bc33a	Fix issues reported in ticket #3877 . Also added additional comments. This commit was SVN r29641.	2013-11-07 20:44:47 +00:00
Rolf vandeVaart	2cf7c40ee5	Minor adjustments to error messages due to review of #3880 . This commit was SVN r29640.	2013-11-07 20:21:21 +00:00
Rolf vandeVaart	e46c0bb952	Fix one more space for consistent defines. This commit was SVN r29607.	2013-11-05 15:31:49 +00:00
Rolf vandeVaart	64b3a24fec	Fix CUDA-aware compile issues. This commit was SVN r29606.	2013-11-05 14:46:58 +00:00
Rolf vandeVaart	e57795f097	Revert r29594. That was just plain wrong. Sorry about workday configure change. This commit was SVN r29605. The following SVN revision numbers were found above: r29594 --> open-mpi/ompi@ed7ddcd9c7	2013-11-05 14:45:56 +00:00
Rolf vandeVaart	ed7ddcd9c7	Fix CUDA-aware compile error introduces with r29581. This commit was SVN r29594. The following SVN revision numbers were found above: r29581 --> open-mpi/ompi@ee7510b025	2013-11-05 00:08:33 +00:00
Rolf vandeVaart	ee7510b025	Remove redundant macro. This was from reviewed of earlier ticket. Fixes trac:3878. Reviewed by jsquyres. This commit was SVN r29581. The following Trac tickets were found above: Ticket 3878 --> https://svn.open-mpi.org/trac/ompi/ticket/3878	2013-11-01 12:19:40 +00:00
Rolf vandeVaart	99f9fdee01	Fix corner case involving threads and CUDA-aware support. This commit was SVN r29579.	2013-10-31 20:53:46 +00:00
Nathan Hjelm	fd25b7af01	Fix common ugni Makefile.am for non-DSO builds. This commit was SVN r29571.	2013-10-30 19:37:14 +00:00
Rolf vandeVaart	fa5d20a5ec	Add optimization that can be used when CUDA 6.0 comes out. Use new pointer attribute. This commit was SVN r29514.	2013-10-24 21:17:58 +00:00
Jeff Squyres	09fae6e62b	Prefix DSO filenames with "lib" so that Automake doesn't complain. Follow the convention established by the ompi/mca/common/sm tree and prefix both the "install" and "no install" versions of the build with "lib" so that Automake doesn't complain. Differentiate the two by adding a "_noinst" suffix to the "no install" version. This commit was SVN r29462.	2013-10-22 13:16:33 +00:00
Rolf vandeVaart	9f83405c78	Fix one more corner case initialization issue. This commit was SVN r29443.	2013-10-16 16:39:19 +00:00
Rolf vandeVaart	fbf143f3b4	Move another function that was missed in r29347. This commit was SVN r29422. The following SVN revision numbers were found above: r29347 --> open-mpi/ompi@ce61985503	2013-10-10 14:48:56 +00:00
Jeff Squyres	d9be19f011	Added shared library versions to those who were missing it. The following common shared libraries did not have versioning: * ompi/common/ofacm * ompi/common/verbs * ompi/common/ugni Additionally, we still had shared library versions in VERSION for the following libraries, which no longer exist: * ompi/common/portals * opal/common/hwloc This commit was SVN r29421.	2013-10-10 13:25:57 +00:00
Rolf vandeVaart	4dd1c86b36	Add a few support functions for future features. This commit was SVN r29353.	2013-10-03 21:06:17 +00:00
Rolf vandeVaart	ce61985503	Move registration function inside initial initialization function. This commit was SVN r29347.	2013-10-03 14:14:42 +00:00
Dave Goodell	2c7975eb86	common_verbs: fix bad opal_output args Spotted by Reese Faucette <rfaucett@cisco.com>. cmr=v1.7.3 This commit was SVN r29267.	2013-09-26 21:59:00 +00:00
Rolf vandeVaart	d67e3077f5	Add a check for the CUDA 6.0 version of the cuda.h header file. This commit was SVN r29250.	2013-09-26 12:46:06 +00:00
Rolf vandeVaart	3b5e0736a3	Adjust verbosity levels upward. This commit was SVN r29232.	2013-09-24 14:35:48 +00:00
Rolf vandeVaart	96457df9bc	Fix compile errors created from changeset 29058. This commit was SVN r29061.	2013-08-22 18:25:23 +00:00
Jeff Squyres	63ac60864b	Refs trac:3730 Turns out that AC_CHECK_DECLS is one of the "new style" Autoconf macros that #defines the output to be 0 or 1 (vs. #define'ing or #undef'ing it). So don't check for "#if defined(..."; just check for "#if ...". This commit was SVN r29059. The following Trac tickets were found above: Ticket 3730 --> https://svn.open-mpi.org/trac/ompi/ticket/3730	2013-08-22 17:44:20 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Rolf vandeVaart	504fa2cda9	Fix support in smcuda btl so it does not blow up when there is no CUDA IPC support between two GPUs. Also make it so CUDA IPC support is added dynamically. Fixes ticket 3531. This commit was SVN r29055.	2013-08-21 21:00:09 +00:00
Steve Wise	67fe3f23ed	Use the HAVE_DECL_IBV_LINK_LAYER_ETHERNET macro. Commit r27211 added ifdef checks for #define HAVE_IBV_LINK_LAYER_ETHERNET, which is incorrect. The correct #define is HAVE_DECL_IBV_LINK_LAYER_ETHERNET. This broke OMPI over iWARP. This fixes trac:3726 and should be added to cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29053. The following SVN revision numbers were found above: r27211 --> open-mpi/ompi@b27862e5c7 The following Trac tickets were found above: Ticket 3726 --> https://svn.open-mpi.org/trac/ompi/ticket/3726	2013-08-20 20:00:46 +00:00
Ralph Castain	45e695928f	As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time: * add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit. * remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL" * modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded * removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base * added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames This commit was SVN r29052.	2013-08-20 18:59:36 +00:00
Ralph Castain	611d7f9f6b	When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require. This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times. Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes: * upon first request for data, have the OPAL db pmi component fetch and decode all the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally * reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test * reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued). Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time. This commit was SVN r29040.	2013-08-17 00:49:18 +00:00
Rolf vandeVaart	cd72024a3c	Refactor some of the initialization code. This commit was SVN r29009.	2013-08-09 14:54:17 +00:00
Rolf vandeVaart	67badf384c	Only search SONAME of library. Expand comments. This commit was SVN r28904.	2013-07-22 15:54:45 +00:00
Rolf vandeVaart	49663fb802	Move CUDA-aware configurary to its own file and other minor changes due to review. This commit was SVN r28832.	2013-07-17 22:12:29 +00:00

1 2 3 4 5 ...

264 Коммитов