openmpi

Автор	SHA1	Сообщение	Дата
Rolf vandeVaart	e46c0bb952	Fix one more space for consistent defines. This commit was SVN r29607.	2013-11-05 15:31:49 +00:00
Rolf vandeVaart	64b3a24fec	Fix CUDA-aware compile issues. This commit was SVN r29606.	2013-11-05 14:46:58 +00:00
Rolf vandeVaart	e57795f097	Revert r29594. That was just plain wrong. Sorry about workday configure change. This commit was SVN r29605. The following SVN revision numbers were found above: r29594 --> open-mpi/ompi@ed7ddcd9c7	2013-11-05 14:45:56 +00:00
Rolf vandeVaart	ed7ddcd9c7	Fix CUDA-aware compile error introduces with r29581. This commit was SVN r29594. The following SVN revision numbers were found above: r29581 --> open-mpi/ompi@ee7510b025	2013-11-05 00:08:33 +00:00
Rolf vandeVaart	ee7510b025	Remove redundant macro. This was from reviewed of earlier ticket. Fixes trac:3878. Reviewed by jsquyres. This commit was SVN r29581. The following Trac tickets were found above: Ticket 3878 --> https://svn.open-mpi.org/trac/ompi/ticket/3878	2013-11-01 12:19:40 +00:00
Rolf vandeVaart	99f9fdee01	Fix corner case involving threads and CUDA-aware support. This commit was SVN r29579.	2013-10-31 20:53:46 +00:00
Nathan Hjelm	fd25b7af01	Fix common ugni Makefile.am for non-DSO builds. This commit was SVN r29571.	2013-10-30 19:37:14 +00:00
Rolf vandeVaart	fa5d20a5ec	Add optimization that can be used when CUDA 6.0 comes out. Use new pointer attribute. This commit was SVN r29514.	2013-10-24 21:17:58 +00:00
Jeff Squyres	09fae6e62b	Prefix DSO filenames with "lib" so that Automake doesn't complain. Follow the convention established by the ompi/mca/common/sm tree and prefix both the "install" and "no install" versions of the build with "lib" so that Automake doesn't complain. Differentiate the two by adding a "_noinst" suffix to the "no install" version. This commit was SVN r29462.	2013-10-22 13:16:33 +00:00
Rolf vandeVaart	9f83405c78	Fix one more corner case initialization issue. This commit was SVN r29443.	2013-10-16 16:39:19 +00:00
Rolf vandeVaart	fbf143f3b4	Move another function that was missed in r29347. This commit was SVN r29422. The following SVN revision numbers were found above: r29347 --> open-mpi/ompi@ce61985503	2013-10-10 14:48:56 +00:00
Jeff Squyres	d9be19f011	Added shared library versions to those who were missing it. The following common shared libraries did not have versioning: * ompi/common/ofacm * ompi/common/verbs * ompi/common/ugni Additionally, we still had shared library versions in VERSION for the following libraries, which no longer exist: * ompi/common/portals * opal/common/hwloc This commit was SVN r29421.	2013-10-10 13:25:57 +00:00
Rolf vandeVaart	4dd1c86b36	Add a few support functions for future features. This commit was SVN r29353.	2013-10-03 21:06:17 +00:00
Rolf vandeVaart	ce61985503	Move registration function inside initial initialization function. This commit was SVN r29347.	2013-10-03 14:14:42 +00:00
Dave Goodell	2c7975eb86	common_verbs: fix bad opal_output args Spotted by Reese Faucette <rfaucett@cisco.com>. cmr=v1.7.3 This commit was SVN r29267.	2013-09-26 21:59:00 +00:00
Rolf vandeVaart	d67e3077f5	Add a check for the CUDA 6.0 version of the cuda.h header file. This commit was SVN r29250.	2013-09-26 12:46:06 +00:00
Rolf vandeVaart	3b5e0736a3	Adjust verbosity levels upward. This commit was SVN r29232.	2013-09-24 14:35:48 +00:00
Rolf vandeVaart	96457df9bc	Fix compile errors created from changeset 29058. This commit was SVN r29061.	2013-08-22 18:25:23 +00:00
Jeff Squyres	63ac60864b	Refs trac:3730 Turns out that AC_CHECK_DECLS is one of the "new style" Autoconf macros that #defines the output to be 0 or 1 (vs. #define'ing or #undef'ing it). So don't check for "#if defined(..."; just check for "#if ...". This commit was SVN r29059. The following Trac tickets were found above: Ticket 3730 --> https://svn.open-mpi.org/trac/ompi/ticket/3730	2013-08-22 17:44:20 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Rolf vandeVaart	504fa2cda9	Fix support in smcuda btl so it does not blow up when there is no CUDA IPC support between two GPUs. Also make it so CUDA IPC support is added dynamically. Fixes ticket 3531. This commit was SVN r29055.	2013-08-21 21:00:09 +00:00
Steve Wise	67fe3f23ed	Use the HAVE_DECL_IBV_LINK_LAYER_ETHERNET macro. Commit r27211 added ifdef checks for #define HAVE_IBV_LINK_LAYER_ETHERNET, which is incorrect. The correct #define is HAVE_DECL_IBV_LINK_LAYER_ETHERNET. This broke OMPI over iWARP. This fixes trac:3726 and should be added to cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29053. The following SVN revision numbers were found above: r27211 --> open-mpi/ompi@b27862e5c7 The following Trac tickets were found above: Ticket 3726 --> https://svn.open-mpi.org/trac/ompi/ticket/3726	2013-08-20 20:00:46 +00:00
Ralph Castain	45e695928f	As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time: * add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit. * remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL" * modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded * removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base * added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames This commit was SVN r29052.	2013-08-20 18:59:36 +00:00
Ralph Castain	611d7f9f6b	When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require. This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times. Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes: * upon first request for data, have the OPAL db pmi component fetch and decode all the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally * reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test * reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued). Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time. This commit was SVN r29040.	2013-08-17 00:49:18 +00:00
Rolf vandeVaart	cd72024a3c	Refactor some of the initialization code. This commit was SVN r29009.	2013-08-09 14:54:17 +00:00
Rolf vandeVaart	67badf384c	Only search SONAME of library. Expand comments. This commit was SVN r28904.	2013-07-22 15:54:45 +00:00
Rolf vandeVaart	49663fb802	Move CUDA-aware configurary to its own file and other minor changes due to review. This commit was SVN r28832.	2013-07-17 22:12:29 +00:00
Rolf vandeVaart	7a45be8bde	Fix variable initialization. This commit was SVN r28819.	2013-07-17 17:37:35 +00:00
Rolf vandeVaart	f95c95cf79	Additional cleanup of how libraries and paths are searched. This commit was SVN r28815.	2013-07-16 18:40:55 +00:00
Rolf vandeVaart	54b1fbdb4a	Better error message code. Remove commented out code. This commit was SVN r28793.	2013-07-15 22:27:34 +00:00
Rolf vandeVaart	4d2c2bcefe	Better error message. Remove a tab. This commit was SVN r28791.	2013-07-15 19:39:54 +00:00
Rolf vandeVaart	858ef65142	Fix loop limit. This commit was SVN r28755.	2013-07-11 17:15:43 +00:00
Rolf vandeVaart	adda653fc1	Fix two bugs from previous commit. This commit was SVN r28684.	2013-06-28 16:32:51 +00:00
Rolf vandeVaart	850d325f32	Adjust how search is done for dynamic load of library. CUDA only. This commit was SVN r28683.	2013-06-27 22:13:25 +00:00
Jeff Squyres	e3d0782788	Move the assignment after the bozo check. This commit was SVN r28669.	2013-06-22 12:38:32 +00:00
Rolf vandeVaart	5e1dde419c	Fix some compile errors in CUDA-aware code that has crept in. This commit was SVN r28346.	2013-04-18 15:34:16 +00:00
Pavel Shamis	aa1f5697b4	In order to prevent name conflicts in XRC (MOFED) enabled mode OFACM's ib_address_t was renamed to ofacm_ib_address_t This commit was SVN r28289.	2013-04-04 20:02:17 +00:00
Jeff Squyres	64d39a4e97	Technically speaking, we're creating a QP with 1 send WQE and 1 receive WQE, so it's good form to have a CQ with 2 entries, not 1. This commit was SVN r28256.	2013-03-28 13:11:31 +00:00
Nathan Hjelm	cf377db823	MCA/base: Add new MCA variable system Features: - Support for an override parameter file (openmpi-mca-param-override.conf). Variable values in this file can not be overridden by any file or environment value. - Support for boolean, unsigned, and unsigned long long variables. - Support for true/false values. - Support for enumerations on integer variables. - Support for MPIT scope, verbosity, and binding. - Support for command line source. - Support for setting variable source via the environment using OMPI_MCA_SOURCE_<var name>=source (either command or file:filename) - Cleaner API. - Support for variable groups (equivalent to MPIT categories). Notes: - Variables must be created with a backing store (char *, int , or bool *) that must live at least as long as the variable. - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of mca_base_var_set_value() to change the value. - String values are duplicated when the variable is registered. It is up to the caller to free the original value if necessary. The new value will be freed by the mca_base_var system and must not be freed by the user. - Variables with constant scope may not be settable. - Variable groups (and all associated variables) are deregistered when the component is closed or the component repository item is freed. This prevents a segmentation fault from accessing a variable after its component is unloaded. - After some discussion we decided we should remove the automatic registration of component priority variables. Few component actually made use of this feature. - The enumerator interface was updated to be general enough to handle future uses of the interface. - The code to generate ompi_info output has been moved into the MCA variable system. See mca_base_var_dump(). opal: update core and components to mca_base_var system orte: update core and components to mca_base_var system ompi: update core and components to mca_base_var system This commit also modifies the rmaps framework. The following variables were moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode, rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables. This commit was SVN r28236.	2013-03-27 21:09:41 +00:00
Jeff Squyres	44e371a65d	Remove (bogus) port number from the opal_output -- there's no port number associated with creating a QP. This commit was SVN r28222.	2013-03-26 19:48:50 +00:00
Rolf vandeVaart	037729dcbb	Add a search path. Refactor code. This commit was SVN r28142.	2013-03-01 21:50:56 +00:00
Rolf vandeVaart	5c761d701d	Remove tabs for spaces, fix some error messages. This commit was SVN r28141.	2013-03-01 19:13:06 +00:00
Rolf vandeVaart	ebe63118ac	Remove dependency on libcuda.so when building in CUDA-aware support. Dynamically load it if needed. This commit was SVN r28140.	2013-03-01 13:21:52 +00:00
Ralph Castain	8d2fa3693b	First cut at removing the native Windows support. Remove all the Windows-specific components, and the .windows files sprinkled around. Remove the Windows platform files and MTT scripts. Update the NEWS to point Windows users to the cygwin package. This commit was SVN r28116.	2013-02-26 20:44:56 +00:00
Rolf vandeVaart	da3e9ff906	Add show_help.h where needed. This commit was SVN r28071.	2013-02-19 15:42:09 +00:00
Brian Barrett	3c83618799	fix a missing header file issue with IB This commit was SVN r28070.	2013-02-18 18:29:14 +00:00
Jeff Squyres	bbddd6ea03	Add header file for opal_show_help(). This commit was SVN r28056.	2013-02-13 16:31:59 +00:00
Brian Barrett	312f37706e	In talking about this with Jeff and Ralph, we don't actually need ompi_show_help, because opal_show_help is replaced with an aggregating version when using ORTE, so there's no reason to directly call orte_show_help. This commit was SVN r28051.	2013-02-12 21:10:11 +00:00
Pavel Shamis	a31bc57849	Moving mca/common/netpatterns and commpaterns to ompi/patterns. This commit was SVN r28035.	2013-02-05 21:52:55 +00:00
Rolf vandeVaart	729caaf0cd	Remove any dependency on libcuda.so in opal layer. All changes are within OMPI_CUDA_SUPPORT code. This commit was SVN r27986.	2013-01-30 23:07:32 +00:00
Rolf vandeVaart	aa04de4f1e	Add run-time parameter to enable and disable CUDA GPU support. This commit was SVN r27970.	2013-01-29 20:24:04 +00:00
Brian Barrett	b8442ba505	Revamp the handling of wrapper compiler flags. The user flags, main configure flags, and mca flags are kept seperate until the very end. The main configure wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD macro. MCA components should either let <framework>_<component>_{LIBS,LDFLAGS} be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}. The situations in which WRAPPER CPPFLAGS can be set by MCA components was made very small to match the one use case where it makes sense. This commit was SVN r27950.	2013-01-29 00:00:43 +00:00
Rolf vandeVaart	c6412f6dff	Add new rte headers in files that need them. This commit was SVN r27943.	2013-01-28 19:32:33 +00:00
Pavel Shamis	1f1e1efb7b	Removing leftovers of old infrastructure. cmr:v1.7 This commit was SVN r27942.	2013-01-28 19:11:42 +00:00
Brian Barrett	f42783ae1a	Move the RTE framework change into the trunk. With this change, all non-CR runtime code goes through one of the rte, dpm, or pubsub frameworks. This commit was SVN r27934.	2013-01-27 23:25:10 +00:00
Samuel Gutierrez	4c28c8cbd0	New sm BTL initialization take two. This approach is pretty simple. Instead of using the modex or RML to share sm initialization information, have node rank 0 create a file containing initialization information in a well-known place. Then during add_procs, the rest of the node processes requiring sm BTL initialization will just read from that file to complete their initialization. This commit was SVN r27789.	2013-01-11 16:24:56 +00:00
Samuel Gutierrez	c4acd20eb9	Backout r27739. This commit was SVN r27745. The following SVN revision numbers were found above: r27739 --> open-mpi/ompi@a159bfaf25	2013-01-05 01:54:23 +00:00
Samuel Gutierrez	a159bfaf25	sm BTL initialization via modex, as discussed at last year's meeting. This commit was SVN r27739.	2013-01-03 21:52:20 +00:00
Brian Barrett	702451111b	Remove Portals 3.3 support This commit was SVN r27656.	2012-12-06 20:11:27 +00:00
Jeff Squyres	d6e9a14b14	Fix minor issue: the argv_delete may change the top list pointer. So be sure to save it. This commit was SVN r27568.	2012-11-06 16:05:58 +00:00
Brian Barrett	a1a52c9e90	Rather than use the fake mpool for handling callbacks into the MX library, use the memory hooks interface (which does allow for multiple callbacks to be registered) directly. This commit was SVN r27485.	2012-10-25 21:50:07 +00:00
Jeff Squyres	2ab48b997e	Give a better verbose message if we're able to make an RC QP and we didn't want one. This commit was SVN r27438.	2012-10-11 19:21:21 +00:00
Jeff Squyres	8c369224bf	More common/verbs improvements: * Add OMPI_COMMON_VERBS_FLAGS_NOT_RC, which looks for a device that does ''not'' support RC * Add ompi_common_verbs_find_max_inline(), and remove that code from the openib BTL component This commit was SVN r27393.	2012-10-03 00:57:39 +00:00
George Bosilca	48f528f142	icc complains about the missing prototype. This commit was SVN r27373.	2012-09-26 09:56:14 +00:00
Jeff Squyres	3cc8b0461a	More updates to common verbs infrastructure: * Moved "check basics" sanity check from openib BTL to common/verbs (which also allows us to have openib ''not'' include <infiniband/driver.h>, which is a Very Good Thing) * Add new ompi_common_verbs_qp_test() function, which tests to see whether a device supports RC and/or UD QPs. The openib BTL now uses this function to ensure that the device supports RC QPs. * Rename ompi_common_verbs_find_ibv_ports() to be ompi_common_verbs_find_ports() -- the "ibv" was redundant. * Re-work ompi_common_verbs_find_ports() to use ompi_common_verbs_qp_test() instead of testing for RC/UD QPs itself * Add bunches of opal_output_verbose() to the find_ports() routine (to help diagnosing connectivity problems -- imaging running with --mca btl_base_verbose 10; you'll see all the find_ports() test results) * Make ompi_common_verbs_qp_test() warn if devices/ports are supplied in the if_include/if_exclude strings that do not exists (quite similar to what the openib BTL does today). * Add ompi_common_verbs_mca_register() function, which registers common verbs MCA params. It will also register MCA param synonyms for thse MCA params to upper-level components (e.g., btl_<upper-level-component>_<the-mca-param>). * common_verbs_warn_nonexistent_if: warn if if_include/if_exclude-specified devices or ports do not exist. This commit was SVN r27332.	2012-09-12 20:47:47 +00:00
Pavel Shamis	1e7b958c2a	Cleaning warning in collectives code This commit was SVN r27331.	2012-09-12 19:47:23 +00:00
Jeff Squyres	171f6efd70	Don't free heap objects! This commit was SVN r27326.	2012-09-12 15:11:56 +00:00
Jeff Squyres	9feb8d8879	Oops; the error paths were not correct on the initial commit. Fixed. This commit was SVN r27228.	2012-09-04 15:48:44 +00:00
Yevgeny Kliteynik	3fe239702a	Fixed compilation error Thanks to Alex Margolin for the fix This commit was SVN r27215.	2012-09-02 08:26:30 +00:00
Jeff Squyres	341ce2f9a4	Per some discussions between LANL, Cisco, ORNAL, and Mellanox, move some new common OpenFabrics functionality to ompi/mca/common/verbs. Also move everything that was in ompi/mca/common/ofautils under ompi/mca/common/verbs. * Move ofautils -> verbs * Add new functionality in ompi/mca/common/verbs (see doxygen * comments in ompi/mca/common/verbs/common_verbs.h for details): * ompi_common_verbs_find_ibv_ports() * ompi_common_verbs_port_bw() * ompi_common_verbs_mtu() * '''If you're writing verbs-based code, you should be using this common functionality''' * Adapt openib BTL to use some trivial common functionality in common/verbs * Don't use "#ifdef OMPI_HAVE_RDMAOE",use "#if defined(HAVE_IBV_LINK_LAYER_ETHERNET)" * Update the following to include/link against common/verbs * bcol/iboffload * sbgp/ibnet * btl/openib This commit was SVN r27212.	2012-09-01 01:42:37 +00:00
Shiqing Fan	9986cea044	BEGIN_C_DECLS is missing. This commit was SVN r27107.	2012-08-22 14:14:45 +00:00
Shiqing Fan	f746fe152f	* change variable iov_len to iovec_len, in order to fix the conflict with the io vector support on Windows. * several include header protection * do not use ERROR, it's preserved for Visual Studio, use error instead. This commit was SVN r27106.	2012-08-22 13:36:23 +00:00
Shiqing Fan	b0ef486304	exclude one file that is not compatible for Windows. This commit was SVN r27105.	2012-08-22 13:06:33 +00:00
Pavel Shamis	6fac989588	Cleaning warnings in collectives code. Refs trac:3243. This commit was SVN r27089. The following Trac tickets were found above: Ticket 3243 --> https://svn.open-mpi.org/trac/ompi/ticket/3243	2012-08-17 15:36:13 +00:00
Jeff Squyres	7642656aa7	Add more missing files so that dist tarballs aren't borked. Refs trac:3243. This commit was SVN r27086. The following Trac tickets were found above: Ticket 3243 --> https://svn.open-mpi.org/trac/ompi/ticket/3243	2012-08-17 00:47:10 +00:00
Jeff Squyres	2102c05504	Add missing .windows files. Refs trac:3243. This commit was SVN r27083. The following Trac tickets were found above: Ticket 3243 --> https://svn.open-mpi.org/trac/ompi/ticket/3243	2012-08-16 23:38:03 +00:00
Pavel Shamis	b89f8fabc9	Adding Hierarchical Collectives project to the Open MPI trunk. The project includes following components and frameworks: - ML Collective component - NETPATTERNS and COMMPATTERNS common components - BCOL framework - SBGP framework Note: By default the ML collective component is disabled. In order to enable new collectives user should bump up the priority of ml component (coll_ml_priority) ============================================= Primary Contributors (in alphabetical order): Ishai Rabinovich (Mellanox) Joshua S. Ladd (ORNL / Mellanox) Manjunath Gorentla Venkata (ORNL) Mike Dubman (Mellanox) Noam Bloch (Mellanox) Pavel (Pasha) Shamis (ORNL / Mellanox) Richard Graham (ORNL / Mellanox) Vasily Filipov (Mellanox) This commit was SVN r27078.	2012-08-16 19:11:35 +00:00
Samuel Gutierrez	6188d97e1a	Getting out of bed this morning was a bad idea... Reverting the sm update once more because it breaks direct launch. Will address this issue and commit the update once it has all been tested. Sorry everyone! This commit was SVN r27001.	2012-08-10 22:20:38 +00:00
Samuel Gutierrez	159bd2e62e	Let's try this again: sm BTL initialization via modex. This commit was SVN r26989.	2012-08-10 20:12:36 +00:00
Samuel Gutierrez	6a70063812	Yikes - that's not right! Back out 26987. I'll try again in a bit... Sorry! This commit was SVN r26988.	2012-08-10 19:57:51 +00:00
Samuel Gutierrez	2c80273246	sm BTL initialization via modex. This commit was SVN r26987.	2012-08-10 19:51:41 +00:00
Jeff Squyres	9f8265eccb	The files for automake to generate are specified via AC_CONFIG_FILES in the */configure.m4 files. configure.params files are obsolete. This commit was SVN r26897.	2012-07-27 14:33:17 +00:00
Shiqing Fan	204fbfe4b1	update the wv btl component. This commit was SVN r26872.	2012-07-26 15:35:01 +00:00
Jeff Squyres	5ec6a65a72	After I spent a while looking in libibverbs for ibv_get_device_list_compat() and not finding it, I finally realized that it was a function in OMPI. So let's name it with a proper ompi_ prefix, not an ibv_ prefix. This commit was SVN r26867.	2012-07-25 16:32:51 +00:00
Samuel Gutierrez	76d94bf9bf	Plug leak. Thanks, Nathan. This commit was SVN r26846.	2012-07-23 21:11:21 +00:00
Samuel Gutierrez	8096852a16	Towards RML-less shared-memory initialization (primarily for eventual BTL move). Extended common sm API with: mca_common_sm_module_create and mca_common_sm_module_attach. Please note that the new routines aren't currently used -- but will be... This commit was SVN r26845.	2012-07-23 19:38:13 +00:00
Pavel Shamis	f7664b3814	1. Adding 2 new components: ofacm - generic connection manager for IB interconnects. ofautils - IB common utilities and compatibility code 2. Updating OpenIB configure code - ORNL & Mellanox Teams This commit was SVN r26707.	2012-07-02 15:20:12 +00:00
Jeff Squyres	5d030278e1	Refs trac:3130: Per comment 8 on the ticket, this MX patch fixes the cases where the MX BTL and MTL are stepping on each other regarding the mpool. Thanks to Yong Qin for assistance in tracking this down. This commit was SVN r26698. The following Trac tickets were found above: Ticket 3130 --> https://svn.open-mpi.org/trac/ompi/ticket/3130	2012-06-29 13:52:40 +00:00
Josh Hursey	28681deffa	Backout the ORCA commit. :( There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk. This commit was SVN r26676.	2012-06-27 01:28:28 +00:00
Josh Hursey	542330e3a7	Commit of ORCA: Open MPI Runtime Collaborative Abstraction This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI. The project is described on the wiki: https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition And on this email thread: http://www.open-mpi.org/community/lists/devel/2012/06/11109.php This commit was SVN r26670.	2012-06-26 21:42:16 +00:00
Rolf vandeVaart	d6881f3a4f	Rename one function. Add some new functions that can support asynchronous CUDA copies. This commit was SVN r26611.	2012-06-15 16:56:30 +00:00
Rolf vandeVaart	f8ace21366	Rename a few things for clarity. Add a stream. This commit was SVN r26447.	2012-05-17 18:10:59 +00:00
Nathan Hjelm	a753fe91f7	fix merge This commit was SVN r26332.	2012-04-24 21:16:51 +00:00
Nathan Hjelm	37ca31b295	ugni: remove unused completion queue This commit was SVN r26315.	2012-04-23 21:11:39 +00:00
Nathan Hjelm	1340f9c65a	ugni update: - Move endpoint code back up to BTL - Use opal_pointer_array_t for bounce buffer to identify local smsg completions. - Update and reenable sendi - Create a new endpoint for FMA/BTE transactions (keep local smsg/fma transactions seperate) - Move reverse get code into btl_ugni_put.c - Move eager get code into btl_ugni_get.c - Handle remote SMSG overruns correctly - Added support for inplace sends - etc This commit was SVN r26307.	2012-04-19 21:51:55 +00:00
Nathan Hjelm	135ac32b64	ugni: use hash table to keep track of smsg frag completion This commit was SVN r26154.	2012-03-15 20:15:59 +00:00
Rolf vandeVaart	41870ce6ee	Mostly fix some of the verbose output. Also fix issue where memory handle was blocking other registration. This commit was SVN r26124.	2012-03-09 21:28:56 +00:00
Rolf vandeVaart	c7a0ce2755	Two new mpools. They are not used now (and by default, not compiled) but they will be soon. Provide support for GPU buffer transfers within a node. This commit was SVN r26008.	2012-02-22 23:32:36 +00:00
Nathan Hjelm	97dad0ac49	ugni: don't release eager fragments until we get local smsg completion This commit was SVN r25796.	2012-01-27 00:32:43 +00:00
Ralph Castain	f5c43e8d60	Get the header files into the tarball This commit was SVN r25746.	2012-01-19 21:02:20 +00:00

1 2 3 4 5 ...

291 Коммитов