openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	b30ad28276	Remove some unused variables and an unused goto label. This commit was SVN r29044.	2013-08-19 16:18:35 +00:00
Ralph Castain	611d7f9f6b	When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require. This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times. Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes: * upon first request for data, have the OPAL db pmi component fetch and decode all the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally * reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test * reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued). Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time. This commit was SVN r29040.	2013-08-17 00:49:18 +00:00
Ralph Castain	f8a72feb25	Silence unitialized var warning This commit was SVN r29036.	2013-08-16 21:39:28 +00:00
Ralph Castain	c74c54e18d	Cleanup uninitialized warnings This commit was SVN r29033.	2013-08-16 21:23:09 +00:00
Ralph Castain	33beab5918	Avoid segfault due to uninitialized variable This commit was SVN r29030.	2013-08-16 21:10:38 +00:00
Ralph Castain	bebe852057	Add new info key for publish that allows user to designate that the port is to be unique - i.e., to return an error if that service has already been published. Default is to overwrite This commit was SVN r29028.	2013-08-14 04:21:17 +00:00
Ralph Castain	318467c04f	If we only have global scope, then don't fall back to looking at local scope if the lookup target wasn't found else we will hang This commit was SVN r29025.	2013-08-13 04:45:33 +00:00
Nathan Hjelm	6c75699068	coll/ml: fix typo in assert that could cause an abort in debug builds. cmr=v1.7.3:reviewer=manjugv This commit was SVN r29024.	2013-08-12 14:31:44 +00:00
Jeff Squyres	c09ec204ad	Change usNIC BTL to always use small fragments when there is a non-contiguous converter. We can't "convert on the fly" because the # of bytes requested may not divide evenly into the convertor data type. This commit was SVN r29014.	2013-08-11 17:04:13 +00:00
Nathan Hjelm	47320713bb	coll/ml: do not register variables in open and fix a bug in the coll/ml parser cmr=v1.7.3:reviewer=pasha This commit was SVN r29010.	2013-08-09 17:55:30 +00:00
Rolf vandeVaart	cd72024a3c	Refactor some of the initialization code. This commit was SVN r29009.	2013-08-09 14:54:17 +00:00
Edgar Gabriel	f7391eca23	Lazy open does not work for the addproc sharedfp component since it starts by spawning a process using MPI_Comm_spawn. For this, the first operation has to be collective which we can not guarantuee outside of the MPI_File_open operation. This commit was SVN r29008.	2013-08-06 20:48:20 +00:00
Edgar Gabriel	e348f5567f	add unignore for me. This commit was SVN r29007.	2013-08-06 20:47:08 +00:00
George Bosilca	837b3363fe	Silence few warnings. This commit was SVN r29004.	2013-08-06 09:38:30 +00:00
Brian Barrett	2cc947513b	* Fix some compile errors * Need to subtract 1 off the size so that we stay in the bit length requirements This commit was SVN r28997.	2013-08-05 18:49:48 +00:00
Jeff Squyres	87910daf51	Fix a collection of bugs found by QA and Coverity, and make some minor improvements: * Fix minor memory leaks during component_init * Ensure that an initialization loop does not underflow an unsigned int * Improve mlock limit checking * Fix set of BTL modules created during component_init when failing to get QP resources or otherwise excluding some (but not all) usnic verbs devices * Fix/improve error messages to be consistent with other Cisco documentation * Randomize the initial sliding window sequence number so that we silently drop incoming frames from previous jobs that still have existant processes in the middle of dying (and are still transmitting) * Ensure we don't break out of add_procs too soon and create an asymetrical view of what interfaces are available This commit was SVN r28975.	2013-08-01 16:56:15 +00:00
Nathan Hjelm	8429485a39	mpool/grdma: use the rcache even if not using mpi_leave_pinned or mpi_leave_pinned_pipeline This change should improve performance is the non-pinned case where the same memory region is involved in multiple simultaneous transfers. cmr=v1.7.3:reviewer=brbarret This commit was SVN r28973.	2013-07-31 23:50:41 +00:00
Nathan Hjelm	1382d4fb53	Fix typos in _Complex ops This commit was SVN r28959.	2013-07-26 17:02:45 +00:00
Nathan Hjelm	e4f105ffb3	revert change that shouldn't have been part of r28952 This commit was SVN r28953. The following SVN revision numbers were found above: r28952 --> open-mpi/ompi@cb90a4a7fc	2013-07-25 20:23:55 +00:00
Nathan Hjelm	cb90a4a7fc	Add simple algorithms to support MPI_IN_PLACE for MPI_Alltoall, MPI_Alltoallv, and MPI_Alltoallw. Working on faster algorithms for tuned that will come at a later time. cmr=v1.7.3:ticket=trac:2965 This commit was SVN r28952. The following Trac tickets were found above: Ticket 2965 --> https://svn.open-mpi.org/trac/ompi/ticket/2965	2013-07-25 19:19:41 +00:00
Nathan Hjelm	99adeb7f6e	Fix support for complex datatypes when fortran is not available but _Complex is This commit was SVN r28951.	2013-07-25 19:08:21 +00:00
Edgar Gabriel	012f99c3b6	ompi_ignore this component until we find a solution for the dependence on libmpi for the external process that is being spawned by this component. This commit was SVN r28945.	2013-07-24 20:02:47 +00:00
Jeff Squyres	f7337b8f77	Correct faulty max payload and MTU computations (and update some debugging that helped us find those). This commit was SVN r28942.	2013-07-24 16:06:28 +00:00
Ralph Castain	db214a2321	Refs trac:3697 - use the opal_pmi_error function instead of ompi_error as the returned error codes are from PMI This commit was SVN r28941. The following Trac tickets were found above: Ticket 3697 --> https://svn.open-mpi.org/trac/ompi/ticket/3697	2013-07-24 04:05:41 +00:00
Jeff Squyres	5323051047	Use sysfs to check MPI has enough VFs, QPs, and CQs Use the new sysfs files to check that there are enough VFs, QPs, and CQs for all the MPI processes on this server. Move the checking code into its own subroutine to make it smaller and easier to read/grok. This commit was SVN r28937.	2013-07-24 00:38:32 +00:00
Ralph Castain	59a71765cf	Hmmm...these error outputs will never occur, which is probably not what the author intended. So do the output and THEN jump to the error exit. This commit was SVN r28918.	2013-07-22 22:58:03 +00:00
Edgar Gabriel	8ffc1aac89	update the _component.c files in ompio to use the explicit assignment of the mca_register_component_params element of the structure. This commit was SVN r28914.	2013-07-22 21:11:05 +00:00
Nathan Hjelm	b17cd13c09	sharedfp: ensure sharedfp components register their parameters in mca_register_component_params not mca_component_open This commit was SVN r28910.	2013-07-22 17:53:58 +00:00
Jeff Squyres	b437041aeb	Update one more comment. This commit was SVN r28908.	2013-07-22 17:29:00 +00:00
Jeff Squyres	4b6006402d	Use the RTE framework instead of calling ORTE directly. Brian (rightfully) hit me on the head with the don't-use-ORTE-use-the-rte-framework clue bat; the usnic BTL now nicely plays with the RTE framework. This commit was SVN r28907.	2013-07-22 17:28:23 +00:00
Jeff Squyres	ca9da8a554	Fix minor typo in the comments/docs. This commit was SVN r28905.	2013-07-22 17:24:17 +00:00
Rolf vandeVaart	67badf384c	Only search SONAME of library. Expand comments. This commit was SVN r28904.	2013-07-22 15:54:45 +00:00
Brian Barrett	e1d72409cd	add missing header This commit was SVN r28897.	2013-07-21 19:40:31 +00:00
Brian Barrett	704f1ecc18	fix non-orte builds of PSM This commit was SVN r28893.	2013-07-21 19:12:32 +00:00
Brian Barrett	05ab9cbaa6	Need to ship pmi_internal.h This commit was SVN r28891.	2013-07-21 19:00:50 +00:00
Brian Barrett	495384d8b7	Update documentation in rte.h to match recent changes This commit was SVN r28887.	2013-07-20 22:14:12 +00:00
Brian Barrett	414ba3dad8	Update PMI RTE to match error handling changes that were part of r28852. Note that the PMI RTE still doesn't listen for asynchronous errors, so the error handler still won't ever actually do anything :). This commit was SVN r28886. The following SVN revision numbers were found above: r28852 --> open-mpi/ompi@e4e678e234	2013-07-20 22:09:02 +00:00
Brian Barrett	5bfd980968	update PMI RTE component to adapt to ORTE changes This commit was SVN r28885.	2013-07-20 22:06:47 +00:00
Brian Barrett	d984d25da3	Remove orte header file from sharedfp components (OMPI layer should not include ORTE layer with the RTE framework). Thankfully, nothing used orte_show_help, so easy fix. This commit was SVN r28884.	2013-07-20 22:03:44 +00:00
Jeff Squyres	194b285447	First commit of the Cisco usNIC BTL. This BTL accesses the Cisco usNIC Linux device via the Linux verbs API via Unreliable Datagram queue pairs. A few noteworthy points: * This BTL does most of its own fragmentation; it tells the PML that it has a very high max_send_size (much higher than the network MTU). * Since UD fragments are, by definition, unreliable, the usnic BTL handles all of its own reliability via a sliding window approach using the opal_hotel construct and many tricks stolen from the corpus of knowledge surrounding efficient TCP. * There is a fun PML latency-metric based optimization for NUMA awareness of short messages. * Note that this is ''not'' a generic UD verbs BTL; it is specific to the Cisco usNIC device. This commit was SVN r28879.	2013-07-19 22:13:58 +00:00
Jeff Squyres	3546163c48	Devices that do not support RC QP's are also intentionally skipped; don't warn about skipping them. This commit was SVN r28874.	2013-07-19 19:05:18 +00:00
Ralph Castain	e4e678e234	Per the RFC and discussion on the devel list, update the RTE-MPI error handling interface. There are a few differences in the code from the original RFC that came out of the discussion - I've captured those in the following writeup George and I were talking about ORTE's error handling the other day in regards to the right way to deal with errors in the updated OOB. Specifically, it seemed a bad idea for a library such as ORTE to be aborting the job on its own prerogative. If we lose a connection or cannot send a message, then we really should just report it upwards and let the application and/or upper layers decide what to do about it. The current code base only allows a single error callback to exist, which seemed unduly limiting. So, based on the conversation, I've modified the errmgr interface to provide a mechanism for registering any number of error handlers (this replaces the current "set_fault_callback" API). When an error occurs, these handlers will be called in order until one responds that the error has been "resolved" - i.e., no further action is required - by returning OMPI_SUCCESS. The default MPI layer error handler is specified to go "last" and calls mpi_abort, so the current "abort" behavior is preserved unless other error handlers are registered. In the register_callback function, I provide an "order" param so you can specify "this callback must come first" or "this callback must come last". Seemed to me that we will probably have different code areas registering callbacks, and one might require it go first (the default "abort" will always require it go last). So you can append and prepend, or go first. Note that only one registration can declare itself "first" or "last", and since the default "abort" callback automatically takes "last", that one isn't available. :-) The errhandler callback function passes an opal_pointer_array of structs, each of which contains the name of the proc involved (which can be yourself for internal errors) and the error code. This is a change from the current fault callback which returned an opal_pointer_array of just process names. Rationale is that you might need to see the cause of the error to decide what action to take. I realize that isn't a requirement for remote procs, but remember that we will use the SAME interface to report RTE errors internal to the proc itself. In those cases, you really do need to see the error code. It is legal to pass a NULL for the pointer array (e.g., when reporting an internal failure without error code), so handlers must be prepared for that possibility. If people find that too burdensome, we can remove it. Should we ever decide to create a separate callback path for internal errors vs remote process failures, or if we decide to do something different based on experience, then we can adjust this API. This commit was SVN r28852.	2013-07-19 01:08:53 +00:00
Ralph Castain	8a8b4896be	Need to protect libgen.h as some systems might not have it This commit was SVN r28845.	2013-07-18 20:21:37 +00:00
Edgar Gabriel	185e365dad	make the sm sharedfp component compile on Mac. This commit was SVN r28844.	2013-07-18 20:17:14 +00:00
Edgar Gabriel	93cef82873	remove the ylib component from the fcoll framework. It is not used, there are no plans to use it. We can always recover it from svn if we would ever change our minds. This commit was SVN r28840.	2013-07-18 16:18:06 +00:00
Pavel Shamis	68969ba6e5	Removing bogus references in iboffload code. cmr:v1.7:reviewer=hjelmn This commit was SVN r28834.	2013-07-17 22:35:24 +00:00
Rolf vandeVaart	49663fb802	Move CUDA-aware configurary to its own file and other minor changes due to review. This commit was SVN r28832.	2013-07-17 22:12:29 +00:00
Edgar Gabriel	6e8522fec5	infuse life into the shared file pointer framework. For this: - extend the framework API - remove the dummy component, not require anymore - add four components to perform the actual job. This commit was SVN r28828.	2013-07-17 21:55:24 +00:00
Edgar Gabriel	ac694b7056	in preparation for the new shared file pointer components to be committed soon: - add a new abstraction layer to be used internally for some operations - add a new mca parameter to control lazy intialization of shared file pointer structures This commit was SVN r28826.	2013-07-17 21:30:50 +00:00
Vishwanath Venkatesan	ce8f8f0829	Changing the MPI Datatype from MPI_LONG to OMPI_OFFSET_DATATYPE for send/recv offsets This commit was SVN r28822.	2013-07-17 19:16:53 +00:00
Nathan Hjelm	d4c6029cf3	sbgp/ibnet: set mca_sbgp_ibnet_component.mtu to IBV_MTU_1024 before registering it. cmr:v1.7:reviewer=pasha This commit was SVN r28821.	2013-07-17 19:16:31 +00:00
Rolf vandeVaart	7a45be8bde	Fix variable initialization. This commit was SVN r28819.	2013-07-17 17:37:35 +00:00
Nathan Hjelm	f0aeb36d80	Fix warnings in ob1 introduced by the pvar commit This commit was SVN r28817.	2013-07-17 03:41:05 +00:00
Rolf vandeVaart	f95c95cf79	Additional cleanup of how libraries and paths are searched. This commit was SVN r28815.	2013-07-16 18:40:55 +00:00
Nathan Hjelm	e6e9f2c6fd	Add profiling function definitions for MPI_T and add a missing type into mpi.h This commit was SVN r28803.	2013-07-16 16:03:33 +00:00
Nathan Hjelm	35673ea400	Add example performance variables to ob1: unexpected message queue length, posted receive length This commit was SVN r28801.	2013-07-16 16:02:25 +00:00
Rolf vandeVaart	54b1fbdb4a	Better error message code. Remove commented out code. This commit was SVN r28793.	2013-07-15 22:27:34 +00:00
Rolf vandeVaart	4d2c2bcefe	Better error message. Remove a tab. This commit was SVN r28791.	2013-07-15 19:39:54 +00:00
Mike Dubman	5bd2e15cbb	support for ConnectX3-Pro card. cmr:v1.7:reviewer=jsquyres cmr:v1.6:reviewer=jsquyres This commit was SVN r28787.	2013-07-14 06:44:19 +00:00
Nathan Hjelm	dfca3d4804	fix typos in the ugni and vader btls This commit was SVN r28772.	2013-07-12 17:55:33 +00:00
Nathan Hjelm	1119cd3e8a	Merge branch 'vader_fix' This commit was SVN r28764.	2013-07-11 23:30:20 +00:00
Brian Barrett	2f19fc52de	use the same multi-md workaround the rest of the Portals code is using. This commit was SVN r28761.	2013-07-11 21:00:11 +00:00
Nathan Hjelm	b5281778b0	btl/vader: improve small message performance This commit improved the small message latency and bandwidth when using the vader btl. These improvements should make performance competative with other MPI implementations. This commit was SVN r28760.	2013-07-11 20:54:12 +00:00
Brian Barrett	bea54eeeb1	First take at a BTL for Portals 4 This commit was SVN r28759.	2013-07-11 20:47:08 +00:00
Jeff Squyres	baa3182794	Per RFC (http://www.open-mpi.org/community/lists/devel/2013/07/12534.php), remove a bunch of dead code. This commit was SVN r28756.	2013-07-11 17:34:28 +00:00
Rolf vandeVaart	858ef65142	Fix loop limit. This commit was SVN r28755.	2013-07-11 17:15:43 +00:00
Rolf vandeVaart	5051cd53fd	Use new API. This commit was SVN r28754.	2013-07-11 17:06:14 +00:00
Joshua Ladd	16beaa3878	This fixes the nasty configure.m4 hack that was added long ago and not removed. My fault for not catching earlier. I've also removed the '.ompi_ignore' in coll/hcoll. Throwing this to Nathan for review. Upon successful review, this should be added to cmr:v1.7:reviewer=hjelmn This commit was SVN r28753.	2013-07-11 09:55:46 +00:00
Jeff Squyres	28dac8010b	The hcoll component configure.m4 commits multiple sins, and breaks many builds. I am temporarily .ompi_ignore'ing this component until it can be fixed by its owner. * It calls AC_MSG_ERROR, which configure.m4 scripts are ''never'' supposed to do. If you don't want to build, then call $2. * All static and --disable-dlopen builds are broken; they fall afoul of whatever test configure.m4 is doing and therefore error out of configure entirely (vs. simply disabling the hcoll component). * There appear to be multiple shell scripting errors in the configure.m4. Here's the output of "./configure --disable-dlopen": {{{ --- MCA component coll:hcoll (m4 configuration macro) checking for MCA component coll:hcoll compile mode... static checking --with-hcoll value... simple ok (unspecified) ./configure: line 421: test: basic: integer expression expected configure: error: Can not use coll/hcoll and coll/ml (static build) simultaneously. You have two options: 1. Use static build & disable ml with: --enable-mpi-no-build=coll-ml 2. Use dso build for ML & disable ml at runtime: -mca coll self ./configure: line 310: return: basic: numeric argument required ./configure: line 320: exit: basic: numeric argument required }}} Finally, all of these configure.m4 errors aside, I don't understand why there is a ''compile-time'' exclusion between the hcoll and ml components. Why isn't this a ''run-time'' decision? Having what seems to be an unnecessary compile-time exclusion goes against the general Open MPI philosophy. Note: Open MPI 1.7 is also broken in all the same ways. I suggest that the RM's .ompi_ignore hcoll over there, too. Mellanox: please fix. This commit was SVN r28748.	2013-07-10 16:03:15 +00:00
Jeff Squyres	80145742a3	Fix typo in comment This commit was SVN r28747.	2013-07-10 15:13:08 +00:00
Jeff Squyres	ea94936531	First cut at assigning some fine-grained "levels" to MCA parameters for the SM and TCP BTLs, as well as the mca_btl_base_param_register() function (which registers MCA params for all BTLs). The guidelines in https://svn.open-mpi.org/trac/ompi/wiki/MCAParamLevels were used to pick these levels. This commit was SVN r28746.	2013-07-10 00:47:52 +00:00
Aurelien Bouteiller	e1066143a4	rename ompi_free_list operations to _mt, as per discussions at last face to face meeting This commit was SVN r28734.	2013-07-08 22:07:52 +00:00
Brian Barrett	ecbbf888d3	* Update Portals 4 MTL's multi-md code to be a bit cleaner (no if statements in the path) and not create MDs due to boundary crossing * Add the same logic to the Coll component This commit was SVN r28733.	2013-07-08 21:27:37 +00:00
Brian Barrett	84aeb6a6a5	Update request alloc to use free list get instead of free list wait. This commit was SVN r28729.	2013-07-05 20:24:43 +00:00
George Bosilca	dc9352faf6	Remove some unused variables. This commit was SVN r28726.	2013-07-05 13:31:54 +00:00
George Bosilca	8b01c3da33	Slightly reorder the code. This commit was SVN r28725.	2013-07-05 13:29:29 +00:00
George Bosilca	483ed8da8c	Remove an unused variable resulting from the removal of the last parameter of the OMPI_FREE_LIST_GET macro. This commit was SVN r28723.	2013-07-04 09:19:00 +00:00
George Bosilca	c9e5ab9ed1	Our macros for the OMPI-level free list had one extra argument, a possible return value to signal that the operation of retrieving the element from the free list failed. However in this case the returned pointer was set to NULL as well, so the error code was redundant. Moreover, this was a continuous source of warnings when the picky mode is on. The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of using the return code. This commit was SVN r28722.	2013-07-04 08:34:37 +00:00
Brian Barrett	d3b49535b5	Only allow communication from the same user, since we don't have job-level protection. This commit was SVN r28715.	2013-07-03 17:29:02 +00:00
Jeff Squyres	d1ce64f049	Fix some "malloc of 0 bytes" warnings This commit was SVN r28713.	2013-07-03 12:05:33 +00:00
Brian Barrett	81efd0e3cf	Properly shut down Portals collective component This commit was SVN r28707.	2013-07-02 22:07:27 +00:00
Brian Barrett	133dafd3dc	First take at Barrier and Ibarrier, both of which seem to work. This commit was SVN r28706.	2013-07-02 21:42:10 +00:00
Brian Barrett	c4577723ed	fix misuse of param api This commit was SVN r28705.	2013-07-02 21:41:42 +00:00
Brian Barrett	c9a8217af6	Portals 4 doesn't have a BTL, need to default to MTL, rather than finding some stupid slow BTL. THis selection logic sucks. This commit was SVN r28704.	2013-07-02 21:18:04 +00:00
Brian Barrett	e4698f5cd4	Shell of the Portals 4 collectives componetn This commit was SVN r28703.	2013-07-02 15:23:55 +00:00
Joshua Ladd	5d2d5e958c	Deleting garbage I accidentally committed. Thanks, Nathan\! This commit was SVN r28698.	2013-07-01 22:50:54 +00:00
Joshua Ladd	d7a50343bf	Per the details and schedule outlined in the attached RFC, Mellanox Technologies would like to CMR the new 'coll/hcoll' component. This component enables Mellanox Technologies' latest HPC middleware offering - 'Hcoll'. 'Hcoll' is a high-performance, standalone collectives library with support for truly asynchronous, non-blocking, hierarchical collectives via hardware offload on supporting Mellanox HCAs (ConnectX-3 and above.) To build the component, libhcoll must first be installed on your system, then you must configure OMPI with the configure flag: '--with-hcoll=/path/to/libhcoll'. Subsequent to installing, you may select the 'coll/hcoll' component at runtime as you would any other coll component, e.g. '-mca coll hcoll,tuned,libnbc'. This has been reviewed by Josh Ladd and should be added to cmr:v1.7:reviewer=jladd This commit was SVN r28694.	2013-07-01 22:39:43 +00:00
George Bosilca	ae190246df	Oops, thanks Jeff for noticing. This commit was SVN r28693.	2013-07-01 17:51:52 +00:00
George Bosilca	e665cda6c2	Add the empty basic component where the function pointer from the base will be copied over. Without such a decoy component the entire framework will not function correctly. This commit was SVN r28692.	2013-07-01 17:47:44 +00:00
George Bosilca	dc1e68c3c1	Remove the item from the list before releasing it. This commit was SVN r28691.	2013-07-01 16:54:48 +00:00
George Bosilca	702e669636	Remove a [very] annoying warning. This commit was SVN r28690.	2013-07-01 16:49:13 +00:00
George Bosilca	5fae72b9aa	Add the MPI 2.2 MPI_Dist_graph functionality. This patch reshape the way we deal with topologies completely. Where our topologies were mainly storage components (they were not capable of creating the new communicator), the new version is built around a [possibly] common representation (in mca/topo/topo.h), but the functions to attach and retrieve the topological information are specific to each component. As a result the ompi_create_cart and ompi_create_graph functions become useless and have been removed. In addition to adding the internal infrastructure to manage the topology information, it updates the MPI interface, and the debuggers support and provides all Fortran interfaces. This commit was SVN r28687.	2013-07-01 12:40:08 +00:00
George Bosilca	b82abf6bef	Silence a compiler warning. This commit was SVN r28686.	2013-07-01 11:40:42 +00:00
Rolf vandeVaart	adda653fc1	Fix two bugs from previous commit. This commit was SVN r28684.	2013-06-28 16:32:51 +00:00
Rolf vandeVaart	850d325f32	Adjust how search is done for dynamic load of library. CUDA only. This commit was SVN r28683.	2013-06-27 22:13:25 +00:00
Jeff Squyres	e3d0782788	Move the assignment after the bozo check. This commit was SVN r28669.	2013-06-22 12:38:32 +00:00
Rolf vandeVaart	5ebb74bee3	Fix case where amount of data sent is less than expected. Otherwise, we will get hang when running the RGET protocol. Reviewed by hjelm,bosilca. This commit was SVN r28667.	2013-06-21 18:35:16 +00:00
Joshua Ladd	0b5c1f2ea8	Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd This commit was SVN r28666.	2013-06-21 15:28:14 +00:00
Mike Dubman	d1c82994be	fix: detect threading model to take appropriate flow in mxm This commit was SVN r28648.	2013-06-16 08:40:06 +00:00
Jeff Squyres	a0b27f5b28	Better comment than what was submitted in r28614. This commit was SVN r28631. The following SVN revision numbers were found above: r28614 --> open-mpi/ompi@9556310bd0	2013-06-13 20:52:44 +00:00
Mike Dubman	9556310bd0	cosmetic: add comment with rationale for malloc.h include This commit was SVN r28614.	2013-06-12 05:58:32 +00:00
Nathan Hjelm	9b1f32bf12	BTL: add flags for signaled BTL operations As per discussion in the June 2013 developer meeting these flags will be used by the PML in the future to request asynchronous progress on an operation. The naming was chosen to reflect that a BTL supports this mode (MCA_BTL_FLAG_SIGNALED) and that a descriptor should "signal" the remote side to wake up and progress the message (MCA_BTL_DES_FLAG_SIGNAL). Future commits will update OB1 to take advantage of this feature when performing the RDMA get or RDMA rendezvous protocols. This commit was SVN r28612.	2013-06-11 21:52:20 +00:00
Mike Dubman	d18b3ae1a7	fix malloc deprication error with gcc 4.6.3 on ubuntu/fedora This commit was SVN r28605.	2013-06-09 18:13:16 +00:00
George Bosilca	d789423d34	Typo. This commit was SVN r28603.	2013-06-08 10:44:02 +00:00
Vishwanath Venkatesan	0b727f84da	Avoid malloc of zero bytes, add a check and avoid it. This commit was SVN r28597.	2013-06-06 14:08:57 +00:00
Edgar Gabriel	2d4655a05a	Logic has been revised compared to the previous implementation. This commit was SVN r28594.	2013-06-05 23:47:42 +00:00
Edgar Gabriel	03c1db7a3a	fix the calculation of the UNIFORM flag. This commit was SVN r28593.	2013-06-05 23:18:50 +00:00
Vishwanath Venkatesan	7d6a05982a	Removing the gather_array based on the flag UNIFORM FVIEW for read all operations (dynamic/static), + Disabling Timing data extraction by default in dynamic write all This commit was SVN r28592.	2013-06-05 21:35:37 +00:00
Vishwanath Venkatesan	55878674d7	1. Removing the allgather_array based on the flag UNIFORM FVIEW. This is not really and optimization. 2. Fixing some of the debug printf's these are outdated. This commit was SVN r28591.	2013-06-05 21:30:15 +00:00
Jeff Squyres	713e3aa3db	Refs trac:3626: that ticket specifically refers to the v1.6 branch; this commit is the trunk version of what is needed for #3626. Add the "ignore_device" field to the INI file. This allows us to specifically list devices that should be ignored by the openib BTL (such as the Intel Phi, at least as of May 2013 -- see #3626). Also add the Intel Phi to the ini file, and set its ignore_device=1. Finally, add the concept of counting intentionally ignored verbs devices. Devices are ignored for one of two reasons: * If the number of allowed ports on that device is 0 (i.e., if if_include/if_exclude was set such that we're intentionally ignoring this device). * If the INI ignore_device field for this device is set to 1. Once we have the count of devices that were intentionally ignored, only show the "Hey, there's verbs devices that you're not using!" show_help message if there are devices that were ''unintentionally'' ignored. This commit was SVN r28589. The following Trac tickets were found above: Ticket 3626 --> https://svn.open-mpi.org/trac/ompi/ticket/3626	2013-06-05 12:12:09 +00:00
Jeff Squyres	3019b7a3f8	Oops! Remove duplicate registration. This commit was SVN r28588.	2013-06-05 11:55:19 +00:00
Jeff Squyres	1de00b17ad	Properly check the return status from registering the MCA params. This commit was SVN r28587.	2013-06-05 11:53:18 +00:00
Jeff Squyres	d692aba672	Remove the DR PML. It was abondoned long ago. It had a nice life, a few papers, and now a decent demise with respect. This commit was SVN r28582.	2013-06-04 19:36:16 +00:00
Edgar Gabriel	87b3782b7f	arghh, copy-and-paste error, status->_ucount has to be set to 0 not max_data for count=0. This commit was SVN r28576.	2013-05-30 22:00:29 +00:00
Edgar Gabriel	9daec82f17	- make a fileview of 0 bytes work in ompio - fixes the bug reported in ticket 3619 (which is already closed) also for ompio This commit was SVN r28575.	2013-05-30 21:33:13 +00:00
Rolf vandeVaart	3d1d158a80	Do not abort in BTL. Rather, callback into PML error function. Thanks George for review. This commit was SVN r28559.	2013-05-23 18:45:23 +00:00
Nathan Hjelm	721779d7ab	Per RFC: remove old MCA parameter system. This commit was SVN r28541.	2013-05-20 15:36:13 +00:00
Ralph Castain	889bf60c64	Fix bad merge This commit was SVN r28540.	2013-05-18 01:29:55 +00:00
Jeff Squyres	089c632cce	Remove a bunch of dead code: gcc 4.7 warns of set-but-unused variables. So get rid of them. This commit was SVN r28538.	2013-05-17 21:45:49 +00:00
Edgar Gabriel	1b1051da6c	fix a bug in the calculation of the explicit offset. Use the opportunity to clean up the code a bit. This commit was SVN r28537.	2013-05-17 20:22:00 +00:00
Ralph Castain	3e6e1046a3	fix a correctness issue by returning an error if waitall fails and invoking the mpi error handler cmr:v1.7.2:reviewer=jsquyres This commit was SVN r28533.	2013-05-16 15:04:37 +00:00
Rolf vandeVaart	91fdb423d7	Fix warning in CUDA-aware code. This commit was SVN r28511.	2013-05-14 21:04:15 +00:00
Rolf vandeVaart	52ebb0b17f	Change some opal_output to OPAL_OUTPUT per CMR review. This commit was SVN r28510.	2013-05-14 20:49:42 +00:00
Nathan Hjelm	32a8ff5255	btl/openib: bump up udcm priority This commit was SVN r28505.	2013-05-14 20:02:40 +00:00
Edgar Gabriel	d5cae9aced	- fix the mca stripe size and stripe depth parameter logic in the pvfs2 component - correctly recognize and handle the corresponding info objects. This commit was SVN r28497.	2013-05-14 16:11:39 +00:00
Yossi Etigin	64d98e0438	Fix data corruption in MXM by registering to OPAL memory release hooks and removing any mappings created by mxm This commit was SVN r28489.	2013-05-14 12:27:44 +00:00
Rolf vandeVaart	9d569f1487	Fix warning when compiling in CUDA aware code. This commit was SVN r28476.	2013-05-10 21:29:08 +00:00
Nathan Hjelm	422331b4da	btl/openib: fix unconnected datagram connection method (udcm) The primary issue with udcm is that the immediate data in message acks were often bogus. This caused the sender to keep trying even though a message was received and acked. The fix is to use the source LID and QP to determine which message is being acked. In most cases this should work well since only one message will be in flight to any peer. This commit was SVN r28444.	2013-05-03 17:11:38 +00:00
Jeff Squyres	c8258c06e2	In coll_sm, we alloc a huge chunk of shared memory, divvy it into lots of individual regions (each region is a multiple of page size in length), and each process claims its own regions by binding it to its local memory. Each process would end up membining something like 16 individual regions in the overall shmem segment. There were two errors in this code relating to the memory affinity pinning. Some combination of these two errors would lead to kernel panics (!) on my RHEL 6.2 x86_64 machines when used with mmap'ed shared memory (not posix or sysv shared memory, curiously enough): 1. The shared memory segment is initially divided into two regions: control and data. The control starts at the beginning of the shmem segment, the data starts after that. The data portion, unfortunately, was ''not'' aligned to a page. So all the multiple-of-page-size regions that we divvy up were also not alined on page boundaries. And therefore all the regions we tried to membind were not on page boundaries. The solution was to ensure that the data portion started on a page boundary. Then all of the individual regions were on page boundaries, too. That being said, in my tests, Linux mbind() fails gracefully when the address is not on a page boundary. So I'm not sure how this worked at all / led to a kernel panic... 2. There was some bad pointer math that resulted in membinding regions larger than they should have been, resulting in region overlaps. There were definitely overlaps between regions in the same process; it's likely that there were overlaps between regions of multiple processes, too -- I'm not sure (and don't care to figure out :-) ). The solution was to fix the pointer math so that each region membinds exactly only itself and no neighboring/overlapping regions. cmr:v1.7.2:reviewer=samuel This commit was SVN r28442.	2013-05-03 12:49:35 +00:00
Alex Mikheev	9e2fdc7d56	- correction of r28440 This commit was SVN r28441. The following SVN revision numbers were found above: r28440 --> open-mpi/ompi@93ce233530	2013-05-02 12:52:58 +00:00
Alex Mikheev	93ce233530	- btl_openib: changed default SRQ settings: - increase number of wqe to minimize number of RNRs - it is better to have high watermark and post relatively small number of wqes - increased TX queue size This commit was SVN r28440.	2013-05-02 12:46:35 +00:00
Alex Mikheev	f76680fbd0	- btl_openib: fix total registered memory calculation for ConnectIB and Ofed 2.0 This commit was SVN r28432.	2013-05-01 13:39:29 +00:00
Jeff Squyres	d92a8e01f8	Use the _SAFE list traversal macro so that we can remove each item from the list (just for good measure), and then free() it (without using _SAFE, we were accessing memory that was just free()'d to get to the next item). Also be a little more thorough -- DESTRUCT the list when we're all done. This commit was SVN r28429.	2013-05-01 12:26:16 +00:00
George Bosilca	8b0335380a	Fix the error messages to reference the correct function. This commit was SVN r28425.	2013-04-30 23:26:03 +00:00
George Bosilca	6a75c84fa8	Remove useless define. This commit was SVN r28424.	2013-04-30 23:24:59 +00:00
Ralph Castain	9de82aba55	Revert r28417 - given the non-standard way vprotocol is implemented, I see no way to use the framework verbosity here. Best to just leave it alone as those who use it know what they need to do to get debug output This commit was SVN r28418. The following SVN revision numbers were found above: r28417 --> open-mpi/ompi@b00de5be8b	2013-04-30 16:37:17 +00:00
Nathan Hjelm	b00de5be8b	vprotocol: remove the old output and use the framework output This commit was SVN r28417.	2013-04-30 15:21:42 +00:00
Ralph Castain	ceb4061214	Fix BTL_VERBOSE - when the MCA param change was committed, it left the base verbosity variable declared so things compiled. Sadly, the verbosity was now being set to a new variable, so debug never was output. This commit was SVN r28414.	2013-04-30 01:15:52 +00:00
Nathan Hjelm	f384263de7	btl/openib: fix typo This commit was SVN r28413.	2013-04-29 22:21:25 +00:00
Ralph Castain	5d7a93c032	Add the ability to use an external version of libevent. Clearly not recommended at this time. I've verified that it works in limited scenarios, but more thorough testing and performance impacts need to be assessed. Interesting how many includes had to be fixed here and there to fill in missing dependencies :-) This commit was SVN r28411.	2013-04-29 17:02:37 +00:00
Ralph Castain	8996ecb128	Add missing include This commit was SVN r28405.	2013-04-27 00:09:36 +00:00
Jeff Squyres	f55cea1a5b	If there are no BTLs, do ''not'' actually shut down the fd listener, because a) it may still be needed to shut down the CPCs, and b) it will be shut down during component_close(). This commit was SVN r28402.	2013-04-26 15:31:50 +00:00
Jeff Squyres	99b7a0f20d	Remove unused variables. This commit was SVN r28401.	2013-04-26 15:29:42 +00:00
Vishwanath Venkatesan	c902624b59	Using ompi_type_destroy to free ompi_datatype. This had to be updated in all the collective algorithms. Hopefully this will fix all warnings. This commit was SVN r28385.	2013-04-24 19:27:26 +00:00
Nathan Hjelm	2edff7f784	btl/openib: don't free string handle by MCA variable system This commit was SVN r28383.	2013-04-24 18:59:18 +00:00
Alex Margolin	aebd794bf6	Fixed macro definition order in MXM component headers This commit was SVN r28378.	2013-04-24 16:51:43 +00:00
Vishwanath Venkatesan	bba4a93f63	Got this wrong while replacing MPI function with OMPI functions. Fixed it now. This commit was SVN r28350.	2013-04-22 19:58:25 +00:00
Rolf vandeVaart	5e1dde419c	Fix some compile errors in CUDA-aware code that has crept in. This commit was SVN r28346.	2013-04-18 15:34:16 +00:00
Vishwanath Venkatesan	53753622d4	Changing some of the MPI_ functions to ompi_ equivalents. This commit was SVN r28342.	2013-04-17 21:06:36 +00:00
Alex Margolin	0ab7675019	Fix MXM connection establishment flow This commit was SVN r28329.	2013-04-12 16:37:42 +00:00
Steve Wise	134baaf2fa	Add Chelsio T5 device. This fixes trac:3552 and should be added to cmr:v1.6:reviewer=jsquyres and cmr:v1.7:reviewer=jsquyres This commit was SVN r28327. The following Trac tickets were found above: Ticket 3552 --> https://svn.open-mpi.org/trac/ompi/ticket/3552	2013-04-11 19:30:53 +00:00
George Bosilca	2d33c9ee39	Stop complaining about an overwritten default parameter. This commit was SVN r28322.	2013-04-10 19:44:37 +00:00
Jeff Squyres	8405975bf6	Be a little more conservative about initializing devices and modules (i.e., ensure that more data items get zeroed out/set to NULL) so that if something goes wrong during initialization, we don't try to clean up something that isn't there (and segv). The chance of this happening on the trunk is very low (and will also be low once the verbs improvements are brought over to v1.7). But it can actually happen in the v1.6 branch (e.g., if no CPC is available, we'll try to get the length of the endpoints list, but the endpoints list is NULL). Hence, even though the real goal is to get this functionality over to v1.6, I figured I'd commit to the trunk/CMR to v1.7 just to try to keep commonality in the openib between all three where possible. This commit was SVN r28317.	2013-04-09 21:55:31 +00:00
Ralph Castain	45af6cf59e	The move of the orte_db framework to opal required that we create an opaque opal_identifier_t type as OPAL cannot know anything about the ORTE process name. However, passing a value down to opal and then having the db components reference it causes alignment issues on Solaris Sparc platforms. So pass the pointer instead and do the old "memcpy" trick to avoid the problem. This commit was SVN r28308.	2013-04-08 23:34:16 +00:00
Nathan Hjelm	4e95d691a7	pml/ob1: do not reset the convertor if one was not created (size = 0). This macro is only used on the failure path so the additional if statement should not have any affect on performance. cmr:v1.7 This commit was SVN r28292.	2013-04-05 01:40:11 +00:00
Pavel Shamis	fed6e60131	Fixing OpenIB BTL compilation failure for a cases when BTL_OPENIB_MALLOC_HOOKS_ENABLED is disabled. This commit was SVN r28290.	2013-04-04 20:17:18 +00:00
Pavel Shamis	aa1f5697b4	In order to prevent name conflicts in XRC (MOFED) enabled mode OFACM's ib_address_t was renamed to ofacm_ib_address_t This commit was SVN r28289.	2013-04-04 20:02:17 +00:00
Nathan Hjelm	e8d9944456	sbgp/ibnet: fix param -> var update errors This commit was SVN r28284.	2013-04-03 20:17:18 +00:00
Nathan Hjelm	75093155ab	bcol/iboffload: fix still more errors from param -> var updates This commit was SVN r28283.	2013-04-03 19:57:03 +00:00
Nathan Hjelm	47a1897710	bcol/iboffload: fix more errors from param -> var updates This commit was SVN r28281.	2013-04-03 18:55:46 +00:00
Nathan Hjelm	31a498c2a1	bcol/iboffload: fix errors from param -> var updates This commit was SVN r28280.	2013-04-03 18:33:19 +00:00
Ralph Castain	66f3a81488	Cleanup warnings found when building v1.7 cmr:v1.7 This commit was SVN r28279.	2013-04-03 17:37:02 +00:00
Vishwanath Venkatesan	74c418b860	Adding typecasting with intptr_t to remove warnings. This commit was SVN r28278.	2013-04-03 17:07:43 +00:00
Vishwanath Venkatesan	784337aab1	typecasting with intptr_t to remove warnings This commit was SVN r28276.	2013-04-03 17:06:02 +00:00
Jeff Squyres	64d39a4e97	Technically speaking, we're creating a QP with 1 send WQE and 1 receive WQE, so it's good form to have a CQ with 2 entries, not 1. This commit was SVN r28256.	2013-03-28 13:11:31 +00:00
George Bosilca	9c6374b515	Swap the open and register. This commit was SVN r28253.	2013-03-27 22:19:57 +00:00
Nathan Hjelm	f1fa290157	btl/vader: add missing return statement This commit was SVN r28252.	2013-03-27 22:16:21 +00:00
Nathan Hjelm	113fadd749	btl/vader: do not use common/sm for shared memory fragments This commit was SVN r28250.	2013-03-27 22:10:02 +00:00
Nathan Hjelm	9d4a26f47d	Update OMPI frameworks to use the MCA framework system. Notes: - This commit also eliminates the need for an available components list in use in several frameworks. None of the code in question was making use of the priority field of the priority component list item so these extra lists were removed. - Cleaned up selection code in several frameworks to sort lists using opal_list_sort. - Cleans up the ompi/orte-info functions. Expose the functions that construct the list of params so they can be used elsewhere. patches for mtl/portals4 from brian missed a few output variables in openib This commit was SVN r28241.	2013-03-27 21:17:31 +00:00
Nathan Hjelm	c041156f60	Update ORTE frameworks to use the MCA framework system. This commit was SVN r28240.	2013-03-27 21:14:43 +00:00
Nathan Hjelm	cf377db823	MCA/base: Add new MCA variable system Features: - Support for an override parameter file (openmpi-mca-param-override.conf). Variable values in this file can not be overridden by any file or environment value. - Support for boolean, unsigned, and unsigned long long variables. - Support for true/false values. - Support for enumerations on integer variables. - Support for MPIT scope, verbosity, and binding. - Support for command line source. - Support for setting variable source via the environment using OMPI_MCA_SOURCE_<var name>=source (either command or file:filename) - Cleaner API. - Support for variable groups (equivalent to MPIT categories). Notes: - Variables must be created with a backing store (char *, int , or bool *) that must live at least as long as the variable. - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of mca_base_var_set_value() to change the value. - String values are duplicated when the variable is registered. It is up to the caller to free the original value if necessary. The new value will be freed by the mca_base_var system and must not be freed by the user. - Variables with constant scope may not be settable. - Variable groups (and all associated variables) are deregistered when the component is closed or the component repository item is freed. This prevents a segmentation fault from accessing a variable after its component is unloaded. - After some discussion we decided we should remove the automatic registration of component priority variables. Few component actually made use of this feature. - The enumerator interface was updated to be general enough to handle future uses of the interface. - The code to generate ompi_info output has been moved into the MCA variable system. See mca_base_var_dump(). opal: update core and components to mca_base_var system orte: update core and components to mca_base_var system ompi: update core and components to mca_base_var system This commit also modifies the rmaps framework. The following variables were moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode, rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables. This commit was SVN r28236.	2013-03-27 21:09:41 +00:00
Ralph Castain	317915225c	Finish the binding cleanup by removing the no-longer-used binding level scheme. This proved to be fallible as there is no guarantee that the hierarchy it used matched physical reality of the machine (e.g., is L3 "above" the socket or not). Still have to complete the ppr update, but get the rest of it correct. This commit was SVN r28223.	2013-03-26 20:09:49 +00:00
Jeff Squyres	44e371a65d	Remove (bogus) port number from the opal_output -- there's no port number associated with creating a QP. This commit was SVN r28222.	2013-03-26 19:48:50 +00:00
Vishwanath Venkatesan	e092cc34e0	Fixing the read all bugs discovered by Coverity This commit was SVN r28189.	2013-03-20 20:27:09 +00:00
Samuel Gutierrez	8ce2041102	Cleanup in error path. Fixes CID 967211. Thanks, Jeff. This commit was SVN r28183.	2013-03-19 20:00:08 +00:00
Jeff Squyres	2513122d31	Remove extraneous semicolon. This commit was SVN r28180.	2013-03-18 23:58:11 +00:00
Jeff Squyres	7ac02fb9d4	Two fixes for the ROMIO io module: * Don't call PMPI_* anything from our module code; that's terribly bad form (and disallowed!). Instead, do the proper back-end stuff to reset the error handler on the file handle. * If we've already started to MPI_Finalize, then just give up and don't actually perform all the file closing actions (because ROMIO's file close calls MPI_Barrier, which will obviously fail if MPI_Finalize has already been invoked). Bad user behavior should be punished (by leaking resources, not closing the file properly, etc.). This commit was SVN r28177.	2013-03-18 20:11:20 +00:00
Vasily Filipov	7bda23dd84	SBGP, BCOL: add missing "show_help.h" includes. This commit was SVN r28163.	2013-03-10 09:11:09 +00:00
Brian Barrett	65109de931	Fix leak of comm and datatype references for mprobe/improbe and fix a request leak in improbe This commit was SVN r28157.	2013-03-07 21:55:22 +00:00
Brian Barrett	db858827df	Fill in more of the process info structure when using PMI This commit was SVN r28152.	2013-03-06 19:32:47 +00:00
Brian Barrett	a67d768ee4	quick hack to get things compiling again. Still need to fill in the fixme parts. sigh. This commit was SVN r28150.	2013-03-06 18:33:25 +00:00
Nathan Hjelm	3c5cd95087	mtl/psm: add missing header for opal_show_help (one more) This commit was SVN r28147.	2013-03-05 00:18:51 +00:00
Nathan Hjelm	25d0d97d6b	mtl/psm: add missing header for opal_show_help This commit was SVN r28146.	2013-03-05 00:17:48 +00:00
Nathan Hjelm	213cb79fab	mtl/psm: add missing header for opal_show_help This commit was SVN r28145.	2013-03-05 00:15:11 +00:00
Rolf vandeVaart	037729dcbb	Add a search path. Refactor code. This commit was SVN r28142.	2013-03-01 21:50:56 +00:00
Rolf vandeVaart	5c761d701d	Remove tabs for spaces, fix some error messages. This commit was SVN r28141.	2013-03-01 19:13:06 +00:00
Rolf vandeVaart	ebe63118ac	Remove dependency on libcuda.so when building in CUDA-aware support. Dynamically load it if needed. This commit was SVN r28140.	2013-03-01 13:21:52 +00:00
Ralph Castain	a4b6fb241f	Remove all remaining vestiges of the Windows integration This commit was SVN r28137.	2013-02-28 17:31:47 +00:00
Nathan Hjelm	b5a2cd1cce	remove csum pml This commit was SVN r28133.	2013-02-28 00:17:56 +00:00
Brian Barrett	1370d4569a	workaround for case when MD can't span all of memory (sigh) This commit was SVN r28132.	2013-02-27 17:02:45 +00:00
Vasily Filipov	f897c8a1e0	MTL MXM: STREAM supporting for isend and irecv. This commit was SVN r28122.	2013-02-27 13:21:30 +00:00
Ralph Castain	8d2fa3693b	First cut at removing the native Windows support. Remove all the Windows-specific components, and the .windows files sprinkled around. Remove the Windows platform files and MTT scripts. Update the NEWS to point Windows users to the cygwin package. This commit was SVN r28116.	2013-02-26 20:44:56 +00:00
Ralph Castain	bd9265c560	Per the meeting on moving the BTLs to OPAL, move the ORTE database "db" framework to OPAL so the relocated BTLs can access it. Because the data is indexed by process, this requires that we define a new "opal_identifier_t" that corresponds to the orte_process_name_t struct. In order to support multiple run-times, this is defined in opal/mca/db/db_types.h as a uint64_t without identifying the meaning of any part of that data. A few changes were required to support this move: 1. the PMI component used to identify rte-related data (e.g., host name, bind level) and package them as a unit to reduce the number of PMI keys. This code was moved up to the ORTE layer as the OPAL layer has no understanding of these concepts. In addition, the component locally stored data based on process jobid/vpid - this could no longer be supported (see below for the solution). 2. the hash component was updated to use the new opal_identifier_t instead of orte_process_name_t as its index for storing data in the hash tables. Previously, we did a hash on the vpid and stored the data in a 32-bit hash table. In the revised system, we don't see a separate "vpid" field - we only have a 64-bit opaque value. The orte_process_name_t hash turned out to do nothing useful, so we now store the data in a 64-bit hash table. Preliminary tests didn't show any identifiable change in behavior or performance, but we'll have to see if a move back to the 32-bit table is required at some later time. 3. the db framework was a "select one" system. However, since the PMI component could no longer use its internal storage system, the framework has now been changed to a "select many" mode of operation. This allows the hash component to handle all internal storage, while the PMI component only handles pushing/pulling things from the PMI system. This was something we had planned for some time - when fetching data, we first check internal storage to see if we already have it, and then automatically go to the global system to look for it if we don't. Accordingly, the framework was provided with a custom query function used during "select" that lets you seperately specify the "store" and "fetch" ordering. 4. the ORTE grpcomm and ess/pmi components, and the nidmap code, were updated to work with the new db framework and to specify internal/global storage options. No changes were made to the MPI layer, except for modifying the ORTE component of the OMPI/rte framework to support the new db framework. This commit was SVN r28112.	2013-02-26 17:50:04 +00:00
Ralph Castain	70a28c8a27	Now that we are using local ranks in OMPI, we need to define an ompi_local_rank_t and equate it to orte_local_rank_t. Change the sm btl to use the correct abstraction. This commit was SVN r28098.	2013-02-22 17:48:53 +00:00
Samuel Gutierrez	af5ed9b25c	OMPI_NODE_RANK_INVALID ==> OMPI_LOCAL_RANK_INVALID This commit was SVN r28096.	2013-02-21 18:28:07 +00:00
Samuel Gutierrez	4bf0134901	Remove debug. This commit was SVN r28095.	2013-02-21 18:21:22 +00:00
Samuel Gutierrez	b7791963f2	Fix sm BTL initialization for MPI_Comm_spawn and friends. Thanks to Jeff for finding the issue. This commit was SVN r28094.	2013-02-21 18:19:46 +00:00
Nathan Hjelm	55cf850eca	Add comment about r28083 This commit was SVN r28084. The following SVN revision numbers were found above: r28083 --> open-mpi/ompi@5411e28c00	2013-02-20 21:42:13 +00:00
Nathan Hjelm	5411e28c00	btl/openib: don't align fragments on 2 byte boundaries (changed to 8) cmr:v1.6,v1.7 This commit was SVN r28083.	2013-02-20 21:27:01 +00:00
Rolf vandeVaart	da3e9ff906	Add show_help.h where needed. This commit was SVN r28071.	2013-02-19 15:42:09 +00:00
Brian Barrett	3c83618799	fix a missing header file issue with IB This commit was SVN r28070.	2013-02-18 18:29:14 +00:00
Vasily Filipov	52a9241859	MTL MXM: adapt to mxm 2.0 api changes - flags are only for send requests, and SYNC is part of the opcode. This commit was SVN r28069.	2013-02-17 10:04:19 +00:00
Vasily Filipov	8270d8f52a	MTL MXM: "#include "opal/util/show_help.h" adding. This commit was SVN r28068.	2013-02-17 09:51:03 +00:00
Ralph Castain	ebad55b933	Apply patches from ORNL to fix compile issues - minor stuff. Thanks to Geoffroy Vallee for the patches. This commit was SVN r28065.	2013-02-15 22:14:23 +00:00
Jeff Squyres	bbddd6ea03	Add header file for opal_show_help(). This commit was SVN r28056.	2013-02-13 16:31:59 +00:00
Brian Barrett	312f37706e	In talking about this with Jeff and Ralph, we don't actually need ompi_show_help, because opal_show_help is replaced with an aggregating version when using ORTE, so there's no reason to directly call orte_show_help. This commit was SVN r28051.	2013-02-12 21:10:11 +00:00
Joshua Ladd	70ad711337	Backing out the Open SHMEM project This commit was SVN r28050.	2013-02-12 17:45:27 +00:00
Mike Dubman	ff384daab4	Added new project: oshmem. This commit was SVN r28048.	2013-02-12 15:33:21 +00:00
Mike Dubman	55cb00f8a3	Remove references to unexisting files: ompi/mca/common/netpatterns/ ompi/mca/common/commpatterns/ This commit was SVN r28044.	2013-02-12 13:21:47 +00:00
Pavel Shamis	a31bc57849	Moving mca/common/netpatterns and commpaterns to ompi/patterns. This commit was SVN r28035.	2013-02-05 21:52:55 +00:00
Brian Barrett	d80218996f	Rather than setting up the direct call stuff in ompi_mca (which requires modifying ompi_mca for every interface that is direct called), do it in the framework's .m4 file. This commit was SVN r28031.	2013-02-04 23:26:42 +00:00
Vasily Filipov	21b170b43b	MTL MXM: push commit r27987 back, now with right user. r27987 - MTL MXM: ver. 2.0 interface changes. This commit was SVN r28026. The following SVN revision numbers were found above: r27987 --> open-mpi/ompi@2735658d81	2013-02-04 06:59:24 +00:00
Vasily Filipov	aa5e436479	Revert revesion -r27986, the reason is - it was submitted with wrong user name. This commit was SVN r28025. The following SVN revision numbers were found above: r27986 --> open-mpi/ompi@729caaf0cd	2013-02-04 06:54:24 +00:00
Jeff Squyres	c8dc1905f0	Fixes trac:3494: If we get 0 bytes back for the ACK, it doesn't necessarily mean an error -- it could (and usually does) mean that the peer realized that we both initiated a connect at the same time, and therefore it decided to hang up. I also added a friendly show_help error message for other cases where recv_blocking() fails (i.e., "Something went wrong. Kaboom! Your job will abort..."). This commit was SVN r28023. The following Trac tickets were found above: Ticket 3494 --> https://svn.open-mpi.org/trac/ompi/ticket/3494	2013-02-02 01:19:03 +00:00
Jeff Squyres	f05b7aa6d8	As the help message states, it's not an ''error'' if the specified interface is not found. It should just be skipped. This commit was SVN r28016.	2013-02-01 20:17:43 +00:00
Ralph Castain	afb0db5b6f	Okay, Jeff - just for you...flow the show help thru the orte functions so help messages will be aggregated This commit was SVN r28007.	2013-02-01 00:35:48 +00:00
Ralph Castain	e6555408f4	When we say abort, we mean ABORT!! Actually implement the ompi_rte_abort and ompi_rte_show_help functions in the ORTE module. This commit was SVN r28004.	2013-01-31 23:12:11 +00:00
Igor Usarov	8d80af6c10	Support FCA v3.0 This commit was SVN r27988.	2013-01-31 11:14:27 +00:00
Pavel Shamis	2735658d81	MTL MXM: ver. 2.0 interface changes. This commit was SVN r27987.	2013-01-31 08:38:08 +00:00
Rolf vandeVaart	729caaf0cd	Remove any dependency on libcuda.so in opal layer. All changes are within OMPI_CUDA_SUPPORT code. This commit was SVN r27986.	2013-01-30 23:07:32 +00:00
Rolf vandeVaart	aa04de4f1e	Add run-time parameter to enable and disable CUDA GPU support. This commit was SVN r27970.	2013-01-29 20:24:04 +00:00
Rolf vandeVaart	de5b7f5c6a	Add mpool_base_verbose parameter. All the other base components appear to have this and it can help with debug. This commit was SVN r27968.	2013-01-29 17:52:18 +00:00
Brian Barrett	49b2b5bf4f	Fix double-install issue when --with-devel-headers is used This commit was SVN r27967.	2013-01-29 17:23:18 +00:00
Brian Barrett	b8442ba505	Revamp the handling of wrapper compiler flags. The user flags, main configure flags, and mca flags are kept seperate until the very end. The main configure wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD macro. MCA components should either let <framework>_<component>_{LIBS,LDFLAGS} be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}. The situations in which WRAPPER CPPFLAGS can be set by MCA components was made very small to match the one use case where it makes sense. This commit was SVN r27950.	2013-01-29 00:00:43 +00:00
Rolf vandeVaart	b5672927f2	Fix build issue when building with --disable-dlopen. This commit was SVN r27945.	2013-01-28 20:14:59 +00:00
Rolf vandeVaart	c6412f6dff	Add new rte headers in files that need them. This commit was SVN r27943.	2013-01-28 19:32:33 +00:00
Pavel Shamis	1f1e1efb7b	Removing leftovers of old infrastructure. cmr:v1.7 This commit was SVN r27942.	2013-01-28 19:11:42 +00:00
Vishwanath Venkatesan	5be992f445	The pointer to the structure was also never allocated before retrieving the stripe size. Fixing that too. This commit was SVN r27941.	2013-01-28 07:21:22 +00:00
Vishwanath Venkatesan	817f6cd868	To remove the warning due to uninitialized variable. This commit was SVN r27940.	2013-01-28 06:55:46 +00:00
George Bosilca	4defdea9f2	The shortest lifespan for a BTL. This commit was SVN r27939.	2013-01-28 03:43:23 +00:00
George Bosilca	1b7dff3f2f	A copy for posterity of the Open MPI Sicortex BTL. This commit was SVN r27938.	2013-01-28 03:42:52 +00:00
Brian Barrett	f42783ae1a	Move the RTE framework change into the trunk. With this change, all non-CR runtime code goes through one of the rte, dpm, or pubsub frameworks. This commit was SVN r27934.	2013-01-27 23:25:10 +00:00
Brian Barrett	14f4aa1198	Fix memory leak in nbc init This commit was SVN r27884.	2013-01-21 22:45:59 +00:00
Brian Barrett	407714a85a	Fix a memory leak in the RDMA one-sided component. Thanks to Victor Vysotskiy for letting us know about this one. This commit was SVN r27883.	2013-01-21 22:45:37 +00:00
George Bosilca	42753b4690	Make the TCP BTL really fail-safe. It now trigger the error callback on all pending fragments when the destination goes down. This allows the PML to recalibrate its behavior, either find an alternate route or just give up. This commit was SVN r27881.	2013-01-21 11:41:08 +00:00
George Bosilca	d2281cc672	Remove the CMA related warnings. This commit was SVN r27872.	2013-01-19 14:26:43 +00:00
Rolf vandeVaart	f63c88701f	Improve CUDA GPU transfers over openib BTL. Use aynchronous copies. This is RFC that was submitted in July and December of 2012. This commit was SVN r27862.	2013-01-17 22:34:43 +00:00
Rolf vandeVaart	a07a4bb3f7	Update smcuda to match recent changes in sm BTL. This commit was SVN r27803.	2013-01-14 14:42:19 +00:00
Rolf vandeVaart	34d1f0a585	Add some comments to the #ifdefs for clarity. No functional changes. This commit was SVN r27802.	2013-01-13 16:08:48 +00:00
Alex Mikheev	344d407ed4	fixed compilation warning always send signalled when BTL_OPENIB_FAILOVER is defined This commit was SVN r27801.	2013-01-13 10:11:03 +00:00
Jeff Squyres	b2d5d1e348	Along with the Automake 1.13.x changes in r27790, rename these third party configure.in scripts to be configure.ac so that Automake stops complaining about them. This commit was SVN r27791. The following SVN revision numbers were found above: r27790 --> open-mpi/ompi@675a2f5c48	2013-01-11 20:26:19 +00:00
Jeff Squyres	675a2f5c48	Updates for Automake 1.13.x. Without these changes, Automake 1.13.x will error out, due to use of the previously-deprecated-and-now-removed AM_CONFIG_HEADER macro. This commit was SVN r27790.	2013-01-11 20:20:02 +00:00
Samuel Gutierrez	4c28c8cbd0	New sm BTL initialization take two. This approach is pretty simple. Instead of using the modex or RML to share sm initialization information, have node rank 0 create a file containing initialization information in a well-known place. Then during add_procs, the rest of the node processes requiring sm BTL initialization will just read from that file to complete their initialization. This commit was SVN r27789.	2013-01-11 16:24:56 +00:00
Brian Barrett	b817166072	Use a process name instead of a name list in bcol_basesmuma This commit was SVN r27779.	2013-01-09 16:43:49 +00:00
Joshua Ladd	77df51c516	Fixes the definition of the first fragment and does not assume that first frag has offset_into_user_buff equal to zero. This fix should be added to cmr:v1.7.1:reviewer=pasha This commit was SVN r27775.	2013-01-08 20:24:58 +00:00
Alex Mikheev	fe672f255f	request signal when sending over SRQ and number of SRQ sd_credits is 0 This commit was SVN r27767.	2013-01-08 14:00:29 +00:00
Samuel Gutierrez	c4acd20eb9	Backout r27739. This commit was SVN r27745. The following SVN revision numbers were found above: r27739 --> open-mpi/ompi@a159bfaf25	2013-01-05 01:54:23 +00:00
Nathan Hjelm	84e34ee0d7	Fix a bug in the uGNI btl that could cause certain descriptor callbacks to be called twice. There was a race condition in the eager get protocol where the RDMA complete message could be received before the local completion of the SMSG message that started the eager get protocol. cmr:v1.7 This commit was SVN r27740.	2013-01-03 23:11:13 +00:00
Samuel Gutierrez	a159bfaf25	sm BTL initialization via modex, as discussed at last year's meeting. This commit was SVN r27739.	2013-01-03 21:52:20 +00:00
Mike Dubman	889d46e966	support for FCA v3.0 and up This commit was SVN r27731.	2012-12-31 05:49:22 +00:00

... 3 4 5 6 7 ...

4504 Коммитов