openmpi

Автор	SHA1	Сообщение	Дата
Vishwanath Venkatesan	55878674d7	1. Removing the allgather_array based on the flag UNIFORM FVIEW. This is not really and optimization. 2. Fixing some of the debug printf's these are outdated. This commit was SVN r28591.	2013-06-05 21:30:15 +00:00
Jeff Squyres	713e3aa3db	Refs trac:3626: that ticket specifically refers to the v1.6 branch; this commit is the trunk version of what is needed for #3626. Add the "ignore_device" field to the INI file. This allows us to specifically list devices that should be ignored by the openib BTL (such as the Intel Phi, at least as of May 2013 -- see #3626). Also add the Intel Phi to the ini file, and set its ignore_device=1. Finally, add the concept of counting intentionally ignored verbs devices. Devices are ignored for one of two reasons: * If the number of allowed ports on that device is 0 (i.e., if if_include/if_exclude was set such that we're intentionally ignoring this device). * If the INI ignore_device field for this device is set to 1. Once we have the count of devices that were intentionally ignored, only show the "Hey, there's verbs devices that you're not using!" show_help message if there are devices that were ''unintentionally'' ignored. This commit was SVN r28589. The following Trac tickets were found above: Ticket 3626 --> https://svn.open-mpi.org/trac/ompi/ticket/3626	2013-06-05 12:12:09 +00:00
Jeff Squyres	3019b7a3f8	Oops! Remove duplicate registration. This commit was SVN r28588.	2013-06-05 11:55:19 +00:00
Jeff Squyres	1de00b17ad	Properly check the return status from registering the MCA params. This commit was SVN r28587.	2013-06-05 11:53:18 +00:00
Jeff Squyres	d692aba672	Remove the DR PML. It was abondoned long ago. It had a nice life, a few papers, and now a decent demise with respect. This commit was SVN r28582.	2013-06-04 19:36:16 +00:00
Edgar Gabriel	87b3782b7f	arghh, copy-and-paste error, status->_ucount has to be set to 0 not max_data for count=0. This commit was SVN r28576.	2013-05-30 22:00:29 +00:00
Edgar Gabriel	9daec82f17	- make a fileview of 0 bytes work in ompio - fixes the bug reported in ticket 3619 (which is already closed) also for ompio This commit was SVN r28575.	2013-05-30 21:33:13 +00:00
Rolf vandeVaart	3d1d158a80	Do not abort in BTL. Rather, callback into PML error function. Thanks George for review. This commit was SVN r28559.	2013-05-23 18:45:23 +00:00
Nathan Hjelm	721779d7ab	Per RFC: remove old MCA parameter system. This commit was SVN r28541.	2013-05-20 15:36:13 +00:00
Ralph Castain	889bf60c64	Fix bad merge This commit was SVN r28540.	2013-05-18 01:29:55 +00:00
Jeff Squyres	089c632cce	Remove a bunch of dead code: gcc 4.7 warns of set-but-unused variables. So get rid of them. This commit was SVN r28538.	2013-05-17 21:45:49 +00:00
Edgar Gabriel	1b1051da6c	fix a bug in the calculation of the explicit offset. Use the opportunity to clean up the code a bit. This commit was SVN r28537.	2013-05-17 20:22:00 +00:00
Ralph Castain	3e6e1046a3	fix a correctness issue by returning an error if waitall fails and invoking the mpi error handler cmr:v1.7.2:reviewer=jsquyres This commit was SVN r28533.	2013-05-16 15:04:37 +00:00
Rolf vandeVaart	91fdb423d7	Fix warning in CUDA-aware code. This commit was SVN r28511.	2013-05-14 21:04:15 +00:00
Rolf vandeVaart	52ebb0b17f	Change some opal_output to OPAL_OUTPUT per CMR review. This commit was SVN r28510.	2013-05-14 20:49:42 +00:00
Nathan Hjelm	32a8ff5255	btl/openib: bump up udcm priority This commit was SVN r28505.	2013-05-14 20:02:40 +00:00
Edgar Gabriel	d5cae9aced	- fix the mca stripe size and stripe depth parameter logic in the pvfs2 component - correctly recognize and handle the corresponding info objects. This commit was SVN r28497.	2013-05-14 16:11:39 +00:00
Yossi Etigin	64d98e0438	Fix data corruption in MXM by registering to OPAL memory release hooks and removing any mappings created by mxm This commit was SVN r28489.	2013-05-14 12:27:44 +00:00
Rolf vandeVaart	9d569f1487	Fix warning when compiling in CUDA aware code. This commit was SVN r28476.	2013-05-10 21:29:08 +00:00
Nathan Hjelm	422331b4da	btl/openib: fix unconnected datagram connection method (udcm) The primary issue with udcm is that the immediate data in message acks were often bogus. This caused the sender to keep trying even though a message was received and acked. The fix is to use the source LID and QP to determine which message is being acked. In most cases this should work well since only one message will be in flight to any peer. This commit was SVN r28444.	2013-05-03 17:11:38 +00:00
Jeff Squyres	c8258c06e2	In coll_sm, we alloc a huge chunk of shared memory, divvy it into lots of individual regions (each region is a multiple of page size in length), and each process claims its own regions by binding it to its local memory. Each process would end up membining something like 16 individual regions in the overall shmem segment. There were two errors in this code relating to the memory affinity pinning. Some combination of these two errors would lead to kernel panics (!) on my RHEL 6.2 x86_64 machines when used with mmap'ed shared memory (not posix or sysv shared memory, curiously enough): 1. The shared memory segment is initially divided into two regions: control and data. The control starts at the beginning of the shmem segment, the data starts after that. The data portion, unfortunately, was ''not'' aligned to a page. So all the multiple-of-page-size regions that we divvy up were also not alined on page boundaries. And therefore all the regions we tried to membind were not on page boundaries. The solution was to ensure that the data portion started on a page boundary. Then all of the individual regions were on page boundaries, too. That being said, in my tests, Linux mbind() fails gracefully when the address is not on a page boundary. So I'm not sure how this worked at all / led to a kernel panic... 2. There was some bad pointer math that resulted in membinding regions larger than they should have been, resulting in region overlaps. There were definitely overlaps between regions in the same process; it's likely that there were overlaps between regions of multiple processes, too -- I'm not sure (and don't care to figure out :-) ). The solution was to fix the pointer math so that each region membinds exactly only itself and no neighboring/overlapping regions. cmr:v1.7.2:reviewer=samuel This commit was SVN r28442.	2013-05-03 12:49:35 +00:00
Alex Mikheev	9e2fdc7d56	- correction of r28440 This commit was SVN r28441. The following SVN revision numbers were found above: r28440 --> open-mpi/ompi@93ce233530	2013-05-02 12:52:58 +00:00
Alex Mikheev	93ce233530	- btl_openib: changed default SRQ settings: - increase number of wqe to minimize number of RNRs - it is better to have high watermark and post relatively small number of wqes - increased TX queue size This commit was SVN r28440.	2013-05-02 12:46:35 +00:00
Alex Mikheev	f76680fbd0	- btl_openib: fix total registered memory calculation for ConnectIB and Ofed 2.0 This commit was SVN r28432.	2013-05-01 13:39:29 +00:00
Jeff Squyres	d92a8e01f8	Use the _SAFE list traversal macro so that we can remove each item from the list (just for good measure), and then free() it (without using _SAFE, we were accessing memory that was just free()'d to get to the next item). Also be a little more thorough -- DESTRUCT the list when we're all done. This commit was SVN r28429.	2013-05-01 12:26:16 +00:00
George Bosilca	8b0335380a	Fix the error messages to reference the correct function. This commit was SVN r28425.	2013-04-30 23:26:03 +00:00
George Bosilca	6a75c84fa8	Remove useless define. This commit was SVN r28424.	2013-04-30 23:24:59 +00:00
Ralph Castain	9de82aba55	Revert r28417 - given the non-standard way vprotocol is implemented, I see no way to use the framework verbosity here. Best to just leave it alone as those who use it know what they need to do to get debug output This commit was SVN r28418. The following SVN revision numbers were found above: r28417 --> open-mpi/ompi@b00de5be8b	2013-04-30 16:37:17 +00:00
Nathan Hjelm	b00de5be8b	vprotocol: remove the old output and use the framework output This commit was SVN r28417.	2013-04-30 15:21:42 +00:00
Ralph Castain	ceb4061214	Fix BTL_VERBOSE - when the MCA param change was committed, it left the base verbosity variable declared so things compiled. Sadly, the verbosity was now being set to a new variable, so debug never was output. This commit was SVN r28414.	2013-04-30 01:15:52 +00:00
Nathan Hjelm	f384263de7	btl/openib: fix typo This commit was SVN r28413.	2013-04-29 22:21:25 +00:00
Ralph Castain	5d7a93c032	Add the ability to use an external version of libevent. Clearly not recommended at this time. I've verified that it works in limited scenarios, but more thorough testing and performance impacts need to be assessed. Interesting how many includes had to be fixed here and there to fill in missing dependencies :-) This commit was SVN r28411.	2013-04-29 17:02:37 +00:00
Ralph Castain	8996ecb128	Add missing include This commit was SVN r28405.	2013-04-27 00:09:36 +00:00
Jeff Squyres	f55cea1a5b	If there are no BTLs, do ''not'' actually shut down the fd listener, because a) it may still be needed to shut down the CPCs, and b) it will be shut down during component_close(). This commit was SVN r28402.	2013-04-26 15:31:50 +00:00
Jeff Squyres	99b7a0f20d	Remove unused variables. This commit was SVN r28401.	2013-04-26 15:29:42 +00:00
Vishwanath Venkatesan	c902624b59	Using ompi_type_destroy to free ompi_datatype. This had to be updated in all the collective algorithms. Hopefully this will fix all warnings. This commit was SVN r28385.	2013-04-24 19:27:26 +00:00
Nathan Hjelm	2edff7f784	btl/openib: don't free string handle by MCA variable system This commit was SVN r28383.	2013-04-24 18:59:18 +00:00
Alex Margolin	aebd794bf6	Fixed macro definition order in MXM component headers This commit was SVN r28378.	2013-04-24 16:51:43 +00:00
Vishwanath Venkatesan	bba4a93f63	Got this wrong while replacing MPI function with OMPI functions. Fixed it now. This commit was SVN r28350.	2013-04-22 19:58:25 +00:00
Rolf vandeVaart	5e1dde419c	Fix some compile errors in CUDA-aware code that has crept in. This commit was SVN r28346.	2013-04-18 15:34:16 +00:00
Vishwanath Venkatesan	53753622d4	Changing some of the MPI_ functions to ompi_ equivalents. This commit was SVN r28342.	2013-04-17 21:06:36 +00:00
Alex Margolin	0ab7675019	Fix MXM connection establishment flow This commit was SVN r28329.	2013-04-12 16:37:42 +00:00
Steve Wise	134baaf2fa	Add Chelsio T5 device. This fixes trac:3552 and should be added to cmr:v1.6:reviewer=jsquyres and cmr:v1.7:reviewer=jsquyres This commit was SVN r28327. The following Trac tickets were found above: Ticket 3552 --> https://svn.open-mpi.org/trac/ompi/ticket/3552	2013-04-11 19:30:53 +00:00
George Bosilca	2d33c9ee39	Stop complaining about an overwritten default parameter. This commit was SVN r28322.	2013-04-10 19:44:37 +00:00
Jeff Squyres	8405975bf6	Be a little more conservative about initializing devices and modules (i.e., ensure that more data items get zeroed out/set to NULL) so that if something goes wrong during initialization, we don't try to clean up something that isn't there (and segv). The chance of this happening on the trunk is very low (and will also be low once the verbs improvements are brought over to v1.7). But it can actually happen in the v1.6 branch (e.g., if no CPC is available, we'll try to get the length of the endpoints list, but the endpoints list is NULL). Hence, even though the real goal is to get this functionality over to v1.6, I figured I'd commit to the trunk/CMR to v1.7 just to try to keep commonality in the openib between all three where possible. This commit was SVN r28317.	2013-04-09 21:55:31 +00:00
Ralph Castain	45af6cf59e	The move of the orte_db framework to opal required that we create an opaque opal_identifier_t type as OPAL cannot know anything about the ORTE process name. However, passing a value down to opal and then having the db components reference it causes alignment issues on Solaris Sparc platforms. So pass the pointer instead and do the old "memcpy" trick to avoid the problem. This commit was SVN r28308.	2013-04-08 23:34:16 +00:00
Nathan Hjelm	4e95d691a7	pml/ob1: do not reset the convertor if one was not created (size = 0). This macro is only used on the failure path so the additional if statement should not have any affect on performance. cmr:v1.7 This commit was SVN r28292.	2013-04-05 01:40:11 +00:00
Pavel Shamis	fed6e60131	Fixing OpenIB BTL compilation failure for a cases when BTL_OPENIB_MALLOC_HOOKS_ENABLED is disabled. This commit was SVN r28290.	2013-04-04 20:17:18 +00:00
Pavel Shamis	aa1f5697b4	In order to prevent name conflicts in XRC (MOFED) enabled mode OFACM's ib_address_t was renamed to ofacm_ib_address_t This commit was SVN r28289.	2013-04-04 20:02:17 +00:00
Nathan Hjelm	e8d9944456	sbgp/ibnet: fix param -> var update errors This commit was SVN r28284.	2013-04-03 20:17:18 +00:00

1 2 3 4 5 ...

4196 Коммитов