openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	c52b94af8b	Revert r28453 and r28452 - wrong fix This commit was SVN r28454. The following SVN revision numbers were found above: r28452 --> open-mpi/ompi@756ee4b5e0 r28453 --> open-mpi/ompi@6da24143a2	2013-05-06 21:52:17 +00:00
Ralph Castain	6da24143a2	Minor performance improvement This commit was SVN r28453.	2013-05-06 20:27:16 +00:00
Ralph Castain	756ee4b5e0	Update the rml_uri for each proc so debuggers can attach This commit was SVN r28452.	2013-05-06 20:18:14 +00:00
Ralph Castain	707d0e653a	Must use equal and not & comparison for mapping directives This commit was SVN r28451.	2013-05-06 15:07:12 +00:00
Ralph Castain	a0a6412545	Do a little cleanup on abnormal termination procedure - don't keep submitting forced exit events (one will do), no need to reset the abnormal termination pipe event in orterun, etc. This commit was SVN r28450.	2013-05-05 17:39:45 +00:00
Ralph Castain	12969cec81	Update orte_progress_threads configure option - no longer need to test for --enable-event-threads This commit was SVN r28449.	2013-05-05 14:48:35 +00:00
Ralph Castain	850dbe77ec	Update platform files This commit was SVN r28448.	2013-05-05 14:35:13 +00:00
Ralph Castain	ae68a953f4	Sigh - one more place This commit was SVN r28447.	2013-05-05 00:25:14 +00:00
Ralph Castain	fb2a694587	Fix print This commit was SVN r28446.	2013-05-04 22:37:34 +00:00
Ralph Castain	27e3e382d5	No need for ORTE tools to use orte progress thread This commit was SVN r28445.	2013-05-04 21:13:20 +00:00
Nathan Hjelm	422331b4da	btl/openib: fix unconnected datagram connection method (udcm) The primary issue with udcm is that the immediate data in message acks were often bogus. This caused the sender to keep trying even though a message was received and acked. The fix is to use the source LID and QP to determine which message is being acked. In most cases this should work well since only one message will be in flight to any peer. This commit was SVN r28444.	2013-05-03 17:11:38 +00:00
Ralph Castain	527ea1d090	Per the RFC, always enable libevent thread support. This commit was SVN r28443.	2013-05-03 15:39:05 +00:00
Jeff Squyres	c8258c06e2	In coll_sm, we alloc a huge chunk of shared memory, divvy it into lots of individual regions (each region is a multiple of page size in length), and each process claims its own regions by binding it to its local memory. Each process would end up membining something like 16 individual regions in the overall shmem segment. There were two errors in this code relating to the memory affinity pinning. Some combination of these two errors would lead to kernel panics (!) on my RHEL 6.2 x86_64 machines when used with mmap'ed shared memory (not posix or sysv shared memory, curiously enough): 1. The shared memory segment is initially divided into two regions: control and data. The control starts at the beginning of the shmem segment, the data starts after that. The data portion, unfortunately, was ''not'' aligned to a page. So all the multiple-of-page-size regions that we divvy up were also not alined on page boundaries. And therefore all the regions we tried to membind were not on page boundaries. The solution was to ensure that the data portion started on a page boundary. Then all of the individual regions were on page boundaries, too. That being said, in my tests, Linux mbind() fails gracefully when the address is not on a page boundary. So I'm not sure how this worked at all / led to a kernel panic... 2. There was some bad pointer math that resulted in membinding regions larger than they should have been, resulting in region overlaps. There were definitely overlaps between regions in the same process; it's likely that there were overlaps between regions of multiple processes, too -- I'm not sure (and don't care to figure out :-) ). The solution was to fix the pointer math so that each region membinds exactly only itself and no neighboring/overlapping regions. cmr:v1.7.2:reviewer=samuel This commit was SVN r28442.	2013-05-03 12:49:35 +00:00
Alex Mikheev	9e2fdc7d56	- correction of r28440 This commit was SVN r28441. The following SVN revision numbers were found above: r28440 --> open-mpi/ompi@93ce233530	2013-05-02 12:52:58 +00:00
Alex Mikheev	93ce233530	- btl_openib: changed default SRQ settings: - increase number of wqe to minimize number of RNRs - it is better to have high watermark and post relatively small number of wqes - increased TX queue size This commit was SVN r28440.	2013-05-02 12:46:35 +00:00
Jeff Squyres	52fd270663	Implement MPI-2.2 functionality of deleting attributes on MPI_COMM_SELF in reverse order during MPI_FINALIZE (well, actually, ''all'' attributes are now deleted in reverse order whenever a communicator is destructed). Also revamped a few things in the MPI attribute implementation: * use a One Big Lock philosophy for making the implementation thread safe (vs. the pair of locks we were using before). One Big Lock is quite a bit simpler and has fewer corner cases; the code for attributes is still complicated, but is definitely less complex than it used to be. * The COPY_ATTR_CALLBACKS and DELETE_ATTR_CALLBACKS macros no longer return; they simply set a value if something went wrong. Then we check this value after the macros complete. This simplifies unlocking, etc. * Added write barriers right before releasing locks to ensure memory consistency. * Fixed a bunch of typos in comments, and some indenting. Many thanks to KAWASHIMA Takahiro who contributed the original patch for attribute destruction ordering, and who helped test/debug/evolve the patch to its final form. Fixes trac:3123. cmr:v1.7.2:reviewer=bosilca This commit was SVN r28439. The following Trac tickets were found above: Ticket 3123 --> https://svn.open-mpi.org/trac/ompi/ticket/3123	2013-05-02 12:32:21 +00:00
Jeff Squyres	42a9a4c62c	After examining a '''lot''' of MTT output with Ralph, fix the cause of many, many MTT timeouts when running jobs under SLURM: send the right command at the end to cause remote orteds to shut down. This commit was SVN r28438.	2013-05-02 00:23:53 +00:00
Nathan Hjelm	4990412d0b	undo accidental commit This commit was SVN r28436.	2013-05-01 16:12:10 +00:00
Nathan Hjelm	d3727680a5	import This commit was SVN r28435.	2013-05-01 16:01:48 +00:00
Alex Mikheev	f76680fbd0	- btl_openib: fix total registered memory calculation for ConnectIB and Ofed 2.0 This commit was SVN r28432.	2013-05-01 13:39:29 +00:00
George Bosilca	2331000d63	Correctly handle the invalid status for null and inactive requests. This patch fixes trac:3475. CMR v1.6, v1.7 This commit was SVN r28431. The following Trac tickets were found above: Ticket 3475 --> https://svn.open-mpi.org/trac/ompi/ticket/3475	2013-05-01 12:55:24 +00:00
Jeff Squyres	eeb1d83c1d	Don't assign the status if MPI_STATUS_IGNORE is passed in. Thanks to Lisandro Dalcin for finding the issue. This commit was SVN r28430.	2013-05-01 12:32:58 +00:00
Jeff Squyres	d92a8e01f8	Use the _SAFE list traversal macro so that we can remove each item from the list (just for good measure), and then free() it (without using _SAFE, we were accessing memory that was just free()'d to get to the next item). Also be a little more thorough -- DESTRUCT the list when we're all done. This commit was SVN r28429.	2013-05-01 12:26:16 +00:00
George Bosilca	1169ebdff8	Indentation. This commit was SVN r28426.	2013-04-30 23:26:23 +00:00
George Bosilca	8b0335380a	Fix the error messages to reference the correct function. This commit was SVN r28425.	2013-04-30 23:26:03 +00:00
George Bosilca	6a75c84fa8	Remove useless define. This commit was SVN r28424.	2013-04-30 23:24:59 +00:00
George Bosilca	92aeefebac	The constructor and destructor are not publicly visible functions. Fix the indentation. This commit was SVN r28423.	2013-04-30 23:23:57 +00:00
Nathan Hjelm	75cc04faa6	Fix typo in check for mpi_leave_pinned vs mpi_leave_pinned_pipeline. This commit was SVN r28421.	2013-04-30 20:08:32 +00:00
Ralph Castain	9de82aba55	Revert r28417 - given the non-standard way vprotocol is implemented, I see no way to use the framework verbosity here. Best to just leave it alone as those who use it know what they need to do to get debug output This commit was SVN r28418. The following SVN revision numbers were found above: r28417 --> open-mpi/ompi@b00de5be8b	2013-04-30 16:37:17 +00:00
Nathan Hjelm	b00de5be8b	vprotocol: remove the old output and use the framework output This commit was SVN r28417.	2013-04-30 15:21:42 +00:00
Ralph Castain	ceb4061214	Fix BTL_VERBOSE - when the MCA param change was committed, it left the base verbosity variable declared so things compiled. Sadly, the verbosity was now being set to a new variable, so debug never was output. This commit was SVN r28414.	2013-04-30 01:15:52 +00:00
Nathan Hjelm	f384263de7	btl/openib: fix typo This commit was SVN r28413.	2013-04-29 22:21:25 +00:00
Ralph Castain	4c0dcb1aa2	Update ignores and remove build product This commit was SVN r28412.	2013-04-29 19:02:03 +00:00
Ralph Castain	5d7a93c032	Add the ability to use an external version of libevent. Clearly not recommended at this time. I've verified that it works in limited scenarios, but more thorough testing and performance impacts need to be assessed. Interesting how many includes had to be fixed here and there to fill in missing dependencies :-) This commit was SVN r28411.	2013-04-29 17:02:37 +00:00
Ralph Castain	3052acd968	Fix minor typo This commit was SVN r28410.	2013-04-29 17:02:11 +00:00
Ralph Castain	bd83de0b7f	Fix an obvious typo - it was set to default to true when instantiated. This commit was SVN r28407.	2013-04-27 00:12:10 +00:00
Ralph Castain	700034cda3	Update platform files This commit was SVN r28406.	2013-04-27 00:09:58 +00:00
Ralph Castain	8996ecb128	Add missing include This commit was SVN r28405.	2013-04-27 00:09:36 +00:00
Ralph Castain	3818e88365	Remove and ignore build products This commit was SVN r28404.	2013-04-27 00:07:18 +00:00
Jeff Squyres	c9c6ced1c9	Use some handy shell scripts from W Spector to s/ierr/ierror/ in the mpi module. This commit was SVN r28403.	2013-04-26 22:07:42 +00:00
Jeff Squyres	f55cea1a5b	If there are no BTLs, do ''not'' actually shut down the fd listener, because a) it may still be needed to shut down the CPCs, and b) it will be shut down during component_close(). This commit was SVN r28402.	2013-04-26 15:31:50 +00:00
Jeff Squyres	99b7a0f20d	Remove unused variables. This commit was SVN r28401.	2013-04-26 15:29:42 +00:00
Jeff Squyres	5bf9fffacd	As initially reported by Eric Chamberland in http://www.open-mpi.org/community/lists/users/2013/04/21689.php, the assert in opal_datatype_is_contiguous_memory_layout() is not always correct -- he supplied a test case where it was not valid, essentially: 1. Call MPI_Type_create_indexed_block(0, ..., &newtype) and commit newtype 1. Call MPI_Type_create_resized(newtype, 0, nonzero_value, &resized) and commit resized 1. Call MPI_File_set_view with resized This will eventually call opal_datatype_is_contiguous_memory_layout(), and the assert will fail. After some consultation with George, it was determined that the assert() is basically good, but it needs to also check for (count != 0). This commit was SVN r28398.	2013-04-25 20:54:25 +00:00
Ralph Castain	b73f25e839	Add a function to return the kernel index of the corresponding interface from an IPv4/6 string or hostname This commit was SVN r28397.	2013-04-25 19:40:34 +00:00
Ralph Castain	c081a520a3	Fix --without-hwloc This commit was SVN r28396.	2013-04-25 19:13:56 +00:00
Ralph Castain	cef639f578	Ahem....cleanup a copy/paste error in naming of these functions This commit was SVN r28395.	2013-04-25 15:21:53 +00:00
Ralph Castain	8fd3c86e06	Per Geoffroy Vallee, use the OPAL constant This commit was SVN r28394.	2013-04-25 14:18:18 +00:00
Ralph Castain	ef1b87dbf5	Update ignores This commit was SVN r28392.	2013-04-25 00:45:23 +00:00
Ralph Castain	3a354c4ea3	Cleanup the verbose output channel name This commit was SVN r28391.	2013-04-24 23:44:02 +00:00
Ralph Castain	c5e1a7dc65	fix typo This commit was SVN r28390.	2013-04-24 23:37:59 +00:00

... 3 4 5 6 7 ...

18374 Коммитов