openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	cf9e144f05	silence warnings gcc 3.4.3 on solaris 10 issues some warnings cmr=v1.8.2:reviewer=hjelmn This commit was SVN r32500.	2014-08-11 07:36:46 +00:00
Gilles Gouaillardet	b565e69b86	check-help-strings cleanup This commit was SVN r32491.	2014-08-11 03:19:57 +00:00
Ralph Castain	daeb9b6c4f	Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain. Thanks to Gilles for pointing out some of the discrepancies. This commit was SVN r32398.	2014-08-01 14:44:11 +00:00
Ralph Castain	309d75dadc	Add missing ampersand - function call required a pointer, not the name itself This commit was SVN r32357.	2014-07-30 14:48:20 +00:00
Gilles Gouaillardet	b95537376f	bcol/basesmuma: fix parameter order Ref: #4815 This commit was SVN r32353.	2014-07-30 05:38:53 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
George Bosilca	843ef1fcb0	ompi_mpi_abort had one extra argument that was never used. Clean it up. This commit was SVN r32124.	2014-07-03 00:34:44 +00:00
Ralph Castain	f3cb124e50	Revert r32082 and r32070 - the developer's conference has decided to go a different direction on the threaded progress effort. This will involve some degree of prototyping to understand the tradeoffs prior to making a final design decision, and so we'll hold off on the final change until that is completed. This commit was SVN r32089. The following SVN revision numbers were found above: r32070 --> open-mpi/ompi@12d92d0c22 r32082 --> open-mpi/ompi@aa6438ef7a	2014-06-25 20:43:28 +00:00
Ralph Castain	12d92d0c22	Per the OMPI developer conference, remove the last vestiges of OMPI_USE_PROGRESS_THREADS This commit was SVN r32070.	2014-06-24 17:05:11 +00:00
Nathan Hjelm	2614dfc4bf	bcol/basesmuma: fix remaining memory leaks in basesmuma We were still leaking 1) file descriptors for data files, and 2) some control files. I fixed both of these leaks and everything is looking good. This should fix the bug where we are running out of file descriptors when running the loop_spawn test. I also too the opportunity to refactor the code a bit to make the mapping/unmapping simpler. This should help avoid these sorts of issues in the future. Depends on #4678 cmr=v1.8.2:reviewer=manjugv This commit was SVN r31893.	2014-05-27 18:40:41 +00:00
Gilles Gouaillardet	d04db1e213	Fix mmap flags in bcol_basesmuma_smcm_reg_mmap if in_ptr is NULL, the MAP_FIXED flag cannot be passed to mmap this caused a hang in topology/cart and topology/sub from ibm test suite on trunk. cmr=v1.8.2:reviewer=hjelmn This commit was SVN r31890.	2014-05-26 07:18:31 +00:00
Nathan Hjelm	22e59b056a	bcol/basesmuma: fix leak in basesmuma code Basesmuma was vallocing space for control data then mmapping over that data. Nothing in the code suggests any need for mmapping a specific address so I did the following to remove the leak: - Removed the valloc of the buffer space - ftruncate the mmaped file to ensure there is sufficient memory to allocate space for the control data. Ideally this code should be using opal/shmem but that is a larger change. Keeping it simple for now. cmr=v1.8.2:reviewer=manjugv This commit was SVN r31822.	2014-05-19 15:21:58 +00:00
Nathan Hjelm	55f0dcb81a	Add netpatterns_cleanup_narray_knomial_tree function to cleanup after netpatterns_setup_narray_knomial_tree. Fix a bug in ptpcoll that caused memory allocated by netpatterns_setup_narray_knomial_tree to leak. cmr=v1.8.2:reviewer=manjugv This commit was SVN r31781.	2014-05-15 17:36:26 +00:00
Nathan Hjelm	ddd501c0d9	bcol/base: cleanup code and fix memory leak The items in the available bcol list were getting leaked. This commit fixes this leak. I also cleaned up the code a bit. This includes making use of the opal_argv_free function. cmr=v1.8.2:reviewer=manjugv This commit was SVN r31744.	2014-05-13 21:22:18 +00:00
Nathan Hjelm	c13c21d476	basesmuma: clean up the setup code and ensure mapped files are unmapped We were leaking file descriptors when coll/ml was in use. It turn out this was because basesmuma was failing to unmap files it had previously mapped. This commit cleans up the setup code to ensure that we only attempt to map the control files once per module and then ensures the files are unmapped when the module is released. cmr=v1.8.2:reviewer=manjugv This commit was SVN r31737.	2014-05-13 17:00:31 +00:00
Nathan Hjelm	9e3a0d7b7a	basesmuma: modify the minimum size for the large fan-in fan-out allreduce algorithm Per suggestion from Manju make sure there isn't a gap in the size ranges for the available algorithms. cmr=v1.8.2:ticket=trac:4437:reviewer=ompi-rm1.8 This commit was SVN r31728. The following Trac tickets were found above: Ticket 4437 --> https://svn.open-mpi.org/trac/ompi/ticket/4437	2014-05-13 14:56:21 +00:00
Ralph Castain	a8e2d6c3a6	The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature: top_ompi_srcdir -> OMPI_TOP_SRCDIR top_ompi_builddir -> OMPI_TOP_BUILDDIR We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers. Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon. This commit was SVN r31678.	2014-05-07 21:48:53 +00:00
Nathan Hjelm	e963869fdf	bcol/basesmuma: close mmapped file descriptor Not closing this file descriptor will cause us to leak file descriptors. It is safe to close the file after it has been mmapped. cmr=v1.8.2:reviewer=manjugv This commit was SVN r31579.	2014-04-30 22:28:08 +00:00
Nathan Hjelm	a03b11c20e	bcol/basesmuma: fix broken allgather algorithm The algorithm was failing ibm/collective/allgather and iallgather. I cleaned up the code to eliminate duplicate code paths and tracked the issue down to an error in the way extra nodes in the knomial exchange are handled. The new code is more compact and has been tested with up to 64 ranks with the ibm test suite. cmr=v1.8.1:reviewer=manjugv This commit was SVN r31419.	2014-04-16 22:43:52 +00:00
Nathan Hjelm	bd3b550c6d	coll/ml: fix leaks Thanks to ggouaillardet for finding and fixing these issues. Closes trac:4460 cmr=v1.8.1:reviewer=manjugv This commit was SVN r31264. The following Trac tickets were found above: Ticket 4460 --> https://svn.open-mpi.org/trac/ompi/ticket/4460	2014-03-27 23:25:31 +00:00
Nathan Hjelm	b3bb90cf2d	Do not include inttypes.h directly in Open MPI. Use opal_stdint.h instead. This commit should finish the work started for #869. Closing that ticket with this commit. Closes trac:869 cmr=v1.8.1:reviewer=jsquyres This commit was SVN r31257. The following Trac tickets were found above: Ticket 869 --> https://svn.open-mpi.org/trac/ompi/ticket/869	2014-03-27 17:56:00 +00:00
Nathan Hjelm	128cfe0a39	coll/ml: cleanup tabs, indentation, and trailing whitespace in bcol_basesmuma_bcast.c This commit was SVN r31192.	2014-03-21 21:54:48 +00:00
Nathan Hjelm	d241f95af1	squash into previous. fix coll ml bcast This commit was SVN r31191.	2014-03-21 21:54:41 +00:00
Nathan Hjelm	6740813c27	bcol/basesmuma: fix selection of coll/ml when only using local procs When we are only using local ranks basesmuma needs to provide an allreduce function for both large and small message or else the coll/ml selection logic will fail. In the future this logic should probably be updated to just disable allreduce in coll/ml instead of disabling coll/ml. For now it should be correct to say the basesmuma allgather works for larger messages. cmr=v1.8:reviewer=manjugv This commit was SVN r31190.	2014-03-21 21:54:35 +00:00
Nathan Hjelm	22f64bb62b	Addendum to r31096. Up basesmuma algorithm limits to 1M. After discussion with Manju we decided to update these the process count limits of the shared memory collectives to an arbitrarily large number. cmr=v1.7.5:ticket=trac:4405 This commit was SVN r31126. The following SVN revision numbers were found above: r31096 --> open-mpi/ompi@3f469d08e7 The following Trac tickets were found above: Ticket 4405 --> https://svn.open-mpi.org/trac/ompi/ticket/4405	2014-03-18 21:25:49 +00:00
Nathan Hjelm	3f469d08e7	coll/ml: increase the number of allowed processes in a local reduce and add checks to see if the bcol module can support allreduce. cmr=v1.7.5:reviewer=manjugv This commit was SVN r31096.	2014-03-17 23:10:19 +00:00
Pavel Shamis	fba1edbf14	Removing ml include from bcol_ptpcoll.h. It is not really required. This commit was SVN r31095.	2014-03-17 22:58:40 +00:00
Ralph Castain	cd72aa9b66	Per Dave's comment, bzero has portability issues and little advantage over a simple memset. So let's use the safer solution. cmr=v1.7.5:reviewer=dgoodell:subject=replace bzero with memset This commit was SVN r31055.	2014-03-12 22:55:47 +00:00
Ralph Castain	ebd8e545c0	Silence warning cmr=v1.8:reviewer=hjelmn This commit was SVN r31005.	2014-03-11 21:59:17 +00:00
Nathan Hjelm	51c5daf1b4	bcol/basesmuma: initialize module with all 0's to fix segmentation faults in module destructor. cmr=v1.7.5:reviewer=manjugv This commit was SVN r30978.	2014-03-10 20:42:47 +00:00
Nathan Hjelm	579a2d10cc	bcol/ptpcoll: initialize all pointers in the module to NULL to avoid possible problems when the module is being destructed. References #4331 cmr=v1.7.5:reviewer=manjugv This commit was SVN r30964.	2014-03-07 21:16:20 +00:00
Nathan Hjelm	cb6670b340	bcol/basesmuma: use framework output for error message and fix a rounding error. cmr=v1.7.5:reviewer=pasha This commit was SVN r30929.	2014-03-04 16:55:57 +00:00
Manjunath Gorentla Venkata	38e5a753dd	basemuma bcol : fixing warnings This commit was SVN r30784.	2014-02-20 18:30:53 +00:00
Nathan Hjelm	6dd29a05f1	basesmuma: Fix typos in r30627 cmr=v1.7.5:ticket=trac:4158 This commit was SVN r30651. The following SVN revision numbers were found above: r30627 --> open-mpi/ompi@98ad6b3d1e The following Trac tickets were found above: Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158	2014-02-10 16:15:37 +00:00
Nathan Hjelm	98ad6b3d1e	bcol/basesmuma: fix initialization on 32-bit platforms The initialization code did several allgathers on void 's using MPI_LONG_LONG_INT. This will produce the wrong result on 32-bit platforms. Instead use MPI_BYTE with count = sizeof (void ). cmr=v1.7.5:ticket=trac:4158 This commit was SVN r30627. The following Trac tickets were found above: Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158	2014-02-08 00:00:30 +00:00
Nathan Hjelm	77869c3232	bcol/basesmuma: fix several bugs in the basesmuma code Found two bugs in basesmuma: - Release all resources when tearing down the bcol module. - Allways call the allreduce in the smcm code. We do not know beforehand whether all procs have all the files mapped. cmr=v1.7.5:ticket=trac:4158 This commit was SVN r30623. The following Trac tickets were found above: Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158	2014-02-07 21:39:24 +00:00
Pavel Shamis	3a683419c5	Fixing broken dependency between ML/BCOLS This is hot-fix patch for the issue reported by Ralph. In future we plan to restructure ml data structure layout. Tested by Nathan. cmr=v1.7.5:ticket=trac:4158 This commit was SVN r30619. The following Trac tickets were found above: Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158	2014-02-07 19:15:45 +00:00
Ralph Castain	74d3393a4f	Revert r30600, r30602-30604 as the first one broke the tarball and the others couldn't fix it This commit was SVN r30605. The following SVN revision numbers were found above: r30600 --> open-mpi/ompi@7d2c4cb468 r30602 --> open-mpi/ompi@9e751a0302 r30604 --> open-mpi/ompi@3012c280cf Revision number ranges (suitable for "git log"): r30602-30604 --> open-mpi/ompi@9e751a03^..3012c280	2014-02-07 04:38:06 +00:00
Ralph Castain	3012c280cf	I surrender - this code is just too interbred with other components for me to clean up, so turn it off for now This commit was SVN r30604.	2014-02-07 04:16:21 +00:00
Ralph Castain	3954311bac	We have rules about not cross-integrating components, even across frameworks - please follow them. This commit was SVN r30603.	2014-02-07 03:46:45 +00:00
Ralph Castain	9e751a0302	You absolutely, positively cannot include a header file from a component in the base functions! This commit was SVN r30602.	2014-02-07 03:27:06 +00:00
Nathan Hjelm	12f0bf9488	basesmuma: missed a couple of MB references cmr=v1.7.5:ticket=trac:4158 This commit was SVN r30538. The following Trac tickets were found above: Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158	2014-02-03 18:19:53 +00:00
Nathan Hjelm	64321acc22	basesmuma: do not call MB directly opal does not always define MB. It is recommended that opal_atomic_[rw]mb is called instead. We will need to address the cases where these functions are no-ops on weak-memory ordered cpus. cmr=v1.7.5:ticket=trac:4158 This commit was SVN r30534. The following Trac tickets were found above: Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158	2014-02-03 17:01:57 +00:00
Nathan Hjelm	c2b061cc84	basesmuma: clean up code Several changes are contained in this commit: - Clean up tabs and trailing whitespaces - Use consistent indentation in changed files - Remove unused code. None of the removed code will ever have been used in a trunk build. - Clean up the smcm code quite a bit - Do not fflush stderr and use opal_output instead of fprintf. These changes have been tested on Cray XE-6 and PSM systems. cmr=v1.7.5:ticket=trac:4158 This commit was SVN r30533. The following Trac tickets were found above: Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158	2014-02-03 17:01:46 +00:00
Nathan Hjelm	1ae39753dc	bcol/basesmuma: check the return code of bcol_basesmuma_smcm_allgather_connection. Fixes a segmentation fault found by the bogus intercomm_create test. cmr=v1.7.4:review=manjugv This commit was SVN r30527.	2014-01-31 22:20:25 +00:00
Nathan Hjelm	afae924e29	coll/ml: fix some warnings and the spelling of indices This commit fixes one warning that should have caused coll/ml to segfault on reduce. The fix should be correct but we will continue to investigate. cmr=v1.7.5:ticket=trac:4158 This commit was SVN r30477. The following Trac tickets were found above: Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158	2014-01-29 18:44:21 +00:00
Ralph Castain	b32556e6dc	Fixes trac:4143 After IM with Nathan, apply patch from ticket after verification by Paul Hargrove that it fixes the problem on non-x86 32-bit platforms Verified by Paul, RM-approved cmr=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r30411. The following Trac tickets were found above: Ticket 4143 --> https://svn.open-mpi.org/trac/ompi/ticket/4143	2014-01-24 17:56:52 +00:00
Nathan Hjelm	2435057a57	ignore the iboffload component for now. This commit was SVN r30398.	2014-01-23 16:06:21 +00:00
Nathan Hjelm	82d996fb76	coll/ml: cleanup some merge related errors cmr=v1.7.5:ticket=trac:4158 This commit was SVN r30366. The following Trac tickets were found above: Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158	2014-01-22 16:48:09 +00:00
Nathan Hjelm	1a021b8f2d	coll/ml: add support for blocking and non-blocking allreduce, reduce, and allgather. The new collectives provide a signifigant performance increase over tuned for small and medium messages. We are initially setting the priority lower than tuned until this has had some time to soak in the trunk. Please set coll_ml_priority to 90 for MTT runs. Credit for this work goes to Manjunath Gorentla Venkata (ORNL), Pavel Shamis (ORNL), and Nathan Hjelm (LANL). Commit details (for reference): Import ORNL's collectives for MPI_Allreduce, MPI_Reduce, and MPI_Allgather. We need to take the basesmuma header into account when calculating the ptpcoll small message thresholds. Add a define to bcol.h indicating the maximum header size so we can take the header into account while not making ptpcoll dependent on information from basesmuma. This resolves an issue with allreduce where ptpcoll overwrites the header of the next buffer in the basesmuma bank. Fix reduce and make a sequential collective launcher in coll_ml_inlines.h The root calculation for reduce was wrong for any root != 0. There are four possibilities for the root: - The root is not the current process but is in the current hierarchy. In this case the root is the index of the global root as specified in the root vector. - The root is not the current process and is not in the next level of the hierarchy. In this case 0 must be the local root since this process will never communicate with the real root. - The root is not the current process but will be in next level of the hierarchy. In this case the current process must be the root. - I am the root. The root is my index. Tested with IMB which rotates the root on every call to MPI_Reduce. Consider IMB the reproducer for the issue this commit solves. Make the bcast algorithm decision an enumerated variable Resolve various asset failures when destructing coll ml requests. Two issues: - Always reset the request to be invalid before returning it to the free list. This will avoid an asset in ompi_request_t's destructor. OMPI_REQUEST_FINI does this (and also releases the fortran handle index). - Never explicitly construct or destruct the superclass of an opal object. This screws up the class function tables and will cause either an assert failure or a segmentation fault when destructing coll ml requests. Cleanup allgather. I removed the duplicate non-blocking and blocking functions and modeled the cleanup after what I found in allreduce. Also cleaned up the code somewhat. Don't bother copying from the send to the recieve buffer in bcol_basesmuma_allreduce_intra_fanin_fanout if the pointers are the same. The eliminates a warning about memcpy and aliasing and avoids an unnecessary call to memcpy. Alwasy call CHECK_AND_RELEASE on memsync collectives. There was a call to OBJ_RELEASE on the collective communicator but because CHECK_AND_RECYLCE was never called there was not matching call to OBJ_RELEASE. This caused coll ml to leak communicators. Make allreduce use the sequential collective launcher in coll_ml_inlines.h Just launch the next collective in the component progress. I am a little unsure about this patch. There appears to be some sort of race between collectives that causes buffer exhaustion in some cases (IMB Allreduce is a reproducer). Changing progress to only launch the next bcol seems to resolve the issue but might not be the best fix. Note that I see little-no performance penalty for this change. Fix allreduce when there are extra sources. There was an issue with the buffer offset calculation when there are extra sources. In the case of extra sources == 1 the offset was set to buffer_size (just past the header of the next buffer). I adjusted the buffer size to take into accoun the maximum header size (see the earlier commit that added this) and simplified the offset calculation. Make reduce/allreduce non-blocking. This is required for MPI_Comm_idup to work correctly. This has been tested with various layouts using the ibm testsuite and imb and appears to have the same performance as the old blocking version. Fix allgather for non-contiguous layouts and simplify parsing the topology. Some things in this patch: - There were several comments to the effect that level 0 of the hierarchy MUST contain all of the ranks. At least one function made this assumption but it was not true. I changed the sbgp components and the coll ml initization code to enforce this requirement. - Ensure that hierarchy level 0 has the ranks in the correct scatter gather order. This removes the need for a separate sort list and fixes the offset calculation for allgather. - There were several passes over the hierarchy to determine properties of the hierarchy. I eliminated these extra passes and the memory allocation associated with them and calculate the tree properties on the fly. The same DFS recursion also handles the re-order of level 0. All these changes have been verified with MPI_Allreduce, MPI_Reduce, and MPI_Allgather. All functions now pass all IBM/Open MPI, and IMB tests. coll/ml: correct pointer usage for MPI_BOTTOM Since contiguous datatypes are copied via memcpy (bypassing the convertor) we need to adjust for the lb of the datatype. This corrects problems found testing code that uses MPI_BOTTOM (NULL) as the send pointer. Add fallback collectives for allreduce and reduce. cmr=v1.7.5:reviewer=pasha This commit was SVN r30363.	2014-01-22 15:39:19 +00:00

1 2

89 Коммитов