openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	c8258c06e2	In coll_sm, we alloc a huge chunk of shared memory, divvy it into lots of individual regions (each region is a multiple of page size in length), and each process claims its own regions by binding it to its local memory. Each process would end up membining something like 16 individual regions in the overall shmem segment. There were two errors in this code relating to the memory affinity pinning. Some combination of these two errors would lead to kernel panics (!) on my RHEL 6.2 x86_64 machines when used with mmap'ed shared memory (not posix or sysv shared memory, curiously enough): 1. The shared memory segment is initially divided into two regions: control and data. The control starts at the beginning of the shmem segment, the data starts after that. The data portion, unfortunately, was ''not'' aligned to a page. So all the multiple-of-page-size regions that we divvy up were also not alined on page boundaries. And therefore all the regions we tried to membind were not on page boundaries. The solution was to ensure that the data portion started on a page boundary. Then all of the individual regions were on page boundaries, too. That being said, in my tests, Linux mbind() fails gracefully when the address is not on a page boundary. So I'm not sure how this worked at all / led to a kernel panic... 2. There was some bad pointer math that resulted in membinding regions larger than they should have been, resulting in region overlaps. There were definitely overlaps between regions in the same process; it's likely that there were overlaps between regions of multiple processes, too -- I'm not sure (and don't care to figure out :-) ). The solution was to fix the pointer math so that each region membinds exactly only itself and no neighboring/overlapping regions. cmr:v1.7.2:reviewer=samuel This commit was SVN r28442.	2013-05-03 12:49:35 +00:00
Nathan Hjelm	9d4a26f47d	Update OMPI frameworks to use the MCA framework system. Notes: - This commit also eliminates the need for an available components list in use in several frameworks. None of the code in question was making use of the priority field of the priority component list item so these extra lists were removed. - Cleaned up selection code in several frameworks to sort lists using opal_list_sort. - Cleans up the ompi/orte-info functions. Expose the functions that construct the list of params so they can be used elsewhere. patches for mtl/portals4 from brian missed a few output variables in openib This commit was SVN r28241.	2013-03-27 21:17:31 +00:00
Brian Barrett	f42783ae1a	Move the RTE framework change into the trunk. With this change, all non-CR runtime code goes through one of the rte, dpm, or pubsub frameworks. This commit was SVN r27934.	2013-01-27 23:25:10 +00:00
Ralph Castain	54db4c35eb	Get the trunk to build again when --without-hwloc is specified. Move a couple of key type definitions and utilities out from under the HAVE_HWLOC test so they are always available as they don't really depend on hwloc's presence. Tell two compnents not to build if hwloc is disabled: ompi/mca/sbgp/basesmsocket orte/mca/rmaps/lama Remove stale configure.params files from the sbgp framework as the OMPI build system no longer looks at those files. This commit was SVN r27377.	2012-09-26 23:24:27 +00:00
George Bosilca	63278df92d	Prevent the coll SM from looking for information about remote procs during the init phase. This information is only available at a later stage. This commit was SVN r26746.	2012-07-04 21:15:40 +00:00
Josh Hursey	28681deffa	Backout the ORCA commit. :( There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk. This commit was SVN r26676.	2012-06-27 01:28:28 +00:00
Josh Hursey	542330e3a7	Commit of ORCA: Open MPI Runtime Collaborative Abstraction This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI. The project is described on the wiki: https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition And on this email thread: http://www.open-mpi.org/community/lists/devel/2012/06/11109.php This commit was SVN r26670.	2012-06-26 21:42:16 +00:00
Jeff Squyres	2ba10c37fe	Per RFC, bring in the following changes: * Remove paffinity, maffinity, and carto frameworks -- they've been wholly replaced by hwloc. * Move ompi_mpi_init() affinity-setting/checking code down to ORTE. * Update sm, smcuda, wv, and openib components to no longer use carto. Instead, use hwloc data. There are still optimizations possible in the sm/smcuda BTLs (i.e., making multiple mpools). Also, the old carto-based code found out how many NUMA nodes were ''available'' -- not how many were used ''in this job''. The new hwloc-using code computes the same value -- it was not updated to calculate how many NUMA nodes are used ''by this job.'' * Note that I cannot compile the smcuda and wv BTLs -- I ''think'' they're right, but they need to be verified by their owners. * The openib component now does a bunch of stuff to figure out where "near" OpenFabrics devices are. '''THIS IS A CHANGE IN DEFAULT BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors (I do not have a NUMA machine with an OpenFabrics device that is a non-uniform distance from multiple different NUMA nodes). * Completely rewrite the OMPI_Affinity_str() routine from the "affinity" mpiext extension. This extension now understands hyperthreads; the output format of it has changed a bit to reflect this new information. * Bunches of minor changes around the code base to update names/types from maffinity/paffinity-based names to hwloc-based names. * Add some helper functions into the hwloc base, mainly having to do with the fact that we have the hwloc data reporting ''all'' topology information, but sometimes you really only want the (online \| available) data. This commit was SVN r26391.	2012-05-07 14:52:54 +00:00
Jeff Squyres	b2b781e537	Fix a few miscelaneous memory leaks. This commit was SVN r24865.	2011-07-08 16:39:58 +00:00
Samuel Gutierrez	81f38b258a	commit of new shared memory backing facility framework (shmem) and its components. This commit was SVN r24795.	2011-06-21 15:41:57 +00:00
Samuel Gutierrez	2fb7c344fc	Added a new System V (sysv) shared memory component for Open MPI. Configure Option: --enable-sysv MCA Parameter: mpi_common_sm mpi_common_sm accepts a comma delimited list of: [sysv],mmap (order dependent). The first component that is successfully selected is used. For example, -mca mpi_common_sm sysv,mmap will first try sysv. If sysv is not successfully selected, then mmap will be used. mmap will be used if mpi_common_sm is not provided. Notes: Please make certain that your system's shmmax limit, or equivalent, is larger than mpool_sm_min_size. Otherwise, shmget may fail. This commit was SVN r23260.	2010-06-09 16:58:52 +00:00
Jeff Squyres	181331d65e	Very minor nits/updates. This commit was SVN r22977.	2010-04-15 14:44:55 +00:00
Jeff Squyres	0f8ac9223f	Refs trac:2023, #2027 . This commit does a bunch of things: * Address all remaining code review items from CMR #2023: * Defer mmap setup to be lazy; only set it up the first time we invoke a collective. In this way, we don't penalize apps that make lots of communicators but don't invoke collectives on them (per #2027). * Remove the extra assignments of mca_coll_sm_one (fixing a convertor count setup that was the real problem). * Remove another extra/unnecessary assignment. * Increase libevent polling frequency when using the RML to bootstrap mmap'ed memory. * Fix a minor procs-related memory leak in btl_sm. * Commit a datatype fix that George and I discovered along the way to fixing the coll sm. * Improve error messages when mmap fails, potentially trying to de-alloc any allocated memory when that happens. * Fix a previously-unnoticed confusion between extent and true_extent in coll sm reduce. This commit was SVN r22049. The following Trac tickets were found above: Ticket 2023 --> https://svn.open-mpi.org/trac/ompi/ticket/2023	2009-10-02 17:13:56 +00:00
Jeff Squyres	1ef988c3d9	A slight optimization: no longer call sched_yield() when polling for shmem progress (or the Windows equiv). Instead, poll hard on the condition, but periocially call opal_progress(). This allows badly-formed apps (e.g., the ibm test communicator/bsend_free) to actually complete. To be clear, there are far too many apps out there that assume that MPI collectives will actually progress the rest of MPI. I don't like putting in a feature to enable broken apps, but I have a dim recollection of this issue coming up before (apps "hanging" when testing the sm coll because they assumed that calling collectives would trigger other MPI progress). Rather than have people claim that OMPI is broken, I prefer to put in this "workaround". :-( Indeed, the bsend_free test ''may'' be coded that way for exactly that reason...? I don't remember offhand... This commit was SVN r21984.	2009-09-21 22:20:44 +00:00
Jeff Squyres	533633b8cb	Fixes trac:1988. The little bug that turned out to be huge. Yoinks. * Various cosmetic/style updates in the btl sm * Clean up concept of mpool module (I think that code was written way back when the concept of "modules" was fuzzy) * Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to fix potential segv's when mmap'ed regions were at different addresses in different processes (thanks Tim!). * Change sm coll to no longer use mpool as its main source of shmem; rather, just mmap its own segment (because it's fixed size -- there was nothing to be gained by using mpool; shedding the use of mpool saved a lot of complexity in the sm coll setup). This effectively made Tim's fixes moot (because now everything is an offset into the mmap that is computed locally; there are no global pointers). :-) * Slightly updated common/sm to allow making mmap's for a specific set of procs (vs. ''all'' procs in the process). This potentially allows for same-host-inter-proc mmaps -- yay! * Fixed many, many things in the coll sm (particularly in reduce): * Fixed handling of MPI_IN_PLACE in reduce and allreduce * Fixed handling of non-contiguous datatypes in reduce * Changed the order of reductions to go from process (n-1)'s data to process 0's data, because that's how all other OMPI coll components work * Fixed lots of usage of ddt functions * When using a non-contiguous datatype, if the root process is not (n-1), now we used a 2nd convertor to copy from shmem to the rbuf (saves a memory copy vs. what was done before) * Lots and lots of little cleanups, clarifications, and minor optimizations (although still more could be done -- e.g., I think the use of write memory barriers is fairly sub-optimal; they could be ganged together at the root, for example) I'm marking this as "fixes trac:1988" and closing the ticket; if something is still broken, we can re-open the ticket. This commit was SVN r21967. The following Trac tickets were found above: Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988	2009-09-15 00:25:21 +00:00
Rainer Keller	225a1d6d8e	- For memcpy and memset need string.h This commit was SVN r21259.	2009-05-21 22:36:06 +00:00
Rainer Keller	221fb9dbca	... Delayed due to notifier commits earlier this day ... - Delete unnecessary header files using contrib/check_unnecessary_headers.sh after applying patches, that include headers, being "lost" due to inclusion in one of the now deleted headers... In total 817 files are touched. In ompi/mpi/c/ header files are moved up into the actual c-file, where necessary (these are the only additional #include), otherwise it is only deletions of #include (apart from the above additions required due to notifier...) - To get different MCAs (OpenIB, TM, ALPS), an earlier version was successfully compiled (yesterday) on: Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled This commit was SVN r21096.	2009-04-29 01:32:14 +00:00
Rainer Keller	ec0ed48718	- Revert r20739 This commit was SVN r20742. The following SVN revision numbers were found above: r20739 --> open-mpi/ompi@781caee0b6	2009-03-05 21:56:03 +00:00
Rainer Keller	781caee0b6	- First of two or three patches, in orte/util/proc_info.h: Adapt orte_process_info to orte_proc_info, and change orte_proc_info() to orte_proc_info_init(). - Compiled on linux-x86-64 - Discussed with Ralph This commit was SVN r20739.	2009-03-05 20:36:44 +00:00
Tim Mattox	9b83df22ec	Fix some "is proc on local node?" logic that got accidentally flipped by r20496 for the sm BTL, openib BTL on iWarp, and the sm & sm2 coll modules. This commit was SVN r20515. The following SVN revision numbers were found above: r20496 --> open-mpi/ompi@4cdf91a8d4	2009-02-11 15:02:38 +00:00
Ralph Castain	4cdf91a8d4	Per the RFC, extend the current use of the ompi_proc_t flags field (without changing the field itself). The prior ompi_proc_t structure had a uint8_t flag field in it, where only one bit was used to flag that a proc was "local". In that context, "local" was constrained to mean "local to this node". This commit provides a greater degree of granularity on the term "local", to include tests to see if the proc is on the same socket, PC board, node, switch, CU (computing unit), and cluster. Add #define's to designate which bits stand for which local condition. This was added to the OPAL layer to avoid conflicting with the proposed movement of the BTLs. To make it easier to use, a set of macros have been defined - e.g., OPAL_PROC_ON_LOCAL_SOCKET - that test the specific bit. These can be used in the code base to clearly indicate which sense of locality is being considered. All locations in the code base that looked at the current proc_t field have been changed to use the new macros. Also modify the orte_ess modules so that each returns a uint8_t (to match the ompi_proc_t field) that contains a complete description of the locality of this proc. Obviously, not all environments will be capable of providing such detailed info. Thus, getting a "false" from a test for "on_local_socket" may simply indicate a lack of knowledge. This commit was SVN r20496.	2009-02-10 02:20:16 +00:00
George Bosilca	a6e3a47102	Fix typo. This commit was SVN r19312.	2008-08-17 20:08:38 +00:00
Jeff Squyres	0af7ac53f2	Fixes trac:1392, #1400 * add "register" function to mca_base_component_t * converted coll:basic and paffinity:linux and paffinity:solaris to use this function * we'll convert the rest over time (I'll file a ticket once all this is committed) * add 32 bytes of "reserved" space to the end of mca_base_component_t and mca_base_component_data_2_0_0_t to make future upgrades [slightly] easier * new mca_base_component_t size: 196 bytes * new mca_base_component_data_2_0_0_t size: 36 bytes * MCA base version bumped to v2.0 * '''We now refuse to load components that are not MCA v2.0.x''' * all MCA frameworks versions bumped to v2.0 * be a little more explicit about version numbers in the MCA base * add big comment in mca.h about versioning philosophy This commit was SVN r19073. The following Trac tickets were found above: Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392	2008-07-28 22:40:57 +00:00
Rolf vandeVaart	18879285c7	Fix the selection logic to prevent memory leaks. More work may be done in the priority logic but for now we just fix the leaks and preserve current behavior. This commit fixes trac:1307. This commit was SVN r18504. The following Trac tickets were found above: Ticket 1307 --> https://svn.open-mpi.org/trac/ompi/ticket/1307	2008-05-27 14:16:39 +00:00
Tim Mattox	0215474cb8	Fix two bugs in coll_sm_module.c from bit-rot: Fixed a selection bug, and removed a bogus "free(proc)" call which ultimately caused MPI_Finalize to crash. This commit was SVN r18235.	2008-04-22 18:41:21 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
George Bosilca	c755938eb0	Coverty: release the temporary buffer on error. This commit was SVN r16104.	2007-09-12 17:45:12 +00:00
Brian Barrett	af4e86c25f	Update collectives selection logic to allow for multiple components to be used at nce (up to one unique collective module per collective function). Matches r15795:15921 of the tmp/bwb-coll-select branch This commit was SVN r15924. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15795 r15921	2007-08-19 03:37:49 +00:00
Mohamad Chaarawi	59a7bf8a9f	Merging in the Sparse Groups.. This commit includes config changes.. This commit was SVN r15764.	2007-08-04 00:41:26 +00:00
Sven Stork	855434de59	- fixes several coverty issues - add missing initialisation for variables - use strncpy instead of strcpy This commit was SVN r15683.	2007-07-30 14:44:37 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
Ralph Castain	6d6cebb4a7	Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it. I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn). This commit was SVN r12597.	2006-11-14 19:34:59 +00:00
George Bosilca	126a68dc9a	Big datatype commit. Remove all unused features of the datatype engine. As the memory allocation logic is completely done outside the data-type engine (in the PML) there is no need for any special case inside the data-type engine. There is less arguments for the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is not required anymore as there is no memory allocated in the engine itself). This change affect all components using datatypes. I test most of them, but it might happens that I miss some ... If it's the case please let me know (don't shoot the pianist!!). This commit was SVN r12331.	2006-10-26 23:11:26 +00:00
George Bosilca	3f0a7cad9e	The last patch for Windows support. Mostly casting and conversion to C++ friendly headers. This commit was SVN r11400.	2006-08-24 16:38:08 +00:00
Brian Barrett	4c101c6394	* rename the collectives sm bootstrap area to be consistent with other shared memory segments * make sure to properly unlink the collectives sm bootstrap area at shutdown * Add missing / in the path for the mpool shared memory segment * make sure to release the common_mmap structure in the SM btl after unlinking the file during shutdown This commit was SVN r10886.	2006-07-19 20:55:29 +00:00
George Bosilca	29219ee57d	Thanks to Gleb now we are able to call the schduler on Windows. Instead of using sched_yield, we use our friend SwitchToThread. This commit was SVN r9671.	2006-04-20 19:56:50 +00:00
George Bosilca	479d510eaf	Use the common SM component to unmap the shared memory file. This commit was SVN r8623.	2005-12-31 15:07:48 +00:00
George Bosilca	6f45b6175a	Header protection. This commit was SVN r8441.	2005-12-10 22:11:10 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
Jeff Squyres	b22fab2826	Fix for a bug Galen noticed yesterday -- make the shared memory only be allocated the first time a sm coll is selected for a communicator, not before. This commit was SVN r7647.	2005-10-06 13:17:27 +00:00
Jeff Squyres	b17c4334c4	- Remove all vestigates of using the built-in mcb_tree from the reduce_inorder() function -- we don't use the tree at all. - Add more relevant "volatile"'s for the control buffers in the fragment mpool (and associated casts where necessary) This commit was SVN r7616.	2005-10-04 14:52:59 +00:00
Jeff Squyres	c7fe54ba44	- Remove some silly compiler warnings - Move the "process 0" logic out of the main loop in reduce to make the code a bit less complex (at the price of slight code duplication, but it iss now significantly easier to read) - Fix problem with uniquenes guarantee in the bootstrap mpool -- using the CID alone was not sufficient enough to guarantee uniquenes; now use (CID, rank 0 process name) tuple to check for uniqueness - Made a few debugging help changes in coll_sm.h; especially helps debugging on uniprocessors This commit was SVN r7599.	2005-10-03 21:34:58 +00:00
Jeff Squyres	934caaf449	Fix at least one segv; use the right number of segments (i.e., the number o segments in the fragment pool, not in the bootstrap pool) This commit was SVN r7565.	2005-09-30 18:01:15 +00:00
Jeff Squyres	068b9c72a2	Bunches of changes - remove redundant OBJ_CONSTRUCT in bcast - fix up some macros in coll_sm.h - check to ensure that if there are too many processes in the communicator (i.e., if we couldn't fit a flag for each of them in the control segment), then fail selection - setup the in_use flags properly - adapt to new mpool API - first working copy of reduce -- not tree-baed (but still NUMA-aware), and only processes in order from process 0 to process N-1 -- do not have a tree-based and/or commutative version yet (i.e., process the results in whatever order they arrive) Reduce now passes the new ibm reduce_big.c test. Woo hoo! Time to declare success for the evening (and run the intel test tomorrow). This commit was SVN r7379.	2005-09-15 02:18:16 +00:00
Jeff Squyres	9302f924ea	simplify the bcast code by taking abstract actions and making them macros -- will help with the other algorithms This commit was SVN r7214.	2005-09-07 13:33:43 +00:00
Jeff Squyres	7bab4ed269	bunches of updates - finally added "in use" flags -- one flag protects a set of segments - these flags now used in bcast to protect (for example) when a message is so long that the root loops around the segments and has to re-use old segments -- now it knows that it has to wait until the non-root processes have finished with that set of segments before it can start using them - implement allreduce as a reduce followed by a bcast (per discussion with rich) - removed some redundant data on various data structures - implemented query MCA param ("coll_sm_shared_mem_used_data") that tells you how much shared memory will be used for a given set of MCA params (e.g., number of segments, etc.). For example: ompi_info --mca coll_sm_info_num_procs 4 --param coll sm \| \ grep shared_mem_used_data tells you that for the default MCA param values (as of r7172), for 4 processes, sm will use 548864 bytes of shared memory for its data transfer section - remove a bunch of .c files from the Makefile.am that aren't implemented yet (i.e., all they do is return ERR_NOT_IMPLEMENTED) Now on to the big Altix to test that this stuff really works... This commit was SVN r7205. The following SVN revision numbers were found above: r7172 --> open-mpi/ompi@bc72a7722b	2005-09-06 21:41:55 +00:00
Jeff Squyres	bc72a7722b	Updates: - bcast now works properly for root!=0 and multi-fragment messages - destroy mpool when communicator is destroyed Still need to implement: - "in use" flags for groups of fragments so that "wrapping around" in the data segment doesn't overwrite not-yet-read data - ensure that shared memory isn't removed before all processes have finished with it (e.g., during COMM_FREE) This commit was SVN r7172.	2005-09-03 11:49:46 +00:00
Jeff Squyres	6ef4805729	Tree-based barrier and broadcast seem to be working. Now on to reduce / allreduce... This commit was SVN r7149.	2005-09-02 12:57:47 +00:00
Jeff Squyres	ea45b150b6	Now pre-compute some things rather than compute them during every barrier This commit was SVN r6988.	2005-08-23 22:02:28 +00:00
Jeff Squyres	31065f1cc0	First cut of sm coll component infrastrcutre (this is what took so much time) and somewhat-lame implementation of barrier (need to precompute some more stuff rather than calculate it every time). Checkpointing so I can try this on another machine... This commit was SVN r6985.	2005-08-23 21:22:00 +00:00

1 2

53 Коммитов