openmpi

Автор	SHA1	Сообщение	Дата
Lenny Verkhovsky	a4ae241769	replaced __FUNCTION__ with __func__ This commit was SVN r21956.	2009-09-09 12:02:45 +00:00
Lenny Verkhovsky	130d15384f	fixed error message. thanks to Arthur Huillet This commit was SVN r21952.	2009-09-08 15:36:37 +00:00
Brian Barrett	468bb42f83	Per discussion in ticket #2009 , temporarily disable the block CID allocation algorithms until they properly reuse CIDs. This commit was SVN r21879.	2009-08-25 15:13:31 +00:00
Rainer Keller	5a4c6434e7	- Single request, so use single wait Technically we should have used MPI_STATUSES_IGNORE, anyhow. (however both MPI_STATUS_IGNORE are ((MPI_Status)0) ;-) This commit was SVN r21757.	2009-08-04 14:51:15 +00:00
Ralph Castain	3561880546	Silence compiler warning about comparing signed and unsigned values This commit was SVN r21637.	2009-07-11 18:36:43 +00:00
Edgar Gabriel	b6f292f794	add a uint8_t to the startup modex which allows us to recognize whether different processes have requested different levels of thread support. This verification is restricted to MPI_COMM_WORLD. In case one ore more processes have requested support for MPI_THREAD_MULTIPLE, the cid selection algorithm will fall back to the original, thread safe approach. Else, it uses the block-algorithm. For dynamic communicators, we always fall back now to the original algorithm. This has been tested for homogeneous and heterogeneous settings for MCW. However, I could not test yet the dynamic comm scenario for technical reasons, and that's why I don't close yet ticket 1949. This commit was SVN r21613.	2009-07-07 18:32:14 +00:00
Rainer Keller	b572dc3591	- As discussed revert r21330, Fortran-configure info should not end up in OPAL - Will post an updated patch for the OMPI_ALIGNMENT_ parts (within C). This commit was SVN r21342. The following SVN revision numbers were found above: r21330 --> open-mpi/ompi@95596d1814	2009-06-01 19:02:34 +00:00
Rainer Keller	95596d1814	- Move alignment and size output generated by configure-tests into the OPAL namespace, eliminating cases like opal/util/arch.c testing for ompi_fortran_logical_t. As this is processor- and compiler-related information (e.g. does the compiler/architecture support REAL*16) this should have been on the OPAL layer. - Unifies f77 code using MPI_Flogical instead of opal_fortran_logical_t - Tested locally (Linux/x86-64) with mpich and intel testsuite but would like to get this week-ends MTT output - PLEASE NOTE: configure-internal macro-names and ompi_cv_ variables have not been changed, so that external platform (not in contrib/) files still work. This commit was SVN r21330.	2009-05-30 15:54:29 +00:00
Edgar Gabriel	d93def71ea	second part of the 'running out of cids problem', this time focusing on what happens when hierarch is used. . Two major items: - modify the comm_activate step to take an additional argument, indicating whether the new communicatio has to go through the collective selection step. This is not required sometimes (e.g. when a process calls MPI_COMM_SPLIT with color=MPI_UNDEFINED), and contributed significantly to the exhaustion of cids. - when freeing a communicator, check whether we can reuse the block of cids assigned to that comm. This only works if the current front of the cid assignment (cid_block_start) is right ater the block of cids assigned to this comm. Fixes trac:1904 Fixes trac:1926 This commit was SVN r21296. The following Trac tickets were found above: Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904 Ticket 1926 --> https://svn.open-mpi.org/trac/ompi/ticket/1926	2009-05-27 15:21:07 +00:00
Greg Koenig	60485ff95f	This is a very large change to rename several #define values from OMPI_* to OPAL_*. This allows opal layer to be used more independent from the whole of ompi. NOTE: 9 "svn mv" operations immediately follow this commit. This commit was SVN r21180.	2009-05-06 20:11:28 +00:00
Shiqing Fan	387ee0ad29	fix a type cast This commit was SVN r21161.	2009-05-05 13:51:02 +00:00
Edgar Gabriel	338b136c28	adding a feature which tries to reuse a block of cids assigned to a communicator. This works, if all processes agree that all communicators utilizing the cids in the block have been freed. If they don't, they assign a new block of cid's. This fixes the application scenario reported in the week, in fact the test succefully creates 100,000 communicators without exceeding a cid of 20. The fix also keeps the main property of the algorithm (namely using a single Allreduce operation to get a new block) and did not modify the communicator structure. This commit was SVN r21142.	2009-05-02 18:03:57 +00:00
Brian Barrett	77cf736f48	Make max_contextid field match same type as cid in communicator. Refs trac:1904 This commit was SVN r21141. The following Trac tickets were found above: Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904	2009-05-01 21:11:59 +00:00
Rainer Keller	221fb9dbca	... Delayed due to notifier commits earlier this day ... - Delete unnecessary header files using contrib/check_unnecessary_headers.sh after applying patches, that include headers, being "lost" due to inclusion in one of the now deleted headers... In total 817 files are touched. In ompi/mpi/c/ header files are moved up into the actual c-file, where necessary (these are the only additional #include), otherwise it is only deletions of #include (apart from the above additions required due to notifier...) - To get different MCAs (OpenIB, TM, ALPS), an earlier version was successfully compiled (yesterday) on: Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled This commit was SVN r21096.	2009-04-29 01:32:14 +00:00
George Bosilca	b7c1ae4f76	Nothing important, just an identation. This commit was SVN r20919.	2009-04-01 15:27:16 +00:00
Rainer Keller	6f808d9b05	Preparation work for another commit (after RFC): - This patch solely _adds_ required headers and is rather localized The next patch (after RFC) heavily removes headers (based on script) - ompi/communicator/communicator.h: For sources that use ompi_mpi_comm_world, don't require them to include "mpi.h" - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs #include "ompi/mca/topo/topo.h" - ompi/errhandler/errhandler_predefined.h: ompi/communicator/communicator.h depends on this header file! To prevent recursion just have fwd declarations. #include "ompi/types.h" for fwd declarations of the main structs. - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and ompi_rb_tree_t, so have the proper classes - ompi/mca/op/op.h: Op is pretty self-contained: Nobody up to now has done #include "opal/class/opal_object.h" - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h: #include "opal/types.h" for ompi_ptr_t - ompi/mca/pml/base/base.h: We use opal_lists - ompi/mca/pml/dr/pml_dr_vfrag.h: #include "opal/types.h" for ompi_ptr_t - ompi/mca/pml/ob1/pml_ob1_hdr.h: #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t - opal/dss/dss_unpack.c: #include "opal/types.h" - opal/mca/base/base.h: #include "opal/util/cmd_line.h" for opal_cmd_line_t - orte/mca/oob/tcp/oob_tcp.c: #include "opal/types.h" for opal_socklen_t - orte/mca/oob/tcp/oob_tcp.h: #include "opal/threads/threads.h" for opal_thread_t - orte/mca/oob/tcp/oob_tcp_msg.c: #include "opal/types.h" - orte/mca/oob/tcp/oob_tcp_peer.c: #include "opal/types.h" for opal_socklen_t - orte/mca/oob/tcp/oob_tcp_send.c: #include "opal/types.h" - orte/mca/plm/base/plm_base_proxy.c: #include "orte/util/name_fns.h" for ORTE_NAME_PRINT - orte/mca/rml/base/rml_base_receive.c: #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE - orte/mca/rml/oob/rml_oob_recv.c: #include "opal/types.h" for ompi_iov_base_ptr_t - orte/mca/rml/oob/rml_oob_send.c: #include "opal/types.h" for ompi_iov_base_ptr_t - orte/runtime/orte_data_server.c #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE - orte/runtime/orte_globals.h: #include "orte/util/name_fns.h" for ORTE_NAME_PRINT Tested on Linux/x86-64 This commit was SVN r20817.	2009-03-17 21:34:30 +00:00
Terry Dontje	9215100ac4	Increase communicator padding to accomodate ppc larger lock structures. This commit was SVN r20728.	2009-03-04 19:54:58 +00:00
Rainer Keller	fd28b392bf	- An intrusive commit yet again (sorry): with the separation we get bitten by header depending on having already included the corresponding [opal\|orte\|ompi]_config.h header. When separating, things like [OPAL\|ORTE\|OMPI]_DECLSPEC are missed. Script to add the corresponding header in front of all following (taking care of possible #ifdef HAVE_...) - Including some minor cleanups to - ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H - ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H - ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for orte/util/output.h - ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h - ompi/mca/btl/btl.h -- reorder to fit - ompi/mca/bml/bml.h -- reorder to fit - ompi/runtime/ompi_mpi_finalize.c -- reorder to fit - ompi/request/request.h -- additionally need ompi/constants.h - Tested on linux/x86-64 This commit was SVN r20720.	2009-03-04 15:35:54 +00:00
Terry Dontje	0178b6c45f	Added padding to predefined handle structures to maintain library version to version compatibility. This commit was SVN r20627.	2009-02-24 17:17:33 +00:00
Rainer Keller	d81443cc5a	- On the way to get the BTLs split out and lessen dependency on orte: Often, orte/util/show_help.h is included, although no functionality is required -- instead, most often opal_output.h, or orte/mca/rml/rml_types.h Please see orte_show_help_replacement.sh commited next. - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration actually showed two missing #include "orte/util/show_help.h" in orte/mca/odls/base/odls_base_default_fns.c and in orte/tools/orte-top/orte-top.c Manually added these. Let's have MTT the last word. This commit was SVN r20557.	2009-02-14 02:26:12 +00:00
Jeff Squyres	d1c6f3f89a	* Fix a truckload of Cisco copyrights to be the same as the rest of the code base. * Fix a few misspellings in other copyrights. This commit was SVN r20241.	2009-01-11 02:30:00 +00:00
Edgar Gabriel	b05393b363	now that we have the unexpected message queue for unknown communicators, there is no need to have this additional synchronization operation for multi-threaded communicator creations. This commit was SVN r20046.	2008-12-02 16:30:15 +00:00
George Bosilca	82d1d5d785	The patch for "Unexpected message queue for unknown CID's required" ticket #1460 . I'm unable to split it in two parts, my patch and Edgar's one. So I just update copyright information for both of us. What this patch do: - it use the unexpected queue create by commit r19562 to dispatch the unexpected message to the right communicator (once this communicator is created and initialized). - delay the PML comm_add until we have the context_id for the new communicator. - only do the PML comm_add on processes that really belong to the new communicator. Please read the lengthy comment in the source code for the reason behind this. This commit was SVN r19929. The following SVN revision numbers were found above: r19562 --> open-mpi/ompi@acd3406aa7	2008-11-04 21:58:06 +00:00
Jeff Squyres	0ae2c27d3b	Ensure that the mutex is properly constructed/destructed. This commit was SVN r19527.	2008-09-09 12:57:45 +00:00
George Bosilca	bf25b3339d	Minor cleanup. This commit was SVN r19464.	2008-08-31 21:03:39 +00:00
Jeff Squyres	008fa8c5cc	Fixes trac:1236, #1237 . * Various changes to enable 0-dimensional cartesian communicators: * Set various mtc_* members to NULL when there are 0 dimensions (and don't bother trying to memcpy these arrays when duplicating the communicator -- because they're NULL) * adjust topo_base_cart_sub to correctly handle 0 dimensions (simplified it a bit) * adjust a few error codes to return ERR_OUT_OF_RESOURCE * adjust error checking of CART_CREATE, CART_RANK * Allow MPI_GRAPH_CREATE to accept 0 == nnodes. * Bump reported MPI version in mpi.h to 2.1 This commit was SVN r19461. The following Trac tickets were found above: Ticket 1236 --> https://svn.open-mpi.org/trac/ompi/ticket/1236	2008-08-31 19:31:10 +00:00
Rainer Keller	9cc83d7414	- Fix the freeing of already allocated buffers, if one fails. Fixes Coverity CID 291 & CID 292 - Adjust the rc for other functions as well. This commit was SVN r19232.	2008-08-11 09:43:01 +00:00
Jeff Squyres	765749209f	Fix CID 413: possible uninitialized variable This commit was SVN r19176.	2008-08-06 12:25:56 +00:00
Edgar Gabriel	1adb3a6cda	Fixes trac:1408 The optimization that was introduced a year ago for saving a collective synchronization step for certain communicator creation functions has to be disabled for now. The bug has been exposed by the hierarch module, but could appear as well for inter-communicator creations. The problem is, that within a communicator creation step we invoke a comm_dup (for intercomm_create) or other collective operations (in case of hierarch) before all processes have been synchronized. This lead to the "Dropped message for non-existant communicators" error. This commit disables the optimization without removing it from the code base. In theory, it can be enabled again as soon as we have the unexpected message queues for unknown cid's, which were required if I remember right anyway for the multi-threaded scenarios and potentially for fault tolerance. Before moving the patch to 1.3 I would like to let it soak for a couple of days on trunk. Please note, taht my 2nd comment on ticket #1408 was semi-correct, since the order of activation of the communicator and quering the collective module have already been changed earlier. This commit was SVN r19139. The following Trac tickets were found above: Ticket 1408 --> https://svn.open-mpi.org/trac/ompi/ticket/1408	2008-08-04 14:55:09 +00:00
Jeff Squyres	0af7ac53f2	Fixes trac:1392, #1400 * add "register" function to mca_base_component_t * converted coll:basic and paffinity:linux and paffinity:solaris to use this function * we'll convert the rest over time (I'll file a ticket once all this is committed) * add 32 bytes of "reserved" space to the end of mca_base_component_t and mca_base_component_data_2_0_0_t to make future upgrades [slightly] easier * new mca_base_component_t size: 196 bytes * new mca_base_component_data_2_0_0_t size: 36 bytes * MCA base version bumped to v2.0 * '''We now refuse to load components that are not MCA v2.0.x''' * all MCA frameworks versions bumped to v2.0 * be a little more explicit about version numbers in the MCA base * add big comment in mca.h about versioning philosophy This commit was SVN r19073. The following Trac tickets were found above: Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392	2008-07-28 22:40:57 +00:00
Jeff Squyres	90576a435b	Fixes trac:1345 The issue is that the field mca_topo_base_comm_t->mtc_periods_or_edges has a different length, depending on whether the communicator is a graph or a cart. One of the comm dup functions always assumed that it was the length required by graph comms, which could lead to badness in some cases. This commit makes the legnth of that field on a comm dup be the proper length and copies the data over appropriately. I also changed the syntax of the ompi_comm_copy_topo() function to use shorter pointer notation; it made the code much easier to read and fix. This commit was SVN r18752. The following Trac tickets were found above: Ticket 1345 --> https://svn.open-mpi.org/trac/ompi/ticket/1345	2008-06-26 16:59:31 +00:00
Ralph Castain	0532d799d6	Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm. Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed. This commit was SVN r18664.	2008-06-18 03:15:56 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
George Bosilca	a00ca20446	More cleanups. This commit was SVN r18069.	2008-04-02 06:38:33 +00:00
Jeff Squyres	597266fdec	Present state of MPI debugger work: * New/improved bootstrapping technique for DLLs * First cut of the MPI handle debugging interface. It is still evolving, but the interface is getting more stable. * Some minor bugs were fixed in the unity topo component (brought to light because of the new MPI handle debugging stuff). Fixes trac:1209. This commit was SVN r17730. The following Trac tickets were found above: Ticket 1209 --> https://svn.open-mpi.org/trac/ompi/ticket/1209	2008-03-05 12:22:34 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Shiqing Fan	54c7b71cfd	Use the correct way of including memchecker.h, which will work with '--with-devel-headers'. This commit was SVN r17435.	2008-02-12 18:01:17 +00:00
Shiqing Fan	f5792bbda5	merging the memchecker into trunk. This commit was SVN r17424.	2008-02-12 08:46:27 +00:00
George Bosilca	906e8bf1d1	Replace the ompi_pointer_array with opal_pointer_array. The next step (sometimes after the merge with the ORTE branch), the opal_pointer_array will became the only pointer_array implementation (the orte_pointer_array will be removed). This commit was SVN r17007.	2007-12-21 06:02:00 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Tim Prins	34966edaf1	remove unneeded and never-initialized lock. The orte_ns.assign_tag function does all the locking we need for us. This commit was SVN r16299.	2007-10-02 14:22:29 +00:00
Tim Prins	1d1d0f6d4c	Fix segfault when user provides a working directory for comm_spawn. Thanks to Murat Knecht for reporting this and suggesting a fix. This commit was SVN r16266.	2007-09-27 23:30:40 +00:00
Tim Prins	4033a40e4e	Coding standards... This commit was SVN r16118.	2007-09-13 14:00:59 +00:00
George Bosilca	2e46809995	Only release the comm_reg is we have one. This commit was SVN r16093.	2007-09-11 17:59:40 +00:00
Gleb Natapov	e82a6eec27	Restore check for lowest id. It prevents livelock situation if multiple threads are inside the function and they failed to obtain new cid the first time around. This commit was SVN r16090.	2007-09-11 15:32:46 +00:00
Gleb Natapov	58a018c16d	The code tries to prevent itself from running for more then one communicator simultaneously, but is doing it incorrectly. If the function is running already for one communicator and it is called from another thread for other communicator with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() will fail and the function will be executed for two different communicators by two threads simultaneously. There is nothing in the algorithm that prevent it from been running simultaneously for different communicators as far as I can see, but ompi_comm_unregister_cid() assumes that it is always called for a communicator with the lowest cid and this is not always the case. This patch removes bogus lowest cid check and fix ompi_comm_register_cid() to properly remove cid from the list. This commit was SVN r16088.	2007-09-11 13:23:46 +00:00
Shiqing Fan	b1250eba3a	- Some more to be exported. This commit was SVN r16023.	2007-08-30 15:13:08 +00:00
Jeff Squyres	18db56e270	Fix Coverity defect 675: possible NULL dereference in an error condition. This commit was SVN r15957.	2007-08-25 12:18:55 +00:00
Rainer Keller	b385f8a790	- ompi_comm_set(): PML add_comm may return something != OMPI_SUCCESS Use OMPI_SUCCESS throughout. - ompi_comm_allocate(): Initialize new_comm=NULL to get rid of warnings. This commit was SVN r15948.	2007-08-23 07:40:40 +00:00
Brian Barrett	af4e86c25f	Update collectives selection logic to allow for multiple components to be used at nce (up to one unique collective module per collective function). Matches r15795:15921 of the tmp/bwb-coll-select branch This commit was SVN r15924. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15795 r15921	2007-08-19 03:37:49 +00:00
Edgar Gabriel	0684002812	fixes: 1127 fix some of the multi-threading problems for the cid allocation. Two bugs specifically: - since we do not have a queue for incoming fragments of unknown cid, we need to synchronize all processes before exiting the communicator creation. This synchronization was/is located in comm_activate, which was however too late for the multi-threaded case. Thus, for multi-threaded scenarios we are now synchronizing 'before' we allow another thread to enter the cid-allocation loop. - for synchronization, we used for the sake of simplicity allreduce operations. It turns out, that these operations interefered with the allreductions in the cid-allocation routine, which lead to non-sense results in the cid-allocation and potentially to endless loops. Multi-threaded communicator creation seems to work now, is however still 'very very' slow. I think, the busy wait of threads is killing the performance of the active threads in the cid allocation. But this is another topic. This commit was SVN r15910.	2007-08-17 16:15:26 +00:00
Tim Prins	5a795128af	Change it so that different components in orte use unique rml tags This commit was SVN r15881.	2007-08-16 14:02:35 +00:00
Mohamad Chaarawi	59a7bf8a9f	Merging in the Sparse Groups.. This commit includes config changes.. This commit was SVN r15764.	2007-08-04 00:41:26 +00:00
Sven Stork	6c8d921a76	- coverity found dead code, but it's a typo This commit was SVN r15686.	2007-07-30 15:41:41 +00:00
Brian Barrett	5b9fa7e998	reapply r15517 and r15520, which were removed in r15527 so that I could get the RML/OOB merge in slightly easier This commit was SVN r15530. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95 r15520 --> open-mpi/ompi@9cbc9df1b8 r15527 --> open-mpi/ompi@2d17dd9516	2007-07-20 02:34:29 +00:00
Brian Barrett	39a6057fc6	A number of improvements / changes to the RML/OOB layers: * General TCP cleanup for OPAL / ORTE * Simplifying the OOB by moving much of the logic into the RML * Allowing the OOB RML component to do routing of messages * Adding a component framework for handling routing tables * Moving the xcast functionality from the OOB base to its own framework Includes merge from tmp/bwb-oob-rml-merge revisions: r15506, r15507, r15508, r15510, r15511, r15512, r15513 This commit was SVN r15528. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15506 r15507 r15508 r15510 r15511 r15512 r15513	2007-07-20 01:34:02 +00:00
Brian Barrett	2d17dd9516	temporarily back our r15517 and 15520 so that I can get the RML / OOB changes to cleanly apply This commit was SVN r15527. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95	2007-07-20 01:10:34 +00:00
Ralph Castain	41977fcc95	Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but... Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point. Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings. This commit was SVN r15517.	2007-07-19 20:56:46 +00:00
Sven Stork	2ab401dc3c	- export required symbols used by OSC This commit was SVN r15476.	2007-07-18 11:51:52 +00:00
George Bosilca	e782da00e0	Don't allow the same communicator to be used in a multi-threaded build by several threads to create new communicators. There is nothing in the standard about threading and communicaotr functions, but as they include collective communications I expect the same rules have to be applied. As such, on an incorrect MPI program we deadlock (!). This commit was SVN r15456.	2007-07-17 00:33:27 +00:00
Brian Barrett	cb2bc19f07	add accessor function for getting ompi_communicator_t* -> cid mapping, since we already have a function for getting cid -> ompi_communicator_t* mapping This commit was SVN r15364.	2007-07-11 17:14:57 +00:00
Brian Barrett	1d02b9e7b5	Fix a bunch of issues exposed by Ken Cain in getting Open MPI to work with VxWorks. Still some issues remaining, I'm sure. Refs trac:1010 This commit was SVN r15320. The following Trac tickets were found above: Ticket 1010 --> https://svn.open-mpi.org/trac/ompi/ticket/1010	2007-07-10 03:46:57 +00:00
Brian Barrett	84d1512fba	Add the potential for doing some basic error checking on mutexes during single threaded builds. In its default configuration, all this does is ensure that there's at least a good chance of threads building based on non-threaded development (since the variable names will be checked). There is also code to make sure that a "mutex" is never "double locked" when using the conditional macro mutex operations. This is off by default because there are a number of places in both ORTE and OMPI where this alarm spews mega bytes of errors on a simple test. So we have some work to do on our path towards thread support. Also removed the macro versions of the non-conditional thread locks, as the only places they were used, the author of the code intended to use the conditional thread locks. So now you have upper-case macros for conditional thread locks and lowercase functions for non-conditional locks. Simple, right? :). This commit was SVN r15011.	2007-06-12 16:25:26 +00:00
Brian Barrett	508da4e959	OS X apparently really doesn't like shared libraries with unresolvable symbols in them and environ is defined only in the final application (probably in crt1.o). Apple provides a function for getting at the environment, so use that instead if it's available. This commit was SVN r14857.	2007-06-05 03:03:59 +00:00
Ralph Castain	4fff584a68	Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that did start. The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system. Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed. Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief. With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn. Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put". This commit was SVN r14711.	2007-05-21 18:31:28 +00:00
Ralph Castain	7d0f51e6b9	Begin setting up for a change to the OOB information passing functionality - this is totally transparent at the moment (need to change computers). This commit was SVN r14510.	2007-04-25 17:36:26 +00:00
Tim Prins	f0e6a28a1f	pedantic indentation... This commit was SVN r14251.	2007-04-06 19:18:31 +00:00
Edgar Gabriel	4d2b3e859d	fix the indenting from tabs to spaces :-) This commit was SVN r14211.	2007-04-03 21:33:44 +00:00
Edgar Gabriel	188f770d94	ok, increase the reference count on ompi_mpi_group_null twice when creating ompi_mpi_comm_null, since the destructor of ompi_mpi_comm_null will decrease the reference counter of ompi_mpi_group_null twice according to the last fix of Mohamad. Added also a lengthy comment in ompi_comm_finalize about why we do not decrease the reference counters for ompi_mpi_comm_null, ompi_mpi_group_null etc. for the parent communicator, although we do increase it in ompi_comm_init This commit was SVN r14210.	2007-04-03 21:16:26 +00:00
Mohamad Chaarawi	0e98bf2ac6	quick fix for the cart create problem caused by the previous memory leak fix This commit was SVN r14195.	2007-04-02 19:06:52 +00:00
Mohamad Chaarawi	8f4f992bfc	fixed the memory leak problem by decrementing the ref count on the remote group in case of Intra communicators. This needs to go in V1.2. We will file a move request on monday.. This commit was SVN r14179.	2007-03-30 19:30:40 +00:00
Mohamad Chaarawi	bfaf9d4a12	Added new module for intercomm collectives. This will require an autogen. This commit was SVN r14149.	2007-03-27 02:06:42 +00:00
Mohamad Chaarawi	cae083dec6	replaced the old CID allocation algorithm with the blocked algorithm. The impace in the communicator directory is still not great since the interface for allocating a Cid has not changed.. This commit was SVN r12836.	2006-12-12 22:01:39 +00:00
Brian Barrett	98884e45e4	Clean up the way procs are added to the global process list after MPI_INIT: * Do not add new procs to the global list during modex callback or when sharing orte names during accept/connect. For modex, we cache the modex info for later, in case that proc ever does get added to the global proc list. For accept/connect orte name exchange between the roots, we only need the orte name, so no need to add a proc structure anyway. The procs will be added to the global process list during the proc exchange later in the wireup process * Rename proc_get_namebuf and proc_get_proclist to proc_pack and proc_unpack and extend them to include all information needed to build that proc struct on a remote node (which includes ORTE name, architecture, and hostname). Change unpack to call pml_add_procs for the entire list of new procs at once, rather than one at a time. * Remove ompi_proc_find_and_add from the public proc interface and make it a private function. This function would add a half-created proc to the global proc list, so making it harder to call is a good thing. This means that there's only two ways to add new procs into the global proc list at this time: During MPI_INIT via the call to ompi_proc_init, where my job is added to the list and via ompi_proc_unpack using a buffer from a packed proc list sent to us by someone else. Currently, this is enough to implement MPI semantics. We can extend the interface more if we like, but that may require HNP communication to get the remote proc information and I wanted to avoid that if at all possible. Refs trac:564 This commit was SVN r12798. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-07 19:56:54 +00:00
Brian Barrett	b07dfa7841	* remove unused variable in ompi_comm_get_rprocs * don't load data into a buffer until we have the data, as the data contains some header information needed to properly load the data This commit was SVN r12792.	2006-12-07 16:19:44 +00:00
Brian Barrett	33320b7165	Rework the opal_progress interface to better support dynamic processes and at the same time, remove some of the MPI-related options from OPAL: - provide mechanism to change at runtime whether sched_yield() should be called when the progress engine is idle - provide mechanism for changing the rate at which the event engine is called when there are "no" users of the event engine (ie, when using MPI but not TCP) - fix some function names in the progress engine to better match their intended use (and remove MPI naming scheme) - remove progress_mpi_enable / progress_mpi_disable because we can now use the functions to set the sched_yield and tick rate interfaces - rename opal_progress_events() to opal_progress_set_event_flag() because the first really isn't descriptive of what the function does and I always got confused by it This commit was SVN r12645.	2006-11-22 02:06:52 +00:00
Ralph Castain	6d6cebb4a7	Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it. I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn). This commit was SVN r12597.	2006-11-14 19:34:59 +00:00
Ralph Castain	9204747930	Add timing info to comm_spawn - timing collected and reported when OMPI_MCA_ompi_timing = 1 (or something other than zero). This commit was SVN r12381.	2006-10-31 23:32:39 +00:00
George Bosilca	06563b5dec	Last set of explicit conversions. We are now close to the zero warnings on all platforms. The only exceptions (and I will not deal with them anytime soon) are on Windows: - the write functions which require the length to be an int when it's a size_t on all UNIX variants. - all iovec manipulation functions where the iov_len is again an int when it's a size_t on most of the UNIXes. As these only happens on Windows, so I think we're set for now :) This commit was SVN r12215.	2006-10-20 03:57:44 +00:00
Ralph Castain	d0eb7d7216	Complete the attribute management functions. Modify the mapper to better bookmark its stopping place each time, and to pick up the next time from there. This needs to be validated on a multi-node system. Fix a major memory corruption problem in the registry put/get functions that was doing multiple free's. Not sure how valgrind missed this one, though it only occurred in specific circumstances (such as comm_spawn). This commit was SVN r12179.	2006-10-18 20:02:16 +00:00
Ralph Castain	f4a458532b	This doesn't totally resolve the comm_spawn problem, but it helps a little. I'll continue working on it and hope to resolve it completely shortly. The issue primarily centers on where to start mapping the child job's processes, and how to deal with oversubscription that might result. At the moment, I am trying to resolve the first issue first (hey, that even sounds right!). This change does a couple of things: 1. Since the USE_PARENT_ALLOC attribute is a directive about regarding allocation of resources to a job, it more properly should be an attribute of the RAS. Change the name to reflect that and move the attribute define to the ras_types.h file. 2. Add the attributes list to the RMAPS map_job interface. This provides us with the desired flexibility to dynamically specify directives for mapping. The system will - in the absence of any attribute-based directive - default to the values provided in the MCA parameters (either from environment or command-line interface). This commit was SVN r12164.	2006-10-18 14:01:44 +00:00
Ralph Castain	13227e36ab	This commit looks a lot bigger than it is, so relax :-) Fix the problem observed by multiple people that comm_spawned children were (once again) being mapped onto the same nodes as their parents. This was caused by going through the RAS a second time, thus overwriting the mapper's bookkeeping that told RMAPS where it had left off. To solve this - and to continue moving forward on the ORTE development - we introduce the concept of attributes to control the behavior of the RM frameworks. I defined the attributes and a list of attributes as new ORTE data types to make it easier for people to pass them around (since they are now fundamental to the system, and therefore we will be packing and unpacking them frequently). Thus, all the functions to manipulate attributes can be implemented and debugged in one place. I used those capabilities in two places: 1. Added an attribute list to the rmgr.spawn interface. 2. Added an attribute list to the ras.allocate interface. At the moment, the only attribute I modified the various RAS components to recognize is the USE_PARENT_ALLOCATION one (as defined in rmgr_types.h). So the RAS components now know how to reuse an allocation. I have debugged this under rsh, but it now needs to be tested on a wider set of platforms. This commit was SVN r12138.	2006-10-17 16:06:17 +00:00
Ralph Castain	1f7a5da3ce	Bring singleton comm_spawn online. This commit was SVN r12081.	2006-10-10 23:59:48 +00:00
Edgar Gabriel	ec55acd8f4	orte_rml.send_buffer returns the number of bytes sent or a negative value if something went wrong. A positiv number > 0 is however a correct value (in contrary to orte_rml.recv_buffer, which really returns ORTE_SUCCESS or an error code). Note: this part of the code is correct on 1.1 and 1.2 branch, no need to move this change patch to the release branches. This commit was SVN r11897.	2006-09-29 20:28:45 +00:00
George Bosilca	645790dd9c	Pedantic... This commit was SVN r11731.	2006-09-20 22:20:10 +00:00
George Bosilca	688a16ea78	A long time waiting patch. Get rid of the comm->c_pml_procs. It was (and that was long ago) supposed to be used as a cache for accessing the PML procs. But in all of the PMLs the PML proc contain only one field i.e. a pointer to the ompi_proc. This pointer can be accessed using the c_remote_group easily. Therefore, there is no meaning of keeping the PML procs around. Slim fast commit ... This commit was SVN r11730.	2006-09-20 22:14:46 +00:00
George Bosilca	20459bd982	Remove the HIDDEN flag. It is not used anywhere. This commit was SVN r11729.	2006-09-20 20:57:10 +00:00
Ralph Castain	0ad0d84afd	Add two new API functions to the RMGR, and modify the "spawn" API to support the enhanced MPI-2 functionality. No implementation backs these new APIs - just placeholders for now. This commit was SVN r11699.	2006-09-19 01:45:05 +00:00
Ralph Castain	37dfdb76eb	Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done. This commit was SVN r11661.	2006-09-14 21:29:51 +00:00
George Bosilca	3f0a7cad9e	The last patch for Windows support. Mostly casting and conversion to C++ friendly headers. This commit was SVN r11400.	2006-08-24 16:38:08 +00:00
Ralph Castain	6d27fee3a2	Silence Cyrador...who had a valid complaint. This commit was SVN r11282.	2006-08-21 14:26:11 +00:00
Ralph Castain	6bf06d4602	Fix connect-accept by cleaning up two minor bugs. This commit was SVN r11260.	2006-08-18 21:12:03 +00:00
Ralph Castain	8c7f0ed9ae	Change the SOH to the new State Monitoring and Reporting (SMR) framework. New API's will be appearing in the new framework shortly - this just gets the name change into the system. Other changes: 1. Remove the old xcpu components as they are not functional. 2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one. This will require an autogen/configure, I'm afraid. This commit was SVN r11228.	2006-08-16 16:35:09 +00:00
Ralph Castain	5dfd54c778	With the branch to 1.2 made.... Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced). Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up). I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t). In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but... Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems. This commit was SVN r11204.	2006-08-15 19:54:10 +00:00
Ralph Castain	62e70e6b3a	Enable the use of "prefix" for comm_spawn child processes. With this patch: 1. comm_spawn processes by default will inherit the "--prefix" from their parent job. Thus, the "--prefix" provided on the command line will be propagated automatically to any children. 2. application programs can override the default by providing their own "ompi_prefix" in the MPI_Info parameter passed to comm_spawn This commit was SVN r11143.	2006-08-09 20:48:51 +00:00
Jeff Squyres	7f372b4e1f	No functional changes -- only re-indent some portions of the code to make it consistent with the indenting in the rest of the file (otherwise it was quite difficult to understand -- saw this while I was reviewing 11039). This commit was SVN r11042.	2006-07-28 15:47:16 +00:00
David Daniel	45894aecee	Adding support for MPI_Comm_spawn() to use the 'host' key in an MPI_Info object if provided. The associated value is a comma-separated list of hosts -- which must be in the initial allocation -- and is used to populate the application context map. This commit was SVN r11039.	2006-07-27 23:45:33 +00:00
Jeff Squyres	942f9e8f8d	Fixes for ticket:14. Lengthy discussion is on that ticket and in a comment in ompi_comm_invalid() in source:/trunk/ompi/communicator/communicator.h. Short version: - ompi_comm_invalid() returns TRUE for MPI_COMM_NULL - therefore MPI_COMM_C2F needs to explicitly check for MPI_COMM_NULL (because it uses ompi_comm_invalid()) - make ~20 MPI functions only call ompi_comm_invalid() instead of calling ompi_comm_invalid() and checking for MPI_COMM_NULL (~40 MPI functions already only called ompi_comm_invalid() -- we should be consistent) - similar issue for ompi_win_invalid(), so I added a cross-referencing comment in win.h and fixed MPI_WIN_SET_NAME to only call ompi_win_invalid() (and not check for MPI_WIN_NULL) This commit was SVN r9970.	2006-05-18 18:05:46 +00:00
Edgar Gabriel	8c49f14dce	fix a bug in the intercomm-split allgather emulation function. This commit was SVN r9806.	2006-05-03 21:41:10 +00:00

1 2 3 4

193 Коммитов