openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	d1c6f3f89a	* Fix a truckload of Cisco copyrights to be the same as the rest of the code base. * Fix a few misspellings in other copyrights. This commit was SVN r20241.	2009-01-11 02:30:00 +00:00
Jeff Squyres	a9850c96c5	Cosmetic change. This commit was SVN r20203.	2009-01-05 19:07:06 +00:00
Jeff Squyres	611ebeab33	Cosmetic: expunge some more old 2-space-indent code (re-indent with "indent(1)"). This commit was SVN r20179.	2009-01-02 12:55:17 +00:00
Nysal Jan	ee8ec6f6b5	Remove dead/redundant code. Minimize number of calloc invocations This commit was SVN r20121.	2008-12-12 10:55:50 +00:00
Shiqing Fan	d06604c258	Get rid of the compiler warning message when --enable-picky is used. Do the checks according to inter/intracommunicator flags. This commit was SVN r20063.	2008-12-03 17:44:21 +00:00
Shiqing Fan	abd21b6d17	- An update for memchecker : 1. fix a bug in pml_ob1_recvreq/sendreq.c, buffer was made defined where the request has already been released. 2. complete memchecker support for collective functions. 3. change the wrongly spelled function name of memchecker, i.e. '_isaddressible' should be '_isaddressable' This commit was SVN r20043.	2008-11-27 16:34:02 +00:00
George Bosilca	82d1d5d785	The patch for "Unexpected message queue for unknown CID's required" ticket #1460 . I'm unable to split it in two parts, my patch and Edgar's one. So I just update copyright information for both of us. What this patch do: - it use the unexpected queue create by commit r19562 to dispatch the unexpected message to the right communicator (once this communicator is created and initialized). - delay the PML comm_add until we have the context_id for the new communicator. - only do the PML comm_add on processes that really belong to the new communicator. Please read the lengthy comment in the source code for the reason behind this. This commit was SVN r19929. The following SVN revision numbers were found above: r19562 --> open-mpi/ompi@acd3406aa7	2008-11-04 21:58:06 +00:00
Jeff Squyres	57a3dce9ba	LANL noticed that calling MPI_ABORT invokes opal_output(0, ...) unconditionally, which can result in a flood of messages to the user if all MPI processes invoke abort. Additionally, some users were confused because they saw the MPI_ABORT opal_output() messages from ''some'' MPI processes, but not ''all'' of them (despite the fact that every MPI process supposedly invoked MPI_ABORT). The reason is that calling MPI_ABORT triggers ORTE to kill all MPI processes, so it's a race condition as to whether a) all MPI processes actually invoke MPI_ABORT, and/or b) whether every process is able to opal_output() before they are killed. This commit does two simple things: * Now use orte_show_help() for the MPI_ABORT message, so they are aggregated. * Add a note in the message that calling MPI_ABORT kills all processes, so you might not see all output, yadda yadda yadda. This commit was SVN r19735.	2008-10-14 19:23:03 +00:00
Jeff Squyres	d0a8be6d2f	Fix CID 1117: ensure to check return values. This commit was SVN r19583.	2008-09-19 13:27:30 +00:00
Nysal Jan	4b68803260	Should be coords(i) >= dims(i) Refs trac:1463 This commit was SVN r19500. The following Trac tickets were found above: Ticket 1463 --> https://svn.open-mpi.org/trac/ompi/ticket/1463	2008-09-05 04:20:48 +00:00
Jeff Squyres	9a98423bbc	[Re-]Fix #1463 with a little thing that I like to call "the right way". Don't modify coords in the top-level API function because coords is an IN variable. Instead, as Nysal noted, the real cause of the problem was a missing ! down in topo_base_cart_rank.c. Put a comment down in topo_base_cart_rank.c explaining what's going on so that the code is not so cryptic. Refs trac:1363. This commit was SVN r19487. The following Trac tickets were found above: Ticket 1363 --> https://svn.open-mpi.org/trac/ompi/ticket/1363	2008-09-03 08:24:27 +00:00
Jeff Squyres	008fa8c5cc	Fixes trac:1236, #1237 . * Various changes to enable 0-dimensional cartesian communicators: * Set various mtc_* members to NULL when there are 0 dimensions (and don't bother trying to memcpy these arrays when duplicating the communicator -- because they're NULL) * adjust topo_base_cart_sub to correctly handle 0 dimensions (simplified it a bit) * adjust a few error codes to return ERR_OUT_OF_RESOURCE * adjust error checking of CART_CREATE, CART_RANK * Allow MPI_GRAPH_CREATE to accept 0 == nnodes. * Bump reported MPI version in mpi.h to 2.1 This commit was SVN r19461. The following Trac tickets were found above: Ticket 1236 --> https://svn.open-mpi.org/trac/ompi/ticket/1236	2008-08-31 19:31:10 +00:00
Jeff Squyres	59cb626b7c	Fixes trac:1463: ensure periodic dimensions are handled proprly for MPI_CART_RANK. This commit was SVN r19459. The following Trac tickets were found above: Ticket 1463 --> https://svn.open-mpi.org/trac/ompi/ticket/1463	2008-08-31 18:39:05 +00:00
George Bosilca	697dc524c1	Deal with the ticket #1239 and #712 . This will upgrade the Open MPI support for the F90 type create functions to the requirements of MPI 2.1 standard. Advice to implementors. An application may often repeat a call to MPI_TYPE_CREATE_F90_xxxx with the same combination of (xxxx,p,r). The application is not allowed to free the returned predefined, unnamed datatype handles. To prevent the creation of a potentially huge amount of handles, the MPI implementation should return the same datatype handle for the same (REAL/COMPLEX/INTEGER,p,r) combination. Checking for the combination (p,r) in the preceding call to MPI_TYPE_CREATE_F90_xxxx and using a hash-table to find formerly generated handles should limit the overhead of finding a previously generated datatype with same combination of (xxxx,p,r). (End of advice to implementors.) This commit fixes trac:1239, and #712. This commit was SVN r19458. The following Trac tickets were found above: Ticket 1239 --> https://svn.open-mpi.org/trac/ompi/ticket/1239	2008-08-31 18:36:32 +00:00
Jeff Squyres	93746cd594	Fixed CID 807: Remove unused variable This commit was SVN r19239.	2008-08-11 20:50:09 +00:00
Rolf vandeVaart	e105b3f254	Finish work related to ticket #1392 where the versions were bumped from v1.0.0 to v2.0.0. This change fixed #1439. This commit was SVN r19175.	2008-08-06 12:16:54 +00:00
Rainer Keller	82580701fb	- We may know the *_name is < MPI_MAX_OBJECT_NAME; Prevent does not. Fix Coverity issues CID1068 and CID1069 This commit was SVN r19167.	2008-08-06 07:59:59 +00:00
Ralph Castain	a0ae63f19e	Ensure we call close_port after comm_spawn[_multiple]. Cleanout the port name in close_port This commit was SVN r19068.	2008-07-28 16:40:11 +00:00
Jeff Squyres	74aa9689e4	From an initial patch from George, update all the set/get errhandler functions to use atomics in order to be thread safe. This commit was SVN r18807.	2008-07-03 19:28:02 +00:00
Jeff Squyres	51d833e8d1	Minor fixes and comment clarifications for MPI-2.1-mandated handling of strings. We mostly did the Right Things already; I simplified the code a bit and also had us not write to more characters in the C bindings than we're supposed to (per language in the MPI-2.1 spec). Fixes trac:1238. This commit was SVN r18705. The following Trac tickets were found above: Ticket 1238 --> https://svn.open-mpi.org/trac/ompi/ticket/1238	2008-06-21 19:33:47 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Ralph Castain	6ddcce4085	Apply a patch from Edgar to fix the Intercomm MTT tests. Fixes ticket #1332 This commit was SVN r18591.	2008-06-05 12:53:12 +00:00
Rolf vandeVaart	0d8faf7559	Fix the fix for ticket #1298 . Thanks George for pointing it out. This commit was SVN r18488.	2008-05-23 13:33:38 +00:00
Rolf vandeVaart	8c3b31b181	Need to properly handle zero-length scatters and gathers on intercommunicators. Add a check for the MPI_ROOT and MPI_PROC_NULL processes so they do not enter collective module when count=0. This commit was SVN r18481.	2008-05-22 19:09:43 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Rainer Keller	4b89706dfe	- Properly check for valid output parameters... This commit was SVN r18419.	2008-05-09 08:39:24 +00:00
Shiqing Fan	8088ec8bce	More for non-blocking communication. This commit was SVN r18400.	2008-05-07 13:00:28 +00:00
Shiqing Fan	8393fb5d47	Use the new memchecker_call function for memory checking of non-blocking communication. This commit was SVN r18399.	2008-05-07 12:28:51 +00:00
Shiqing Fan	f35a06119c	Use memchecker_convertor_call function instead the old one. Move the function to the place that we can use convertor. This commit was SVN r18370.	2008-05-05 13:57:27 +00:00
Terry Dontje	8dd0421015	Moved ident lines to ompi_mpi_init.c and created new ompi_version_string variable. This commit was SVN r18345.	2008-05-01 15:06:10 +00:00
Josh Hursey	cc83d41ad9	Merge in tmp/jjh-scratch {{{ svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch . }}} Contains: * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart. * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P. * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry * Some other sundry cleanup items all dealing with C/R functionality in the trunk. This commit was SVN r18241.	2008-04-23 00:17:12 +00:00
Tim Prins	b2acb51d04	make comm_join work again. Allocate memory to the correct pointer. This commit was SVN r18186.	2008-04-17 11:56:53 +00:00
Ralph Castain	7b91f8baff	Cleanup and fix bugs in the MPI dynamics section. Modify the dpm API so it properly takes ports instead of process names (as correctly identified by Aurelien). Fix race conditions in the use of ompi-server. Fix incompatibilities between the mpi bindings and the dpm implemenation that could cause segfaults due to uninitialized memory. Fix the ompi-server -h cmd line option so it actually tells you something! Add two new testing codes to the orte/test/mpi area: accept and connect. This commit was SVN r18176.	2008-04-16 14:27:42 +00:00
Aurelien Bouteiller	921a6ce3d4	Process with different jobid can kwon connet/accept to each other. This commit was SVN r18134.	2008-04-11 15:40:59 +00:00
Edgar Gabriel	5989fa570c	Sorry, previous commit was in the wrong directory. This is the real fix (have to undo 1822). The verification of recvcount==0 and rank = root was braking inter-communicator scatter, since the root (root==MPI_ROOT) might very well have recvcount=0. The same fix has been applied to gather.c just the other way round. Fixes the bug reported on the mainling list by Martin Audet. If there is a 1.2.7 this fix might be worthwhile porting it over. Please note, that while the test works now for basic and for inter, we get a 0byte malloc warning from the inter module, which we still have to fix in a separate patch. This commit was SVN r18123.	2008-04-10 15:03:14 +00:00
Rainer Keller	334b64e760	- Coverity issue CID 35: Event var_deref_op: Variable "requests" tracked as NULL was dereferenced. Only check requests[i] for NULL, if requests is != NULL itself. This commit was SVN r17973.	2008-03-26 08:19:55 +00:00
Rainer Keller	56f3d59f2a	- Coverity issues 939, 940, 941: Event uninit_use_in_call: Using uninitialized value "tag" in call to function "(ompi_dpm).connect_accept" and others The tag is set and used in get_rport only on root... This commit was SVN r17972.	2008-03-26 08:09:11 +00:00
Ralph Castain	90107f3c14	Fix an issue with comm_spawn over who sent/recv first in the modex. The modex assumes that the first name on the list is the "root" that will serve as the allgather collector/distributor. The dpm was putting that entity last, which forced us to pre-inform the parent procs of the child proc's contact info since the parent was trying to send to the child. Clarify the setting of send_first in the mpi bindings (trivial, i know, but helpful) Remove the extra xcast of child contact info to the parent job. This commit was SVN r17952.	2008-03-25 14:57:34 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Josh Hursey	134684d096	A compiler warning fix. This commit was SVN r17539.	2008-02-21 14:28:08 +00:00
Josh Hursey	99144db970	Improve checkpoint/restart support by allowing a checkpoint to progress when the process is not in the MPI library. This involves creating a separate thread for polling for a checkpoint request. This thread is active when the MPI process is not in the MPI library, and paused when the MPI process is in the library. Some MPI C interface files saw some spacing changes to conform to the coding standards of Open MPI. Changed MPI C interface files to use {{{OPAL_CR_ENTER_LIBRARY()}}} and {{{OPAL_CR_EXIT_LIBRARY()}}} instead of just {{{OPAL_CR_TEST_CHECKPOINT_READY()}}}. This will allow the checkpoint/restart system more flexibility in how it is to behave. Fixed the configure check for {{{--enable-ft-thread}}} so it has a know dependance on {{{--enable-mpi-thread}}} (and/or {{{--enable-progress-thread}}}). Added a line for Checkpoint/Restart support to {{{ompi_info}}}. Added some options to choose at runtime whether or not to use the checkpoint polling thread. By default, if the user asked for it to be compiled in, then it is used. But some users will want the ability to toggle its use at runtime. There are still some places for improvement, but the feature works correctly. As always with Checkpoint/Restart, it is compiled out unless explicitly asked for at configure time. Further, if it was configured in, then it is not used unless explicitly asked for by the user at runtime. This commit was SVN r17516.	2008-02-19 22:15:52 +00:00
Rainer Keller	9cd2c6f48b	- Instead of calling RUNNING_ON_VALGRIND, implement specific function, thereby removing bogus requirement on valgrind/valgrind.h dough... - Call specific function runindebugger() before doing expensive checks on each component of struct. - Get rid of void* warnings.. This commit was SVN r17438.	2008-02-12 20:37:51 +00:00
Shiqing Fan	54c7b71cfd	Use the correct way of including memchecker.h, which will work with '--with-devel-headers'. This commit was SVN r17435.	2008-02-12 18:01:17 +00:00
Shiqing Fan	f5792bbda5	merging the memchecker into trunk. This commit was SVN r17424.	2008-02-12 08:46:27 +00:00
Dan Lacher	98f70d6318	Convert the C++ Comm, Datatype and Winn keyval creation and intercept callbacks to not use the STL as well as removing the STL use from the error handler routines. This was removing the STL from the C++ bindings (Solaris has 2 versions of the STL; if OMPI uses one and an MPI application wants to use another, Bad Things happen). The main idea is to wrap up the C++ callback function pointers and the user's extra_state into our own struct that is passed as the extra_state to the C keyval registration along with the intercept routines in intercepts.cc. When the C++ intercepts are activated, they unwrap the user's callback and extra state and call them. This commit was SVN r17409.	2008-02-10 19:29:25 +00:00
George Bosilca	13de3420ab	As the receive buffer is only significant at root, limit the check only where it makes sense. This commit was SVN r17366.	2008-02-04 01:44:41 +00:00
Rainer Keller	2b4975de8e	- In case of MPI_REQUEST_NULL, set the *status to the empty_status, by copying structure: psendrecv.c:81 4e7: cmpl $0x0,0x34(%ebp) 4e7: cmpl $0x0,0x34(%ebp) 4eb: je 51e <PMPI_Sendrecv+0x51e> 4eb: je 517 <PMPI_Sendrecv+0x517> psendrecv.c:85 4ed: mov 0x34(%ebp),%eax 4ed: mov 0x34(%ebp),%edx 4f0: movl $0xfffffffe,(%eax) 4f0: mov 0x38,%eax psendrecv.c:86 4f5: mov %eax,(%edx) 4f6: mov 0x34(%ebp),%eax 4f7: mov 0x3c,%eax 4f9: movl $0xffffffff,0x4(%eax) 4fc: mov %eax,0x4(%edx) psendrecv.c:87 4ff: mov 0x40,%eax 500: mov 0x34(%ebp),%eax 504: mov %eax,0x8(%edx) 503: movl $0x0,0x8(%eax) 507: mov 0x44,%eax psendrecv.c:88 50c: mov %eax,0xc(%edx) 50a: mov 0x34(%ebp),%eax 50f: mov 0x48,%eax 50d: movl $0x0,0xc(%eax) 514: mov %eax,0x10(%edx) psendrecv.c:89 514: mov 0x34(%ebp),%eax 517: movl $0x0,0x10(%eax) psendrecv.c:91 This commit was SVN r17230.	2008-01-25 12:58:59 +00:00
George Bosilca	25814c07e0	Update the checks in the reduce family collectives. This commit was SVN r17096.	2008-01-09 20:40:57 +00:00
George Bosilca	906e8bf1d1	Replace the ompi_pointer_array with opal_pointer_array. The next step (sometimes after the merge with the ORTE branch), the opal_pointer_array will became the only pointer_array implementation (the orte_pointer_array will be removed). This commit was SVN r17007.	2007-12-21 06:02:00 +00:00
Jeff Squyres	b9106a0d25	Back out r16836 and put in a big comment why. This commit was SVN r16872. The following SVN revision numbers were found above: r16836 --> open-mpi/ompi@6b9048fc6d	2007-12-06 18:45:21 +00:00

1 2 3 4 5

240 Коммитов