openmpi

Автор	SHA1	Сообщение	Дата
Rainer Keller	b99e5a71d1	- Help message in case of MPI-application with two init or calling init functions after finalize. This commit was SVN r12858.	2006-12-14 19:58:04 +00:00
Brad Benton	18da4c40d3	Set the QP's static rate from the associated MCA parameter, rather than just defaulting to 0. Fixes trac:675 This commit was SVN r12855. The following Trac tickets were found above: Ticket 675 --> https://svn.open-mpi.org/trac/ompi/ticket/675	2006-12-14 19:42:24 +00:00
Rolf vandeVaart	c51c36c4a2	These changes fix two issues. 1. For OS's without the dirent.d_type field, we were potentially not initializing a filename string. This could result in a directory not being cleaned up. 2. Potential memory leaks in filename strings that were allocated. Refs trac:678 This commit was SVN r12853. The following Trac tickets were found above: Ticket 678 --> https://svn.open-mpi.org/trac/ompi/ticket/678	2006-12-14 18:27:27 +00:00
Brian Barrett	38c2e43ac2	Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues... This commit was SVN r12852.	2006-12-14 18:20:43 +00:00
Jeff Squyres	0ca8cb35b7	Fixes trac:366 Add ability for ini files to recognize "use_eager_rdma" flag. Set the default to "no" (because we should assume that HCAs cannot support the property necessary for using RDMA for eager messages -- that the last byte of the message is guaranteed to be written to memory last -- unless proven otherwise. For example, iWARP cards apparently do not provide this guarantee), and then set all Mellanox and IBM HCAs to override the default to enable this behavior on these cards. This commit was SVN r12851. The following Trac tickets were found above: Ticket 366 --> https://svn.open-mpi.org/trac/ompi/ticket/366	2006-12-14 15:52:13 +00:00
Dan Lacher	e3f749acc4	Ticket: #673 Submitted by: Dan Lacher This commit was SVN r12844.	2006-12-13 20:01:16 +00:00
Ralph Castain	7b8f445e13	Modify the "--display-map-at-launch" option to just "--display-map". Now that we have a "--do-not-launch" option, the "-at-launch" part of the display-map option was confusing. "--display-map" displays the resulting process map before we launch anyway, so this is clearer. This commit was SVN r12840.	2006-12-13 13:49:15 +00:00
Ralph Castain	82946cb220	Add a new option to orterun: "--do-not-launch" directs the system to do the allocation, map, job setup, etc., but don't actually launch the job. This lets us test all the setup portions of the code. Also, take the first step in updating how we handle mca params in ORTE - bring it closer to how it is done in the other two layers. Much more work to be done here. This commit was SVN r12838.	2006-12-13 04:51:38 +00:00
George Bosilca	80bc0c8868	Allow the MX to survive if we are unable to connect to a peer. The PML will try to find another route. This commit was SVN r12837.	2006-12-13 01:12:07 +00:00
Mohamad Chaarawi	cae083dec6	replaced the old CID allocation algorithm with the blocked algorithm. The impace in the communicator directory is still not great since the interface for allocating a Cid has not changed.. This commit was SVN r12836.	2006-12-12 22:01:39 +00:00
Brian Barrett	10af8ab454	Corrections for when threading is enabled. Refs trac:564 This commit was SVN r12830. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-12 18:48:42 +00:00
Ralph Castain	3b064a624e	For convenience, revise the orte_job_map_t object so it includes the vpid start/range values, the number of nodes, and the number of processes on each node. These values are all used in various places in the code base - we currently re-compute them multiple times. Since these values do not change and are already being computed by the RMAPS framework, we might as well just save them for re-use. This commit was SVN r12829.	2006-12-12 16:07:23 +00:00
Brad Benton	337116d5fd	Added IBM eHCA vendor and part id info This commit was SVN r12827.	2006-12-12 14:12:39 +00:00
Ralph Castain	28ce8e5e5e	Extend the mpirun options to support "--npernode N". This option tells the system to spawn N procs/node across all nodes in the allocation. If N is greater than the number of allocated slots, then the usual oversubscription logic will apply (i.e., the system will error out if oversubscription is not allowed, otherwise it will run with the sched_yield set to non-aggressive behavior). In "--npernode" operation, the "-np" command line parameter is ignored. This commit was SVN r12826.	2006-12-12 00:54:05 +00:00
Brian Barrett	cf196ce420	Instead of an unknown proc list that requires ownership transfer of data (which, in turn, requires a complex series of locks to be held during the transfer), use a modex backing store with backpointers from the proc to the backing store. The proc structures no longer own the modex data, which greatly simplifies locking when an unknown proc suddenly becomes known. Refs trac:564 This commit was SVN r12822. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-11 21:27:30 +00:00
Ralph Castain	8314e8dbb9	Modify the pernode option so it can accept a request for the number of processes to be launched. We now check three use-cases for pernode: 1. no -np provided - put one proc/node across all allocated nodes 2. -np N provided, N > #nodes - we print a pretty error message and exit 3. -np N provided, N <= #nodes - put one proc/node across N nodes I also added a new orte constant (ORTE_ERR_SILENT) that allows us to pass up the chain that an error was encountered, but NOT print ORTE_ERROR_LOG messages. This is intended to be used for cases where the error we encounter is NOT an orte error, but rather is one associated with incorrect user input (e.g., the preceding case 2). In such cases, there is no point in printing an ORTE_ERROR_LOG chain of messages as it isn't an orte error. This commit was SVN r12821.	2006-12-11 18:07:07 +00:00
Ralph Castain	0a5d41857a	Complete next round of message size reduction: "strip" the descriptive info from the returned values. I have now added a flag to the gpr address mode (ORTE_GPR_STRIPPED) that instructs the gpr to not include segment names or tokens in the returned gpr_value_t objects. I found only two places that were looking at the tokens: 1. the odls - we used the tokens to separately process the globals container data from everything else. In this case, I left the subscription that returned the globals data alone, but "stripped" the subscription that returned the launch data for the procs. These subscriptions have nothing to do with the xcast message. 2. the pml_base_modex - the callback function was getting process names from the returned tokens. Actually, this function was doing a very bad thing - it was assuming that the first token returned was always the process name. This is currently true, but is one of those assumptions that someone could have easily changed - and suddenly found the system inexplicably failing. I modified the function to (a) get the name sent back to us, (b) "stripped" the value structures of tokens and segment strings, and (c) correctly obtained process names from the returned values. I also reindented the heck out of the code so it was legible (at least, to my old eyes). This commit was SVN r12813.	2006-12-09 23:10:25 +00:00
Jeff Squyres	e70ef98ea6	Update the help message to be a bit more specific and refer to the web FAQ. This commit was SVN r12812.	2006-12-09 15:13:03 +00:00
Jeff Squyres	c7282855e7	Fixes trac:659 This commit fixes several aspects regarding MPI conformance of requests. * Eliminate the last argument of ompi_errhandler_request_invoke(); we ''always'' want to invoke the back-end exception handler with the real error code. * Make it clear in comments that we only invoke the ''first'' exception in a given array of requests, even if there's more than one request with a non-MPI_SUCCESS value for MPI_ERROR. * Defer the freeing of requests upon exception in the back-end functions to MPI_WAIT* and MPI_TEST* until later; the requests are kept so that we know what handler to invoke when we actually invoke the exception. After figuring that out, ''then'' we free requests with pending exceptions on them. * Clean up return codes from the back-end MPI_TEST* and MPI_WAIT* functions. * Slightly modify ompi_errcode_get_mpi_code() to return unity if it receives an MPI error code (vs. an OMPI error code). This commit was SVN r12810. The following Trac tickets were found above: Ticket 659 --> https://svn.open-mpi.org/trac/ompi/ticket/659	2006-12-09 14:20:08 +00:00
Ralph Castain	58569546ed	Fix the fix to remove compiler warning - an incorrect "\" was placed in the command string. This commit was SVN r12805.	2006-12-08 04:17:38 +00:00
Jeff Squyres	568d20de37	Add Myricom to the aggregated list. This commit was SVN r12804.	2006-12-08 00:11:51 +00:00
Patrick Geoffray	58c6f8c8e1	Copyright update. Thanks to Jeff to remind me. This commit was SVN r12803.	2006-12-07 23:55:00 +00:00
Patrick Geoffray	6e09b0c23f	lval is not defined when pval is assigned on 32 bit systems. this usually is ok on little-endian systems, as the upper 32 bits will likely be ignored, but on 32-bit big-endian systems, lval is complete junk. Use ival if 32 bit mode, lval if 64. Mixing of 32 and 64 bit architectures won't work without more changes. This commit was SVN r12802.	2006-12-07 23:34:04 +00:00
Brian Barrett	98884e45e4	Clean up the way procs are added to the global process list after MPI_INIT: * Do not add new procs to the global list during modex callback or when sharing orte names during accept/connect. For modex, we cache the modex info for later, in case that proc ever does get added to the global proc list. For accept/connect orte name exchange between the roots, we only need the orte name, so no need to add a proc structure anyway. The procs will be added to the global process list during the proc exchange later in the wireup process * Rename proc_get_namebuf and proc_get_proclist to proc_pack and proc_unpack and extend them to include all information needed to build that proc struct on a remote node (which includes ORTE name, architecture, and hostname). Change unpack to call pml_add_procs for the entire list of new procs at once, rather than one at a time. * Remove ompi_proc_find_and_add from the public proc interface and make it a private function. This function would add a half-created proc to the global proc list, so making it harder to call is a good thing. This means that there's only two ways to add new procs into the global proc list at this time: During MPI_INIT via the call to ompi_proc_init, where my job is added to the list and via ompi_proc_unpack using a buffer from a packed proc list sent to us by someone else. Currently, this is enough to implement MPI semantics. We can extend the interface more if we like, but that may require HNP communication to get the remote proc information and I wanted to avoid that if at all possible. Refs trac:564 This commit was SVN r12798. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-07 19:56:54 +00:00
Brian Barrett	b07dfa7841	* remove unused variable in ompi_comm_get_rprocs * don't load data into a buffer until we have the data, as the data contains some header information needed to properly load the data This commit was SVN r12792.	2006-12-07 16:19:44 +00:00
Sven Stork	78173a697a	Replace the test opertion "-e" with "-r" to improve the protability. Refs: #392 This commit was SVN r12790.	2006-12-07 12:14:40 +00:00
Ralph Castain	62d7826e01	Helps if we total up the correct field to get the total number of slots in the universe This commit was SVN r12789.	2006-12-07 03:17:12 +00:00
Ralph Castain	a1153fdc8f	Eliminate virtually all of the attribute_predefined data from the STG1 message. We now compute the total number of slots allocated to us and save that in the registry - the attributed_predefined then retrieves it via the STG1 message. The app_num is passed via the process_info structure, which gets the value from the ODLS in the environment. Obviously, people like bproc will have to get the app_num via another avenue...but that's a problem for another day. Several options are easily available. This commit was SVN r12788.	2006-12-07 03:11:20 +00:00
Brian Barrett	41a70a8f01	indent, this time with the right coding standards... This commit was SVN r12787.	2006-12-07 00:24:01 +00:00
Brian Barrett	f9ec8d6f2a	reindent file to make it easier to deal with... This commit was SVN r12786.	2006-12-07 00:21:25 +00:00
Brian Barrett	8f68764e5e	A number of heterogeneous fixes for the dss with the new buffer options: * When using the load/unload interface, stash away the current buffer type so that it can be properly unpacked on the receiving side if the buffer type is other than the receiver default * Include type information for unsized types (bool, int, size_t, pid_t) so that they can be properly unpacked by the receiver in the heterogeneous case. * Restore the NON_DESC type as the default for optimized builds, since it looks like this fixes the known issues with the non-described buffers Refs trac:587 This commit was SVN r12784. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2006-12-06 23:19:06 +00:00
Brian Barrett	cfeac5581a	temporarily always use described buffers as the non-described causes all kinds of problems for heterogeneous environments This commit was SVN r12783.	2006-12-06 20:22:31 +00:00
Ralph Castain	d4bd60c9fe	Restore the paffinity capability, along with all the required logic to ensure we "do the right thing" when the user gives us inaccurate information about the number of slots on a remote node. This commit was SVN r12780.	2006-12-06 15:59:34 +00:00
Ralph Castain	b1e16fffac	Add the C++ doo-hicky stuff around the odls framework definitions just in case somebody, somewhere, on some remote planet where only goats can feed needs it. This commit was SVN r12777.	2006-12-06 13:58:04 +00:00
Ralph Castain	8ca415a0c5	Remove duplicate orte_odls declaration This commit was SVN r12776.	2006-12-06 13:44:41 +00:00
Jeff Squyres	122b8553ef	Sync against 1.1 tree This commit was SVN r12767.	2006-12-05 19:15:56 +00:00
Edgar Gabriel	1359ba9b13	Rewriting much of the errorcode and errorclass code, since - we have to be able to attach a string to an error class, not just to an error code - according to MPI-2 the attribute MPI_LASTUSEDCODE has to be updated everytime you add a new code or a new class. Thus, you have to have single list for both. Thus, we got rid of the error_class structure. In the error-code structure, we can distinguish whether we are dealing with an error code or an error class by looking at the err->code element of the structure. In case its value is MPI_UNDEFINED, the according entry is a class, else it is an error code. All predefined error codes have the code and the class field set to the same value. The test MPI_Add_error_class1 passes now. Fixes trac:418 This commit was SVN r12764. The following Trac tickets were found above: Ticket 418 --> https://svn.open-mpi.org/trac/ompi/ticket/418	2006-12-05 19:07:02 +00:00
Jeff Squyres	73696be5ac	Add bullets about library name changes (because some users are undoubtedly not using the wrapper compilers) and one-sided support fixes. This commit was SVN r12763.	2006-12-05 18:41:40 +00:00
Brian Barrett	6f8b366acb	Rename liborte to libopen-rte and libopal to libopen-pal per telecon today and bug #632. Refs trac:632 This commit was SVN r12762. The following Trac tickets were found above: Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632	2006-12-05 18:27:24 +00:00
Dan Lacher	ba16157f7e	Removed local zone only requirment from solaris packages Submitted by: Dan Lacher Reviewed by: Rolfv Vandavaart This commit was SVN r12759.	2006-12-05 14:37:44 +00:00
Tim Prins	08d5ca821f	Don't get the node architecture when useing the LoadLevleer RAS. It is slow (about a second for ~300 nodes) and we don't even use the value. This commit was SVN r12758.	2006-12-05 13:47:53 +00:00
Ralph Castain	eb941d8ae2	Fix a bug that declared a node as "oversubscribed" a little early during the mapper procedure. This only affected the mapping procedure, and only if you had set the "--no-oversubscribe" flag. Kudos to Tim Prins for finding it. This commit was SVN r12757.	2006-12-05 13:04:27 +00:00
George Bosilca	6f28bcdc21	Remove the last set of compiler warnings from the precondition file. This commit was SVN r12753.	2006-12-04 21:45:57 +00:00
Brian Barrett	441432950f	Merge in changes from the bwb-heterogeneous temp branch (r12491 - r12714) for supporting compilers / architectures with different padding rules. This commit was SVN r12749. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r12491 r12714	2006-12-04 20:11:42 +00:00
Brian Barrett	d64fa194f1	Instead of continually screwing around with different format strings to make this warning-proof, loop over the uint64_ts as an array of integers and use %x. The final string is just as random and formatted exactly the same, so we're all good in that department. Refs trac:655 This commit was SVN r12742. The following Trac tickets were found above: Ticket 655 --> https://svn.open-mpi.org/trac/ompi/ticket/655	2006-12-04 18:07:24 +00:00
Gleb Natapov	f0132b2499	Provide parameters in a correct order (processor/oversubscribed was swapped). This commit was SVN r12737.	2006-12-04 12:55:45 +00:00
Rainer Keller	d078bb3e8a	- Revert changes and include pointers to discussion. This commit was SVN r12736.	2006-12-03 17:05:15 +00:00
Rainer Keller	6f8f28f40f	- Get rid of inline definition, otherwise static-compilation fails. This commit was SVN r12735.	2006-12-03 14:52:17 +00:00
Rainer Keller	e61dd8722e	- Silence compiler on ORTE_TRANSPORT_KEY_FMT, it is fixed to llx - No functional changes, just indentation and corrections to error output. This commit was SVN r12734.	2006-12-03 13:59:23 +00:00
Gleb Natapov	30ca7457b4	Some BTLs (e.g TCP) can report put/get completion before data actually hits the buffer on the other side. For this kind of BTLs we need to send FIN through the same BTL, PUT was performed with so network will handle ordering for us. If we will use another BTL, receiver can get FIN before data will hit the buffer and complete request prematurely. We mark such problematic BTLs with MCA_BTL_FLAGS_FAKE_RDMA flag (this kind of RDMA is really fake, because the real one guaranties that sender will see the completion only after receiver's NIC confirmed that all the data was received). This commit was SVN r12732.	2006-12-03 10:12:09 +00:00

1 2 3 4 5 ...

8784 Коммитов