openmpi

Автор	SHA1	Сообщение	Дата
Brian Barrett	bc6cec346f	Print out the description of the signal from mpirun when a proc was aborted by a signal if we have strsignal() This commit was SVN r12888.	2006-12-17 20:01:11 +00:00
Brian Barrett	299fc7149e	Argh! screwed up a merge. Fix compile error Refs trac:538 This commit was SVN r12887. The following Trac tickets were found above: Ticket 538 --> https://svn.open-mpi.org/trac/ompi/ticket/538	2006-12-17 19:50:20 +00:00
Brian Barrett	edbce8bfec	Don't get the hostname from the environment as SLURM doesn't update that environment variable, so it's not so useful (arg!). Instead, get the hostname during opal_init(). Don't want to call gethostname() during the signal handler. While we're at it, only print the machine name so that the output isn't so wide Refs trac:538 This commit was SVN r12886. The following Trac tickets were found above: Ticket 538 --> https://svn.open-mpi.org/trac/ompi/ticket/538	2006-12-17 19:48:19 +00:00
Brian Barrett	f28391f9ae	Hide the fact we're not printing two levels of trace Refs trac:538 This commit was SVN r12885. The following Trac tickets were found above: Ticket 538 --> https://svn.open-mpi.org/trac/ompi/ticket/538	2006-12-17 19:31:08 +00:00
Brian Barrett	5fb6183e64	* Write out stackframe number when using backtrace_buffer() * Print out header that a signal was received Refs trac:538 This commit was SVN r12884. The following Trac tickets were found above: Ticket 538 --> https://svn.open-mpi.org/trac/ompi/ticket/538	2006-12-17 19:27:57 +00:00
Brian Barrett	b34042a887	Changes to the information printed when a signal occurs: * Have darwin backtrace code return an error when buffer() is called, since it is not imnplemented * Print out hostname & pid when giving signal information * If backtrace_buffer() is implemented, use that instead of backtrace_print() and prefix stacktrace with the hostname * Make the signal information printed be more user friendly * If we're using the backtrace_buffer() code, don't print the last two functions (which will be show_stackframe() then backtrace_buffer()) so that users won't keep thinking the error occurred inside Open MPI (sneaky, yes...) Refs trac:538 This commit was SVN r12883. The following Trac tickets were found above: Ticket 538 --> https://svn.open-mpi.org/trac/ompi/ticket/538	2006-12-17 19:14:13 +00:00
Brian Barrett	2206fc3a45	Ignore the usual files... This commit was SVN r12881.	2006-12-17 17:31:41 +00:00
Brian Barrett	c554638446	Support systems without malloc.h or posix_memalign (ie, pretty much every one we support that isn't Linux) This commit was SVN r12880.	2006-12-17 17:28:59 +00:00
Brian Barrett	b448b4e47e	More heterogeneous fixes. Don't set reachability bit on a remote proc if the remote architecture differs from the local architecture and the btl doesn't support heterogeneous transport. Refs trac:587 This commit was SVN r12879. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2006-12-17 17:27:08 +00:00
Gleb Natapov	190e7a27cd	Merge with gleb-mpool branch. All RDMA components use same mpool now (rdma). udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited). This commit was SVN r12878.	2006-12-17 12:26:41 +00:00
Brian Barrett	f1fdd7c041	Handle case where remote process is of different architecture than the local process when creating a datatype from an internal description. Refs trac:640 This commit was SVN r12877. The following Trac tickets were found above: Ticket 640 --> https://svn.open-mpi.org/trac/ompi/ticket/640	2006-12-17 04:39:16 +00:00
Brian Barrett	0653dc3f24	Pad headers to eliminate heterogeneous issues. Add conversion functions for switching endianness of headers. Galen is going to add the code to use the endian stuff... This commit was SVN r12876.	2006-12-17 00:50:59 +00:00
Ralph Castain	a0ef517550	Fix some errors in the bproc components that prevented compiling. Thought I had already done this, but either those changes were lost when I did the merge, or my old man's memory is fading.... Whaz-at??? :-) This commit was SVN r12874.	2006-12-15 19:40:04 +00:00
Brian Barrett	01e8fc5f91	Redo of r12871, without the preconnect code change: Move the req_mtl structure back to the end of each of the structures in the CM PML. The req_mtl structure is cast into a mtl__request_structure for each MTL, which is larger than the req_mtl itself. The cast will cause the _request to overwrite parts of the heavy requests if the req_mtl isn't the LAST thing on each structure (hence the comment). This was moved as an optimization at some point, which caused buffer sends to fail... Refs trac:669 This commit was SVN r12873. The following SVN revision numbers were found above: r12871 --> open-mpi/ompi@597598b712 The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:54:14 +00:00
Brian Barrett	bdf0b231b2	Undo r12871, as it contained some code in ompi/runtime that shouldn't have been committed Refs trac:669 This commit was SVN r12872. The following SVN revision numbers were found above: r12871 --> open-mpi/ompi@597598b712 The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:52:13 +00:00
Brian Barrett	597598b712	Move the req_mtl structure back to the end of each of the structures in the CM PML. The req_mtl structure is cast into a mtl__request_structure for each MTL, which is larger than the req_mtl itself. The cast will cause the _request to overwrite parts of the heavy requests if the req_mtl isn't the LAST thing on each structure (hence the comment). This was moved as an optimization at some point, which caused buffer sends to fail... Refs trac:669 This commit was SVN r12871. The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:46:53 +00:00
Ralph Castain	1e1d0e8a89	Set the app_num attribute into the process environment so we pick it up on the other end This commit was SVN r12868.	2006-12-15 16:43:52 +00:00
Ralph Castain	677d1260aa	cleanup nicely if we don't launch This commit was SVN r12867.	2006-12-15 14:03:53 +00:00
Ralph Castain	cbb660504c	Retain the ability to run valgrind on the bproc launcher - do not call bproc_version if "nolaunch" is specified. This commit was SVN r12866.	2006-12-15 14:01:21 +00:00
Ralph Castain	64ec238b7b	Repair support for Bproc 4 on 64-bit systems. Update the SMR framework to actually support the begin_monitoring API. Implement the get/set_node_state APIs. This commit was SVN r12864.	2006-12-15 02:34:14 +00:00
Rainer Keller	b99e5a71d1	- Help message in case of MPI-application with two init or calling init functions after finalize. This commit was SVN r12858.	2006-12-14 19:58:04 +00:00
Brad Benton	18da4c40d3	Set the QP's static rate from the associated MCA parameter, rather than just defaulting to 0. Fixes trac:675 This commit was SVN r12855. The following Trac tickets were found above: Ticket 675 --> https://svn.open-mpi.org/trac/ompi/ticket/675	2006-12-14 19:42:24 +00:00
Rolf vandeVaart	c51c36c4a2	These changes fix two issues. 1. For OS's without the dirent.d_type field, we were potentially not initializing a filename string. This could result in a directory not being cleaned up. 2. Potential memory leaks in filename strings that were allocated. Refs trac:678 This commit was SVN r12853. The following Trac tickets were found above: Ticket 678 --> https://svn.open-mpi.org/trac/ompi/ticket/678	2006-12-14 18:27:27 +00:00
Brian Barrett	38c2e43ac2	Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues... This commit was SVN r12852.	2006-12-14 18:20:43 +00:00
Jeff Squyres	0ca8cb35b7	Fixes trac:366 Add ability for ini files to recognize "use_eager_rdma" flag. Set the default to "no" (because we should assume that HCAs cannot support the property necessary for using RDMA for eager messages -- that the last byte of the message is guaranteed to be written to memory last -- unless proven otherwise. For example, iWARP cards apparently do not provide this guarantee), and then set all Mellanox and IBM HCAs to override the default to enable this behavior on these cards. This commit was SVN r12851. The following Trac tickets were found above: Ticket 366 --> https://svn.open-mpi.org/trac/ompi/ticket/366	2006-12-14 15:52:13 +00:00
Dan Lacher	e3f749acc4	Ticket: #673 Submitted by: Dan Lacher This commit was SVN r12844.	2006-12-13 20:01:16 +00:00
Ralph Castain	7b8f445e13	Modify the "--display-map-at-launch" option to just "--display-map". Now that we have a "--do-not-launch" option, the "-at-launch" part of the display-map option was confusing. "--display-map" displays the resulting process map before we launch anyway, so this is clearer. This commit was SVN r12840.	2006-12-13 13:49:15 +00:00
Ralph Castain	82946cb220	Add a new option to orterun: "--do-not-launch" directs the system to do the allocation, map, job setup, etc., but don't actually launch the job. This lets us test all the setup portions of the code. Also, take the first step in updating how we handle mca params in ORTE - bring it closer to how it is done in the other two layers. Much more work to be done here. This commit was SVN r12838.	2006-12-13 04:51:38 +00:00
George Bosilca	80bc0c8868	Allow the MX to survive if we are unable to connect to a peer. The PML will try to find another route. This commit was SVN r12837.	2006-12-13 01:12:07 +00:00
Mohamad Chaarawi	cae083dec6	replaced the old CID allocation algorithm with the blocked algorithm. The impace in the communicator directory is still not great since the interface for allocating a Cid has not changed.. This commit was SVN r12836.	2006-12-12 22:01:39 +00:00
Brian Barrett	10af8ab454	Corrections for when threading is enabled. Refs trac:564 This commit was SVN r12830. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-12 18:48:42 +00:00
Ralph Castain	3b064a624e	For convenience, revise the orte_job_map_t object so it includes the vpid start/range values, the number of nodes, and the number of processes on each node. These values are all used in various places in the code base - we currently re-compute them multiple times. Since these values do not change and are already being computed by the RMAPS framework, we might as well just save them for re-use. This commit was SVN r12829.	2006-12-12 16:07:23 +00:00
Brad Benton	337116d5fd	Added IBM eHCA vendor and part id info This commit was SVN r12827.	2006-12-12 14:12:39 +00:00
Ralph Castain	28ce8e5e5e	Extend the mpirun options to support "--npernode N". This option tells the system to spawn N procs/node across all nodes in the allocation. If N is greater than the number of allocated slots, then the usual oversubscription logic will apply (i.e., the system will error out if oversubscription is not allowed, otherwise it will run with the sched_yield set to non-aggressive behavior). In "--npernode" operation, the "-np" command line parameter is ignored. This commit was SVN r12826.	2006-12-12 00:54:05 +00:00
Brian Barrett	cf196ce420	Instead of an unknown proc list that requires ownership transfer of data (which, in turn, requires a complex series of locks to be held during the transfer), use a modex backing store with backpointers from the proc to the backing store. The proc structures no longer own the modex data, which greatly simplifies locking when an unknown proc suddenly becomes known. Refs trac:564 This commit was SVN r12822. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-11 21:27:30 +00:00
Ralph Castain	8314e8dbb9	Modify the pernode option so it can accept a request for the number of processes to be launched. We now check three use-cases for pernode: 1. no -np provided - put one proc/node across all allocated nodes 2. -np N provided, N > #nodes - we print a pretty error message and exit 3. -np N provided, N <= #nodes - put one proc/node across N nodes I also added a new orte constant (ORTE_ERR_SILENT) that allows us to pass up the chain that an error was encountered, but NOT print ORTE_ERROR_LOG messages. This is intended to be used for cases where the error we encounter is NOT an orte error, but rather is one associated with incorrect user input (e.g., the preceding case 2). In such cases, there is no point in printing an ORTE_ERROR_LOG chain of messages as it isn't an orte error. This commit was SVN r12821.	2006-12-11 18:07:07 +00:00
Ralph Castain	0a5d41857a	Complete next round of message size reduction: "strip" the descriptive info from the returned values. I have now added a flag to the gpr address mode (ORTE_GPR_STRIPPED) that instructs the gpr to not include segment names or tokens in the returned gpr_value_t objects. I found only two places that were looking at the tokens: 1. the odls - we used the tokens to separately process the globals container data from everything else. In this case, I left the subscription that returned the globals data alone, but "stripped" the subscription that returned the launch data for the procs. These subscriptions have nothing to do with the xcast message. 2. the pml_base_modex - the callback function was getting process names from the returned tokens. Actually, this function was doing a very bad thing - it was assuming that the first token returned was always the process name. This is currently true, but is one of those assumptions that someone could have easily changed - and suddenly found the system inexplicably failing. I modified the function to (a) get the name sent back to us, (b) "stripped" the value structures of tokens and segment strings, and (c) correctly obtained process names from the returned values. I also reindented the heck out of the code so it was legible (at least, to my old eyes). This commit was SVN r12813.	2006-12-09 23:10:25 +00:00
Jeff Squyres	e70ef98ea6	Update the help message to be a bit more specific and refer to the web FAQ. This commit was SVN r12812.	2006-12-09 15:13:03 +00:00
Jeff Squyres	c7282855e7	Fixes trac:659 This commit fixes several aspects regarding MPI conformance of requests. * Eliminate the last argument of ompi_errhandler_request_invoke(); we ''always'' want to invoke the back-end exception handler with the real error code. * Make it clear in comments that we only invoke the ''first'' exception in a given array of requests, even if there's more than one request with a non-MPI_SUCCESS value for MPI_ERROR. * Defer the freeing of requests upon exception in the back-end functions to MPI_WAIT* and MPI_TEST* until later; the requests are kept so that we know what handler to invoke when we actually invoke the exception. After figuring that out, ''then'' we free requests with pending exceptions on them. * Clean up return codes from the back-end MPI_TEST* and MPI_WAIT* functions. * Slightly modify ompi_errcode_get_mpi_code() to return unity if it receives an MPI error code (vs. an OMPI error code). This commit was SVN r12810. The following Trac tickets were found above: Ticket 659 --> https://svn.open-mpi.org/trac/ompi/ticket/659	2006-12-09 14:20:08 +00:00
Ralph Castain	58569546ed	Fix the fix to remove compiler warning - an incorrect "\" was placed in the command string. This commit was SVN r12805.	2006-12-08 04:17:38 +00:00
Jeff Squyres	568d20de37	Add Myricom to the aggregated list. This commit was SVN r12804.	2006-12-08 00:11:51 +00:00
Patrick Geoffray	58c6f8c8e1	Copyright update. Thanks to Jeff to remind me. This commit was SVN r12803.	2006-12-07 23:55:00 +00:00
Patrick Geoffray	6e09b0c23f	lval is not defined when pval is assigned on 32 bit systems. this usually is ok on little-endian systems, as the upper 32 bits will likely be ignored, but on 32-bit big-endian systems, lval is complete junk. Use ival if 32 bit mode, lval if 64. Mixing of 32 and 64 bit architectures won't work without more changes. This commit was SVN r12802.	2006-12-07 23:34:04 +00:00
Brian Barrett	98884e45e4	Clean up the way procs are added to the global process list after MPI_INIT: * Do not add new procs to the global list during modex callback or when sharing orte names during accept/connect. For modex, we cache the modex info for later, in case that proc ever does get added to the global proc list. For accept/connect orte name exchange between the roots, we only need the orte name, so no need to add a proc structure anyway. The procs will be added to the global process list during the proc exchange later in the wireup process * Rename proc_get_namebuf and proc_get_proclist to proc_pack and proc_unpack and extend them to include all information needed to build that proc struct on a remote node (which includes ORTE name, architecture, and hostname). Change unpack to call pml_add_procs for the entire list of new procs at once, rather than one at a time. * Remove ompi_proc_find_and_add from the public proc interface and make it a private function. This function would add a half-created proc to the global proc list, so making it harder to call is a good thing. This means that there's only two ways to add new procs into the global proc list at this time: During MPI_INIT via the call to ompi_proc_init, where my job is added to the list and via ompi_proc_unpack using a buffer from a packed proc list sent to us by someone else. Currently, this is enough to implement MPI semantics. We can extend the interface more if we like, but that may require HNP communication to get the remote proc information and I wanted to avoid that if at all possible. Refs trac:564 This commit was SVN r12798. The following Trac tickets were found above: Ticket 564 --> https://svn.open-mpi.org/trac/ompi/ticket/564	2006-12-07 19:56:54 +00:00
Brian Barrett	b07dfa7841	* remove unused variable in ompi_comm_get_rprocs * don't load data into a buffer until we have the data, as the data contains some header information needed to properly load the data This commit was SVN r12792.	2006-12-07 16:19:44 +00:00
Sven Stork	78173a697a	Replace the test opertion "-e" with "-r" to improve the protability. Refs: #392 This commit was SVN r12790.	2006-12-07 12:14:40 +00:00
Ralph Castain	62d7826e01	Helps if we total up the correct field to get the total number of slots in the universe This commit was SVN r12789.	2006-12-07 03:17:12 +00:00
Ralph Castain	a1153fdc8f	Eliminate virtually all of the attribute_predefined data from the STG1 message. We now compute the total number of slots allocated to us and save that in the registry - the attributed_predefined then retrieves it via the STG1 message. The app_num is passed via the process_info structure, which gets the value from the ODLS in the environment. Obviously, people like bproc will have to get the app_num via another avenue...but that's a problem for another day. Several options are easily available. This commit was SVN r12788.	2006-12-07 03:11:20 +00:00
Brian Barrett	41a70a8f01	indent, this time with the right coding standards... This commit was SVN r12787.	2006-12-07 00:24:01 +00:00
Brian Barrett	f9ec8d6f2a	reindent file to make it easier to deal with... This commit was SVN r12786.	2006-12-07 00:21:25 +00:00

1 2 3 4 5 ...

8804 Коммитов