openmpi

Автор	SHA1	Сообщение	Дата
Josh Hursey	6309047e63	pedantic cleanup This commit was SVN r10728.	2006-07-11 13:43:50 +00:00
Ralph Castain	ae222cca5b	Include the help file so it can be accessed This commit was SVN r10725.	2006-07-11 12:15:25 +00:00
George Bosilca	b3e5c658d2	Add the correct include file. This commit was SVN r10721.	2006-07-11 05:50:15 +00:00
George Bosilca	e2ebd1efcc	Protect header file. This commit was SVN r10720.	2006-07-11 05:38:12 +00:00
George Bosilca	ee6fab783d	SwitchToThread is not defined by any library. Not even by the kernel32.lib as noted in the MSDN documentation. At least not on my WinXP Pro box. This commit was SVN r10719.	2006-07-11 05:36:04 +00:00
George Bosilca	47eef2e002	Use Windows specific functions to parse the list of files in a directory. This commit was SVN r10718.	2006-07-11 05:28:31 +00:00
George Bosilca	a8a2a60cc5	Nothing releavant, only indentation. This commit was SVN r10717.	2006-07-11 05:27:17 +00:00
George Bosilca	523b6dcbe8	Protect the header files. Remove the directory using the OPAL function. This commit was SVN r10716.	2006-07-11 05:25:41 +00:00
George Bosilca	94f6cb3765	There is no SIG_USR1 and SIG_USR2 on windows. This commit was SVN r10715.	2006-07-11 05:24:08 +00:00
George Bosilca	3075225ea3	Remove the executable extension if any. This commit was SVN r10714.	2006-07-11 05:19:54 +00:00
Ralph Castain	6129a5a887	Enable -host support for "mpirun a.out". You can now execute on all slots on specified nodes within your overall allocation. This commit was SVN r10713.	2006-07-11 02:59:23 +00:00
George Bosilca	a9df5035f9	Remove unused variable. This commit was SVN r10712.	2006-07-11 00:30:51 +00:00
George Bosilca	14b3f141db	Nothing relevant !!! This commit was SVN r10711.	2006-07-11 00:30:26 +00:00
George Bosilca	5666bece6a	Don't update the base pointer when we finish unpacking a partial data. This commit was SVN r10710.	2006-07-11 00:22:58 +00:00
Galen Shipman	68ae99123d	fix bsend completion.. This commit was SVN r10709.	2006-07-10 22:27:32 +00:00
Ralph Castain	febc143d8c	Per LANL's stated need, add functionality that runs a.out across ALL available process slots if no num_proc is specified on the command line. However, please note the following limitation: we ONLY allow ONE application to be specified on the command line when this feature is invoked. If multiple apps are specified, the user MUST also specify the number to be launched for each and every one of them. Update the help text to report errors when not following that rule. Also updated the RMAPS help text to reflect the reorganization of some of the round-robin code into the base. The new functionality has been tested under Mac OS-X and on Odin using an MPI program. Both byslot and bynode mapping have been checked and verified. Operational support for other systems needs to be verified - I respectfully request people's help in doing so. This commit was SVN r10708.	2006-07-10 21:25:33 +00:00
Galen Shipman	9a1221bf7d	fix buffered sends (don't use blocking sends!) removed inaccurate comment.. This commit was SVN r10703.	2006-07-10 16:11:14 +00:00
Ralph Castain	3d220cbd48	This patch fixes several issues relating to comm_spawn and N1GE. In particular, it does the following: 1. Modifies the RAS framework so it correctly stores and retrieves the actual slots in use, not just those that were allocated. Although the RAS node structure had storage for the number of slots in use, it turned out that the base function for storing and retrieving that information ignored what was in the field and simply set it equal to the number of slots allocated. This has now been fixed. 2. Modified the RMAPS framework so it updates the registry with the actual number of slots used by the mapping. Note that daemons are still NOT counted in this process as daemons are NOT mapped at this time. This will be fixed in 2.0, but will not be addressed in 1.x. 3. Added a new MCA parameter "rmaps_base_no_oversubscribe" that tells the system not to oversubscribe nodes even if the underlying environment permits it. The default is to oversubscribe if needed and the underlying environment permits it. I'm sure someone may argue "why would a user do that?", but it turns out that (looking ahead to dynamic resource reservations) sometimes users won't know how many nodes or slots they've been given in advance - this just allows them to say "hey, I'd rather not run if I didn't get enough". 4. Reorganizes the RMAPS framework to more easily support multiple components. A lot of the logic in the round_robin mapper was very valuable to any component - this has been moved to the base so others can take advantage of it. 5. Added a new test program "hello_nodename" - just does "hello_world" but also prints out the name of the node it is on. 6. Made the orte_ras_node_t object a full ORTE data type so it can more easily be copied, packed, etc. This proved helpful for the RMAPS code reorganization and might be of use elsewhere too. This commit was SVN r10697.	2006-07-10 14:10:21 +00:00
Josh Hursey	c38c47a4f5	Fix some unreachable statements. Caught by a nightly build. This commit was SVN r10696.	2006-07-10 13:32:31 +00:00
Brian Barrett	83df0ab0a4	* s/LAM\/MPI/Open MPI/g This commit was SVN r10693.	2006-07-09 03:41:39 +00:00
Brian Barrett	41e144c879	Fix for ticket #92 , bproc stdin being borked. The problem was that we were using a pty for everything, which drops all buffered data on the floor when close() is called on the daemon side, meaning EOF has some issues. Instead, do the same thing we do for other starters that use the fork() pls -- use a pipe/fifo for stdin and stderr and a pty for stdout. This is good enough for what we need and avoids most of the issues with ptys. This commit was SVN r10692.	2006-07-08 21:18:24 +00:00
Andrew Friedley	b7e0484c37	Give up on dat_ep_query() and instead manually send our address information across the wire after connection establishment. I've introduced a race condition - seeing occasional LOCAL_LENGTH errors on the receive side. I think I'm mixing up eager/max somehow - will look at it more on monday. This commit was SVN r10690.	2006-07-07 21:48:16 +00:00
Josh Hursey	13f1f4d86e	fix a typo when checking the return code This commit was SVN r10686.	2006-07-06 20:49:09 +00:00
Galen Shipman	5085061475	don't call unpack when we received directly into the user buffer.. the convertor doesn't handle it properly continue peeking until we don't get anything else.. close the endpoint before closing the library.. add a blocking send that uses mx_test .. This commit was SVN r10684.	2006-07-06 19:54:13 +00:00
Ralph Castain	bc7690bcb0	Fix the bproc allocator. This is just a bandaid for 1.x that will be fixed more thoroughly in 2.0. Basically, the problem was that the allocator was grabbing everything on the cluster for which the user had access privilege. Thus, if a user had two sessions operable, each with its own allocation, mpirun in each session would grab both sets of nodes and use them. Not very polite. This commit was SVN r10683.	2006-07-06 18:31:14 +00:00
Brian Barrett	cba9b1e6b7	* the POrtals MTL is now stable enough to not have it ompi ignored This commit was SVN r10682.	2006-07-06 18:26:48 +00:00
Brian Barrett	58ce434292	* remove the broken, defunct portals PML. Not needed anymore, since we can do the same basic thing with the MTL design This commit was SVN r10681.	2006-07-06 18:24:08 +00:00
George Bosilca	476c9e64df	Don't keep multiples copies of the datatype and count. The only one we really need is the one provided by the user. For the buffered send the real datatype used for the communication is always MPI_BYTE and the count can be retrieved from the req_bytes_packed field. This will decrease the size of the request by one pointer and one size_t (8 bytes or 16 bytes depending on the architecture). This commit was SVN r10680.	2006-07-06 17:58:25 +00:00
Brian Barrett	b7b93e48f5	* can definitely be optimized more, but add code for calling send for MTL components that have a blocking send implementation This commit was SVN r10679.	2006-07-06 16:37:59 +00:00
Brian Barrett	ef6b7e170f	* make mtl datatype wrapper code inline functions This commit was SVN r10678.	2006-07-06 15:58:07 +00:00
Galen Shipman	2217fd4003	reset receive request convertor for persistent requests We can always call unpack.. This commit was SVN r10677.	2006-07-06 15:13:26 +00:00
Brian Barrett	ef8c6a249b	* Fix up some direct-calling issues for the PML/MTL This commit was SVN r10676.	2006-07-06 15:12:38 +00:00
Brian Barrett	95118f83f6	* complete all outstanding Portals events before shutting down * Remove all knowledge of PML requests from the Portals MTL This commit was SVN r10675.	2006-07-06 14:33:29 +00:00
Brian Barrett	26eee59032	* turns out that you should only call bsend_request_alloc or bsend_request_init, but not both. Otherwise, you don't free some buffer space and end up leaking buffers and ending in badness * since you only call alloc() or init(), but not both, need to restore reference counting in init() This commit was SVN r10674.	2006-07-06 14:02:51 +00:00
Jeff Squyres	3d5d0959fa	Remove unused variable, and therefore silence a compiler warning. This commit was SVN r10673.	2006-07-06 10:44:04 +00:00
Gleb Natapov	e05ec69dc4	print "flush error" only once. This commit was SVN r10672.	2006-07-06 08:03:01 +00:00
Gleb Natapov	9b0807e547	Put pending fragment on the right waiting list. This commit was SVN r10671.	2006-07-06 07:51:23 +00:00
George Bosilca	01a59d68da	Do not generate the XFER_BEGIN and XFER_END events if the length of the data is zero, for both the receives and the sends. This commit was SVN r10670.	2006-07-05 23:39:13 +00:00
Brian Barrett	c793ad0a3d	unpack the amount received, not the amount we had space to receive. This commit was SVN r10669.	2006-07-05 22:31:29 +00:00
Galen Shipman	c933c0f65f	unpack the length actually received, not the length posted.. This commit was SVN r10668.	2006-07-05 22:16:46 +00:00
Brian Barrett	3e29949cc8	* Fix shutdown code in utcp portals code * make all sends long sends for now in Portals MTL * More optimized match check This commit was SVN r10667.	2006-07-05 21:46:45 +00:00
Josh Hursey	b1da6f8bc4	A bit more cleanup for that last patch. * num_children should really be an int instead of size_t since 'size_t' is not signed and num_children can (in rare cases) drop below 0, and don't want it to roll around to MAX_INT or some such. * I figured out that this problem only happened to me because I use the pls_fork_reap_timeout MCA parameter and thus the only time that the code in pls_fork_module.c to waitpid is executed is if this is not set to 0 (I had it set to 1 to give my procs time to exit). I adjusted the loop from while{...} to do{...}while; so that it is executed at least once for consistency. * de-register the SIGCHILD callback for the pid before we attempt to kill it, so that we don't leave the door open for both the waitpids (the one in the callback, and the one in this function) to race to see who can wait on the child. * Move the 'thread release' to outside the for loop for a bit of an optimization, and always set the value to 0 since we want to finish after this function. * Added a help message for the case when we can't send a kill() signal to the process. Should never happen, but all is possible in the wild wild west of HPC. This commit was SVN r10666.	2006-07-05 21:38:23 +00:00
Galen Shipman	fe480cd003	change mask bits and don't call convertor if we received directly into the user buffer.. This commit was SVN r10665.	2006-07-05 21:10:09 +00:00
Jeff Squyres	429c25095e	Fix for bug #176 . * Fix for two problems introduced by r10661: 1. ensure to use the key ''after'' it is initialized (sigh). 1. handle the case where we free the attrkey before it is fully initialized (i.e., some other error causes us to free it). In this case, don't try to remove the key from the hash map, because it won't exist. * More accurate zeroing in the keyval constructor (ompi_attrkey_item_constructor) * Widen the scope of the alock such that the attrkey destructor does not need to acquire it. Instead, assume that the caller already has it. * Add a comment about why the keyval may get destroyed as the result of deleting an attribute (so that I don't have to figure it out again the next time I read this code :-) ) This commit was SVN r10664. The following SVN revision numbers were found above: r10661 --> open-mpi/ompi@fdba2c9df0	2006-07-05 20:23:08 +00:00
George Bosilca	6265625983	Generate the XFER_CONTINUE PERUSE event (or the receive) before unpacking the data. This commit was SVN r10663.	2006-07-05 19:45:00 +00:00
Josh Hursey	696bb4a0c0	A partial fix for the hanging orted bugs (Ticket #177 ) When we force an application to terminate (via CTRL-C to mpirun) we send an out-of-band message to the orted to reap its children. the fork PLS was doing an internal waitpid but never releasing or updating the information and signaling the condition variable. So the fork PLS callback for SIGCHLD registered with the event library and this waitpid are in a bit of a race to 'waitpid' for the children. Since the PLS callback was the only one that handled the signal properly when it 'won' then things were great -- as in the normal termination case. But when it 'lost' -- as in the abnormal termination case -- the orted never received the proper signal that its children had gone away. We want to preserve the internal fork PLS callback since it allows for a timeout while waiting for the child, which the event library won't do. This allows both to exist, and behave properly. This was introduced in r9068. The ticket is still open since the orted's hang in other situations still. This is a fix for one of the causes. This commit was SVN r10662. The following SVN revision numbers were found above: r9068 --> open-mpi/ompi@c2c2daa966	2006-07-05 19:37:29 +00:00
Jeff Squyres	fdba2c9df0	Per the analysis in bug #184 , move some assignments around to effect thread safety. This is likely to be only the first of multiple steps for complete thread safety in the MPI attribute code. All tests [continue to] pass the intel and ibm attribute tests. Also renamed a variable from "attr" to "attrkey" to reflect that it's a keyval, not an attribute. This commit was SVN r10661.	2006-07-05 17:37:17 +00:00
Brian Barrett	4ee4acb6a6	* ignore some Cray-only code when not on the Cray machine This commit was SVN r10660.	2006-07-05 17:16:27 +00:00
Brian Barrett	043153dad3	* fix opal_list_item_t -> ompi_free_list_item_t type change This commit was SVN r10659.	2006-07-05 17:02:16 +00:00
Rainer Keller	23d3628691	- Declare and initialize the peruse_handle_list_lock This commit was SVN r10656.	2006-07-05 13:48:25 +00:00

1 2 3 4 5 ...

7747 Коммитов