openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	0ffbc5506a	Add bmi into the SUBDIRS list. This commit was SVN r5928.	2005-06-02 20:26:38 +00:00
Tim Woodall	878490a405	return resources This commit was SVN r5927.	2005-06-02 19:51:51 +00:00
Tim Woodall	5a00e53ab1	init lock state This commit was SVN r5926.	2005-06-02 19:36:35 +00:00
Tim Woodall	e406a4b1aa	corrections for stdin - stdin shouldn't be using same pty as stdout - corrects duplicate output This commit was SVN r5925.	2005-06-02 19:17:32 +00:00
Tim Woodall	78d713b70a	misc fixes This commit was SVN r5924.	2005-06-02 17:42:53 +00:00
Tim Woodall	a5d13104f9	acks should now be handled correctly for stdin This commit was SVN r5923.	2005-06-02 17:06:33 +00:00
Tim Woodall	5728748980	handle local acks correctly This commit was SVN r5922.	2005-06-02 17:05:52 +00:00
Tim Woodall	e21ffa48e2	bug fix - generate a completion callback for local sends This commit was SVN r5921.	2005-06-02 17:03:08 +00:00
Brian Barrett	c073999c7f	* fix XGrid to match API change This commit was SVN r5917.	2005-06-02 01:43:05 +00:00
Brian Barrett	42f8ea9389	* when bringing over from branch, forgot to fix up for the local fork() setup. Now uses fork() instead of ssh if the target nodename is the same as the current nodename (which will happen if the user gave "localhost" or just the hostname without the domain) or if the target nodename is local according to ompi_ifislocal() (which will happen if the user gave a FQDN) This commit was SVN r5916.	2005-06-01 21:45:26 +00:00
Tim Woodall	e7332d0521	cleanup - support for sender side scheduling (non-rdma case) This commit was SVN r5915.	2005-06-01 21:09:43 +00:00
Brian Barrett	8b411a10be	- revert to resolving "localhost" to the contents of orte_system_info.nodename so that cleanup and the like occur correctly. Otherwise, the daemon on localhost and an MPI process can have different ideas on what the local nodename is, and that lead to all kinds of badness with both process killing and cleanup. Also fixes the annoying ssh keys problem when sshing to localhost. - modify the rsh pls to ssh to localhost if the target nodename is the same as orte_system_info.nodename AND is not resolvable (ie, ssh to would fail). Otherwise, ssh to nodename. This should work around the issues Ralph was seeing with ssh failing on his laptop (since the above change undid the previous fix to this problem). - Small change to ompi_ifislocal() to squelch a warning message about unresolvable hostnames when checking to see if a name is, in fact, resolvable. - Force ORTE process to have same nodename field as it's starting daemon (assuming it was started using the fork pls), so that the fork pls can properly kill the process, and cleanup its session directory on abnormal exit. This commit was SVN r5914.	2005-06-01 19:30:05 +00:00
Brian Barrett	465b54a3f0	* start of support for stdin forwarding. stdin is now forwarded to vpid 0 of the started job (which should be rank 0 of the started MPI job). Still some issues for Tim / Ralph to work out (below). Only works from MPI_Init onward. Remaining issues: - Need to move the orte_rmgr_urm_wireup_stdin() call from STG1 to when everyone sets LAUNCHED state. Tim/Ralph are going to look at adding this code - stdin frags are not properly acked, leading to some shutdown workarounds. Tim is going to look at this one. - Probably somehow related to the 2nd point, stdin text appears to be echoed by the IOF framework This commit was SVN r5913.	2005-06-01 19:23:23 +00:00
Galen Shipman	aaa236052d	changed function signitures to match the changes in mpool This commit was SVN r5911.	2005-06-01 15:25:17 +00:00
Tim Woodall	4ce8f91b6a	updates to bmi and pml I/F This commit was SVN r5910.	2005-06-01 14:34:22 +00:00
Jeff Squyres	19b4479a0a	Patch for systems with broken Fortran compilers (e.g., OS X Tiger [10.4] with gfortran 4.0) who need to be able to add flags to compile simple Fortran executables that use libc routines. Notably, for Tiger with gfortran 4.0 installed, you'll need to: ./configure F77=gfortran FC=gfortran LIBS=-lSystemStubs This commit was SVN r5909.	2005-06-01 10:53:44 +00:00
Jeff Squyres	346921e9e7	Add Makefile.am (and related support) for dynamic builds of bmi components. This commit was SVN r5908.	2005-06-01 09:31:08 +00:00
Tim Woodall	3e07a64ade	dont allow synchronous request to complete until ack is received This commit was SVN r5907.	2005-05-31 21:56:43 +00:00
Galen Shipman	2b2b8fa283	fixed mpool_base calls to include the mpool module. This commit was SVN r5905.	2005-05-31 20:34:03 +00:00
George Bosilca	a285ecce5e	PID's should be of type pid_t and should use the GPR union member "pid", not "size". This commit was SVN r5904.	2005-05-31 19:25:42 +00:00
Tim Prins	75b0b519d8	- Added functionality to MPI_Alloc_mem and MPI_Free_mem so that they call the memory pool to do special memory allocations, and extended the mpool so that it will do the allocations and keep tack of them in a tree. Currently, if you pass MPI_INFO_NULL to MPI_Alloc_mem, we will try to allocate the memory and register it with as many mpools as possible. Alternatively, one can pass an info object with the names of the mpools as keys, and from these we decide which mpools to register the new memory with. - fixed some comments in the allocator and fixed a minor bug - extended the red black tree test and made a minor correction This commit was SVN r5902.	2005-05-31 19:07:27 +00:00
Galen Shipman	459be82daa	Removed generated file .. This commit was SVN r5900.	2005-05-31 17:45:13 +00:00
George Bosilca	dba4d91d96	strcmp is defined on string.h on Linux so we have to include it. This commit was SVN r5898.	2005-05-31 17:41:37 +00:00
Galen Shipman	4c208f7964	Common source files used by mpool and bmi This commit was SVN r5897.	2005-05-31 17:09:55 +00:00
Galen Shipman	5ccaaf55e2	Initial checkin of VAPI allocator. This commit was SVN r5896.	2005-05-31 17:08:41 +00:00
Galen Shipman	f16f9703a5	Modified the mpool and allocator to allow user defined data to be passed in and out of the mpool allocate functions, this is necessary if we use the mpool to allocate IB registered memory as need to pass in the hca handle and pass out the memory region handle. This commit was SVN r5895.	2005-05-31 17:06:55 +00:00
Jeff Squyres	c80f54052e	(copied from a mail that has a lengthy description of this commit) I spoke with Tim about this the other day -- he gave me the green light to go ahead with this, but it turned into a bigger job than I thought it would be. I revamped how the default RAS scheduling and round_robin RMAPS mapping occurs. The previous algorithms were pretty brain dead, and ignored the "slots" and "max_slots" tokens in hostfiles. I considered this a big enough problem to fix it for the beta (because there is currently no way to control where processes are launched on SMPs). There's still some more bells and whistles that I'd like to implement, but there's no hurry, and they can go on the trunk at any time. My patches below are for what I considered "essential", and do the following: - honor the "slots" and "max-slots" tokens in the hostfile (and all their synonyms), meaning that we allocate/map until we fill slots, and if there are still more processes to allocate/map, we keep going until we fill max-slots (i.e., only oversubscribe a node if we have to). - offer two different algorithms, currently supported by two new options to orterun. Remember that there are two parts here -- slot allocation and process mapping. Slot allocation controls how many processes we'll be running on a node. After that decision has been made, process mapping effectively controls where the ranks of MPI_COMM_WORLD (MCW) are placed. Some of the examples given below don't make sense unless you remember that there is a difference between the two (which makes total sense, but you have to think about it in terms of both things): 1. "-bynode": allocates/maps one process per node in a round-robin fashion until all slots on the node are taken. If we still have more processes after all slots are taken, then keep going until all max-slots are taken. Examples: - The hostfile: eddie slots=2 max-slots=4 vogon slots=4 max-slots=8 - orterun -bynode -np 6 -hostfile hostfile a.out eddie: MCW ranks 0, 2 vogon: MCW ranks 1, 3, 4, 5 - orterun -bynode -np 8 -hostfile hostfile a.out eddie: MCW ranks 0, 2, 4 vogon: MCW ranks 1, 3, 5, 6, 7 -> the algorithm oversubscribes all nodes "equally" (until each node's max_slots is hit, of course) - orterun -bynode -np 12 -hostfile hostfile a.out eddie: MCW ranks 0, 2, 4, 6 vogon: MCW ranks 1, 3, 5, 7, 8, 9, 10, 11 2. "-byslot" (this is the default if you don't specify -bynode): greedily takes all available slots on a node for a job before moving on to the next node. If we still have processes to allocate/schedule, then oversubscribe all nodes equally (i.e., go round robin on all nodes until each node's max_slots is hit). Examples: - The hostfile eddie slots=2 max-slots=4 vogon slots=4 max-slots=8 - orterun -np 6 -hostfile hostfile a.out eddie: MCW ranks 0, 1 vogon: MCW ranks 2, 3, 4, 5 - orterun -np 8 -hostfile hostfile a.out eddie: MCW ranks 0, 1, 2 vogon: MCW ranks 3, 4, 5, 6, 7 -> the algorithm oversubscribes all nodes "equally" (until max_slots is hit) - orterun -np 12 -hostfile hostfile a.out eddie: MCW ranks 0, 1, 2, 3 vogon: MCW ranks 4, 5, 6, 7, 8, 9, 10, 11 The above examples are fairly contrived, and it's not clear from them that you can get different allocation answers in all cases (the mapping differences are obvious). Consider the following allocation example: - The hostfile eddie count=4 vogon count=4 earth count=4 deep-thought count=4 - orterun -np 8 -hostfile hostfile a.out eddie: 4 slots will be allocated vogon: 4 slots will be allocated earth: no slots allocated deep-thought: no slots allocated - orterun -bynode -np 8 -hostfile hostfile a.out eddie: 2 slots will be allocated vogon: 2 slots will be allocated earth: 2 slots will be allocated deep-thought: 2 slots will be allocated This commit was SVN r5894.	2005-05-31 16:36:53 +00:00
Jeff Squyres	497580441d	Per MPI-2:3.1, MPI_GET_VERSION can be called before MPI_INIT, so remove the MPI_ERR_INIT_FINALIZE() macro. Also check to see how we invoke the errhandler if an error occurs (i.e., the action depends on whether we're between MPI_INIT and MPI_FINALIZE or not). This commit was SVN r5891.	2005-05-31 16:30:34 +00:00
Jeff Squyres	9c2554d8ce	Since we allow the following keys: cpu count slots We should allow max versions of all of those, rather than just slots-max (and its variations). This commit was SVN r5889.	2005-05-27 17:28:30 +00:00
Jeff Squyres	843cd2dbac	More work with Ralph -- now we think we have it right. Here's the additions from his previous commit: - Properly propagate error upwards if we have a losthost+other_node error - Added logic to handle multiple instances of the same hostname - Added logic to properly increment the slot count for multiple instances. For example, a hostfile with: foo.example.com foo.example.com slots=4 foo.example.com slots=8 would result in a single host with a slot count of 13 (i.e., if no slot count is specified, 1 is assumed) - Revised the localhost logic a bit -- some cases are ok (e.g., specifying localhost multiple times is ok, as long as there are no other hosts) This commit was SVN r5886.	2005-05-26 21:44:45 +00:00
George Bosilca	fa8889bafa	little buggy thing ... hunted for hours ... The problem was that the displacement was increased even when the current memcpy completly succeed. It not a problem for most of the cases ... except when we completly finish a data. This commit was SVN r5885.	2005-05-26 21:44:24 +00:00
George Bosilca	0fbf302080	More output for debug. This commit was SVN r5883.	2005-05-26 21:42:15 +00:00
George Bosilca	cd84c1cb10	Replace TAB with spaces. This commit was SVN r5882.	2005-05-26 21:41:51 +00:00
Ralph Castain	cff4dcebc1	Short explanation: fix how we handle the "localhost" entry in the hostfile so that the Mac (and other multi-NIC systems) will work. Long explanation: Jeff and I spent some time chasing this down today (mostly Jeff), and found that the Mac was having problems with the replacement of "localhost" with the local nodename when we read the hostfile. Jeff then found that the Linux documentation specifically warns about the vaguery of the value returned for "nodename" (see the man page for uname for details). Sooo....when we replaced "localhost" with the local "nodename", the system couldn't figure out what node we were referring to when we tried to launch. Solution (borrowed from LAM): if the user includes "localhost" in the hostfile, then we do NOT allow any other entries in the hostfile - the presence of another entry will generate an error message and cause mpirun to gracefully exit. Obviously, then, if "localhost" is specified in the hostfile, then we are running the application locally. This commit was SVN r5881.	2005-05-26 19:43:21 +00:00
George Bosilca	08cff446f2	Few improvements: - creating the stack work now even for contiguous data (with gaps around) and independing on the fragment size. - add a TYPE argument to the PUSH_STACK macro. It's too obscure to explain it here :) - in dt_add we avoid surrounding a datatype with loops if we can handle it by increasing the count of the datatype (only if the datatype contain one type element and if the extent match). But it's enough to speed up a lot the packing/unpacking of all composed predefined datatypes (line MPI_COMPLEX and co.). - in dt_module.c improve the handling of the flags for all composed predefined datatypes. There is still something to do for the Fortran datatypes but it will be on the next commit. This commit was SVN r5879.	2005-05-26 17:32:18 +00:00
Jeff Squyres	6781900f98	(copied from an e-mail, just so that I don't have to re-type the entire explanation ;-) ) Our Abaqus friends just pointed out another bug to me. We have the "-x" option to orterun to export environment variables to newly-started processes. However, it doesn't work if the environment variable is already set in the target environment. For example: mpirun -x LD_LIBRARY_PATH -np 2 a.out The app context correctly contains LD_LIBRARY_PATH and its value, and that app context correctly propagates out to the orted and is present when we fork/exec a.out. However, if LD_LIBRARY_PATH is already set in the newly-started process' environment, the fork pls won't override it with the value from the app context. It really only has to do with the ordering of arguments in ompi_environ_merge() -- when merging to env arrays together, we "prefer" one set to the other if there are duplicate names. I think that if the user wants to override variables (even variables like LD_LIBRARY_PATH), we should let them -- it's necessary for some applications (like in Abaqus' case). If they screw it up, it's their fault (e.g., setting some LD_LIBRARY_PATH that won't work). That being said, we should not allow them to override specific MCA parameters that are necessary for startup -- that's easy to accomplish by setting up that stuff after we merge in the context app environment. Also note that I am only speaking about the fork pls here -- so this only applies to started ORTE job processes, not the orted. So an easy re-order to do the following: env_copy = merge(environ and context->app) ompi_setenv(...MCA params necessary for startup..., env_copy) execve(..., env_copy) does what we want. This commit was SVN r5878.	2005-05-26 15:57:48 +00:00
Jeff Squyres	0fb6121bfd	After yet another round of discussions about why these classes are split between OMPI and ORTE, added a lengthy comment to ompi_bitmap.h explaining the reason why (and how it would be fine to re-merge them -- if someone has the time) and references to it from all the other relevant .h files. This commit was SVN r5876.	2005-05-26 13:12:11 +00:00
Jeff Squyres	84e70e279c	Remove bad free (doxy docs say that freeing the result of ompi_cmd_line_get_param() is a Bad Thing) that causes seg faults. This commit was SVN r5873.	2005-05-26 02:44:09 +00:00
George Bosilca	5789cbe9b9	A much better version, which stop in the right place even if the data representation contain several loops. The displacement if correctly computed now. This commit was SVN r5872.	2005-05-25 22:33:44 +00:00
George Bosilca	59961a0adf	Correctly set the flags for predefined composed datatypes. This commit was SVN r5871.	2005-05-25 22:32:16 +00:00
Jeff Squyres	f16e86ec81	Fix logic mistake with the OMPI_WANT_CXX_BINDINGS macro -- it's not enough that it's defined; it must also be true before we include the C++ bindings header files. This commit was SVN r5868.	2005-05-25 18:29:00 +00:00
Ralph Castain	80ba76e486	Checkpoint This commit was SVN r5866.	2005-05-25 16:29:01 +00:00
Ralph Castain	05a1982853	Fix a bug in the new gpr match check function that caused some containers to be mismatched. Add the logic to properly assign new cellid's to hosts read in by the hostfile component. However, don't turn it on yet. It seems that the code base has (unfortunately) assumed that cellid is always zero. When I turn on the cellid capability, the system "hangs" whenever the cellid is non-zero. I'll have to chase that problem down. For now, I've turned "off" the cellid assignment in the hostfile component. This commit was SVN r5865.	2005-05-25 16:23:13 +00:00
Galen Shipman	ddc19805ab	Initial commit of ib bmi This commit was SVN r5864.	2005-05-25 15:27:33 +00:00
Jeff Squyres	92f34e848c	Fix the OMPI-specific ROMIO routine that sets the count element in an MPI_Status. This commit was SVN r5862.	2005-05-25 09:09:14 +00:00
George Bosilca	5029060d85	More information on debug mode, and one less compiler warning. This commit was SVN r5859.	2005-05-24 20:30:48 +00:00
George Bosilca	db7e06380d	This was supposed to be a debug message. This commit was SVN r5855.	2005-05-24 20:20:17 +00:00
George Bosilca	a29529c036	Correctly create the data for the get_content and get_envelope functions. This commit was SVN r5854.	2005-05-24 20:15:44 +00:00
George Bosilca	4925edb374	Do not believe the upper level about the total size to unpack. Check to insure we respect the boundaries of the data as see by the user. This commit was SVN r5853.	2005-05-24 19:54:46 +00:00
George Bosilca	15c3baaf3b	Always set the bConverted to zero when we attach a new datatype. This commit was SVN r5851.	2005-05-24 19:30:25 +00:00

... 2 3 4 5 6 ...

4708 Коммитов