openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	02f6e6ab3e	Slight touchup to make it pretty This commit was SVN r14734.	2007-05-23 16:39:18 +00:00
Ralph Castain	3fc227286f	Be sure to NULL terminate the list of keys... This commit was SVN r14733.	2007-05-23 16:35:03 +00:00
Ralph Castain	cffea274b3	Fix the hostfile system. Enable the hostfile RDS to properly setup the name service. Modify the name service so that a caller can provide a valid cellid and update its site info for later retrieval. This commit was SVN r14732.	2007-05-23 16:31:44 +00:00
Sven Stork	ed72cbcaec	- Fix a deadlock problem for threaded builds. We have to release the lock before we wait for the callback, because the callback will try to lock the lock again. (show up in debug+threaded build) This commit was SVN r14731.	2007-05-23 16:11:50 +00:00
Sven Stork	fc932f1fb4	- changes to get the tests running with visibility enabled This commit was SVN r14730.	2007-05-23 15:02:36 +00:00
Josh Hursey	a010ff6e6a	Some updates from the interface change to orte_init This commit was SVN r14729.	2007-05-23 14:44:23 +00:00
Ralph Castain	e6ff7757ab	Modify the new DSS xfer and copy functions so they only xfer/copy the unpacked portion of a buffer's payload. This allows for more rapid transfer of data during message relay without requiring any knowledge of what is in the buffer. Begin work on restoring binomial message distribution method. This commit was SVN r14728.	2007-05-23 14:06:32 +00:00
George Bosilca	b2e805db61	Nothing relevant. Indentation, typos, change PTL to BTL. This commit was SVN r14727.	2007-05-23 14:03:52 +00:00
George Bosilca	50b26ebb6a	Allow the ompi_ddt_init and ompi_ddt_finalize to be visible even when the visibility feature is on. This commit was SVN r14726.	2007-05-23 14:02:08 +00:00
Sven Stork	88f0845c44	- let the pt2pt component compile with threads enabled This commit was SVN r14725.	2007-05-23 12:56:34 +00:00
Adrian Knoth	b2219710d4	Cosmetics. Changed function name in error output. This commit was SVN r14724.	2007-05-23 10:05:18 +00:00
Adrian Knoth	52f3540a9c	As of r14652, use autoconf 2.60 and automake 1.10 This commit was SVN r14723. The following SVN revision numbers were found above: r14652 --> open-mpi/ompi@21e00f6f0c	2007-05-23 10:02:43 +00:00
Ralph Castain	b582d98d4a	Fix singleton comm_spawn so it can see available resources This commit was SVN r14719.	2007-05-22 13:29:07 +00:00
Ralph Castain	5b0abf520b	Don't update our own contact info This commit was SVN r14718.	2007-05-22 13:28:23 +00:00
Brian Barrett	38eab3613b	* Fix race condition with the pending_{in,out} variables -- if we're going to do while(...) { } then we can't change the variables in the ... atomically, but should do it while holding the module lock. * Fix dumb communicator creation error when we don't create the progress stuff (because a window already exists), where we would accidently jump to the error case. This commit was SVN r14715.	2007-05-21 20:53:02 +00:00
Ralph Castain	677eb5e4bc	Ensure the singleton wakes up when comm_spawn fails This commit was SVN r14714.	2007-05-21 20:13:31 +00:00
Ralph Castain	b771cfcce3	Fix compile problem This commit was SVN r14713.	2007-05-21 20:11:03 +00:00
Ralph Castain	4fff584a68	Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that did start. The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system. Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed. Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief. With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn. Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put". This commit was SVN r14711.	2007-05-21 18:31:28 +00:00
Ralph Castain	c9c2922706	Drat - commit missing file. This commit was SVN r14710.	2007-05-21 17:20:56 +00:00
Brian Barrett	0e9e0c518a	Fix a couple more progress thread related issues... This commit was SVN r14708.	2007-05-21 16:06:14 +00:00
Ralph Castain	3288ce0462	Cleanup the SDS components and move some common code to the base. Modify the seed and singleton SDS code so it returns the right local rank and num_local_procs. This commit was SVN r14707.	2007-05-21 14:57:58 +00:00
Ralph Castain	d9acc93efa	Compute and pass the local_rank and local number of procs (in that proc's job) on the node. To be precise, given this hypothetical launching pattern: host1: vpids 0, 2, 4, 6 host2: vpids 1, 3, 5, 7 The local_rank for these procs would be: host1: vpids 0->local_rank 0, v2->lr1, v4->lr2, v6->lr3 host2: vpids 1->local_rank 0, v3->lr1, v5->lr2, v7->lr3 and the number of local procs on each node would be four. If vpid=0 then does a comm_spawn of one process on host1, the values of the parent job would remain unchanged. The local_rank of the child process would be 0 and its num_local_procs would be 1 since it is in a separate jobid. I have verified this functionality for the rsh case - need to verify that slurm and other cases also get the right values. Some consolidation of common code is probably going to occur in the SDS components to make this simpler and more maintainable in the future. This commit was SVN r14706.	2007-05-21 14:30:10 +00:00
Pavel Shamis	5ceaa605d7	Adding new vendor_part_id for Mellanox Hermon HCA This commit was SVN r14705.	2007-05-21 13:33:54 +00:00
Brian Barrett	1191677b76	Fix dumb threads-related compile issues This commit was SVN r14704.	2007-05-21 03:23:58 +00:00
Brian Barrett	2b4b754925	Some much needed cleanup of the point-to-point one-sided component... * Combine polling of the long requests and buffer requests into one type, and in one place * Associate the list of requests to poll with the component, not the individual modules * add progress thread that sits on the OMPI request structure and wakes up at the appropriate time to poll the message list. Not the best, but without some asynch notification from the PML that a given set of requests has completed, there isn't much better * Instead of calling opal_progress() all over the place, move to using the condition variables like the rest of the project. Has the advantage of moving it slightly futher along in the becoming thread safe thing * Fix a problem with the passive side of unlock where it could go recursive and cause all kinds of problems, especially when progress threads are used. Instead, have two parts of passive unlock -- one to start the unlock, and another to complete the lock and send the ack back. The data moving code trips the second at the right time. This commit was SVN r14703.	2007-05-21 02:21:25 +00:00
Brian Barrett	77a80b2612	sockaddr_storage check needs sys/socket.h This commit was SVN r14702.	2007-05-20 21:54:43 +00:00
Jeff Squyres	97248d6bc6	Add another test to check multiple, concurrent COMM_SPAWN's. This commit was SVN r14701.	2007-05-19 19:02:24 +00:00
Jeff Squyres	47ba3db3b8	Add a simple MPI_COMM_SPAWN_MULTIPLE test. This commit was SVN r14700.	2007-05-19 02:30:53 +00:00
Ralph Castain	fa5a40070d	Test the return status code from comm_dyn_start_processes - if we see an error, then let's report it and not continue on with the comm_spawn procedure! This commit was SVN r14699.	2007-05-18 20:22:32 +00:00
Ralph Castain	180c96bb8f	Clear an erroneous error message pending a more complete fix This commit was SVN r14698.	2007-05-18 14:44:27 +00:00
Tim Mattox	a65d1d6acc	Add the first NEWS entry for 1.2.3 This commit was SVN r14696.	2007-05-18 14:34:11 +00:00
Ralph Castain	6ef39ef525	Extend fix in r14693 to the comm_spawn case This commit was SVN r14694. The following SVN revision numbers were found above: r14693 --> open-mpi/ompi@75d51812a3	2007-05-18 13:34:26 +00:00
Ralph Castain	75d51812a3	Fix the app-failed-to-start capability that was broken by r14554 (holding the caller in rmgr.spawn until the application - as opposed to just the orteds - have started). Allow the rmgr.spawn function to return if the app terminates, correctly handling its return status code to show abnormal termination. Modify orterun to correctly handle the returned status code so it doesn't enter a conditioned wait if the app fails to start since it will never wakeup if it does. This commit was SVN r14693. The following SVN revision numbers were found above: r14554 --> open-mpi/ompi@4510b42638	2007-05-18 13:29:11 +00:00
Donald Kerr	23280bd7da	remove an assignment which is not required This commit was SVN r14692.	2007-05-18 01:33:02 +00:00
Donald Kerr	588d5bd6a9	clean up compile warnings This commit was SVN r14691.	2007-05-17 23:37:47 +00:00
George Bosilca	7738079ab9	Remove unused variable. This commit was SVN r14689.	2007-05-17 20:01:30 +00:00
Brian Barrett	6b4264f178	* forgot to fix the BSD and Sun cases for finding IPv6 addresses This commit was SVN r14688.	2007-05-17 16:21:44 +00:00
Gleb Natapov	b2c8fcdbab	Forget to add file in r14681. This commit was SVN r14682. The following SVN revision numbers were found above: r14681 --> open-mpi/ompi@3ebaff8dfe	2007-05-17 08:41:01 +00:00
Gleb Natapov	3ebaff8dfe	Implement new BTL parameters: We eagerly send data up to btl__eager_limit with the match Upon ACK of the MATCH we start using send/receives of size btl__max_send_size up to the btl__rdma_pipeline_offset After the btl__rdma_pipeline_offset we begin using RDMA writes of size btl__rdma_pipeline_frag_size. Now, on a per message basis we only use the above protocol if the message is larger than btl__min_rdma_pipeline_size btl__eager_limit - > same btl__max_send_size -> same btl__rdma_pipeline_offset -> btl__min_rdma_size btl__rdma_pipeline_frag_size -> btl__max_rdma_size btl_*_min_rdma_pipeline_size is new.. This patch also moves all BTL common parameters initialisation into btl_base_mca.c file. This commit was SVN r14681.	2007-05-17 07:54:27 +00:00
Brian Barrett	33a5758521	Some IPv6 improvements: * Move ipv6comat.h code into opal_config_bottom.h and change into some more intelligent testing of structures * Change opal's if interface to use sockaddr instead of sockaddr_storage, as the RFCs suggest we do * Move the networking code in opal that isn't directly related to if detection into net.h * Add quicky function to get the port out of either a sockaddr_in or sockaddr_in6, saving a bunch of code in the oob. * Update TCP oob and btl with new interface This commit was SVN r14679.	2007-05-17 01:17:59 +00:00
Donald Kerr	c40307fd27	add user warning message to inform when udapl btl is no longer able to register memory This commit was SVN r14678.	2007-05-16 21:04:50 +00:00
Jeff Squyres	af7f56c179	Really. We can spell his name right. Really. :-) This commit was SVN r14677.	2007-05-16 20:30:06 +00:00
Brian Barrett	7708c4f887	Don't complain about unsupported protocols. Needs to be made better, but this will quit the whining from platforms where the kernel doesn't have IPv6 support. This commit was SVN r14676.	2007-05-16 20:11:47 +00:00
Galen Shipman	542937ee2f	ompi running on bproc again. This commit was SVN r14675.	2007-05-16 19:55:43 +00:00
Sven Stork	22af6d38e6	- UNexport symbols that shouldn't be needed outside the libraries - replace #if/#endif with BEGIN/END_C_DECLS - reformating This commit was SVN r14669.	2007-05-16 15:46:52 +00:00
Sven Stork	bd29eb9bd1	- backout commit r14667, because internal functionality shouldn't be exported. NOTE: if visibility is enabled "make check" will fail This commit was SVN r14668. The following SVN revision numbers were found above: r14667 --> open-mpi/ompi@1f526a95e9	2007-05-16 15:43:44 +00:00
Sven Stork	1f526a95e9	- we need to export this internal symbols because the tests in test/memory need them. This commit was SVN r14667.	2007-05-16 15:14:31 +00:00
Sven Stork	a97e65e7f7	- export mmap/munmap so the memory manager can intercept this calls This commit was SVN r14665.	2007-05-16 13:40:33 +00:00
Gleb Natapov	61e889a1d9	Fix breakage of GM by r13921. On receive GM provides only buffer pointer without any context so we need to save a context somewhere so it can be retrieved given only buffer pointer. This patch saves context (pointer to frag) just before start of a buffer so it can be be easily retrieved. This commit was SVN r14664. The following SVN revision numbers were found above: r13921 --> open-mpi/ompi@90fb58de4f	2007-05-16 12:20:58 +00:00
Brian Barrett	ce4e24c399	need to empty location of libltdl.la so that we don't try to build it when we're not using dlopen This commit was SVN r14662.	2007-05-15 23:43:38 +00:00

... 4 5 6 7 8 ...

9925 Коммитов