openmpi

Автор	SHA1	Сообщение	Дата
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
Gleb Natapov	1dc1ee3998	Send control credit message over "eager rdma" channel if possible. This commit was SVN r14032.	2007-03-14 14:38:56 +00:00
Gleb Natapov	1f3ac2d7ae	Hold pointers to free_max/free_eager lists in array indexed by priority. This eliminates couple of ifs from fast path. This commit was SVN r14031.	2007-03-14 14:36:03 +00:00
Gleb Natapov	8607957df9	Get rid of remaining _hp/_lp stuff. Consolidate HP/LP QP creation code. This commit was SVN r14030.	2007-03-14 14:33:24 +00:00
Rolf vandeVaart	42168575fd	Fix for the special case where np=2 and the sendbuf is set to MPI_IN_PLACE. In that case, sendcount and sendtype are not valid and we need to use recvcount and recvtype. This commit fixes trac:943. Reviewed by Jelena Pjesivac-Grbovic. This commit was SVN r14022. The following Trac tickets were found above: Ticket 943 --> https://svn.open-mpi.org/trac/ompi/ticket/943	2007-03-13 19:01:20 +00:00
Galen Shipman	8253d83410	make btl template compile again This commit was SVN r13990.	2007-03-08 21:58:26 +00:00
Galen Shipman	67ba5264f6	ORTE_NAME_ARGS casts to long, not unsigned long. This commit was SVN r13988.	2007-03-08 21:42:29 +00:00
Galen Shipman	8072dd344c	use %ld instead of %d as ORTE_NAME_ARGS does casting to long not unsigned long This commit was SVN r13987.	2007-03-08 21:41:39 +00:00
Bill D'Amico	53d434d6ab	Fix warnings when building with UDAPL - minor formatting errors. This commit was SVN r13971.	2007-03-08 18:39:40 +00:00
Jelena Pjesivac-Grbovic	9780a000ba	Cleanup of generic reduce function and possible (low probability) bug fix. - fixing line lengths and some of the comments - possible bug fix (but I do not think we exposed it in any tests so far) temporary buffers were allocated as multiples of extent instead of true_extent + (count -1) * extent. Everything is still passing Intel tests over tcp and btl mx up to 64 nodes. This commit was SVN r13956.	2007-03-08 00:54:52 +00:00
Jelena Pjesivac-Grbovic	57cbafafd5	Clean up of generic broadcast function: removing unecessary statements and improving comments. This commit was SVN r13955.	2007-03-07 21:59:53 +00:00
Rolf vandeVaart	333357f4cc	This fixes the initialization of the usable size of the shared memory. The original code was not compensating for the space used by the header. When memory got tight, the allocator would return a pointer to memory that did not exist resulting in a SEGV for the application. This is a partial fix for ticket #929. Reviewed by Rich Graham. This commit was SVN r13950.	2007-03-07 13:28:06 +00:00
Jelena Pjesivac-Grbovic	0c07654c30	Updating reduce_scatter decision function based on MX results up to 64 nodes and both 1ppn and 2ppn configurations. This commit was SVN r13945.	2007-03-07 00:38:33 +00:00
Gleb Natapov	40501f8274	Amend IB parameter checking. This commit was SVN r13936.	2007-03-06 13:05:12 +00:00
Brian Barrett	9660bb6ccc	These symbols aren't actually created in ROMIO with Open MPI's configure, so no need to have them in here. This commit was SVN r13933.	2007-03-05 22:55:17 +00:00
Jelena Pjesivac-Grbovic	e5ed167a6e	Adding tuned version of reduce_scatter implementation. Currently 3 algorithms are available: - non-overlapping, reduce + scatterv, (works for non-commutative operations) - recursive halving algorithm (copied from basic module) - ring algorithm (similar to allreduce ring, for large messages) This commit was SVN r13929.	2007-03-05 20:40:39 +00:00
Gleb Natapov	be018944d2	Clean up circular buffer implementation. Get rid of _same_base_address() functions by pre-calculating everything in advance. This commit was SVN r13923.	2007-03-05 14:27:26 +00:00
Gleb Natapov	8078ae5977	Optimize sm communication. Pass message type (MCA_BTL_SM_FRAG_ACK/ MCA_BTL_SM_FRAG_SEND) and status success/fail in low bits of pointers we are passing through circular buffer. The rank that receives ACK doesn't need to look into data it received and this is a big win since this data is not in the cache of the rank's CPU. (Note that we can use low bits of pointers because free_list always return pointers aligned at least to cache line size). This commit was SVN r13922.	2007-03-05 14:24:09 +00:00
Gleb Natapov	90fb58de4f	When frags are allocated from mpool by free_list the frag structure is also allocated from mpool memory (which is registered memory for RDMA transports) This is not a problem for a small jobs, but for a big number of ranks an amount of waisted memory is big. This commit was SVN r13921.	2007-03-05 14:17:50 +00:00
Rich Graham	e932d9a695	macro variable has same name as one of the parameters passed to the macro. Typo - most likely cut and paste error. This commit was SVN r13918.	2007-03-04 23:31:07 +00:00
Li-Ta Lo	196e2a86bb	addes binomial tree based scatter, passed IBM and intel tests This commit was SVN r13906.	2007-03-02 23:19:02 +00:00
Li-Ta Lo	11c94cbe76	eliminated the use of MPI_Get_count This commit was SVN r13904.	2007-03-02 22:57:50 +00:00
Li-Ta Lo	3765e19d15	added ASCII graph for the topologies This commit was SVN r13892.	2007-03-02 17:17:14 +00:00
Li-Ta Lo	bd75f2f162	change ALLGATHER to GATHER This commit was SVN r13891.	2007-03-02 17:02:29 +00:00
Josh Hursey	0404444dbe	* Added 2 new MCA parameters - mca_base_param_file_prefix (Default: NULL) This is the fullname of the "-am" mpirun option. Used to specify a ':' separated list of AMCA parameter set files. - mca_base_param_file_path (Default: $SYSCONFDIR/amca-param-sets/:$CWD) The path to search for AMCA files with relative paths. A warning will be printed if the AMCA file cannot be found. * Added a new function "mca_base_param_recache_files" the re-reads the file configurations. This is used internally to help bootstrap the MCA system. * Added a new orterun/mpirun command line option '-am' that aliases for the mca_base_param_file_prefix MCA parameter * Exposed the opal_path_access function as it is generally useful in other places in the code. * New function "opal_cmd_line_make_opt_mca" which will allow you to append a new command line option with MCA parameter identifiers to set at the same time. Previously this could only be done at command line declaration time. * Added a new directory under the $pkgdatadir named "amca-param-sets" where all the 'shipped with' Open MPI AMCA parameter sets are placed. This is the first place to search for AMCA sets with relative paths. * An example.conf AMCA parameter set file is located in contrib/amca-param-sets/. * Jeff Squyres contributed an OpenIB AMCA set for benchmarking. Note: You will need to autogen with this commit as it adds a configure param. Sorry :( This commit was SVN r13867.	2007-03-01 13:39:20 +00:00
Tim Mattox	ec82d01555	Add a missing extern keyword that prevented compilation on OS X. This commit was SVN r13853.	2007-02-28 20:26:34 +00:00
Gleb Natapov	2b6cbd6299	Separate frag lists for RDMA descriptors to two, one for src descriptors and another for dst descriptors. This provide partial solution to OB1 protocol deadlock problem. We can limit number of RDMA descriptors (by setting btl_openib_free_list_max to something different from -1) and if we will be lucky to hit this limit before we fail to register more memory the protocol will not deadlock. When we had only one list for src/dst descriptors we deadlocked when we reached max limit for the list. This commit was SVN r13844.	2007-02-28 13:43:38 +00:00
Sven Stork	870740efe2	- proper export symbols that are required by other components. This commit was SVN r13841.	2007-02-28 12:51:55 +00:00
Rainer Keller	0889ebd59f	- Eliminate warnings, that PGI-6.2.5 issues with -Minform=inform This commit was SVN r13840.	2007-02-28 08:36:34 +00:00
Li-Ta Lo	c5d8c221b0	added binomial tree based Gather alogrithm, passed IBM and Intel tests This commit was SVN r13835.	2007-02-28 01:11:01 +00:00
Jelena Pjesivac-Grbovic	627533fe4a	Adding segmented ring algorithm for Allreduce for commutative operations. Algorithm allows user to specify the segment size to be used for computation/communication overlap. The additional memory requirement for the algorithm is 2 x segment size. It performed well for (really) large message sizes over MX and it passed intel Allreduce_c and Allreduce_loc_c tests. This commit was SVN r13832.	2007-02-27 20:32:30 +00:00
Sven Stork	d8a369936e	- Fix more symbols that should be exported. This commit was SVN r13824.	2007-02-27 15:17:17 +00:00
George Bosilca	bec20422ee	Remove the warnings about printf data-type mismatch. This commit was SVN r13804.	2007-02-26 22:20:35 +00:00
Brian Barrett	6d70f5fbe0	don't define malloc and friends in opal_config, as it causes problems when we later include malloc.h This commit was SVN r13803.	2007-02-26 21:34:48 +00:00
Li-Ta Lo	c860bd1be5	fixed a typo in the comment This commit was SVN r13802.	2007-02-26 19:20:46 +00:00
Li-Ta Lo	73a73b1c78	added ASCII graph on reduce_log_intra This commit was SVN r13801.	2007-02-26 19:15:37 +00:00
Pavel Shamis	6fe84f581b	mpool_base_module_destroy was removing all modules from a list instead of removing specific one. Fixing the bug. This commit was SVN r13795.	2007-02-26 16:25:20 +00:00
Brian Barrett	d9e0e80190	Make some debugging output only looked at when debugging is enabled This commit was SVN r13777.	2007-02-25 01:03:19 +00:00
Bill D'Amico	db1c2a58c4	Removed cruft - unused variables causing warnings during OMPI build. This commit was SVN r13772.	2007-02-23 18:55:41 +00:00
Tim Prins	f35f67ed1c	(very) minor correction to helpfile This commit was SVN r13758.	2007-02-22 16:02:12 +00:00
Ron Brightwell	e15e85a0b6	Fix a problem with long unexpected messages that was causing hangs. Long unexpected messages were not generating PUT_START events because the MD for long unexpected messages was configured to ignore start events. When a long unexpected message arrived, it traversed the match list, and ended up in the long unexpected MD. As the long message is being consumed, the code called PtlMDUpdate() to look for the message, but there was no event that indicated that it had arrived. So, the update succeeded. Once the long unexpected message was consumed, the PUT_END event showed up in the event queue -- except the code wasn't looking for it anymore. The PUT_START events exist specifically to handle ordering between short and long unexpected messages, so PUT_START events can't be ignored on long unexpected messages. Modified the code to generate PUT_START events for both long and short unexpected messages and handle matching up START and END events appropriately. This commit was SVN r13746.	2007-02-21 21:59:48 +00:00
Li-Ta Lo	049921a5ec	the temporary buffer is not needed for the MPI_IN_PLACE cases if the underlying Gather is implemented correctly This commit was SVN r13740.	2007-02-21 20:39:56 +00:00
Jelena Pjesivac-Grbovic	36156f39c2	Modification to allreduce ring algorithm: - the block sizes are computed in more uniformn way. The first k blocks may be 1 element larger than the remaining blocks. The algorithm passed Intel Allreduce_c and Allreduce_loc_c tests, and IMB-3.2 Allreduce, over TCP and both btl and mtl MX (up to 128 processes). The algorithm still only supports commutative operations. This commit was SVN r13738.	2007-02-21 19:30:08 +00:00
Josh Hursey	c573171b7d	Mostly a cleanup commit. - Implement the BML/r2 finialize funciton - Cleanup the btl close routine - Wire up a pml_base_verbose MCA parameter so you can actually watch the PML selection logic if you really want to. - Fix a potental segfault in the selection logic. ompi_pointer_array_get_item() may return NULL, so we have to check for it This commit was SVN r13734. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855	2007-02-21 16:18:43 +00:00
Jelena Pjesivac-Grbovic	b608887466	Adding variant of linear alltoall algorithm where the number of outstanding requests can be limited using mca parameters. The implementation passed Intel, IMB-3.2, and mpi_test_suite tests over TCP and MX up to 128 processes (64 nodes), on both 32-bit and 64-bit machines. It is not activated by default, but it should be useful for really large communicator sizes. This commit was SVN r13720.	2007-02-20 04:25:00 +00:00
Jeff Squyres	f820e44112	Remove a gcc-ism from the code (defining an anonymous union in the middle of a struct). Now we properly define and name the union outside the struct and simply create an instance of it inside the struct. This commit was SVN r13709.	2007-02-19 18:21:57 +00:00
George Bosilca	020b8ade70	A slightly better fix for the data mismatch compiler complaints. This commit was SVN r13695.	2007-02-17 05:23:57 +00:00
Jelena Pjesivac-Grbovic	d2d02642ca	Removing compilation warnings about the output format. This commit was SVN r13693.	2007-02-16 23:32:47 +00:00
Rich Graham	b925d6588d	add some missing error checking - thanks to Ron B. This commit was SVN r13692.	2007-02-16 22:19:24 +00:00
George Bosilca	04138c23af	No more warnings. This commit was SVN r13683.	2007-02-16 16:25:58 +00:00
Pavel Shamis	edeab0e912	Adding Mellanox Technologies copyright to files touched by Mellanox. This commit was SVN r13669.	2007-02-15 18:03:20 +00:00
Jelena Pjesivac-Grbovic	e532b928af	Adding segmented binary reduce algorithm which works with non-commutative operations. Implementation passed intel: MPI_Reduce_c , MPI_Reduce_loc_c, and MPI_Reduce_user_c tests over TCP, BTL MX, and MTL MX, as well as, mpi_test_suite Reduce tests (up to 64 nodes). The algorithm is still not activated by decision function (will be in the near future). This commit was SVN r13657.	2007-02-14 22:38:38 +00:00
Pavel Shamis	2483cefc57	Additional check if descriptor is NULL. It prevents mca_pml_dr_sendreq_cleanup_active failure on segfault. This commit was SVN r13647.	2007-02-14 10:43:43 +00:00
Brian Barrett	c00d841741	Fix hang on Cray machine introduced with r13582. The modex will never fire when on the Cray machine (aka when the NULL GPR is in use). This commit was SVN r13638. The following SVN revision numbers were found above: r13582 --> open-mpi/ompi@041beeb1b6	2007-02-13 18:34:03 +00:00
Gleb Natapov	4d4b0a022a	Add error callback to sm BTL. Call it when allocation of the initial circular buffer fails. If cb is already allocated, but it is full and allocation of additional cb fails, we spin waiting for receiver to free space in existing cb. This commit was SVN r13635.	2007-02-13 12:01:36 +00:00
George Bosilca	2e042c91cf	Once we compute the local offset use it (instead of the global one). This commit was SVN r13634.	2007-02-13 09:34:04 +00:00
George Bosilca	22eca30b45	One less compiler warning. This commit was SVN r13633.	2007-02-13 09:32:57 +00:00
Gleb Natapov	1033002595	Fix memory leak. Free allocated descriptor if operation cannot proceed. This commit was SVN r13610.	2007-02-12 09:47:51 +00:00
Jelena Pjesivac-Grbovic	b52dc9e427	Modifying fixed decision function for reduce to utilize linear algorithm only for really small communicator sizes. This commit was SVN r13597.	2007-02-10 00:31:10 +00:00
Brian Barrett	041beeb1b6	Share currently selected PML in the modex information, then check whenever adding new procs that the remote proc's pml is the same as our local pml. Turns the hangs from mismatched PMLs into an abort, which is better, I think. This commit was SVN r13582.	2007-02-09 16:38:16 +00:00
Galen Shipman	f98a442c82	Fix a problem in the selection logic for MX. Basically we need to be able to open MTL MX and BTL MX and initialize them at the same time. The problem is that both call mx_init and mx_finalize, solution is to add an external entity that does the init and finalize (based on ref counting). This commit was SVN r13576.	2007-02-09 03:19:38 +00:00
Jelena Pjesivac-Grbovic	6efca498ec	Fixes trac:692 in trunk: receive buffer in MPI_Reduce operation is no longer overwritten on non-root nodes. This commit was SVN r13538. The following Trac tickets were found above: Ticket 692 --> https://svn.open-mpi.org/trac/ompi/ticket/692	2007-02-07 18:57:03 +00:00
Josh Hursey	90f449f675	fix a typo that got in there This commit was SVN r13523.	2007-02-06 20:56:48 +00:00
Jeff Squyres	c91fcd7fbd	Fix a bunch of minor typos submitted by Bernhard Fischer. This commit was SVN r13505.	2007-02-06 12:00:30 +00:00
Brian Barrett	09cc9e4941	properly compute starting offset -- the lb will be included in the offset, so we don't need both. Refs trac:864 This commit was SVN r13494. The following Trac tickets were found above: Ticket 864 --> https://svn.open-mpi.org/trac/ompi/ticket/864	2007-02-05 18:12:18 +00:00
Galen Shipman	ec610a9e65	spread priorities out a bit.. This commit was SVN r13487.	2007-02-04 00:55:25 +00:00
Galen Shipman	ddf08cb0b3	woops.. This commit was SVN r13482.	2007-02-03 02:32:00 +00:00
Galen Shipman	a94101fa62	mostly another hack around for PML selection, allows CM be select itself if an MTL is available, if not OB1 is used. Still prevents DR and OB1 from stomping on each other though. This commit was SVN r13481.	2007-02-03 02:01:18 +00:00
Christian Bell	e04c55af00	Fixes to psm mtl following a more comprehensive testing of intel tests. This commit was SVN r13471.	2007-02-02 21:55:04 +00:00
George Bosilca	0ff2115964	Other warnings are now silenced. This commit was SVN r13462.	2007-02-02 06:47:35 +00:00
Jelena Pjesivac-Grbovic	e193d625bc	Bugfix for ring allreduce algorithm. The step used to iterate through buffer was function of true_extent instead of extent. This may or may not solve ticket #689 because I am still getting failures over btl mx, but I cannot reproduce failures over mtl mx nor tcp. This commit was SVN r13459.	2007-02-02 02:44:16 +00:00
George Bosilca	1c7c39b32b	I miss this warnings on my last commit. This commit was SVN r13431.	2007-02-01 19:34:21 +00:00
George Bosilca	79ea6d471b	Even less warnings. This commit was SVN r13429.	2007-02-01 19:27:11 +00:00
George Bosilca	56ffbfc5ff	Get rid of the warnings in the Open IB BTL. This commit was SVN r13424.	2007-02-01 19:07:04 +00:00
George Bosilca	b611e6d7dc	Less warnings. This commit was SVN r13419.	2007-02-01 17:51:43 +00:00
George Bosilca	6ef3917741	Allow the user to specify the bandwidth and latency for the MX device. This commit was SVN r13418.	2007-02-01 17:51:00 +00:00
Brian Barrett	58b325b03f	Two changes to improve the sm situation with spawn: * have the mpool size be based on MCW, not num procs in other jobs we know about. Solves the problem of the spawned job having a much bigger than needed sm file * Can't assume that "me" is in the list of procs passed to addprocs, so need to use slightly different logic and not go through all of add procs unless there's a proc in my job that isn't me. This seems to greatly improve the situation, although there still seems to be more of a slowdown through MPI_INIT for the children (if there are more than one child) than MPI_INIT for the parent if there are 'n' children compared to 'n' parents. Hopefully that made sense ;) This commit was SVN r13417.	2007-02-01 17:18:35 +00:00
Brian Barrett	a0b40ce45a	Fix race condition in setting MPI_ERROR -- with buffered sends, the request can complete before the operation, meaning that a bogus MPI_ERROR is read This commit was SVN r13401.	2007-01-31 21:40:14 +00:00
Brian Barrett	039a3d8c17	add comment about why there's no status update here, since I always forget This commit was SVN r13400.	2007-01-31 21:39:20 +00:00
Brian Barrett	846eed84f1	When receiving a message, need to account for the fact that the displacement of the first entry might not be the start of the user's buffer. This is similar to what ompi_convertor_unpack does. This is the solution for the test case attached to ticket #690. Refs trac:690 This commit was SVN r13397. The following Trac tickets were found above: Ticket 690 --> https://svn.open-mpi.org/trac/ompi/ticket/690	2007-01-31 18:18:19 +00:00
Brian Barrett	65b07140c0	clean up some of the printf warnings caused by the attribute code This commit was SVN r13395.	2007-01-31 17:11:06 +00:00
George Bosilca	a02d1c7c8d	No more warnings. This commit was SVN r13382.	2007-01-31 04:27:41 +00:00
Brian Barrett	ee753694e0	Print out the memlock limit when we can't allocate memory This commit was SVN r13372.	2007-01-30 21:22:56 +00:00
Rainer Keller	061ba05439	- Fixes uncovered with the format attribute to opal_output and opal_output_verbose This commit was SVN r13371.	2007-01-30 20:56:31 +00:00
Jeff Squyres	86f8c66a27	Turns out that the leave_pinned stuff isn't used in these BTLs at all. So just remove it. This commit was SVN r13360.	2007-01-30 15:39:49 +00:00
Rainer Keller	3669e8921e	- Fix further compiler warnings regarding initialization and shadowing variables. This commit was SVN r13358.	2007-01-30 06:34:38 +00:00
Jeff Squyres	c9f072b84f	Strike down a few more stray places that were registering mpi_leave_pinned and replace them with the one central global variable. This commit was SVN r13349.	2007-01-29 20:24:31 +00:00
Brian Barrett	93a2f31932	Use a recursive halving communication algorithm similar to the one used by MPICH2 for "small" commutative operations in the reduce_scatter basic implementation. "small" is currently pretty big, as it doesn't take much to beat reduce/scatterv. Need to do much more than this for better all around performance of MPI_Reduce_scatter, but this was enough to solve the problems I was having. This commit was SVN r13348.	2007-01-29 19:29:35 +00:00
Rainer Keller	ca35881cd0	- Minor bugfixes and removed compiler warnings This commit was SVN r13343.	2007-01-28 19:52:09 +00:00
Jelena Pjesivac-Grbovic	33dcb4f810	Minor change to linear alltoall algorithm: - post isends in reverse order of posting irecvs. if the messages arrive approximately in order, this should minimize the time spent in matching the requests. I did not see any performance difference over MX up to 64 nodes, but the change makes sense and may have some impact when we have (many) more nodes. This commit was SVN r13337.	2007-01-26 21:59:31 +00:00
Brian Barrett	385a435813	Start long message send as soon as possible, to minimze ack time for the receive, greatly increasing mid-range bandwidth This commit was SVN r13317.	2007-01-25 23:07:03 +00:00
Rich Graham	1c20feb52b	Take into account constants that in the cray headers are defined different than in the portals spec. This commit was SVN r13311.	2007-01-25 18:32:47 +00:00
Jeff Squyres	7b6ed64c7b	Add in the hostname to the BTL_* output macros so that you can tell on which node an event occurred. This commit was SVN r13302.	2007-01-25 14:02:54 +00:00
Jeff Squyres	6fea000e5f	Oops -- get the right function name (copy-n-paste error). This commit was SVN r13290.	2007-01-24 22:31:13 +00:00
Jeff Squyres	6b69ea664d	Make a much, much better error message for a not-uncommon failure scenario (user/sysadmin forgot to set the memlock limits high enough). This commit was SVN r13289.	2007-01-24 22:25:40 +00:00
Patrick Geoffray	b252cb82c8	oops, ".", not "->", copy error... This commit was SVN r13287.	2007-01-24 19:16:46 +00:00
Patrick Geoffray	d58f6b2451	Free memory in synchronous send case if free_after requires it. Fixes memory leak using synchronous sends and custom data types. This commit was SVN r13286.	2007-01-24 19:10:38 +00:00
George Bosilca	d19a4f4740	Cast it to make cl happy. This commit was SVN r13267.	2007-01-24 00:51:01 +00:00
George Bosilca	790f175d4e	Explicit conversions to make the code Windows friendly. This commit was SVN r13266.	2007-01-24 00:50:24 +00:00
George Bosilca	a4488ff8d2	Add explicit conversions. This commit was SVN r13265.	2007-01-24 00:49:08 +00:00

1 2 3 4 5 ...

1637 Коммитов