openmpi

Автор	SHA1	Сообщение	Дата
Nysal Jan	857c32784e	Fix detection of fd_mask This commit was SVN r24320.	2011-01-28 06:20:32 +00:00
George Bosilca	d457338f66	Force mips2 asm acceptance before sc and ll. This commit was SVN r24319.	2011-01-27 22:42:26 +00:00
Nathan Hjelm	2605fc6a54	actually need pml = csum for these This commit was SVN r24318.	2011-01-27 20:44:13 +00:00
Josh Hursey	8ec85c6b8f	Fixes the C/R Automatic Recovery feature when the HNP is also hosting processes locally. I want to thank Hugo Meyer for reporting this/these bugs. Notes: * Moved over a patch from the stabilization branch that makes sure we close the peer socket in the OOB TCP component fully during shutdown (after the de-registration sync). It also ensures that we free the rml_uri only after we are done communicating with the peer (in the odls_base deregister sync operation). * When an error is detected while delivering messages, we really want to bail out of the loop since the error manager is likely mutating the orte_local_children data structure, so it is no longer safe to iterate over in the orte_odls_base_default_deliver_message() function. * When the HNP is hosting processes make sure it accounts for processes that may have failed locally in the ErrMgr HNP component by decrementing the num_local_procs. This makes it match the orted ErrMgr component accounting. This is what was causing the modex to fail (the number of participants was wrong on a rolling recovery. * The crmig and autor features of the hnp ErrMgr component now check for the jobid from both the 'job' parameter and from the process name (since one may be there and not the other). This caused some additional error messages during startup. * If we fail to migrate (e.g., due to invalid node specification), print only the error message, not the error and success messages. This can be misleading. This commit was SVN r24317.	2011-01-27 20:40:23 +00:00
Jeff Squyres	5bc2ad2b44	Fix some deprecated notices to refer to the correct new function names This commit was SVN r24313.	2011-01-27 19:55:42 +00:00
Jeff Squyres	6c8de8fb76	Bump up to hwloc 1.1.1 This commit was SVN r24312.	2011-01-26 23:20:26 +00:00
Jeff Squyres	511f87665b	Fixes trac:2680: Add ARM support. This commit was SVN r24308. The following Trac tickets were found above: Ticket 2680 --> https://svn.open-mpi.org/trac/ompi/ticket/2680	2011-01-26 17:22:44 +00:00
Josh Hursey	81fd41f811	Return an informative error message if the user requests a migration of a job that is not capable of it. C/R Functionality cleanup This commit was SVN r24307.	2011-01-26 15:36:34 +00:00
Josh Hursey	8f45fcb429	More fixes for the C/R support. Fixes a couple bugs with the migration and autor features. The C/R functionality should be fully working now. * Fix the checkpoint-restart-checkpoint case which would previous reject the checkpoint of the newly restarted process. By making sure to re-enable checkpointing once the application has fully restarted fixes this issue (make sure to set is_app_checkpointable to true on restart confirmation). * In the case of an invalid checkpoint, do not try to access the SStore datastore as it will be using a dummy handler, and return NULL strings. mpirun was segfaulting in the error case because it was trying to convert the seq_num from a string to an integer. * Make sure to initialize the timer event in the Automatic Recovery section of the HNP errmgr, per the libevent update. This caused a segfault when attempting to recover a failed process. * If ompi-checkpoint loses connection to the HNP/mpirun the TCP socket will fail and call the ErrMgr update_state function. This commit adds a dummy function {{{orte_errmgr_base_update_state()}}} that will prevent the ompi-checkpoint command from segfaulting in this error scenario. This commit was SVN r24306.	2011-01-26 14:56:35 +00:00
Nathan Hjelm	8a3179cdcb	removed c99 test code This commit was SVN r24297.	2011-01-25 23:02:35 +00:00
Josh Hursey	66af515061	Fix C/R functionality with the new libtool. This fixes the case where the restarted process cannot be checkpointed or finalized. Short Version: -------------- Event engine needs to be flushed so it does not use old/stale file descriptors. Long Version: ------------- The problem was that the restarted process was waiting for the socket to the local daemon to finish establishing during the 'sync' operation. The core problem was that the daemon was sending a header of 36 bytes, but the restarted process only received 35 bytes of the message. So the restarted process became stuck waiting for the last byte to arrive. After many hours of digging, I figured out that the event engine was using the same file descriptor for its evsig_cb functionality (to signal itself when a signal arrives). So when the daemon wrote in to the new fd the event engine was stealing the first byte (shakes fist at event engine) before the recv() could be posted. The solution is to use the event_reinit() function on restart to re-establish the now-stale file descriptors in the event engine. This seems to have fixed the problem. A few other minor things: ------------------------- * Add a check to make sure the event engine is balanced in its init/finalize * Add the opal_event_base_close() to the BLCR restart exec function (still not 100% sure it is needed, but there it is). This commit was SVN r24296.	2011-01-25 22:43:47 +00:00
Josh Hursey	e4d13d338f	Fix a couple of compiler warnings This commit was SVN r24295.	2011-01-25 22:22:32 +00:00
Nysal Jan	72ba038309	Add workaround for a Libtool (<2.2.8) bug concerning IBM xlf compilers This commit was SVN r24294.	2011-01-25 09:53:34 +00:00
George Bosilca	09f645f9a9	There is no need for the byte variable. This commit was SVN r24293.	2011-01-24 22:41:04 +00:00
Jeff Squyres	30e164e246	Fix all the problems with "make distcheck" caused by the new ROMIO import so that we can finally get a trunk nightly tarball\! This commit was SVN r24292.	2011-01-24 21:10:14 +00:00
Nathan Hjelm	2ca55d54f7	use AC_PROG_CC_C99 to find flags to turn on c99 support. remove if mtt fails because of this. This commit was SVN r24291.	2011-01-24 15:54:52 +00:00
Jeff Squyres	afa654746c	Somehow this has been sitting, uncommitted, in a local checkout since last December. :-( Add new MCA param: maffinity_libnuma_policy. Thanks to David Singleton for the suggestion. Here's the help text about it: {{{ MCA maffinity: parameter "maffinity_libnuma_policy" (current value: <loose>, data source: default value) Binding policy that determines what happens if memory is unavailable on the local NUMA node. A value of "strict" means that the memory allocation will fail; a value of "loose" means that the memory allocation will spill over to another NUMA node. }}} This commit was SVN r24290.	2011-01-24 14:39:16 +00:00
Jeff Squyres	272fe89252	Update svn:ignore This commit was SVN r24289.	2011-01-24 14:15:24 +00:00
Jeff Squyres	1ea62f3bf6	Add svn:ignore This commit was SVN r24288.	2011-01-24 14:15:07 +00:00
Jeff Squyres	700d601dfc	Also need to check the "flag" value, because if flag!=true, then the value of "local_spawn" (and "non_mpi") is not set by ompi_info_get*(). This commit was SVN r24286.	2011-01-22 16:27:58 +00:00
Jeff Squyres	89fb26eb1c	Add missing line continuation character to prevent a Makefile syntax error This commit was SVN r24285.	2011-01-22 11:13:28 +00:00
Rolf vandeVaart	8171370287	Fix typo which broke builds when configured with hetero and debug. This commit was SVN r24283.	2011-01-21 17:10:09 +00:00
Abhishek Kulkarni	a1090575c2	Nitpick: Get rid of a redundant OPAL_SOS_GET_ERROR_CODE. This commit was SVN r24282.	2011-01-20 23:48:11 +00:00
Abhishek Kulkarni	3243b16bb3	Decode SOS error code before checking it with the native error code. This commit was SVN r24281.	2011-01-20 23:21:38 +00:00
Abhishek Kulkarni	45a53b4f7a	Add a missing to opal_sos_finalize in opal_finalize_util. This commit was SVN r24280.	2011-01-20 23:18:02 +00:00
George Bosilca	fc9133cc7f	Correctly initialize the convertor to be used. Don't forget to initialize the OPAL datatype module. This commit was SVN r24279.	2011-01-20 20:05:21 +00:00
Samuel Gutierrez	2574e18de4	update LANL's tlcc and rr-class platform files This commit was SVN r24278.	2011-01-20 18:59:37 +00:00
Rolf vandeVaart	6a5ad29c36	Update configure command since it changed. This commit was SVN r24275.	2011-01-20 14:42:12 +00:00
Sylvain Jeaugey	46b711e164	Fixes trac:1888 introduced by r24264 : make Romio autogen.sh executable. This commit was SVN r24272. The following SVN revision numbers were found above: r24264 --> open-mpi/ompi@0e921bba7f The following Trac tickets were found above: Ticket 1888 --> https://svn.open-mpi.org/trac/ompi/ticket/1888	2011-01-20 09:20:34 +00:00
Nathan Hjelm	e2126512a9	test c99 struct initialization with mtt. remove on jan 20, 2011 This commit was SVN r24271.	2011-01-19 22:21:21 +00:00
Rolf vandeVaart	acd38ff746	Final changes from jsquyres review. Moved configure code from upper level into btl configure.m4. Changed prefix from "OMPI" to "BTL" in preprocessor macro. Add an mca param that shows it has been configured in. This commit was SVN r24270.	2011-01-19 20:58:22 +00:00
Brian Barrett	4859bb82e2	* Update to support direct call * Add missing cancel (not that it does anything useful) * Fix bug in opal_output call This commit was SVN r24269.	2011-01-19 20:49:28 +00:00
Brian Barrett	8f6a19b0fc	export component/module interface so that direct call works again This commit was SVN r24268.	2011-01-19 20:47:17 +00:00
Brian Barrett	b98afd298b	update to remove unneeded fields This commit was SVN r24267.	2011-01-19 20:46:06 +00:00
Rolf vandeVaart	f22f76a6ff	Add byte swapping macro for failover control message per jsquyres review. This commit was SVN r24266.	2011-01-19 19:58:35 +00:00
Rolf vandeVaart	e75b86d3ab	Fix some issues from jsquyres review. 1. Use asprintf instead of snprintf 2. Return remote_proc where possible. 3. Remove dead code. 4. Fix two comment typos. This commit was SVN r24265.	2011-01-19 16:09:17 +00:00
Sylvain Jeaugey	0e921bba7f	Romio Refresh from mpich2-1.3.1. Work by Pascal Deveze, tested through bitbucket by Jeff Squyres (https://bitbucket.org/devezep/new-romio-for-openmpi ). This commit was SVN r24264.	2011-01-19 15:55:10 +00:00
Shiqing Fan	b2f3a5b7c2	Correctly check system specific datatypes on Windows. This commit was SVN r24257.	2011-01-18 09:40:58 +00:00
Jeff Squyres	189b541dbd	Add a proper help message for the mca_verbose MCA param (and shuffle the code to be slightly more efficient). This commit was SVN r24256.	2011-01-14 20:18:06 +00:00
Abhishek Kulkarni	fd7ef7a1f1	Fixes broken trunk compile: call process status notify only when ft-enable-cr is selected. This commit was SVN r24255.	2011-01-14 18:37:07 +00:00
Jeff Squyres	32e722e4d9	Add first bullets about 1.5.2. This commit was SVN r24254.	2011-01-14 15:15:23 +00:00
George Bosilca	29c7f2fba5	Update the tests to match the new datatype engine. This commit was SVN r24252.	2011-01-14 07:58:50 +00:00
Abhishek Kulkarni	87d2c9b31d	Few fault tolerance updates related to the CIFTS project (http://www.mcs.anl.gov/research/cifts/) * Improve the FTB notifier to publish (C/R, process/communication failure) events to the FTB with the OMPI jobid as the associated payload. * Add notifier calls for C/R events and process status events in SnapC and ErrMgr components. * Fix a bug where the SnapC states and process states collide before being thrown out over the notifier. This commit was SVN r24251.	2011-01-13 20:13:49 +00:00
George Bosilca	5390fd6f33	Reshape the datatype engine. The basic types are built down in OPAL. MPI types are either direct link to these basic predefined types, or a combination of them. Anyway, the first items in the datatype list belong to OPAL, the second round are MPI datatypes created by composing basic OPAL datatypes, and the last batch are mapped datatype (direct correspondance between an OMPI datatype and an OPAL one such as int -> int32_t). Modify the op to fit this new scheme. This commit was SVN r24247.	2011-01-13 06:08:54 +00:00
Ralph Castain	b09f57b03d	Update the multicast subsystem - ported from Cisco branch This commit was SVN r24246.	2011-01-13 01:54:05 +00:00
Terry Dontje	f3aaa885a3	corrected a couple places in orte where it said cpu_model when it should have been cpu_type. This commit was SVN r24221.	2011-01-11 19:56:26 +00:00
Terry Dontje	56c03a3853	removing a file I should not have added This commit was SVN r24220.	2011-01-11 19:02:08 +00:00
Terry Dontje	a374661ead	add configure.params to solaris sysinfo module to allow it to be built This commit was SVN r24219.	2011-01-11 18:31:55 +00:00
Jeff Squyres	cd8f12d8e5	Remove a few useless files that were missed last night. This commit was SVN r24218.	2011-01-11 14:15:31 +00:00
Jeff Squyres	54cb4eb2b5	Merge over new version of hwloc 1.1 from the vendor branch. Update the module to use the new hwloc bitmap API (the cpuset API is both klunkier and deprecated), which simplified a few things. This commit was SVN r24217.	2011-01-11 01:41:10 +00:00

1 2 3 4 5 ...

15496 Коммитов