openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	0af7ac53f2	Fixes trac:1392, #1400 * add "register" function to mca_base_component_t * converted coll:basic and paffinity:linux and paffinity:solaris to use this function * we'll convert the rest over time (I'll file a ticket once all this is committed) * add 32 bytes of "reserved" space to the end of mca_base_component_t and mca_base_component_data_2_0_0_t to make future upgrades [slightly] easier * new mca_base_component_t size: 196 bytes * new mca_base_component_data_2_0_0_t size: 36 bytes * MCA base version bumped to v2.0 * '''We now refuse to load components that are not MCA v2.0.x''' * all MCA frameworks versions bumped to v2.0 * be a little more explicit about version numbers in the MCA base * add big comment in mca.h about versioning philosophy This commit was SVN r19073. The following Trac tickets were found above: Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392	2008-07-28 22:40:57 +00:00
Adrian Knoth	5096512c3a	Cosmetics, only typos. This commit was SVN r19061.	2008-07-28 13:33:08 +00:00
Jeff Squyres	4d034383d9	Apply patch from Ralf W. to remove a non-portable use of ==. This commit was SVN r19046.	2008-07-26 12:36:24 +00:00
Jeff Squyres	92c10cd187	Remove some old kruft from Makefile.am's -- likely the result of copying some old Makefile.am a long time ago. This commit was SVN r19043.	2008-07-26 00:27:42 +00:00
Josh Hursey	ca43968418	Fix a dealock scenario when registering depricated MCA parameters. The internal loop uses the 'item' variable that is used by the outer loop as well. So when the outer loop checks the value of 'item' it will never equal the end of the list since it no longer references the same list. Kinda found by MTT. MTT calls 'ompi_info --all --parsable' and it was livelocked and had to be killed by hand. I'm going to push this one to Jeff to push to v1.3 since he did the original implementation and should check this code. This commit was SVN r19014.	2008-07-24 15:51:54 +00:00
Ralph Castain	fdb2408bf2	Rename the osx paffinity component the "posix" component since it really has nothing osx specific in it - it is just a generic posix call to determine #processors. Set the priority low so that both linux and solaris components override it if they build. It shouldn't build in Windows at all. Modify the odls to remove a (size_t) typecast in front of the num_processors variable just in case it is returned negative. This usually is accompanied by an opal_error, so this shouldn't make any difference - but it is more technically correct. This commit was SVN r19008.	2008-07-24 01:54:51 +00:00
Jeff Squyres	1fd5b0402a	Refs trac:1250 * Fix linux paffinity component to make a "best" guess when PLPA can't find topology information in the Linux kernel. That is, if PLPA can't tell us the max_processor_id, just assume that it's the same as the number of processors. If you have a more complex system than that (e.g., you have holes in your available processor IDs), you'll likely be running a Linux kernel that supports the topology information, and this problem won't happen. * Make sure to conver the return codes from PLPA to OPAL_ERR* codes. This commit was SVN r19001. The following Trac tickets were found above: Ticket 1250 --> https://svn.open-mpi.org/trac/ompi/ticket/1250	2008-07-23 15:47:43 +00:00
Shiqing Fan	5f021e47a9	- Add support for get_processor_info in windows paffinity module. This commit was SVN r18992.	2008-07-23 07:59:03 +00:00
Ralph Castain	83e7c19d33	Remove deprecated function - this was incorporated into the paffinity framework a long time ago. Fortunately, nobody was actually using it! This commit was SVN r18990.	2008-07-23 03:43:31 +00:00
Ralph Castain	f32e24ab86	Move the POSIX-specific code out of the paffinity base. Add support for OSX in its own component. For now, hide the OSX component with .ompi_ignore so only I can see it until I can ensure that it doesn't inadvertently interfere with Linux and Solaris support. This clears the conflict with Windows. This commit was SVN r18989.	2008-07-23 03:29:43 +00:00
Ralph Castain	28ca14297c	Add minimal support (#processors only) for OSX and other systems that don't have paffinity modules. This commit was SVN r18959.	2008-07-21 16:54:14 +00:00
George Bosilca	4f9ea0155b	Remove 2 compiler warnings. This commit was SVN r18956.	2008-07-21 12:55:40 +00:00
Shiqing Fan	54e93ff9d3	- This fix replaces r18899, which actually was not correct. - Revert the $2, which was correct. - It fixes the problem, that memchecker valgrind component could be compiled and is required, but it is unable to be selected. This commit was SVN r18906. The following SVN revision numbers were found above: r18899 --> open-mpi/ompi@0b1b96b598	2008-07-14 13:06:09 +00:00
Jeff Squyres	cb36782310	Make this parameter visible to users; it was a mistake/typo to make it hidden. This commit was SVN r18902.	2008-07-14 11:21:52 +00:00
Lenny Verkhovsky	a812324963	Fixing "paffinity_base_slot_list" environment This commit was SVN r18900.	2008-07-14 07:10:50 +00:00
Shiqing Fan	0b1b96b598	Fix the bug in memchecker/valgrind/configure.m4, which wrongly reset the CPPFLAG. This commit was SVN r18899.	2008-07-13 18:03:02 +00:00
Jeff Squyres	583bf425c0	Fixes trac:1383: Short version: remove opal_paffinity_alone and restore mpi_paffinity_alone. ORTE makes various information available for the MPI layer to decide what it wants to do in terms of processor affinity. Details: * remove opal_paffinity_alone MCA param; restore mpi_paffinity_alone MCA param * move opal_paffinity_slot_list param registration to paffinity base * ompi_mpi_init() calls opal_paffinity_base_slot_list_set(); if that succeeds use that. If no slot list was set, see if mpi_paffinity_alone was set. If so, bind this process to its Node Local Rank (NLR). The NLR is the ORTE-maintained slot ID; if you COMM_SPAWN to a host in this ORTE universe that already has procs on it, the NLR for the new job will start at N (not 0). So this is slightly better than mpi_paffinity_alone in the v1.2 series. * If a slot list is specified and mpi_paffinity_alone is set, we display an error and abort. * Remove calls from rmaps/rank_file component to register and lookup opal_paffinity mca params. * Remove code in orte/odls that set affinities - instead, have them just pass a slot_list if it exists. * Cleanup the orte/odls code that determined oversubscribed/want_processor as these were just opposites of each other. This commit was SVN r18874. The following Trac tickets were found above: Ticket 1383 --> https://svn.open-mpi.org/trac/ompi/ticket/1383	2008-07-10 21:12:45 +00:00
Jeff Squyres	7b2612696c	Remove all the keyval stuff from the MCA parameter functionality. The meat of it was commented out long ago, anyway (because of the way it was written, it violates OPAL<->OMPI abstraction barriers); we never ended up using the MPI keyval MCA parameter stuff. So just delete it. This commit was SVN r18860.	2008-07-10 01:52:51 +00:00
Jeff Squyres	49be4b1e45	Fixes trac:1383 Lenny and I went back and forth on whether we should simply register another "mpi_paffinity_alone" MCA param and then try to figure out which one was set in ompi_mpi_init, but there was difficulty in figuring out what to do. So it seemed like the Right Thing to do was to implement what was committed in r18770; then we could tell where MCA parameters were set from and you could do Better Things (this is also useful in the openib BTL, where parameters can be set either via MCA parameter or via an INI file). But after that was done, it seemed only a few steps further to actually implement two new features in the MCA params area: * Synonyms (where one MCA param name is a synonym for another) * Allow MCA params and/or their synonyms to be marked as "deprecated" (printing out warnings if they are used) These features have actually long been discussed/desired, and I had some time in airports and airplanes recently where I could work in this stuff on a standalone laptop. So I did it. :-) This commit introduces these two new features, and then uses them to register mpi_paffinity_alone as a non-deprecated synonym for opal_paffinity_alone. A few other random points in this commit: * Add a few error checks for conditions that were not checked before * Correct some comments in mca_base_params.h * Add a few comments in strategic places * ompi_info now prints additional information: * for any MCA parameter that has synonyms, it lists all the synonyms * synonyms are also output as 1st-class MCA params, but with an additional attribute indicating that they have a "parent" * all MCA param name (both "real" or "synonym") will output an attribute indicating whether it is deprecated or not. A synonym is deprecated if it iself is marked as deprecated (via the mca_base_param_regist_syn() or mca_base_param_register_syn_name() functions) or if its "parent" MCA parameter is deprecated This commit was SVN r18859. The following SVN revision numbers were found above: r18770 --> open-mpi/ompi@8efe67e08c The following Trac tickets were found above: Ticket 1383 --> https://svn.open-mpi.org/trac/ompi/ticket/1383	2008-07-10 01:44:51 +00:00
Jeff Squyres	480c17c332	Fix in minore memory leak This commit was SVN r18857.	2008-07-10 00:37:08 +00:00
Josh Hursey	c4035d848f	This commit fixes runs when there is no available CRS component (BLCR is unavailable, and SELF is deactivated). Previously the run would fail out of MPI_INIT since the OPAL CRS framework could not select a component. This is because the framework did not recognize the 'none' component as a full component because it was part of crs/base. I promoted the ''none'' component to a full component, and updated the other components to reflect this code movement. The ''none'' component is the default component unless the user requests '''-am ft-enable-cr''' to auto-select a component. There is an MCA parameter to show a warning if the application requested an FT enabled job, but the ''none'' component was selected ({{{crs_none_select_warning}}}). This temporarily fixes the problem mentioned in r18739. The full fix will entail working on ticket #1291. Thanks to Ethan from Sun for finding this bug. This commit was SVN r18840. The following SVN revision numbers were found above: r18739 --> open-mpi/ompi@a003fa7a50	2008-07-08 20:04:39 +00:00
Josh Hursey	22f4c829ba	cleanup BLCR configure so --without-blcr works correctly This commit was SVN r18825.	2008-07-08 02:48:20 +00:00
Lenny Verkhovsky	1ed465326b	Change of name conventions in carto NODE -> EDGE CONNECTION -> BRANCH SLOT -> SOCKET. This commit was SVN r18799.	2008-07-03 14:19:16 +00:00
Lenny Verkhovsky	ba1fa73881	Selectign Maffinity only if Paffinity selected fix This commit was SVN r18797.	2008-07-03 13:39:34 +00:00
Jeff Squyres	8efe67e08c	Improvements to the MCA param system: allow querying to find out where an MCA parameter's value came from. Note that the actual value of the parameter is irrelevant. For example, if a value was specified in an MCA parameter file that happened to have the same defaultvalue that was specified when the parameter was registered, the returned location will indicate that the value was set from the file. Possible answers: * '''MCA_BASE_PARAM_SOURCE_DEFAULT:''' no user-specified values were found, so the default value was used * '''MCA_BASE_PARAM_SOURCE_ENV:''' the value came from the environment (which also means the mpirun/orterun command line!) * '''MCA_BASE_PARAM_SOURCE_FILE:''' the value came a file (or the Windows registry) * '''MCA_BASE_PARAM_SOURCE_KEYVAL:''' the value came from a keyval (can currently never happen) * '''MCA_BASE_PARAM_SOURCE_OVERRIDE:''' the value came from an MCA param API "set" function This commit was SVN r18770.	2008-06-28 15:13:25 +00:00
Jeff Squyres	21c7d95109	Fixes trac:1365: if we're using !^ to negate module inclusion, then don't bother to check to see whether they exist or not. Specifically, this will not cause an error: {{{ shell$ mpirun --mca btl ^does_not_exist ... }}} but neither will this: {{{ shell$ mpirun --mca btl ^sm ... }}} (where the sm BTL ''does'' exist) This commit was SVN r18760. The following Trac tickets were found above: Ticket 1365 --> https://svn.open-mpi.org/trac/ompi/ticket/1365	2008-06-27 19:42:08 +00:00
Ralph Castain	830ea9dfe6	Reconnect the opal dss debug envar with the debug output This commit was SVN r18759.	2008-06-27 19:29:18 +00:00
Shiqing Fan	d129578694	Small fix for including unistd.h header file. This commit was SVN r18758.	2008-06-27 16:25:31 +00:00
Josh Hursey	a003fa7a50	C/R fix for broken CRS component selection resulting from r18707. Make sure that if we ask for the 'none' component (which is not a 'real' component, but a component in crs/base) then we do not fail out of the box when using tools. We check for the {{{OPAL_ERR_NOT_FOUND}}} error. Also make sure that component_open() returns {{{OPAL_ERR_NOT_FOUND}}} when it cannot find a value instead of {{{OPAL_ERROR}}} which means something quite a bit different. C/R is working but the tools still print the warning below everytime they are ran: {{{ -------------------------------------------------------------------------- A requested component was not found, or was unable to be opened. This means that this component is either not installed or is unable to be used on your system (e.g., sometimes this means that shared libraries that the component requires are unable to be found/loaded). Note that Open MPI stopped checking at the first component that it did not find. Host: odin.cs.indiana.edu Framework: crs Component: none -------------------------------------------------------------------------- }}} I'll have to figure out a work around for this warning (maybe work on the {{{MCA_NULL}}} Ticket #1291). This commit was SVN r18739. The following SVN revision numbers were found above: r18707 --> open-mpi/ompi@bdaaf01d8a	2008-06-25 14:55:09 +00:00
George Bosilca	2bc52a87d2	Related to my previous commit. The Sicortex is a MIPS machine, so allow the assembly to understand this. This commit was SVN r18732.	2008-06-25 03:09:02 +00:00
George Bosilca	872d957550	Allow Open MPI to configure correctly on the Sicortex machine. This commit was SVN r18731.	2008-06-25 03:07:53 +00:00
Brian Barrett	e9c50a29ba	Some (rare) platforms only have a #define for htonl and friends, but not anything in libc. Which causes an incorrect answer for AC_CHECK_FUNCS. Work around that by also checking for the #define. This commit was SVN r18730.	2008-06-24 23:20:25 +00:00
Brian Barrett	e7a299d046	Add timer support for Catamount This commit was SVN r18729.	2008-06-24 22:13:34 +00:00
Rolf vandeVaart	95cd9758e5	Fix broken build on Solaris. This commit was SVN r18719.	2008-06-24 14:57:12 +00:00
Ralph Castain	f70b7e51ce	Fix a missing header file and ensure we use a portable name for a system limit This commit was SVN r18712.	2008-06-23 22:32:26 +00:00
Jeff Squyres	bdaaf01d8a	Fixes trac:1338: Have the MCA base specifically check for all requested components. If they are not found / able to be opened, a warning will be printed and the mca_base_component_find() will return OPAL_ERR_NOT_FOUND. It is the upper-layer's responsibility to handle this error appropriately. This commit was SVN r18707. The following Trac tickets were found above: Ticket 1338 --> https://svn.open-mpi.org/trac/ompi/ticket/1338	2008-06-23 16:14:05 +00:00
Ralph Castain	ccbf194e8f	Visibility fix This commit was SVN r18687.	2008-06-19 19:08:08 +00:00
Ralph Castain	26c9ad5799	Clean-up the DSS API to remove two functions that are supposed to be used solely internally to the DSS. These were likely exposed because we need to call them when packing/unpacking declared types, but this means that developers may accidentally use the wrong functions, causing the DSS buffer to get confused. Instead, return the system to the way it used to work and hide those functions. This commit was SVN r18684.	2008-06-19 18:46:25 +00:00
Pak Lui	188c8bce5d	Fix the SEGV when module_get finds that no proc is binded. Also make no-intr available for processor binding. This commit was SVN r18671.	2008-06-18 16:03:08 +00:00
George Bosilca	f97a728dc6	Dont cast the int32_t pointer into a long pointer. This doesn't work on 64 bits architectures. This commit was SVN r18667.	2008-06-18 08:33:58 +00:00
Ralph Castain	0532d799d6	Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm. Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed. This commit was SVN r18664.	2008-06-18 03:15:56 +00:00
Jeff Squyres	16b2a50543	Slight clarification of help message. This commit was SVN r18661.	2008-06-17 11:25:32 +00:00
Jeff Squyres	c1d1ffbc56	Fix compile problems on systems with older versions of libnuma (that don't have MPOL_MF_MOVE). I know that this is a configure change in the middle of the US workday, but this compile problem is preventing work on several kinds of systems (e.g., RHEL4). This commit was SVN r18659.	2008-06-16 17:26:42 +00:00
Lenny Verkhovsky	dee2f1d175	Adding new functionality to Maffinity component to support NUMA awareness This commit was SVN r18657.	2008-06-15 07:27:29 +00:00
Brian Barrett	7712b07ac4	Add perl based wrapper compilers for cross-compile environments. The default is still to use the C based wrapper compilers (which have many more features and are more well tested). The Perl compilers are enabled with the option --enable-script-wrapper-compilers, which also ignores the option --disable-binaries (ie --enable-script-wrapper-compilers --disable-binaries will result in perl-based wrapper compilers being installed, but no other binaries being installed). This commit was SVN r18655.	2008-06-13 22:52:25 +00:00
Brian Barrett	79ad6d983e	- The ptmalloc2 memory manager component is now by default built as a standalone library named libopenmpi-malloc. Users wanting to use leave_pinned with ptmalloc2 will now need to link the library into their application explicitly. All other users will use the libc-provided allocator instead of Open MPI's ptmalloc2. This change may be overriden with the configure option enable-ptmalloc2-internal - The leave_pinned options will now default to using mallopt on Linux in the cases where ptmalloc2 was not linked in. mallopt will also only be available if munmap can be intercepted (the default whenever Open MPI is not compiled with --without-memory- manager. - Open MPI will now complain and refuse to use leave_pinned if no memory intercept / mallopt option is available. This commit was SVN r18654.	2008-06-13 22:32:49 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Ralph Castain	e1e224b81a	Silence a couple of minor compiler warnings This commit was SVN r18617.	2008-06-09 12:57:41 +00:00
Ralph Castain	7bee71aa59	Fix a potential, albeit perhaps esoteric, race condition that can occur for fast HNP's, slow orteds, and fast apps. Under those conditions, it is possible for the orted to be caught in its original send of contact info back to the HNP, and thus for the progress stack never to recover back to a high level. In those circumstances, the orted can "hang" when trying to exit. Add a new function to opal_progress that tells us our recursion depth to support that solution. Yes, I know this sounds picky, but good ol' Jeff managed to make it happen by driving his cluster near to death... Also ensure that we declare "failed" for the daemon job when daemons fail instead of the application job. This is important so that orte knows that it cannot use xcast to tell daemons to "exit", nor should it expect all daemons to respond. Otherwise, it is possible to hang. After lots of testing, decide to default (again) to slurm detecting failed orteds. This proved necessary to avoid rather annoying hangs that were difficult to recover from. There are conditions where slurm will fail to launch all daemons (slurm folks are working on it), and yet again, good ol' Jeff managed to find both of them. Thanks you Jeff! :-/ This commit was SVN r18611.	2008-06-06 19:36:27 +00:00
George Bosilca	b2aa751c28	Remove a race condition in the threaded mode. As a callback is allowed to modify the callback array (add or remove), make sure we don't call the same callback twice if it get remove in another thread. This commit was SVN r18608.	2008-06-06 15:54:40 +00:00
Josh Hursey	1de50b523c	Fix some Coverity 'Event set_but_not_used' highlights. Thanks to Jeff for bringing them to my attention. This commit was SVN r18606.	2008-06-06 14:38:41 +00:00
Jeff Squyres	12a3fe57e1	As pointed out by Ralf W. (http://www.open-mpi.org/community/lists/devel/2008/06/4095.php), these dependencies don't need to be here. This commit was SVN r18603.	2008-06-06 01:20:47 +00:00
Jeff Squyres	b123629e6a	Fix CIDs 458, 716, 717: ensure that strings are long enough to always be properly \0 terminated. This commit was SVN r18602.	2008-06-06 00:59:08 +00:00
Jeff Squyres	e2b08aaca4	Fix bad free's found in CID 707 and CID 708. This commit was SVN r18600.	2008-06-05 20:49:33 +00:00
Lenny Verkhovsky	a8b5dcb204	Added more output info about socket:core pair in paffinity / rankfile components This commit was SVN r18589.	2008-06-05 10:28:44 +00:00
Ralph Castain	ca91ec525b	Add a suffix to the opal_output stream descriptor object - we can now output both a prefix and a suffix for a given stream. Default the suffix to NULL. Remove lingering references to a filtering system as this will no longer be implemented. This commit was SVN r18586.	2008-06-04 20:52:20 +00:00
Josh Hursey	78f14b5255	Fix the none.checkpoint command. orte-checkpoint/orte-restart seem to not seem to totally like orte_output so revert them to opal_output for now. Since we have no need for the additional complexity of orte_output we can drop it for now and revisit this if anyone needs it later. It seems that if you set the verbose level on an output handle then try to call a normal orte_output() on it then the message will not be printed. This is the same for opal_output, and seems incorrect to me because it stops some error messages from being printed out if you do not directly specify opal_output(0, ...). Maybe someone should take a look a this. orte-checkpoint would segv if passed an incorrect PID. Fixed the return code so it errors out properly. Thanks to Eric Roman for bringing this to my attention. This commit was SVN r18583.	2008-06-04 14:44:11 +00:00
Jeff Squyres	530a15baa4	Fix cross-compiling scenario with valgrind.m4. This commit was SVN r18579.	2008-06-04 11:58:41 +00:00
Shiqing Fan	2dc812f720	Clean configure.m4 of memchecker/valgrind. If Valgrind is requested but wrong version is supplied, print error messages and stop. Save the CPPFLAGS in opal_memchecker_valgrind_CPPFLAGS, which could be used in Makefile.am. Many thanks to Jeff. This commit was SVN r18573.	2008-06-04 11:46:50 +00:00
Ralph Castain	9927b2445c	Remove the filter framework - the xml support will have to be provided in a different manner that will be implemented shortly This commit was SVN r18572.	2008-06-04 09:04:51 +00:00
Jeff Squyres	75a97ebbf0	Many thanks to Ralf W. for finding a subtle bug in these Makefile.am's that can sometimes cause problems with "make -j [N>1] install". Ensure to make the target directory before we copy stuff into it -- read the thread starting here for more details: http://www.open-mpi.org/community/lists/devel/2008/06/4080.php This commit was SVN r18570.	2008-06-04 01:28:03 +00:00
Jeff Squyres	3b568d4b14	Remove an old attempt to understand the tradeoffs with using GNU libc's malloc_hooks functionality, which turned out to be totally unusable in practice. I think we just always forgot to remove them. This commit was SVN r18547.	2008-05-30 00:11:12 +00:00
Shiqing Fan	b67a1244b6	Some small fixes. This commit was SVN r18541.	2008-05-29 15:05:28 +00:00
Jeff Squyres	ed5bc2cd08	Per http://www.open-mpi.org/community/lists/devel/2008/05/4057.php , remove the darwin memory hooks component This commit was SVN r18531.	2008-05-28 23:50:53 +00:00
Sharon Melamed	64fe554b8e	Fix bug in carto component select. After the insertion of mca_base_select the carto file component was never selected. This commit was SVN r18496.	2008-05-26 12:52:41 +00:00
Jeff Squyres	d45cb82ecc	Fix two bugs in PLPA: 1. If we don't have the topology information, don't bother trying to create cross-referencing information 1. Ensure to only check for valid processor ID's This commit was SVN r18462.	2008-05-20 12:57:12 +00:00
Terry Dontje	ef7ac86929	created opal_version_string and orte_version_string to match the ompi changes made in r18345 for ompi_version_string. This was done per request from Jeff Squyres to maintain consistency and to remove some warnings caused by the non-use of some static const char. This commit was SVN r18461. The following SVN revision numbers were found above: r18345 --> open-mpi/ompi@8dd0421015	2008-05-20 12:13:19 +00:00
Jeff Squyres	ea1582856f	Clarify some messages, move AC_ARG_WITH outside of the conditional This commit was SVN r18459.	2008-05-19 23:13:31 +00:00
Jeff Squyres	d12b21e21b	Ensure that if an error occurs, we actually return that error rather than an undefined value (which could be 0/OPAL_SUCCESS). This commit was SVN r18452.	2008-05-19 11:57:44 +00:00
Terry Dontje	517abf9b09	This commit fixes trac:1288. This commit was SVN r18441. The following Trac tickets were found above: Ticket 1288 --> https://svn.open-mpi.org/trac/ompi/ticket/1288	2008-05-15 17:40:08 +00:00
Jeff Squyres	fb17097de4	Make ompi_info correctly display "filter" components This commit was SVN r18435.	2008-05-13 20:56:20 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Josh Hursey	c70ba283b8	Fix a warning, and some return codes. Thanks to Jeff for pointing this out to me. This commit was SVN r18430.	2008-05-13 13:10:16 +00:00
Josh Hursey	4236255700	Add the framework name to the verbose message for improved debugging. Also set the 'best_priority' to the smallest 32 bit integer possible so negaive priority component can be selected if they are the highest ranking component available. This commit was SVN r18427.	2008-05-12 14:07:37 +00:00
Rainer Keller	b0cbeb0b41	- Add detection of __attribute__((hot)) and __attribute__((cold)) to allow explicit grouping of hot functions into similar code sections upon link-time. Should decrease TLB misses (iff the code- section is really too large)... Candidates for __opal_attribute_hot__ are MPI_Isend MPI_Irecv, MPI_Wait, MPI_Waitall Candidates for __opal_attribute_cold__ are MPI_Init, MPI_Finalize and MPI_Abort... This commit was SVN r18421.	2008-05-10 10:38:51 +00:00
Josh Hursey	9b0cd5b02a	Remove the 'include' check from mca_base_select. include/exclude is handled by the mca_base_open functionality and it is redundant (and wrong) to check this in the select function. Thanks to Pak Lui for bringing this to my attention. This commit was SVN r18418.	2008-05-08 23:41:07 +00:00
Josh Hursey	da2f1c58e2	Some checkpoint/restart cleanup. * Remove the opal_only option. This was suffering from bit rot, and no one uses it. It can be added back fairly easily if wanted. * Cleanup metadata interactions at the local level. * Touch up some of the INC funcitonality (fix typos and a minor ordering issue) This commit was SVN r18416.	2008-05-08 18:47:47 +00:00
Josh Hursey	8739edc580	Fix a couple of missing OPAL_DECLSPEC missing from r18407 This commit was SVN r18415. The following SVN revision numbers were found above: r18407 --> open-mpi/ompi@7c7b9b0486	2008-05-08 18:44:23 +00:00
George Bosilca	fe495e429a	Completely remove the kqueue support on MAC OS X. Remove the test from kqueue that try to detect if kqueue might works with ptys. This commit was SVN r18411.	2008-05-08 02:33:23 +00:00
Ralph Castain	7c7b9b0486	Do a little cleanup on the opal graph class and opal carto framework to conform to OMPI naming conventions and avoid potential conflict with user applications - no change in functionality, passes carto test program This commit was SVN r18407.	2008-05-07 19:33:49 +00:00
Josh Hursey	9971bc9d95	Merge in the mca_base_select changes per RFC: http://www.open-mpi.org/community/lists/devel/2008/04/3779.php {{{ svn merge -r 18276:18380 https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play . }}} Any components not in the trunk, but in one of the effected frameworks must be updated. Contact the list, look at the RFC, or look at the diff for how to do this. Sorry for the early commit of this, but I wanted to get it in today (per RFC) and didn't know if I would have a chance later today. This commit was SVN r18381.	2008-05-06 18:08:45 +00:00
Aurelien Bouteiller	c06620ad70	Add a const to the parameters of opal_dss_compare. This commit was SVN r18374.	2008-05-05 19:12:01 +00:00
Brad Penoff	4f104ba5d1	Add header for FreeBSD. This commit was SVN r18366.	2008-05-03 23:07:45 +00:00
George Bosilca	f5dfc005a4	Only check for /proc/cpuinfo if we are on a supported architecture. This commit was SVN r18331.	2008-04-29 22:36:18 +00:00
George Bosilca	465f690f90	We need to force the compiler to preprocess these files as some of them use #include. The standard way is to rename to file .S instead of .s. This commit was SVN r18290.	2008-04-24 21:40:40 +00:00
Josh Hursey	2c736873bb	Fix a checkpoint/restart bug that causes a restarted application to occasionally throw a SIGSEGV or SIGPIPE due to invalid socket descriptors. The problem was caused by a bad ordering between the restart of the ORTE level tcp connections (in the OOB - out-of-band communication) and the Open MPI level tcp connections (BTLs). Before this commit ORTE would shutdown and restart the OOB completely before the OMPI level restarted its tcp connections. What would happen is that a socket descriptor used by the OMPI level on checkpoint was assigned to the ORTE level on restart. But the OMPI level had no knowledge that the socket descriptor it was previously using has been recycled so it closed it on restart. This caused the ORTE level to break as the newly created socket descriptor was closed without its knowledge. The fix is to have the OMPI level shutdown tcp connections, allow the ORTE level to restart, and then allow the OMPi level to restart its connections. This seems obvious, and I'm surprised that this bug has not cropped up sooner. I'm confident that this specific problem has been fixed with this commit. Thanks to Eric Roman and Tamer El Sayed for their help in identifying this problem, and patience while I was fixing it. * Add a new state {{{OPAL_CRS_RESTART_PRE}}}. This state identifies when we are on the down slope of the INC (finalize-like) which is useful when you want to close, but not reopen a component set for fear of interfering with a lower level. * Use this new state in OMPI level coordination. Here we want to make sure to play well with both the OMPI/BTL/TCP and ORTE/OOB/TCP components. * Update ft_event functions in PML and BML to handle the new restart state. * Add an additional flag to the error output in OOB/TCP so we can see what the socket descriptor was on failure as this can be helpful in debugging. This commit was SVN r18276.	2008-04-24 17:54:22 +00:00
Shiqing Fan	4a9787979e	When valgrind is not available or it is deselected (--without-valgrind, --with-valgrind=no), don't compile this component, continue without abortion. This commit was SVN r18243.	2008-04-23 11:50:42 +00:00
Josh Hursey	cc83d41ad9	Merge in tmp/jjh-scratch {{{ svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch . }}} Contains: * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart. * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P. * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry * Some other sundry cleanup items all dealing with C/R functionality in the trunk. This commit was SVN r18241.	2008-04-23 00:17:12 +00:00
Jeff Squyres	db2695ccab	Make the symbols be visible. This commit was SVN r18201.	2008-04-18 00:26:17 +00:00
Ralph Castain	fa082cafa9	Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex. Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer. This commit was SVN r18198.	2008-04-17 20:43:56 +00:00
George Bosilca	01148b77dc	Generate the help message for the available event ops. Now the list only contains the one that are compiled on the current ompi. This commit was SVN r18196.	2008-04-17 18:16:54 +00:00
Ralph Castain	e7487ad533	Implement the seq rmaps module that sequentially maps process ranks to a list hosts in a hostfile. Restore the "do-not-launch" functionality so users can test a mapping without launching it. Add a "do-not-resolve" cmd line flag to mpirun so the opal/util/if.c code does not attempt to resolve network addresses, thus enabling a user to test a hostfile mapping without hanging on network resolve requests. Add a function to hostfile to generate an ordered list of host names from a hostfile This commit was SVN r18190.	2008-04-17 13:50:59 +00:00
Shiqing Fan	49fbc4e795	These functions should always have a return value. This commit was SVN r18174.	2008-04-16 13:54:15 +00:00
George Bosilca	b359d84661	Use the correct prefix. This commit was SVN r18048.	2008-03-31 21:42:59 +00:00
George Bosilca	be2454e0c5	Default the temporary directory to /tmp if no special environment variables are set. This commit was SVN r18046.	2008-03-31 20:15:49 +00:00
George Bosilca	ee784b601e	For consistency reasons always use opal_home_directory and opal_tmp_directory. This commit was SVN r18043.	2008-03-31 18:13:41 +00:00
George Bosilca	60111ce66d	Few less warnings. This commit was SVN r18025.	2008-03-30 19:06:49 +00:00
Lenny Verkhovsky	fa6a084d33	added opal/mca/paffinity/base/paffinity_base_service.c with paffinity functions This commit was SVN r18020.	2008-03-30 12:01:02 +00:00
Lenny Verkhovsky	7e45d7e134	Few updates due to RMAPS rank_file component changes 1. applied prefix rule to functions and variables of RMAPS rank_file component 2. cleaned ompi_mpi_init.c from paffinity code 3. paffinity code moved to new opal/mca/paffinity/base/paffinity_base_service.c file 4. added opal_paffinity_slot_list mca parameter This commit was SVN r18019.	2008-03-30 11:52:11 +00:00
Shiqing Fan	f82092566f	We don't have inttypes.h on Windows, and some types are redefined. This commit was SVN r18010.	2008-03-28 17:33:54 +00:00
Shiqing Fan	aaf2730fab	Winsock2.h also has definition for timeval and so on, it conflicts with our own definitions. This commit was SVN r18009.	2008-03-28 17:30:33 +00:00
Jeff Squyres	6ea36061cf	Fix typo found by Pak. This commit was SVN r18000.	2008-03-27 23:04:17 +00:00
Jeff Squyres	c06f7c3992	Fixes trac:1254: ensure that evport.c is in the distribution tarball. This commit was SVN r17989. The following Trac tickets were found above: Ticket 1254 --> https://svn.open-mpi.org/trac/ompi/ticket/1254	2008-03-27 16:40:55 +00:00
Sharon Melamed	afa98f92e8	Changed the for loop to a while loop so I could release the edge without conflicting with get next. This commit was SVN r17979.	2008-03-26 14:45:45 +00:00
Jeff Squyres	33c09b30c2	Patch from George: ensure that we don't overwrite timer_linux_happy improperly when checking the host type. This commit was SVN r17975.	2008-03-26 11:22:57 +00:00
George Bosilca	4a5431ef11	Remove the event-config.h file, it is never used. Correct the include logic that protect the headers. It's amazing that this didn't bite us yet ... This commit was SVN r17971.	2008-03-26 03:33:43 +00:00
George Bosilca	64bc580c78	Use evutil_timercmp instead of timercmp to take advantage of the fallback installed in evutil.h. This commit was SVN r17968.	2008-03-25 23:54:30 +00:00
George Bosilca	2e46a53b0a	Avoid strcpy if its not really required. This commit was SVN r17962.	2008-03-25 22:40:20 +00:00
George Bosilca	028c7391d3	Coverty fix: Replace strcpy by strncpy. This commit was SVN r17961.	2008-03-25 22:39:24 +00:00
George Bosilca	6717b2dc75	Add the Solaris evport to the list of available event subsystems. This commit was SVN r17958.	2008-03-25 18:00:40 +00:00
Jeff Squyres	763218e754	Fix #1253 : default libevent to use select/poll and only use the other mechanisms (such as epoll) if someone (ompi_mpi_init()) requests otherwise. See big comment in opal/event/event.c for a full explanation. This commit was SVN r17956.	2008-03-25 17:18:17 +00:00
George Bosilca	03c10e2a85	Add the Solaris evport support. This commit was SVN r17954.	2008-03-25 16:44:27 +00:00
George Bosilca	9222ea0d0a	Cast the uintptr_t to int when playing with fds. This commit was SVN r17925.	2008-03-23 18:16:29 +00:00
Jeff Squyres	8239e40607	Add header for OS X. This commit was SVN r17924.	2008-03-23 12:57:57 +00:00
Jeff Squyres	314ab2c6e7	Update internal libevent to upstream (v1.4.2-rc + OMPI changes). Greatly reduce the number of "foo" -> "opal_foo" symbol renames in the libevent source, and instead greatly expand the event_rename.h file that uses preprocessor macros to make all public symbols be "opal_foo". This commit was SVN r17923.	2008-03-23 12:33:04 +00:00
Jeff Squyres	dee561d29e	Per recent off-list discussions about the build system, I have done some cleanups and standardizations in the various /tools// Makefile.am files. This commit: * Somewhat simplify the tool Makefile.am's * Makes the tool Makefile.am's consistent with each other (do similar actions in similar ways) * Update the tool Makefile.am's to remove old kruft that was required by older versions of AM (trunk requires AM >=1.10) This commit was SVN r17921.	2008-03-22 02:04:05 +00:00
Jeff Squyres	05a7b1ed55	Remove svn:executable from these files. This commit was SVN r17918.	2008-03-21 21:16:11 +00:00
Jeff Squyres	a4ec8a9d53	Spring cleaning -- no one is using this stuff; remove it from the tree. This commit was SVN r17913.	2008-03-21 17:14:42 +00:00
Jeff Squyres	e0fb3957cb	Patch from Brian: * The opal_sys_timer_get_cycles() call was implemented for Sparc v9 using inline assembly, but not in the assembly files. This would only currently matter on Linux Sparc systems using a compiler that didn't support inline assembly (not many of those), but it should be there for completion. * The linux timer component would always build on non-Alpha platforms, rather than only building on platforms where opal_sys_timer_get_cycles() was implemented. This would only matter on a very narrow set of platforms that we don't really support, but still, it could be more right. We now only build the component on platforms where we have the assembly call to get the cycle counter. * Added a comment to opal/sys/timer.h to note that the linux timer component needed to be updated if another platform was added. This should be harmless to commit. It will only really change behaviors on platforms we don't have assembly support for, which currently won't make it through configure. It really only matters when (if?) we support atomic operations through libatomic_ops. This commit was SVN r17887.	2008-03-20 00:29:36 +00:00
George Bosilca	3997639ec6	Hide what should be hidden, and expose the others. Plus some indentation. This commit was SVN r17856.	2008-03-18 03:00:08 +00:00
Jeff Squyres	f443644bfe	From Brian B.: This commit lowers the priority of the darwin backtrace component below that of the ''execinfo'' and ''stackprint'' components, which will cause OS X Leopard to use the ''execinfo'' component. execinfo utilizes a public API for printing the stacktrace. The ''darwin'' component uses some evil hacks and a not-so supported package from Apple to print the stack trace. This commit was SVN r17840.	2008-03-17 13:39:25 +00:00
Jeff Squyres	9b18b0e9c6	Fix visibility symbols on OS X This commit was SVN r17838.	2008-03-17 13:18:12 +00:00
George Bosilca	210631962c	Add two convenience functions in order to make sure we get these environment variables in a consistent manner. These functions retrieve the user and the temporary directories (based on the system). This commit was SVN r17815.	2008-03-13 17:56:44 +00:00
Jon Mason	2e8a316ae6	opal_evtimer_initialized is missing the opening '(' This commit was SVN r17814.	2008-03-12 20:33:22 +00:00
Sharon Melamed	4a8e2a2648	Renove status check from carto initiation. This commit was SVN r17812.	2008-03-12 08:55:28 +00:00
George Bosilca	4267f2b967	This symbol have to be visible. This commit was SVN r17793.	2008-03-08 23:53:17 +00:00
Rainer Keller	32dcd9e551	- Adding #include <stdbool.h> with protection in r17488 and r17504 seemed to be the right thing(tm), but broke the Sun Studio C++ compiler under Linux (ticket 747). This patch should allow inclusion into C and C++ from other header files without problems. This commit was SVN r17792. The following SVN revision numbers were found above: r17488 --> open-mpi/ompi@d53131f261 r17504 --> open-mpi/ompi@b22e8e7567	2008-03-08 12:53:10 +00:00
Josh Hursey	aaff245271	A couple verbose additions. Poll the event engine while waiting for the named pipe. This commit was SVN r17787.	2008-03-07 21:10:14 +00:00
Jeff Squyres	b2ed2b95aa	Fix filename so that the help file can be found. This commit was SVN r17759.	2008-03-06 14:44:47 +00:00
Rolf vandeVaart	91af56db00	Fix a few typos so this compiles on Solaris. Remove some trailing spaces. This commit was SVN r17746.	2008-03-05 20:16:00 +00:00
Aurelien Bouteiller	c280b81e40	Revert the last patch. Still some warning should be issued on ia32 architectures. Looking for a fix. This commit was SVN r17745.	2008-03-05 17:20:11 +00:00
Josh Hursey	612ebdc2ac	Cleanup some symbol visability issues. This commit was SVN r17733.	2008-03-05 13:59:25 +00:00
Josh Hursey	3b4073e32c	This commit fixes the checkpoint/restart functionality on the trunk. Included in this commit are: * Extension to the ESS framework to support C/R * Fixed support for {{{snapc_base_establish_global_snapshot_dir}}} * Fixed FileM support * Misc. minor code modifications There are some outstanding visability issues that I want to fix next. This commit was SVN r17725.	2008-03-05 04:57:23 +00:00
Jeff Squyres	ea5c0cb4a2	Now that the nightly tarball has safely been made, let's try this commit again. Remove the svn:ignore from problematic directories and try a merge from /tmp-public/plpa-merge-area2. This commit was SVN r17718.	2008-03-05 02:45:15 +00:00
Jeff Squyres	8189fcc7d5	Back out r17702; it went very badly. This commit was SVN r17704. The following SVN revision numbers were found above: r17702 --> open-mpi/ompi@3df754ebd7	2008-03-05 00:42:39 +00:00
Jeff Squyres	3df754ebd7	Bring over PLPA v1.1 from /tmp-public/plpa-v1.1 branch. This commit was SVN r17702.	2008-03-05 00:16:49 +00:00
Aurelien Bouteiller	284115208c	Try to blindly solve warning about size_t printf format, as I can't reproduce the warning on my machines. This commit was SVN r17701.	2008-03-04 22:30:35 +00:00
Tim Prins	824c298abf	Move the carto finalize from the util finalize to the main finalize where it belongs. Otherwise, the modules are unloaded by the mca before we try to do carto_finalize, and bad things happen. This commit was SVN r17665.	2008-02-29 12:49:04 +00:00
Tim Prins	84b2099fe8	Remove the now-unused orte_value_array. As this is the last 'class' split between orte and ompi, remove the big comment about the split in ompi_bitmap. Also, update some properties (source files should not be executeable...), and remove a couple unneeded inclusions of orte_proc_table.h This commit was SVN r17655.	2008-02-28 21:39:42 +00:00
Tim Prins	2e1bda6d23	Remove the now-unused arithmatic interface to the dss This commit was SVN r17654.	2008-02-28 21:36:51 +00:00
Ralph Castain	8d819cf3d3	Move carto open/close/finalize to opal layer so that ORTE can get access to topo info. This will be used to support a topo grpcomm that optimizes communications in non-uniform topologies like RR. This commit was SVN r17652.	2008-02-28 21:04:30 +00:00
Ralph Castain	5e6928d710	Cleanup recursions in ORTE caused by processing recv'd messages that can cause the system to take action resulting in receipt of another message. Basically, the method employed here is to have a recv create a zero-time timer event that causes the event library to execute a function that processes the message once the recv returns. Thus, any action taken as a result of processing the message occur outside of a recv. Created two new macros to assist: ORTE_MESSAGE_EVENT: creates the zero-time event, passing info in a new orte_message_event_t object ORTE_PROGRESSED_WAIT: while waiting for specified conditions, just calls progress so messages can be recv'd. Also fixed the failed_launch function as we no longer block in the orted callback function. Updated the error messages to reflect revision. No change in API to this function, but PLM "owners" may want to check their internal error messages to avoid duplication and excessive output. This has been tested on Mac, TM, and SLURM. This commit was SVN r17647.	2008-02-28 19:58:32 +00:00
George Bosilca	9d421bea2a	Replace all occurences of orte_pointer_array by opal_pointer_array. Remove the implementation of orte_pointer_array. This commit was SVN r17636.	2008-02-28 05:32:23 +00:00
George Bosilca	f256dd6010	Don't free the node2_name it is not yet set at this point. This commit was SVN r17634.	2008-02-28 05:17:20 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Aurelien Bouteiller	6ea23283a8	Added a PRIsize_t constant to help printing size_t without having to cast them to long long explicitely everywhere. This commit was SVN r17626.	2008-02-27 19:38:14 +00:00
Josh Hursey	5e0d17ec99	Forgot a case in which we should check is the checkpoint is ready during the threaded CR builds. MTT caught this by running the IU FT CR test 'inflight' which under certian timing scenarios will trigger this. This commit was SVN r17538.	2008-02-21 13:34:27 +00:00
Josh Hursey	a169575ab2	A quick fix for opal only apps (really this time) This commit was SVN r17537.	2008-02-20 22:33:42 +00:00
Josh Hursey	ad9fbf2a92	a fix for opal only apps This commit was SVN r17536.	2008-02-20 21:17:08 +00:00
Josh Hursey	99144db970	Improve checkpoint/restart support by allowing a checkpoint to progress when the process is not in the MPI library. This involves creating a separate thread for polling for a checkpoint request. This thread is active when the MPI process is not in the MPI library, and paused when the MPI process is in the library. Some MPI C interface files saw some spacing changes to conform to the coding standards of Open MPI. Changed MPI C interface files to use {{{OPAL_CR_ENTER_LIBRARY()}}} and {{{OPAL_CR_EXIT_LIBRARY()}}} instead of just {{{OPAL_CR_TEST_CHECKPOINT_READY()}}}. This will allow the checkpoint/restart system more flexibility in how it is to behave. Fixed the configure check for {{{--enable-ft-thread}}} so it has a know dependance on {{{--enable-mpi-thread}}} (and/or {{{--enable-progress-thread}}}). Added a line for Checkpoint/Restart support to {{{ompi_info}}}. Added some options to choose at runtime whether or not to use the checkpoint polling thread. By default, if the user asked for it to be compiled in, then it is used. But some users will want the ability to toggle its use at runtime. There are still some places for improvement, but the feature works correctly. As always with Checkpoint/Restart, it is compiled out unless explicitly asked for at configure time. Further, if it was configured in, then it is not used unless explicitly asked for by the user at runtime. This commit was SVN r17516.	2008-02-19 22:15:52 +00:00
Rainer Keller	b22e8e7567	- Need stdbool.h if included in userland This commit was SVN r17504.	2008-02-19 00:39:48 +00:00
Rainer Keller	d53131f261	- Need stdbool.h if included in userland; additionally protect stdbool / stdarg.h This commit was SVN r17488.	2008-02-18 08:11:57 +00:00
Aurelien Bouteiller	e7aaf6aa67	Patch to introduce PRI printf constants on architecture that do not provide C99 inttypes.h. Mainly usefull on windows, but might also prove helpful to deal with all the size_t and other size changing datatypes that used to be casted long long in printf/opal_output to avoid warnings. This commit was SVN r17451.	2008-02-14 03:31:49 +00:00
Josh Hursey	95c31388e1	It was observed that the component constraint logic is currently only used by the checkpoint/restart feature. Other constraints could be enforced here, but at the moment it is only the checkpointable constraint. So this commit just removes this logic from non-c/r builds. If someone wanted to add a new constraint in the future then there is a comment in the code that directs them a bit. This commit was SVN r17447.	2008-02-13 19:26:25 +00:00
Sharon Melamed	5b2dab2439	Reverted commit # r17443 This commit was SVN r17446. The following SVN revision numbers were found above: r17443 --> open-mpi/ompi@88ce5a2b73	2008-02-13 14:07:12 +00:00
Sharon Melamed	88ce5a2b73	Replaced PLPA to the latest PLPA (plpa-1.1a3r123) This commit was SVN r17443.	2008-02-13 13:09:11 +00:00
Rainer Keller	9cd2c6f48b	- Instead of calling RUNNING_ON_VALGRIND, implement specific function, thereby removing bogus requirement on valgrind/valgrind.h dough... - Call specific function runindebugger() before doing expensive checks on each component of struct. - Get rid of void* warnings.. This commit was SVN r17438.	2008-02-12 20:37:51 +00:00
Rainer Keller	7621800477	- Fix and add comments -- output full name for pd - Protect argument in macro... This commit was SVN r17434.	2008-02-12 16:59:59 +00:00
Rainer Keller	b20f434306	- really minor fix in comment. This commit was SVN r17433.	2008-02-12 16:54:27 +00:00
Shiqing Fan	f5792bbda5	merging the memchecker into trunk. This commit was SVN r17424.	2008-02-12 08:46:27 +00:00
Sharon Melamed	51f8308c68	Added Bi-Directional connection in the carto file. This commit was SVN r17393.	2008-02-07 09:51:19 +00:00
Sharon Melamed	c9f80caf7c	fixed a printing bug in case the carto file is not found. This commit was SVN r17392.	2008-02-07 09:02:23 +00:00
Sharon Melamed	98e8de264d	Wraped the carto API in carto_base_wrapers.c This commit was SVN r17380.	2008-02-05 19:29:16 +00:00
Sharon Melamed	9ef46de2f5	added proper wraping to the paffinity new APIs This commit was SVN r17379.	2008-02-05 17:37:17 +00:00
Pak Lui	6900fe36c2	Restore the solaris paffinity with an older but working implementation with processor_bind() instead of the pset_*() implementation that is commented out. There's also a fix for allowing some Sun platforms which have non-contiguous CPU IDs to do processor binding. This commit was SVN r17309.	2008-01-29 16:09:56 +00:00
Ralph Castain	71378305ed	The static-components.h file should never be under svn control - it is dynamically generated during build. Update properties to ignore that file. Update properties to ignore the carto_file_lex.c file since that is also dynamically generated. Update the build-hgignore.pl to properly disregard DS_Store files This commit was SVN r17301.	2008-01-29 14:18:00 +00:00
Sharon Melamed	3374d56739	This file was added to the carto tree by mistake. this file is supposed to be generated by lex. This commit was SVN r17257.	2008-01-27 09:09:55 +00:00
George Bosilca	fc4bb9c87e	Update the generated file. This one was generated using a very recent version of flex (2.5.33). This commit was SVN r17253.	2008-01-26 20:22:57 +00:00
George Bosilca	7dddbe5e29	Protect the system headers. This commit was SVN r17252.	2008-01-26 18:54:27 +00:00
Jeff Squyres	3f94d6a494	Properly qualify the filename. #$%@#%#@!!! This commit was SVN r17229.	2008-01-25 12:04:35 +00:00
George Bosilca	ddcfc78f52	Add the missing header to the header list. This commit was SVN r17222.	2008-01-25 02:28:16 +00:00
George Bosilca	f7e8fda58b	Remove the dependencies on the libopen-pal. Add the visibility attributes. This commit was SVN r17220.	2008-01-25 00:33:55 +00:00
George Bosilca	7b1132b623	Remove some warnings about uninitialized variables (the code was correct but the compilers are not yet that smart). Add the dependency to output.h in order to be able to use opal_output. This commit was SVN r17195.	2008-01-24 00:39:24 +00:00
Sharon Melamed	025b68becf	Move the carto framework to the trunk. This commit was SVN r17177.	2008-01-23 09:20:34 +00:00
Sharon Melamed	526a12620d	Expanded the paffinity interface. Added: map_to_processor_id, map_to_socket_core, max_processor_id, max_socket, max_core. In OS other then Linux, those functions will return OPAL_ERR_NOT_SUPPORTED. --This Line, and those below, will be ignored-- M paffinity/linux/paffinity_linux_module.c M paffinity/paffinity.h M paffinity/base/base.h M paffinity/base/paffinity_base_wrappers.c M paffinity/windows/paffinity_windows_module.c M paffinity/solaris/paffinity_solaris_module.c This commit was SVN r17173.	2008-01-22 07:22:24 +00:00
Adrian Knoth	601fb4389d	Cosmetics for r17150. Closes trac:1201 This commit was SVN r17151. The following SVN revision numbers were found above: r17150 --> open-mpi/ompi@4b50f02126 The following Trac tickets were found above: Ticket 1201 --> https://svn.open-mpi.org/trac/ompi/ticket/1201	2008-01-17 12:29:12 +00:00
Adrian Knoth	4b50f02126	Only free res iff it's been allocated before. Re #1201 This patch fixes the segfault, so closing the ticket might be possible. It's a very conservative patch. Perhaps the freeaddrinfo spec says that it will never allocate res in case of errors, but for now, I neither have the spec nor the will to rely on it. This commit was SVN r17150.	2008-01-17 10:01:52 +00:00
Jeff Squyres	cc3805d861	Because opal_list is used in the C++ bindings, where not having "const" in the argument creates [correct] warnings (because __FILE__ is a (const char)). Plus, opal_object.cls_init_file_name is already (const char). This commit was SVN r17145.	2008-01-15 23:50:30 +00:00
George Bosilca	7b0e295057	Fix a small memory leak. This commit was SVN r17095.	2008-01-09 20:37:02 +00:00
Gleb Natapov	09de1da7ee	Undefine MORECORE_CANNOT_TRIM. We don't call free() from the callback any more. This commit was SVN r17065.	2008-01-08 10:08:35 +00:00
George Bosilca	3d387bdab9	Add defines for the INT16 min and max value. This commit was SVN r17052.	2008-01-04 23:09:31 +00:00
Jeff Squyres	95fa693273	In r17007, ompi_pointer_array.c the logic from the ompi_pointer_array.c:ompi_pointer_array_set_item() was slightly changed such that the "find the next open slot when the requested index was already open" logic was no longer right -- since the new lowest_free value is not set until ''after'' we look for the next open slot, we need to start searching for the new lowest_free slot at the (index+1) position (not the index position). This commit was SVN r17021. The following SVN revision numbers were found above: r17007 --> open-mpi/ompi@906e8bf1d1	2007-12-21 20:19:55 +00:00
Ralph Castain	401dc49686	Cleanup compiler warnings about comparing signed and unsigned values This commit was SVN r17011.	2007-12-21 14:22:27 +00:00
George Bosilca	906e8bf1d1	Replace the ompi_pointer_array with opal_pointer_array. The next step (sometimes after the merge with the ORTE branch), the opal_pointer_array will became the only pointer_array implementation (the orte_pointer_array will be removed). This commit was SVN r17007.	2007-12-21 06:02:00 +00:00
Jeff Squyres	a1b0914037	Fix prototypes for platforms that fall back to the inline C versions of opal_atomic_[add\|sub]_[32\|64]. This commit was SVN r17005.	2007-12-20 22:13:25 +00:00
Ethan Mallove	2b48f42637	Mark XLC atomics as non-inline. This commit was SVN r16989.	2007-12-18 16:18:49 +00:00
Jeff Squyres	213b5d5c6e	Per long threads on the mailing list and much confusion discussion about linkers, have all OPAL, ORTE, and OMPI components '''not'' link against the OPAL, ORTE, or OMPI libraries. See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a better-formatted version of the same info). This commit was SVN r16968.	2007-12-15 13:32:02 +00:00
Ethan Mallove	a20a1a806a	Rework of r16807. For opal atomics: * Conditionalize around `static inline` using `OPAL_HAVE_INLINE_ATOMIC` macros Remove redundant `opal_atomic*` prototypes (they belong in the top-level `sys/atomic.h` This commit was SVN r16957. The following SVN revision numbers were found above: r16807 --> open-mpi/ompi@b7c885247a	2007-12-14 15:11:35 +00:00
Jon Mason	d77c2430c0	Fix minor spelling error This commit was SVN r16936.	2007-12-11 20:11:03 +00:00
Terry Dontje	351117a254	This commit fixes trac:747 This commit was SVN r16892. The following Trac tickets were found above: Ticket 747 --> https://svn.open-mpi.org/trac/ompi/ticket/747	2007-12-07 15:56:07 +00:00
Jeff Squyres	00131df353	Fix typo in incorrect variable name; only noticed now because someone actually compiled on a system without syslog support (Brian B.). :-) This commit was SVN r16863.	2007-12-06 11:36:44 +00:00
Ethan Mallove	58bcf14f8b	Back r16807 out of sys/atomic.h. This commit was SVN r16825. The following SVN revision numbers were found above: r16807 --> open-mpi/ompi@b7c885247a	2007-12-03 19:32:43 +00:00
Josh Hursey	27c9016b93	sleep -> usleep so we can be a bit more eager when waiting for events to finish. Still working on solutions that do not involve sleeping, but this will do for now. This commit was SVN r16824.	2007-12-03 19:27:32 +00:00
Ethan Mallove	b7c885247a	* Typo: change `__volatile` to `__volatile__`. Some compilers (e.g., gcc) are indifferent about this, while others are more particular (e.g., Sun Studio 12). * Typo: `asms.s` to `asm.s` * Eliminate "foo is multiply-defined" linker errors on Solaris by making the declarations in `opal/sys/atomic.h` agree with their corresponding definitions (use `static inline` in both places). This commit was SVN r16807.	2007-11-30 17:59:12 +00:00
Josh Hursey	bbef304f04	Convert the runtime version checks to be configure time checks (As they should have been from the start). This should fix the nightly build. This commit was SVN r16706.	2007-11-09 06:13:40 +00:00
Josh Hursey	287ca882d3	Only process a checkpoint request from BLCR if this process was the one requesting it. This commit adds a bit of error checking to keep us from participating in a checkpoint that we did not initiate and therefore are not ready for. Thanks to Paul Hargrove and Eric Roman for their help with this. This commit was SVN r16694.	2007-11-08 14:37:11 +00:00
Jeff Squyres	714b409595	Fix an uninitialized variable in the error case. Thanks to Ake Sandgren for pointing out the mistake. This commit was SVN r16682.	2007-11-07 01:52:23 +00:00
Rainer Keller	37c1b6a67e	- As with rev16656, value is not modified. Get rid of compiler warning from g++ - trunk This commit was SVN r16670.	2007-11-06 10:56:06 +00:00
Rainer Keller	9045c5a6f1	- Value pointed to is not modified (file-name / FILE-macro), getting rid of compiler-warning when compiled with trunk of g++: when doing --enable-debug: ../../../../orte/class/orte_pointer_array.h:128: warning: deprecated conversion from string constant to 'char*' This commit was SVN r16656.	2007-11-05 13:03:35 +00:00
Ethan Mallove	005652c9d4	* Embed ident strings into the Open MPI libraries using one of the following methods (in order of precedence): 1. #pragma ident <ident string> (e.g., Intel and Sun) 1. #ident <ident string> (e.g., GCC) 1. static const char ident[] = <ident string> (all others) By default, the ident string used is the standard Open MPI version string. Only the following libraries will get the embedded version strings (e.g., DSOs will not): * libmpi.so * libmpi_cxx.so * libmpi_f77.so * libopen-pal.so * libopen-rte.so * Added two new configure options: * `--with-package-name="STRING"` (defaults to "Open MPI username@hostname Distribution"). `STRING` is displayed by `ompi_info` next to the "Package" heading. * `--with-ident-string="STRING"` (defaults to the standard Open MPI version string - e.g., X.Y.Zr######). `%VERSION%` will expand to the Open MPI version string if it is supplied to this configure option. This commit was SVN r16644.	2007-11-03 02:40:22 +00:00
Jeff Squyres	dd27622814	Fix fd leak noted by Paul Hargrove. http://www.open-mpi.org/community/lists/devel/2007/10/2493.php This commit was SVN r16564.	2007-10-25 16:03:21 +00:00
Josh Hursey	0bf61a1b84	Move in some accumulated small features and minor bug fixes for C/R support. {{{ svn merge -r 16447:16475 https://svn.open-mpi.org/svn/ompi/tmp/jjh-fgs . }}} This commit was SVN r16478.	2007-10-17 13:47:36 +00:00
Tim Prins	12d3ad4c5c	remove unused and outdated opal message buffer code This commit was SVN r16436.	2007-10-11 22:09:01 +00:00
Josh Hursey	06a30e7f3a	Add a quick check to make sure the BLCR being used has a working cr_request. If it doesn (version < 0.6.0) then fallback to fork/exec of cr_checkpoint command. This commit was SVN r16400.	2007-10-09 13:51:28 +00:00
Josh Hursey	7437f37e96	This commit contains the following: * Fix some missing includes in a few places. * Add the cr_request() functionality to the BLCR CRS component. We are now dependent upon the 0.6.* series of BLCR. * Made the CR notification mechanism a registered function. This way we can have an OPAL-only version and it can be replaced at runtime with the ORTE version. * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only CR functionality when the user wants it. Default: Disabled. * Fix the placement of a checkpoint request check in MPI_Init * Pull the OPAL notification mechanism into the SnapC framework. * We no longer fork/exec the 'opal-checkpoint' command for local checkpointing, the Local coordinator in the orted does this directly. * The Local and Application coordinator talk together bypassing the OPAL notifiation mechanism. * Optimized the Local <-> App Coordinator communication. * Improved the structure used to track vpid_snapshots in the local coord. * Fix a race condition in which an application under heavy communication load may produce an inconsistent global checkpoint. This commit was SVN r16389.	2007-10-08 20:53:02 +00:00
Torsten Hoefler	e985812e1f	fixing a comment to be more detailed about opal_output_open functionality ... This commit was SVN r16370.	2007-10-06 17:33:57 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Josh Hursey	e10f476c87	Bring over the jjh-filem branch which contains a non-blocking FileM interface and implementation. This has shown drastic performance benefit when transferring Many files at roughly the same time. I tested this for many different filem operations and everything was working fine. Let me know if you have any problems with this functionality. Some Notes: - opal-checkpoint now has a 'quiet' flag to keep it from being too verbose. - FileM RSH component is fully non-blocking. - FileM RSH component has incomming connection throttling since by default ssh only allows 10 concurrent scp connections to any single host. This default can be adjusted via an MCA parameter. {{{-mca filem_rsh_max_incomming 10}}} - There is an MCA parameter for max outgoing connections, but it is currently not implemented. If someone needs it then it should not be hard to implement. {{{-mca filem_rsh_max_outgoing 10}}} - Changed the FileM request structure so that it is a bit more explicit and flexible. - Moved the 'preload-binary' and 'preload-files' functionality into odls/base allowing for code reuse in the 'process' and 'default' ODLS components. - Fixed a bug in the process name resolution which broke the 'preload-*' functionality due to GPR table structure changes. - The FileM RSH component might be able to see even more speedup from using a thread pool to operate on the work_pool structures, but that is for future work. - Added a 'opal-show-help' file to ODLS Base This commit was SVN r16252.	2007-09-27 13:13:29 +00:00
Tim Prins	e25bb7f187	Some platforms (such as FreeBSD) need libutil.h included for openpty. Thanks to Karol Mroz for pointing this out. This commit was SVN r16163.	2007-09-19 21:59:22 +00:00
George Bosilca	d1364c53de	Don't allocate the temporary buffer on the stack. It get way too much space. This commit was SVN r16127.	2007-09-14 02:09:38 +00:00
George Bosilca	2c8c75ef94	Coverty blame list: - Remove memory leaks - uninitialized return This commit was SVN r16126.	2007-09-14 02:08:37 +00:00
George Bosilca	921d79c2b8	Remove few memory leaks. Close the files where we're done with them. This commit was SVN r16125.	2007-09-14 02:06:26 +00:00
George Bosilca	41ed50f901	Use secure version of strncpy and srtncat. Release the temporary resources on error. This commit was SVN r16124.	2007-09-14 02:04:34 +00:00
George Bosilca	61989cc4d4	Don't hardcode the length, there is an argument for that. Don't do the NULL check as we already know thaty tmp cannot be NULL. This commit was SVN r16123.	2007-09-14 02:02:03 +00:00
Josh Hursey	b4735c9719	Remove an old workaround in which we had to 'mv' the checkpoint file after it was taken form the $CWD to the storage directory. Now we just store directly to the storage directory which can reduce NFS traffic if working in that mode. A slight performance boost, but at the point you are using NFS you are paying a penalty anyway. Now you just don't have to pay it twice :) This commit was SVN r16099.	2007-09-12 15:03:21 +00:00
Gleb Natapov	140dce7614	Fix ABA problem in atomic_lifo code. This is temporary solution for now. We are looking for a better one. This commit was SVN r16091.	2007-09-11 15:40:30 +00:00
Shiqing Fan	a389e61330	- Add some type casts, required by MS compiler. This commit was SVN r16085.	2007-09-11 09:32:11 +00:00
Gleb Natapov	febdade113	Make non threaded OPAL_ATOMIC_CMPSET macros work correctly. This commit was SVN r16071.	2007-09-09 08:00:16 +00:00
Jeff Squyres	3653bfcbe7	This function returns void. This commit was SVN r15934.	2007-08-20 13:12:38 +00:00
Brian Barrett	2b8af283de	Add ability to completely turn off MPI one-sided support, so that users can experiment with using ROMIO directly. This commit was SVN r15922.	2007-08-18 21:35:51 +00:00
Josh Hursey	729c63cf9d	Fix invalid MCA 'base' names so they appear in ompi_info. A subset of this patch needs to be applied to v1.2 Refs trac:928 This commit was SVN r15918. The following Trac tickets were found above: Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928	2007-08-18 03:05:45 +00:00
Brian Barrett	2d4918b09d	Support versions of the Libtool 2.1a snapshots after the lt_dladvise code was brought in. This supercedes the GLOBL patch that we had been using with Libtool 2.1a versions prior to the lt_dladvise code. Autogen tries to figure out which version you're on, so either will now work with the trunk. This commit was SVN r15903.	2007-08-17 04:08:23 +00:00
Brian Barrett	20fe0952f7	compare should compare the framework names as well. Fixes a potential bug in the modex component compare code (thanks to Tim P. for finding the problem) This commit was SVN r15885.	2007-08-16 16:51:41 +00:00
Adrian Knoth	3115816733	Poor off-by-one line error. This now really builds on kFreeBSD. Re #1105 This commit was SVN r15842.	2007-08-13 19:00:18 +00:00
Tim Prins	188771901d	Fix typo. This commit was SVN r15802.	2007-08-08 14:37:50 +00:00
Sven Stork	f22ab47f84	- one more required symbol This commit was SVN r15801.	2007-08-08 13:02:10 +00:00
Sven Stork	3c753a4cf7	- export required symbol This commit was SVN r15800.	2007-08-08 12:57:53 +00:00
Brian Barrett	a48f07b1d9	If we don't have event ops, we don't have a current_base, so don't dereference the pointer (fixes a segfault Josh was seeing). This commit was SVN r15796.	2007-08-07 17:09:54 +00:00
Sven Stork	3a640603a4	- remove wrong va_end This commit was SVN r15789.	2007-08-07 13:32:05 +00:00
Sven Stork	5e257fadbd	- add missing va_end This commit was SVN r15788.	2007-08-07 12:25:20 +00:00
George Bosilca	31dfa5592e	Few clean-ups, few indentations. Nothing really important. This commit was SVN r15767.	2007-08-04 00:44:23 +00:00
George Bosilca	629bacbb07	Don't include the atomic header file, if we're building a non threaded version. This commit was SVN r15766.	2007-08-04 00:43:15 +00:00
George Bosilca	e2f6d69669	Only use one va_list, as it seems that only one is allowed. This commit was SVN r15765.	2007-08-04 00:41:26 +00:00
Josh Hursey	6248b2bb51	Whoops. Make sure to include the opal_output header. This commit was SVN r15755.	2007-08-03 19:20:23 +00:00
Josh Hursey	dc9644a2c2	Add a bit of error output so the user can figure out what went wrong when we cannot create a directory. This commit was SVN r15754.	2007-08-03 19:08:48 +00:00
Brian Barrett	951755f9fb	no need to call gethostname twice to determine if a process is local This commit was SVN r15742.	2007-08-02 16:25:25 +00:00
Gleb Natapov	dd8b0c925f	Add OPAL_ATOMIC_CMPSET macros that became non atomic with only one threaded. This commit was SVN r15720.	2007-08-01 12:13:34 +00:00
Gleb Natapov	072ebf0fb3	Add new opal_argv_split_with_empty() function. opal_argv_split() function doesn't include empty string in the argv array if there are two delimiters in a row in an input string. This commit was SVN r15718.	2007-08-01 12:08:11 +00:00
George Bosilca	d52d21fae8	Don't forget to include the header file in the sources list. This commit was SVN r15711.	2007-07-31 18:40:31 +00:00
Shiqing Fan	0f468f3668	- Remove the solution and project files, will commit them later. This commit was SVN r15705.	2007-07-31 17:07:02 +00:00
Sven Stork	4c5836c2ee	- add missing va_end found by coverity This commit was SVN r15689.	2007-07-30 16:08:18 +00:00
Sven Stork	71915f269c	- more coverity fixes - use stncpy - comapring NULL against an array which is staically inside the structure will allways be true This commit was SVN r15684.	2007-07-30 15:19:54 +00:00
Shiqing Fan	4d7b349cdb	- Add VC8 solution and project files. - If one wants to use this solution, remember to unload the project 'orte-restart' which is currently not working for Windows. This commit was SVN r15680.	2007-07-30 11:05:34 +00:00
Josh Hursey	fb90a75fc9	A fix so that 'self' only compiles if --enable-dlopen (common case). This is because internally 'self' uses dlopen to look at the application running to determine if it can/should be used or not. This commit was SVN r15673.	2007-07-29 17:40:17 +00:00
George Bosilca	8350ba19db	Add protections against the shlwapi.h header file. This commit was SVN r15597.	2007-07-25 03:49:31 +00:00
George Bosilca	0158806e4c	Add the missing return. This commit was SVN r15596.	2007-07-25 03:48:04 +00:00
Adrian Knoth	e6345aeac6	Fixes for building on kFreeBSD. Re #1105 This commit was SVN r15592.	2007-07-24 23:19:45 +00:00
Brian Barrett	de2c4deeda	Fix deadlock in thread case exposed by ORTE message model -- if we are in a callback from the event library and post an RML receive, we'll deadlock because the event library wouldn't be entered until the event library was not already entered. Now just protect data structures (which we were basically already doing) instead of code, like good threading people ;). This commit was SVN r15585.	2007-07-24 19:10:19 +00:00
Brian Barrett	c3be7376c5	* Mark some of the structures passed into the if and net code as const when they actually are const. * Remove some dead code from the no IP support case * Add doxygen comment for opal_net_get_port() This commit was SVN r15547.	2007-07-22 19:19:01 +00:00
Brian Barrett	39a6057fc6	A number of improvements / changes to the RML/OOB layers: * General TCP cleanup for OPAL / ORTE * Simplifying the OOB by moving much of the logic into the RML * Allowing the OOB RML component to do routing of messages * Adding a component framework for handling routing tables * Moving the xcast functionality from the OOB base to its own framework Includes merge from tmp/bwb-oob-rml-merge revisions: r15506, r15507, r15508, r15510, r15511, r15512, r15513 This commit was SVN r15528. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15506 r15507 r15508 r15510 r15511 r15512 r15513	2007-07-20 01:34:02 +00:00

... 3 4 5 6 7 ...

1224 Коммитов