openmpi

Автор	SHA1	Сообщение	Дата
Josh Hursey	9a31060b6d	Fix r10725 so that the trunk builds again. This commit was SVN r10733. The following SVN revision numbers were found above: r10725 --> open-mpi/ompi@ae222cca5b	2006-07-11 14:48:31 +00:00
Ralph Castain	ae222cca5b	Include the help file so it can be accessed This commit was SVN r10725.	2006-07-11 12:15:25 +00:00
Ralph Castain	6129a5a887	Enable -host support for "mpirun a.out". You can now execute on all slots on specified nodes within your overall allocation. This commit was SVN r10713.	2006-07-11 02:59:23 +00:00
Ralph Castain	febc143d8c	Per LANL's stated need, add functionality that runs a.out across ALL available process slots if no num_proc is specified on the command line. However, please note the following limitation: we ONLY allow ONE application to be specified on the command line when this feature is invoked. If multiple apps are specified, the user MUST also specify the number to be launched for each and every one of them. Update the help text to report errors when not following that rule. Also updated the RMAPS help text to reflect the reorganization of some of the round-robin code into the base. The new functionality has been tested under Mac OS-X and on Odin using an MPI program. Both byslot and bynode mapping have been checked and verified. Operational support for other systems needs to be verified - I respectfully request people's help in doing so. This commit was SVN r10708.	2006-07-10 21:25:33 +00:00
Ralph Castain	3d220cbd48	This patch fixes several issues relating to comm_spawn and N1GE. In particular, it does the following: 1. Modifies the RAS framework so it correctly stores and retrieves the actual slots in use, not just those that were allocated. Although the RAS node structure had storage for the number of slots in use, it turned out that the base function for storing and retrieving that information ignored what was in the field and simply set it equal to the number of slots allocated. This has now been fixed. 2. Modified the RMAPS framework so it updates the registry with the actual number of slots used by the mapping. Note that daemons are still NOT counted in this process as daemons are NOT mapped at this time. This will be fixed in 2.0, but will not be addressed in 1.x. 3. Added a new MCA parameter "rmaps_base_no_oversubscribe" that tells the system not to oversubscribe nodes even if the underlying environment permits it. The default is to oversubscribe if needed and the underlying environment permits it. I'm sure someone may argue "why would a user do that?", but it turns out that (looking ahead to dynamic resource reservations) sometimes users won't know how many nodes or slots they've been given in advance - this just allows them to say "hey, I'd rather not run if I didn't get enough". 4. Reorganizes the RMAPS framework to more easily support multiple components. A lot of the logic in the round_robin mapper was very valuable to any component - this has been moved to the base so others can take advantage of it. 5. Added a new test program "hello_nodename" - just does "hello_world" but also prints out the name of the node it is on. 6. Made the orte_ras_node_t object a full ORTE data type so it can more easily be copied, packed, etc. This proved helpful for the RMAPS code reorganization and might be of use elsewhere too. This commit was SVN r10697.	2006-07-10 14:10:21 +00:00
Jeff Squyres	538965aeb0	Final merge of stuff from /tmp/tm-stuff tree (merged through /tmp/tm-merge). Validated by RHC. Summary: - Add --nolocal (and -nolocal) options to orterun - Make some scalability improvements to the tm pls This commit was SVN r10651.	2006-07-04 20:12:35 +00:00
Josh Hursey	58110f9fc9	Fixes Ticket #125 for both the trunk and v1.1 branch. This commit will apply cleanly to the v1.1 branch, and should be moved over once I get someone to verify it. The problem is outlined in the bug. The fix was to move the setting of the app context index (idx) before we put it in the GPR so that it is propogated to the gpr. The reason this hasn't bitten us before is because we init app->idx to 0, which is true most of the time. Except that is when MPI_Comm_spawn_multiple in which we put in more than one app context, thus care about correct indexing. This was causing down the line memory corruption by overrunning the mapping array. This commit also puts in a check to make sure that we error out if we ever try to do that again. This commit was SVN r10380.	2006-06-15 22:14:07 +00:00
Josh Hursey	2f20a38c98	This is a fix for bug Ticket #27 We were stuck in an infinite loop inside the rmaps round_robin component when the user specified a host, then over subscribed it. Instead of retuning an error, we looped forever. For example: $ cat hostfile A slots=2 max-slots=2 B slots=2 max-slots=2 $ mpirun -np 3 --hostfile hostfile --host B <hang> The loop would not terminate because both host A and B are in the 'nodes' structure as they are both allocated to the job. However, after allocating 2 slots to host B, we remove it from the node list leaving us with a 'nodes' structure with just A in it. Since we can't use host A, we keep looping here until we find a node that we can use. This patch checks to make sure that if we get into this situation where rmaps is looping over the list a second time without finding a node during the first pass then we know that there are no nodes left to use, so we have a resource allocation error, and should return to the user. This patch should be moved to all of the release branches This commit was SVN r10131.	2006-05-31 03:42:01 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
Ralph Castain	1abe8ef368	Well, it certainly helps triggers to fire if the respective responsible routines adjust the counters! The INIT counter is supposed to be adjusted when the processes are mapped - this is now done correctly. The LAUNCHED counter is supposed to be adjusted when the pls sets the process pid info into the registry and changes the state to LAUNCHED. This could probably be changed to have that function use the set_proc_soh API, but this fixes the problem for now. Thanks to Brian for finding that the triggers were not being fired. This commit was SVN r8948.	2006-02-09 15:39:06 +00:00
Ralph Castain	4b9f015c0b	Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list. This commit was SVN r8912.	2006-02-07 03:32:36 +00:00
George Bosilca	7d8d516a4a	A bunch of fixed for Windows support. - protection with __WINDOWS__ and not WIN32 or _WIN32 - protect all the headers This commit was SVN r8463.	2005-12-12 20:04:00 +00:00
Jeff Squyres	31336e4773	Add some missing headers / correct one installation directory This commit was SVN r8408.	2005-12-08 04:00:52 +00:00
Jeff Squyres	6fbd321442	Fix a bunch of install locations for header files This commit was SVN r8406.	2005-12-08 00:54:44 +00:00
Brian Barrett	20cea60b82	* fix "make distclean" error in PML * turns out (duh!) that there was a reason that the <projectdir>dir variable was set in the AM conditional. If not, stupid directories are created and not needed... duh. This commit was SVN r8205.	2005-11-20 07:41:09 +00:00
Brian Barrett	8faa1884f0	* The last of the build system optimizations. Combine the component and component/base Makefile.am files, reducing the time configure spends stamping out Makefiles at the end * Install base_impl.h file when devel-headers are being installed This commit was SVN r8200.	2005-11-20 01:03:01 +00:00
George Bosilca	c802d54696	The return type is an int. Casting it to a size_t before checking if it's bigger than zero lead to a true condition ... always ... This commit was SVN r8114.	2005-11-11 06:34:14 +00:00
Tim Woodall	7f20198d49	Filter the set of data returned to the daemons during startup using the new get_conditional command to improve scalability during launch This commit was SVN r8097.	2005-11-10 16:44:51 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
Jeff Squyres	0379b27969	Add missing DESTRUCT This commit was SVN r7948.	2005-11-01 13:35:44 +00:00
Tim Woodall	e27dfb180d	yet another fix This commit was SVN r7941.	2005-10-31 21:59:14 +00:00
Tim Woodall	aa5b61e4f1	corrections for multiple app contexts This commit was SVN r7939.	2005-10-31 20:37:44 +00:00
Jeff Squyres	8503fce61b	Remove debugging message This commit was SVN r7924.	2005-10-28 18:53:20 +00:00
Tim Woodall	3fd351117a	removed debug This commit was SVN r7902.	2005-10-27 21:07:49 +00:00
Tim Woodall	793836da57	removed debug This commit was SVN r7897.	2005-10-27 17:10:49 +00:00
Tim Woodall	60754acae8	- modified rmaps data structures to point directly to ras node - modified rsh to NOT query for each nodes mapping, as all data is already available in the rmaps structures This commit was SVN r7894.	2005-10-27 17:04:10 +00:00
Brian Barrett	1302cb4072	The next in a long line of crazed build system changes from Brian. This was originally suggested by Ralf Wildenhues, to try to speed autogen, configure, and make (and possibly even make install). Use automake's include directive to drastically reduce the number of Makefile files (although the number of Makefile.am files is the same - most are just included in a top-level Makefile.am). Also use an Automake SUBDIRs feature to eliminate the dynamic-mca tree, which was no longer really needed. This makes adding a framework easier (since you don't have to remember the dynamic-mca tree) and makes building faster (as make doesn't have to recurse through the dynamic-mca tree) This commit was SVN r7777.	2005-10-17 00:21:10 +00:00
Josh Hursey	0f08e87a1f	Fixed a max_slots off by one problem that Brian highlighted. Also cleaned up the error message when allocating over the number of slots available. This commit was SVN r7715.	2005-10-12 02:09:56 +00:00
Josh Hursey	d5ebb5c46a	fix a compiler warning This commit was SVN r7674.	2005-10-08 17:03:12 +00:00
Jeff Squyres	0629cdc2d7	Bring back the changes from /tmp/jjhursey-rmaps. Specific merge command: svn merge -r 7567:7663 https://svn.open-mpi.org/svn/ompi/tmp/jjhursey-rmaps . (where "." is a trunk checkout) The logs from this branch are much more descriptive than I will put here (including a really long description from last night). Here's the short version: - fixed some broken implementations in ras and rmaps - "orterun --host ..." now works and has clearly defined semantics (this was the impetus for the branch and all these fixes -- LANL had a requirement for --host to work for 1.0) - there is still a little bit of cleanup left to do post-1.0 (we got correct functionality for 1.0 -- we did not fix bad implementations that still "work") - rds/hostfile and ras/hostfile handshaking - singleton node segment assignments in stage1 - remove the default hostfile (no need for it anymore with the localhost ras component) - clean up pls components to avoid duplicate ras mapping queries - [possible] -bynode/-byslot being specific to a single app context This commit was SVN r7664.	2005-10-07 22:24:52 +00:00
Andrew Friedley	82ee2933a5	- Add an opal_show_help() to the pls fork module to explain what went wrong when the execv to start the application fails. - Add a couple opal_show_help()'s to indicate when not enough slots/nodes are available to satisfy a request. This commit was SVN r7555.	2005-09-30 14:30:21 +00:00
Josh Hursey	a23370c007	Converted some MCA parameters from the old version to the new. Have the ras_base_schedule_policy MCA parameter working once again. before it would only do slot based allocation, even if the MCA parameter was set properly. Currently you can specify to orterun a node allocation by either: -mca ras_base_schedule_policy node -bynode and slot allocation (which is the default) by: -mca ras_base_schedule_policy slot -byslot This commit was SVN r7513.	2005-09-27 02:54:15 +00:00
Andrew Friedley	555ae37255	Add lib{opal,orte,mpi}.la to appropriate LIBADD's, some whitespace cleanup as well. This commit was SVN r7477.	2005-09-22 12:28:54 +00:00
Brian Barrett	ed56e743b7	* update configure.ac to use the modern version of AC_INIT and AM_INIT_AUTOMAKE, instead of the deprecated version. * Work around dumbness in modern AC_INIT that requires the version number to be set at autoconf time (instead of at configure time, as it was before). Set the version number, minus the subversion r number, at autoconf time. Override the internal variables to include the r number (if needed) at configure time. Basically, the right thing should always happen. The only place it might not is the version reported as part of configure --help will not have an r number. * Since AM_INIT_AUTOMAKE taks a list of options, no need to specify them in all the Makefile.am files. * Addes support for subdir-objects, meaning that object files are put in the directory containing source files, even if the Makefile.am is in another directory. This should start making it feasible to reduce the number of Makefile.am files we have in the tree, which will greatly reduce the time to run autogen and configure. This commit was SVN r7211.	2005-09-07 05:54:53 +00:00
Jeff Squyres	3962c53e2e	- Add to AM_CPPFLAGS $(OPAL_LTDL_CPPFLAGS) where necessary in order to add a -I to find the included ltdl.h (vs. a system-installed ltdl.h) - Clean up kruft in a bunch of Makefile.am's to remove now-unnecessary AM_CPPFLAGS settings to get static-components.h for each framework - Move the component_repository API functions out of opal/mca/base/base.h and into opal/mca/base/mca_base_component_repository.h in order to decrease unnecessary dependencies (e.g., before this, almost everything in the tree depended on ltdl.h, which is unnecessary -- only a small number of files really need ltdl.h) This commit was SVN r7127.	2005-09-01 12:16:36 +00:00
Brian Barrett	17c1bb355e	* more memory leak fixes - mainly string params not being freed at end of time * Added code to free dps structures at shutdown This commit was SVN r7043.	2005-08-26 02:08:23 +00:00
Rainer Keller	f52784bad3	- Just changes to comments, deletion of spaces to make diff smaller This commit was SVN r7030.	2005-08-25 15:42:41 +00:00
Jeff Squyres	cce0950df7	- change a bunch of OMPI_* constants or ORTE_* equivalents - change the framework opens to [mostly] use the new MCA param API - properly pass in framework debug output streams to the mca_base_component_open() function This commit was SVN r6888.	2005-08-15 18:25:35 +00:00
Josh Hursey	22c7f2b3e0	Quite a range of small changes. ns_replica.c - Removed the error logging since I use this function in orte_init_stage1 to check if we have created a cellid yet or not. ras_types.h & rase_base_node.h - This was an empty file. moved the orte_ras_node_t from base/ras_base_node.h to this file. - Changed the name of orte_ras_base_node_t to orte_ras_node_t to match the naming mechanisms in place. ras.h - Exposed 2 functions: - node_insert: This takes a list of orte_ras_base_node_t's and places them in the Node Segment of the GPR. This is to be used in orte_init_stage1 for singleton processes, and the hostfile parsing (see rds_hostfile.c). This just puts in the appropriate API interface to keep from calling the orte_ras_base_node_insert function directly. - node_query: This is used in hostfile parsing. This just puts in the appropriate API interface to keep from calling the orte_ras_base_node_query function directly. - Touched all of the implemented components to add reference to these new function pointers ras_base_select.c & ras_base_open.c - Add and set the global module reference rds.h - Exposed 1 function: - store_resource: This stores a list of rds_cell_desc_t's to the Resource Segment. This is used in conjunction with the orte_ras.node_insert function in both the orte_init_stage1 for singleton processes and rds_hostfile.c rds_base_select.c & rds_base_open.c - Add and set the global module reference rds_hostfile.c - Added functionality to create a new cellid for each hostfile, placing each entry in the hostfile into the same cellid. Currently this is commented out with the cellid hard coded to 0, with the intention of taking this out once ORTE is able to handle multiple cellid's - Instead of just adding hosts to the Node Segment via a direct call to the ras_base_node_insert() function. First add the hosts to the Resource Segment of the GPR using the orte_rds.store_resource() function then use the API version of orte_ras.node_insert() to store the hosts on the Node Segment. - Add 1 new function pointer to module as required by the API. rds_hostfile_component.c - Converted this to use the new MCA parameter registration orte_init_stage1.c - It is possible that a cellid was not created yet for the current environment. So I put in some logic to test if the cellid 0 existed. If it does then continue, otherwise create the cellid so we can properly interact with the GPR via the RDS. - For the singleton case we insert some 'dummy' data into the GPR. The RAS matches this logic, so I took out the duplicate GPR put logic, and replaced it with a call to the orte_ras.node_insert() function. - Further before calling orte_ras.node_insert() in the singleton case, we also call orte_rds.store_resource() to add the singleton node to the Resource Segment. Console: - Added a bunch of new functions. Still experimenting with many aspects of the implementation. This is a checkpoint, and has very limited functionality. - Should not be considered stable at the moment. This commit was SVN r6813.	2005-08-11 19:51:50 +00:00
Jeff Squyres	f8fa8f4935	Fix a problem found by Tim Prins (patch also supplied by Tim P). From his e-mail: I ran into a small bug in rmaps_rr.c: map_app_by_slot which was triggered by using multiple app contexts. Basically, if not all the slots we allocated on a node were used by an app, we would automatically move onto the next node. This caused a problem with multiple app contexts when the first app takes a partial allocation of a node, the second app would not be able to access these slots because we had already move past the node, and the byslot routine does not wrap back around the list. This commit was SVN r6766.	2005-08-08 18:56:17 +00:00
Brian Barrett	0ae16f2ab7	* add local hook to remove static-components.h in distclean target. The files are generated by configure, and not part of the tarball, so distclean would be the right place to remove them. This commit was SVN r6390.	2005-07-08 13:54:12 +00:00
Jeff Squyres	ba99409628	Major simplifications to component versioning: - After long discussions and ruminations on how we run components in LAM/MPI, made the decision that, by default, all components included in Open MPI will use the version number of their parent project (i.e., OMPI or ORTE). They are certaint free to use a different number, but this simplification makes the common cases easy: - components are only released when the parent project is released - it is easy (trivial?) to distinguish which version component goes with with version of the parent project - removed all autogen/configure code for templating the version .h file in components - made all ORTE components use ORTE__VERSION for version numbers - made all OMPI components use OMPI__VERSION for version numbers - removed all VERSION files from components - configure now displays OPAL, ORTE, and OMPI version numbers - ditto for ompi_info - right now, faking it -- OPAL and ORTE and OMPI will always have the same version number (i.e., they all come from the same top-level VERSION file). But this paves the way for the Great Configure Reorganization, where, among other things, each project will have its own version number. So all in all, we went from a boatload of version numbers to [effectively] three. That's pretty good. :-) This commit was SVN r6344.	2005-07-04 20:12:36 +00:00
Jeff Squyres	6a9c9953bc	Remove a bunch of -I's that are no longer necessary with properly-prefixed static-component.h files. This commit was SVN r6342.	2005-07-04 18:24:58 +00:00
Brian Barrett	a13166b500	* rename ompi_output to opal_output This commit was SVN r6329.	2005-07-03 23:31:27 +00:00
Brian Barrett	761402f95f	* rename ompi_list to opal_list This commit was SVN r6322.	2005-07-03 16:22:16 +00:00
Brian Barrett	499e4de1e7	* rename ompi_object and ompi_class to opal_object and opal_class This commit was SVN r6321.	2005-07-03 16:06:07 +00:00
Jeff Squyres	282a8b5e8d	More orte Makefile.am updates This commit was SVN r6287.	2005-07-02 15:13:41 +00:00
Jeff Squyres	1b18979f79	Initial population of orte tree This commit was SVN r6266.	2005-07-02 13:42:54 +00:00

1 2 3

148 Коммитов