openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Gleb Natapov	da3e69101d	Add missing include. This commit was SVN r17493.	2008-02-18 14:55:02 +00:00
Galen Shipman	18d1d3b408	Add ORTE ALPS support (Cray XT CNL) This commit was SVN r17482.	2008-02-17 19:29:06 +00:00
Jeff Squyres	213b5d5c6e	Per long threads on the mailing list and much confusion discussion about linkers, have all OPAL, ORTE, and OMPI components '''not'' link against the OPAL, ORTE, or OMPI libraries. See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a better-formatted version of the same info). This commit was SVN r16968.	2007-12-15 13:32:02 +00:00
Ralph Castain	3dbd4d9be7	Squeeeeeeze the launch message. This is the message sent to the daemons that provides all the data required for launching their local procs. In reorganizing the ODLS framework, I discovered that we were sending a significant amount of unnecessary and repeated data. This commit resolves this by: 1. taking advantage of the fact that we no longer create the launch message via a GPR trigger. In earlier times, we had the GPR create the launch message based on a subscription. In that mode of operation, we could not guarantee the order in which the data was stored in the message - hence, we had no choice but to parse the message in a loop that checked each value against a list of possible "keys" until the corresponding value was found. Now, however, we construct the message "by hand", so we know precisely what data is in each location in the message. Thus, we no longer need to send the character string "keys" for each data value any more. This represents a rather large savings in the message size - to give you an example, we typically would use a 30-char "key" for a 2-byte data value. As you can see, the overhead can become very large. 2. sending node-specific data only once. Again, because we used to construct the message via subscriptions that were done on a per-proc basis, the data for each node (e.g., the daemon's name, whether or not the node was oversubscribed) would be included in the data for each proc. Thus, the node-specific data was repeated for every proc. Now that we construct the message "by hand", there is no reason to do this any more. Instead, we can insert the data for a specific node only once, and then provide the per-proc data for that node. We therefore not only save all that extra data in the message, but we also only need to parse the per-node data once. The savings become significant at scale. Here is a comparison between the revised trunk and the trunk prior to this commit (all data was taken on odin, using openib, 64 nodes, unity message routing, tested with application consisting of mpi_init/mpi_barrier/mpi_finalize, all execution times given in seconds, all launch message sizes in bytes): Per-node scaling, taken at 1ppn: #nodes original trunk revised trunk time size time size 1 0.10 819 0.09 564 2 0.14 1070 0.14 677 3 0.15 1321 0.14 790 4 0.15 1572 0.15 903 8 0.17 2576 0.20 1355 16 0.25 4584 0.21 2259 32 0.28 8600 0.27 4067 64 0.50 16632 0.39 7683 Per-proc scaling, taken at 64 nodes ppn original trunk revised trunk time size time size 1 0.50 16669 0.40 7720 2 0.55 32733 0.54 11048 3 0.87 48797 0.81 14376 4 1.0 64861 0.85 17704 Condensing those numbers, it appears we gained: per-node message size: 251 bytes/node -> 113 bytes/node per-proc message size: 251 bytes/proc -> 52 bytes/proc per-job message size: 568 bytes/job -> 399 bytes/job (job-specific data such as jobid, override oversubscribe flag, total #procs in job, total slots allocated) The fact that the two pre-commit trunk numbers are the same confirms the fact that each proc was containing the node data as well. It isn't quite the 10x message reduction I had hoped to get, but it is significant and gives much better scaling. Note that the timing info was, as usual, pretty chaotic - the numbers cited here were typical across several runs taken after the initial one to avoid NFS file positioning influences. Also note that this commit removes the orte_process_info.vpid_start field and the handful of places that passed that useless value. By definition, all jobs start at vpid=0, so all we were doing is passing "0" around. In fact, many places simply hardwired it to "0" anyway rather than deal with it. This commit was SVN r16428.	2007-10-11 15:57:26 +00:00
Ralph Castain	53af94fd87	Modify the configure system so that gridengine support is only built in specific conditions: 1. --with-sge, always builds 2. --without-sge, never builds 3. if neither is specified, build if and only if either SGE_ROOT is set or "qrsh" is found in the path This commit was SVN r16422.	2007-10-10 21:39:16 +00:00
Brian Barrett	3a0067249c	The previous hack to deal with Libtool not speaking Objective C stopped working with Automake 1.10. This is a new hack, which should be much more flexible. The ras doesn't contain any Objective C, so remove the hack entirely from that Makefile.am. This commit was SVN r16269.	2007-09-30 03:40:25 +00:00
Jeff Squyres	f9b9beba77	Allow the LSF components to be shipped in the nightly tarball and open it up to others. This commit was SVN r16143.	2007-09-17 22:42:33 +00:00
Josh Hursey	729c63cf9d	Fix invalid MCA 'base' names so they appear in ompi_info. A subset of this patch needs to be applied to v1.2 Refs trac:928 This commit was SVN r15918. The following Trac tickets were found above: Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928	2007-08-18 03:05:45 +00:00
Jeff Squyres	75192de1fc	LSF support is now working. W00t! May be subject to a further tweak or two. * checking lsb_init() is not sufficient to know whether you're in an LSF job or not; you also need to check for environment variable markers * remove lots of debugging output * no need for the sds lsf to call lsb_init() * remove some slurm-like dead code and a copy-n-paste error in the sds lsf This commit was SVN r15644.	2007-07-26 18:49:29 +00:00
Ralph Castain	f219cc1e6e	A few changes to the lsf components - mostly cleanup, no major logic changes This commit was SVN r15563.	2007-07-23 18:38:36 +00:00
Jeff Squyres	2baa866026	Compiles to the new API, but doesn't quite work yet... This commit was SVN r15537.	2007-07-20 19:49:27 +00:00
Brian Barrett	5b9fa7e998	reapply r15517 and r15520, which were removed in r15527 so that I could get the RML/OOB merge in slightly easier This commit was SVN r15530. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95 r15520 --> open-mpi/ompi@9cbc9df1b8 r15527 --> open-mpi/ompi@2d17dd9516	2007-07-20 02:34:29 +00:00
Brian Barrett	39a6057fc6	A number of improvements / changes to the RML/OOB layers: * General TCP cleanup for OPAL / ORTE * Simplifying the OOB by moving much of the logic into the RML * Allowing the OOB RML component to do routing of messages * Adding a component framework for handling routing tables * Moving the xcast functionality from the OOB base to its own framework Includes merge from tmp/bwb-oob-rml-merge revisions: r15506, r15507, r15508, r15510, r15511, r15512, r15513 This commit was SVN r15528. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15506 r15507 r15508 r15510 r15511 r15512 r15513	2007-07-20 01:34:02 +00:00
Brian Barrett	2d17dd9516	temporarily back our r15517 and 15520 so that I can get the RML / OOB changes to cleanly apply This commit was SVN r15527. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95	2007-07-20 01:10:34 +00:00
Ralph Castain	41977fcc95	Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but... Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point. Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings. This commit was SVN r15517.	2007-07-19 20:56:46 +00:00
Ralph Castain	d109e9a6f4	Roll in the Voltaire core/socket/etc process mapping implementation. Only change I made was to cleanup some of the diagnostic output in the odls_default component so it uses the -mca odls_base_verbose parameter. You will not see any impact from this change unless you use the syntax described in ticket #1023. I've tried as many of the RAS components as possible and saw no problem - there may be issues with other RAS components that would not compile on any of my systems. Anything that appears should be trivial to fix. This commit was SVN r15427.	2007-07-14 15:14:07 +00:00
Jeff Squyres	24a28494a6	Fix what looks like a cut-n-paste error. This will not cause everyone to run autogen because this component is .ompi_ignore'd for everyone except jsquyres and rhc. This commit was SVN r15401.	2007-07-13 14:47:03 +00:00
Jeff Squyres	b20248709a	Next round of LSF commits. Getting farther, but it still doesn't fully work yet (everything is still .ompi_ignore'ed for everyone). This commit was SVN r15398.	2007-07-13 11:57:17 +00:00
Jeff Squyres	4439734a8a	It compiles (too)! Getting a little further... This commit was SVN r15383.	2007-07-12 14:46:41 +00:00
Ralph Castain	39013e2a18	Clean up a couple of minor typos. Bring the new bproc-related RAS components online. This commit was SVN r15328.	2007-07-10 14:11:26 +00:00
Ralph Castain	a1bf04f39e	First cut at revamping bproc support to separate it out from LANL's configuration. First cut at adding support for LSF Lots of ompi_ignores so only Jeff and I will see this stuff This commit was SVN r15321.	2007-07-10 12:43:05 +00:00
Ralph Castain	684aa1bc9f	Since universe size now is an orte thing, we may as well give it some direct support. Create rmgr set/get functions so it becomes more obvious where this value is being defined and how to retrieve it. Modify the bproc pls to pass it to the app procs when launched. Modify one of the test programs to verify it has been correctly set. This commit was SVN r15266.	2007-07-02 16:45:40 +00:00
George Bosilca	715f6012cf	The DSS pack function can use the const attribute for the src field as it is never modified by the pack functions directly. Enforce it all over the code base. This commit was SVN r15026.	2007-06-12 22:47:14 +00:00
Jeff Squyres	51f286d737	Just like r14289 on the ORTE trunk: Per discussions with Brian and Ralph, make a slight correction in where components are installed. Use $pkglibdir, not $libdir/openmpi, so that when compiled in the orte trunk, components are installed to the right directory (because the component search patch is checking $pkglibdir). This commit was SVN r14345. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r14289	2007-04-12 11:19:42 +00:00
Pak Lui	e9e8dc2765	* comment out unused code This commit was SVN r14297.	2007-04-10 22:38:34 +00:00
Tim Prins	2ffc02870d	Reduce the memory usage of the GPR: - Make it so that all the GPR pointer arrays are allocated initially at 16 elements instead of 512. This saves (on a 64 bit machine) approximately 4*(# procs + # nodes) KB. - Fix up the segment prealloc function so that preallocating an existant segment is not an error, and make the areas where we do large inserts use it. Fix the orte_pointer_array to efficiently implement setting its size. Before we just realloced the array one block at a time until the desired size was reached. Now we resize it all in one realloc. This commit was SVN r14264.	2007-04-09 00:40:15 +00:00
Tim Prins	df4c468bb4	fix some more minor memory leaks This commit was SVN r14260.	2007-04-07 18:41:16 +00:00
Tim Prins	9cb455272b	Fix a pile of memory leaks in ORTE. Fix a major memory leak in the SLURM RAS, and cleanup a bit of code there. This commit was SVN r14164.	2007-03-29 00:50:56 +00:00
Jeff Squyres	2105f444ec	Add missing header file This commit was SVN r14129.	2007-03-23 00:47:30 +00:00
Sven Stork	6111ca1152	- Let's try to detect the default nodefile directory because it can different for different sites. If we cannot detect the default then we fall back to the hard coded path. This commit was SVN r14121.	2007-03-22 15:26:16 +00:00
Pak Lui	803655b555	* incorporated some of Jeff's comment regarding this fix. This commit was SVN r14070.	2007-03-19 21:59:48 +00:00
Pak Lui	da4d41e0e7	* fixed the missing fclose and eliminate the call to get_slot_count since it is not needed This commit was SVN r14066.	2007-03-19 17:47:30 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
Ralph Castain	5818a32245	Bring in a forgotten speed improvement for the TM launcher that was developed during SNL Tbird testing last year. Remove the redundant and slow calls to TM to resolve hostnames. Instead, read the host info from the PBS file during the RAS, and then just use that info in the PLS (rather than getting it again). Adjust the RMAPS mapped_node object to propagate the required launch_id info now included in the ras_node object. This provides support for those few systems that don't use nodename to launch, but instead want some id (typically an index into the array of allocated nodes). This value gets set for each node in the RAS - the RMAPS just propagates it for easy launch. This commit was SVN r13581.	2007-02-09 15:06:45 +00:00
Tim Prins	e199bf9b64	Refs trac:801 Fix compiler warning This commit was SVN r13308. The following Trac tickets were found above: Ticket 801 --> https://svn.open-mpi.org/trac/ompi/ticket/801	2007-01-25 16:12:05 +00:00
Tim Prins	4fd81b3407	Fixes trac:801 - Make it so the SLURM ras can handle different nodelist configurations - Some code cleanup and better/more informative error messages and error handling This commit was SVN r13271. The following Trac tickets were found above: Ticket 801 --> https://svn.open-mpi.org/trac/ompi/ticket/801	2007-01-24 14:45:42 +00:00
Brian Barrett	e130f18cc2	Fix some compiler warnings that have slipped in lately... This commit was SVN r13037.	2007-01-08 17:20:09 +00:00
Brian Barrett	a34e67d743	Remove unneeded PARAM_INIT_FILE variable in configure.params files used by components that use configure.m4 for configuration or are always built. The macro has not been needed since moving to configure types other than configure.stub Fixes trac:590 This commit was SVN r13031. The following Trac tickets were found above: Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590	2007-01-08 03:44:22 +00:00
Ralph Castain	62d7826e01	Helps if we total up the correct field to get the total number of slots in the universe This commit was SVN r12789.	2006-12-07 03:17:12 +00:00
Ralph Castain	a1153fdc8f	Eliminate virtually all of the attribute_predefined data from the STG1 message. We now compute the total number of slots allocated to us and save that in the registry - the attributed_predefined then retrieves it via the STG1 message. The app_num is passed via the process_info structure, which gets the value from the ODLS in the environment. Obviously, people like bproc will have to get the app_num via another avenue...but that's a problem for another day. Several options are easily available. This commit was SVN r12788.	2006-12-07 03:11:20 +00:00
Ralph Castain	d4bd60c9fe	Restore the paffinity capability, along with all the required logic to ensure we "do the right thing" when the user gives us inaccurate information about the number of slots on a remote node. This commit was SVN r12780.	2006-12-06 15:59:34 +00:00
Brian Barrett	6f8b366acb	Rename liborte to libopen-rte and libopal to libopen-pal per telecon today and bug #632. Refs trac:632 This commit was SVN r12762. The following Trac tickets were found above: Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632	2006-12-05 18:27:24 +00:00
Tim Prins	08d5ca821f	Don't get the node architecture when useing the LoadLevleer RAS. It is slow (about a second for ~300 nodes) and we don't even use the value. This commit was SVN r12758.	2006-12-05 13:47:53 +00:00
Ralph Castain	9bc25f0bec	Fix a potential bug in the registry where it didn't fully check a segment's name when searching for it. Will have to verify that this doesn't break other things. Bring the bproc system close to being back online.... This commit was SVN r12659.	2006-11-23 04:17:37 +00:00
Ralph Castain	8080034eb2	Clean up a compile issue for bproc This commit was SVN r12653.	2006-11-22 19:50:27 +00:00
Ralph Castain	428c1f14c3	Modify the bproc components to resolve the current allocation problem This commit was SVN r12652.	2006-11-22 19:10:58 +00:00
Ralph Castain	6fca1431f3	Back out some prior commits. These commits fixed bproc so it would run, but broke several other things (singleton comm_spawn and hostfile operations have been identified so far). Since bproc is the culprit here, let's leave bproc broken for now - I'll work on a fix for that environment that doesn't impact everythig else. This commit was SVN r12648.	2006-11-22 13:30:21 +00:00
Ralph Castain	a30c65ca24	Fix the allocator to make bproc happy. We were burned again by the fact that the bproc state monitor creates entries on the node segment for all the nodes in the cluster when it is opened during orte_init. As a result, the bjs allocator was never being called, and the system merrily assumed that all nodes in the cluster had been allocated to it. To fix this, I removed a test that had been inserted into the allocation procedure that checked for a non-zero node segment. This was an old artifact - the RAS components already know that they are not to overwrite any existing node segment entries (at least, bproc does - I will check the others. For now, I just want to save the bproc fix on this machine). This commit was SVN r12640.	2006-11-21 19:52:55 +00:00
Tim Prins	2afb401e39	fix some compile warnings This commit was SVN r12636.	2006-11-21 00:33:10 +00:00
Ralph Castain	6d6cebb4a7	Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it. I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn). This commit was SVN r12597.	2006-11-14 19:34:59 +00:00
Ralph Castain	ea77beca29	cleanup the TM modules in prep for T-bird tests. The TM RAS will now report time required to resolve hostnames This commit was SVN r12449.	2006-11-06 20:56:18 +00:00
Ralph Castain	884caeb2c7	Add timing tests for the TM ras This commit was SVN r12445.	2006-11-06 18:41:22 +00:00
Ralph Castain	d182ae7472	Clean up a few compiler warnings courtesy of Jeff This commit was SVN r12430.	2006-11-03 20:45:22 +00:00
Ralph Castain	30de73a712	Add a few attributes that are helpful for folks doing things like Eclipse. Also add yet another command-line option to orterun to support one of the new attributes. These include: 1. ORTE_RMAPS_DISPLAY_AT_LAUNCH: pretty-prints out the process map right before we launch so you can see where everyone is going. This is settable via the command line option "--display-map-at-launch" 2. ORTE_RMGR_STOP_AFTER_SETUP: just setup the job and then return from the spawn command. 3. ORTE_RMGR_STOP_AFTER_ALLOC: return from the rmgr.spawn call after allocating the job 4. ORTE_RMGR_STOP_AFTER_MAP: return from the rmgr.spawn call after mapping the job. This gives folks a chance to retrieve and graphically display the map, let the user edit it, and store the results. They can then call "launch" on their own and the system will use the revised map. Enjoy! My personal favorite is the first one - helps with debugging. This commit was SVN r12379.	2006-10-31 22:16:51 +00:00
George Bosilca	2aa3e51223	Nothing relevant. Only a set of castings to have a clean compile on Windows. The cl.exe compiler is pretty good at complaining about any kind of non explicit cast. This commit was SVN r12207.	2006-10-20 02:25:50 +00:00
Tim Prins	ade94b523b	Fixed a number of issues related to resource allocation: - Simplified the logic of the ras modules by moving the attribute handling into the base allocation function. This allows us to decide how to allocate based on the situation, and solves some of the allocation problems we were having with comm_spawn. - moved the proxy component into the base. This was done because we always want to call the proxy functions if we are not on a HNP regardless of the attributes passed. - Got rid of the hostfile component. What little logic was in it was moved into the base to deal with other circumstances. The hostfile information is currently being propagated into the registry by the RDS, so we just use what is already in the registry. - renamed some slurm function so that they have the proper prefix. Not strictly necessary as they were static, but it makes debugging much easier. - fixed a buglet in the round_robin rmaps where we would return an error when really no error occured. I tried to make proper corrections to all the ras modules, but I cannot test all of them. This commit was SVN r12202.	2006-10-19 23:33:51 +00:00
Ralph Castain	f4a458532b	This doesn't totally resolve the comm_spawn problem, but it helps a little. I'll continue working on it and hope to resolve it completely shortly. The issue primarily centers on where to start mapping the child job's processes, and how to deal with oversubscription that might result. At the moment, I am trying to resolve the first issue first (hey, that even sounds right!). This change does a couple of things: 1. Since the USE_PARENT_ALLOC attribute is a directive about regarding allocation of resources to a job, it more properly should be an attribute of the RAS. Change the name to reflect that and move the attribute define to the ras_types.h file. 2. Add the attributes list to the RMAPS map_job interface. This provides us with the desired flexibility to dynamically specify directives for mapping. The system will - in the absence of any attribute-based directive - default to the values provided in the MCA parameters (either from environment or command-line interface). This commit was SVN r12164.	2006-10-18 14:01:44 +00:00
Ralph Castain	0c0fe022ff	This is a first cut at fixing the problem of comm_spawn children being mapped onto the same nodes as their parents. I am not convinced the behavior implemented here is the long-term right one, but hopefully it will help alleviate the situation for now. In this implementation, we begin mapping on the first node that has at least one slot available as measured by the slots_inuse versus the soft limit. If none of the nodes meet that criterion, we just start at the beginning of the node list since we are oversubscribed anyway. Note that we ignore this logic if the user specifies a mapping - then it's just "user beware". The real root cause of the problem is that we don't adjust sched_yield as we add processes onto a node. Hence, the node becomes oversubscribed and performance goes into the toilet. What we REALLY need to do to solve the problem is: (a) modify the PLS components so they reuse the existing daemons, (b) create a way to tell a running process to adjust its sched_yield, and (c) modify the ODLS components to update the sched_yield on a process per the new method Until we do that, we will continue to have this problem - all this fix (and any subsequent one that focuses solely on the mapper) does is hopefully make it happen less often. This commit was SVN r12145.	2006-10-17 19:35:00 +00:00
Tim Prins	720eb88cad	Make no-op function match new interface. This commit was SVN r12142.	2006-10-17 17:34:06 +00:00
Tim Prins	8b0170148e	Add some missing headers. This commit was SVN r12141.	2006-10-17 17:28:02 +00:00
Tim Prins	5d31332f97	Goodbye poe, long live LoadLeveler... This commit was SVN r12140.	2006-10-17 17:07:48 +00:00
Ralph Castain	13227e36ab	This commit looks a lot bigger than it is, so relax :-) Fix the problem observed by multiple people that comm_spawned children were (once again) being mapped onto the same nodes as their parents. This was caused by going through the RAS a second time, thus overwriting the mapper's bookkeeping that told RMAPS where it had left off. To solve this - and to continue moving forward on the ORTE development - we introduce the concept of attributes to control the behavior of the RM frameworks. I defined the attributes and a list of attributes as new ORTE data types to make it easier for people to pass them around (since they are now fundamental to the system, and therefore we will be packing and unpacking them frequently). Thus, all the functions to manipulate attributes can be implemented and debugged in one place. I used those capabilities in two places: 1. Added an attribute list to the rmgr.spawn interface. 2. Added an attribute list to the ras.allocate interface. At the moment, the only attribute I modified the various RAS components to recognize is the USE_PARENT_ALLOCATION one (as defined in rmgr_types.h). So the RAS components now know how to reuse an allocation. I have debugged this under rsh, but it now needs to be tested on a wider set of platforms. This commit was SVN r12138.	2006-10-17 16:06:17 +00:00
Brian Barrett	f5b8f1f2f0	Work around Automake not knowing how to properly configure libtool to build Objective C libraries Refs trac:483 This commit was SVN r12080. The following Trac tickets were found above: Ticket 483 --> https://svn.open-mpi.org/trac/ompi/ticket/483	2006-10-10 20:14:26 +00:00
George Bosilca	ad5810e33f	ORTE_DECLSPEC what needs to be ORTE_DECLSPES. This commit was SVN r11997.	2006-10-05 05:22:22 +00:00
Ralph Castain	121f834776	Continue bringing comm_spawn back online. Ensure all RM frameworks post their HNP receives. Fix the rmgr proxy component. Still need some work on the proxy component, and on job termination for persistent daemon case. This commit was SVN r11928.	2006-10-02 00:46:31 +00:00
Brian Barrett	95ba51fbd4	* Clean up debugging output so that it's useful * Error message in an NSError object is localizedDescription, not localizedErrorReason. The latter is a decription of how the error can occur, which is usually nothing in XGrid frameworks. * Clean up silly error in finding the Kerberos Service Principal when using Kerberos authenticaion * Print useful error message when a connection unexpectedly closes, as this is usually authentication related... This commit was SVN r11923.	2006-10-01 22:43:17 +00:00
Tim Prins	1b35e7adff	cleanup This commit was SVN r11863.	2006-09-28 13:28:48 +00:00
Brian Barrett	9733c8e3bd	Update XGrid RAS and PLS to the new infrastructure. Not yet super well tested, but starting to get there... This commit was SVN r11810.	2006-09-26 03:26:45 +00:00
Tim Prins	567676f3c1	- Formatting and minor cleanup - made it so we now set the architecture of each node we discover - remove debugging output This commit was SVN r11751.	2006-09-22 13:24:32 +00:00
Tim Prins	83a7f6e4de	Fix for bug #369 . LoadLeveler only sets LOADL_PROCESSOR_LIST when there are 128 or less tasks allocated to a job. The POE RAS relied on this variable so I created a new RAS which uses the LoadLeveler API instead of relying on the environment variable. This still needs some testing, so for now we use the POE RAS whenever LOADL_PROCESSOR_LIST, otherwise we fall back on this component. Unfortunately, this will require an autogen... This commit was SVN r11732.	2006-09-21 00:08:49 +00:00
Tim Prins	c4db5654fa	Fix for bug #370 The POE ras did not correctly enter the number of slots per node. This fixes that. This commit was SVN r11716.	2006-09-19 16:27:15 +00:00
Jeff Squyres	3e239f4532	Add a missing .ompi_ignore This commit was SVN r11666.	2006-09-15 02:36:22 +00:00
Ralph Castain	37dfdb76eb	Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done. This commit was SVN r11661.	2006-09-14 21:29:51 +00:00
Josh Hursey	908f31fe9f	Fix a code clarity issue in the POE PLS. Allow the POE RAS to be compled for linux as well as AIX. The POE RAS is really a Loadleveler RAS, and IU now has a cluster that uses Loadleveler in a Linux environment (BigRed). This seems to be the only thing we need to do so far to run Open MPI on BigRed. Yay :) This commit was SVN r11600.	2006-09-09 05:13:15 +00:00
Ralph Castain	9e6e9b8619	Fix a couple of variable declarations This commit was SVN r11467.	2006-08-28 13:28:10 +00:00
George Bosilca	693c835137	No need to cast as the returned value is already in the expected type. This commit was SVN r11458.	2006-08-28 04:10:43 +00:00
Pak Lui	131f0eff04	fix the verbose value. This commit was SVN r11418.	2006-08-24 21:30:08 +00:00
Pak Lui	65a524dd0d	- need to provide option for showing the grid engine's JOB_ID in case the grid engine job needs to be killed - clean up the orted_path and debug message This commit was SVN r11413.	2006-08-24 20:27:19 +00:00
George Bosilca	f52c10d18e	And ORTE is ready for prime-time. All Windows tricks are in: - use the OPAL functions for PATH and environment variables - make all headers C++ friendly - no unamed structures - no implicit cast. Plus a full implementation for the orte_wait functions. This commit was SVN r11347.	2006-08-23 03:32:36 +00:00
George Bosilca	6afa4c6c64	Windows friendly version. We have to split the OMPI_DECLSPEC in at least 3 different macros, one for each project. Therefore, now we have OPAL_DECLSPEC, ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project. This commit was SVN r11270.	2006-08-20 15:54:04 +00:00
Ralph Castain	8c7f0ed9ae	Change the SOH to the new State Monitoring and Reporting (SMR) framework. New API's will be appearing in the new framework shortly - this just gets the name change into the system. Other changes: 1. Remove the old xcpu components as they are not functional. 2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one. This will require an autogen/configure, I'm afraid. This commit was SVN r11228.	2006-08-16 16:35:09 +00:00
Ralph Castain	5dfd54c778	With the branch to 1.2 made.... Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced). Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up). I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t). In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but... Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems. This commit was SVN r11204.	2006-08-15 19:54:10 +00:00
Pak Lui	8fab3d5b82	* Inadvertently removed a wrong variable during the last change. This commit was SVN r11157.	2006-08-11 16:00:39 +00:00
Ralph Castain	59d6f1e2eb	Remove ompi_ignores on gridengine components as this seems resolved - thanks Pak for quick response! Fixed a few very minor compiler complaints in the pls_gridengine_module.c file. ISO C is less forgiving about where variables get declared. This commit was SVN r11156.	2006-08-11 15:32:17 +00:00
Pak Lui	99a0521e44	* Fix the issue that Ralph observed in MacOS X with an invalid header file and other warnings. This commit was SVN r11155.	2006-08-11 15:04:51 +00:00
Ralph Castain	5fd6306c2f	Add ompi_ignores until the configuration can be fixed This commit was SVN r11154.	2006-08-11 14:11:41 +00:00
Pak Lui	08352878cc	* Added in new ras and pls components to support Sun N1 Grid Engine (N1GE) 6 and its open source version as the job launchers for ORTE. This commit was SVN r11153.	2006-08-10 21:46:52 +00:00
Ralph Castain	ddd575d126	Ensure that the localhost gets placed on the registry with the same name as found in the system_info structure. Otherwise, we wind up with confusion in the session directory names. This commit was SVN r11139.	2006-08-09 15:26:37 +00:00
George Bosilca	3daa063772	Make the format and the arguments matchs. This commit was SVN r10734.	2006-07-11 15:10:44 +00:00
George Bosilca	a9df5035f9	Remove unused variable. This commit was SVN r10712.	2006-07-11 00:30:51 +00:00
Ralph Castain	3d220cbd48	This patch fixes several issues relating to comm_spawn and N1GE. In particular, it does the following: 1. Modifies the RAS framework so it correctly stores and retrieves the actual slots in use, not just those that were allocated. Although the RAS node structure had storage for the number of slots in use, it turned out that the base function for storing and retrieving that information ignored what was in the field and simply set it equal to the number of slots allocated. This has now been fixed. 2. Modified the RMAPS framework so it updates the registry with the actual number of slots used by the mapping. Note that daemons are still NOT counted in this process as daemons are NOT mapped at this time. This will be fixed in 2.0, but will not be addressed in 1.x. 3. Added a new MCA parameter "rmaps_base_no_oversubscribe" that tells the system not to oversubscribe nodes even if the underlying environment permits it. The default is to oversubscribe if needed and the underlying environment permits it. I'm sure someone may argue "why would a user do that?", but it turns out that (looking ahead to dynamic resource reservations) sometimes users won't know how many nodes or slots they've been given in advance - this just allows them to say "hey, I'd rather not run if I didn't get enough". 4. Reorganizes the RMAPS framework to more easily support multiple components. A lot of the logic in the round_robin mapper was very valuable to any component - this has been moved to the base so others can take advantage of it. 5. Added a new test program "hello_nodename" - just does "hello_world" but also prints out the name of the node it is on. 6. Made the orte_ras_node_t object a full ORTE data type so it can more easily be copied, packed, etc. This proved helpful for the RMAPS code reorganization and might be of use elsewhere too. This commit was SVN r10697.	2006-07-10 14:10:21 +00:00
Ralph Castain	bc7690bcb0	Fix the bproc allocator. This is just a bandaid for 1.x that will be fixed more thoroughly in 2.0. Basically, the problem was that the allocator was grabbing everything on the cluster for which the user had access privilege. Thus, if a user had two sessions operable, each with its own allocation, mpirun in each session would grab both sets of nodes and use them. Not very polite. This commit was SVN r10683.	2006-07-06 18:31:14 +00:00
Jeff Squyres	538965aeb0	Final merge of stuff from /tmp/tm-stuff tree (merged through /tmp/tm-merge). Validated by RHC. Summary: - Add --nolocal (and -nolocal) options to orterun - Make some scalability improvements to the tm pls This commit was SVN r10651.	2006-07-04 20:12:35 +00:00
George Bosilca	5df94f812e	Aren't we supposed to release the value on all possible execution paths ? This commit was SVN r9757.	2006-04-27 17:31:01 +00:00
Jeff Squyres	018a4b98ff	- Ensure that "context" is initialized to NULL - Ensure that we don't free a NULL context - Add a few {}'s This commit was SVN r9055.	2006-02-16 04:09:29 +00:00
David Daniel	ff7a2c7967	Fixes for BJS (broken since merge) This commit was SVN r9043.	2006-02-15 01:14:50 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
Ralph Castain	4b9f015c0b	Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list. This commit was SVN r8912.	2006-02-07 03:32:36 +00:00
Jeff Squyres	31336e4773	Add some missing headers / correct one installation directory This commit was SVN r8408.	2005-12-08 04:00:52 +00:00
Jeff Squyres	6fbd321442	Fix a bunch of install locations for header files This commit was SVN r8406.	2005-12-08 00:54:44 +00:00
Brian Barrett	20cea60b82	* fix "make distclean" error in PML * turns out (duh!) that there was a reason that the <projectdir>dir variable was set in the AM conditional. If not, stupid directories are created and not needed... duh. This commit was SVN r8205.	2005-11-20 07:41:09 +00:00
Brian Barrett	8faa1884f0	* The last of the build system optimizations. Combine the component and component/base Makefile.am files, reducing the time configure spends stamping out Makefiles at the end * Install base_impl.h file when devel-headers are being installed This commit was SVN r8200.	2005-11-20 01:03:01 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
Tim Woodall	aa5b61e4f1	corrections for multiple app contexts This commit was SVN r7939.	2005-10-31 20:37:44 +00:00
Tim Woodall	60754acae8	- modified rmaps data structures to point directly to ras node - modified rsh to NOT query for each nodes mapping, as all data is already available in the rmaps structures This commit was SVN r7894.	2005-10-27 17:04:10 +00:00
Brian Barrett	1302cb4072	The next in a long line of crazed build system changes from Brian. This was originally suggested by Ralf Wildenhues, to try to speed autogen, configure, and make (and possibly even make install). Use automake's include directive to drastically reduce the number of Makefile files (although the number of Makefile.am files is the same - most are just included in a top-level Makefile.am). Also use an Automake SUBDIRs feature to eliminate the dynamic-mca tree, which was no longer really needed. This makes adding a framework easier (since you don't have to remember the dynamic-mca tree) and makes building faster (as make doesn't have to recurse through the dynamic-mca tree) This commit was SVN r7777.	2005-10-17 00:21:10 +00:00
Thara Angskun	8b59de0f37	Import RAS for POE This commit was SVN r7748.	2005-10-13 14:08:17 +00:00
Brian Barrett	128389758f	* fix compile error in XGrid PLS that got introduced sometime in the not too distant past * work around apparently broken handling of max_slots somewhere along the line by just setting it to 0 Both changes should go to the trunk. This commit was SVN r7710.	2005-10-12 00:41:14 +00:00
Josh Hursey	af9ccdf04a	need to use get_first instead of get_begin since we don't want to execute this loop if "nodes" is an empty list. get_first, in this loop context, allows us to do just that, while get_begin doesn't. This fixes a --host problem that appeared on the Linux PPC64 build. This commit was SVN r7703.	2005-10-11 21:33:04 +00:00
Jeff Squyres	0629cdc2d7	Bring back the changes from /tmp/jjhursey-rmaps. Specific merge command: svn merge -r 7567:7663 https://svn.open-mpi.org/svn/ompi/tmp/jjhursey-rmaps . (where "." is a trunk checkout) The logs from this branch are much more descriptive than I will put here (including a really long description from last night). Here's the short version: - fixed some broken implementations in ras and rmaps - "orterun --host ..." now works and has clearly defined semantics (this was the impetus for the branch and all these fixes -- LANL had a requirement for --host to work for 1.0) - there is still a little bit of cleanup left to do post-1.0 (we got correct functionality for 1.0 -- we did not fix bad implementations that still "work") - rds/hostfile and ras/hostfile handshaking - singleton node segment assignments in stage1 - remove the default hostfile (no need for it anymore with the localhost ras component) - clean up pls components to avoid duplicate ras mapping queries - [possible] -bynode/-byslot being specific to a single app context This commit was SVN r7664.	2005-10-07 22:24:52 +00:00
Jeff Squyres	b79c46dbf6	Downgrade the default priority to 75, just to give leeway (same as the slurm pls). This commit was SVN r7624.	2005-10-04 19:18:52 +00:00
Jeff Squyres	fcef1774d5	Per advice from Ralf W., change the pkgdata declarations in Makefile.am's to be a slightly more correct (and, more importantly, less error-prone) construct. This commit was SVN r7554.	2005-09-30 13:32:39 +00:00
Josh Hursey	a23370c007	Converted some MCA parameters from the old version to the new. Have the ras_base_schedule_policy MCA parameter working once again. before it would only do slot based allocation, even if the MCA parameter was set properly. Currently you can specify to orterun a node allocation by either: -mca ras_base_schedule_policy node -bynode and slot allocation (which is the default) by: -mca ras_base_schedule_policy slot -byslot This commit was SVN r7513.	2005-09-27 02:54:15 +00:00
Tim Woodall	c38ebe2c6a	support -H,-host,--host option for rsh/ssh launch This commit was SVN r7484.	2005-09-22 16:09:23 +00:00
Tim Woodall	194150b81c	someone broke this... This commit was SVN r7478.	2005-09-22 13:47:37 +00:00
Andrew Friedley	555ae37255	Add lib{opal,orte,mpi}.la to appropriate LIBADD's, some whitespace cleanup as well. This commit was SVN r7477.	2005-09-22 12:28:54 +00:00
Tim Woodall	84e0d89497	correction This commit was SVN r7447.	2005-09-20 19:20:39 +00:00
Tim Woodall	29d14281c8	use the specified host names (if provided) This commit was SVN r7442.	2005-09-20 13:33:11 +00:00
Tim Woodall	6c885acb91	corrections to handle host specifications This commit was SVN r7441.	2005-09-20 13:32:08 +00:00
Tim Woodall	75d9119cf3	correction This commit was SVN r7436.	2005-09-19 21:35:39 +00:00
Tim Woodall	e1ec160858	lookup available nodes based on mapping data (if available) This commit was SVN r7435.	2005-09-19 21:31:00 +00:00
Brian Barrett	2787d993a9	* Add checks for fork/execve/setpgid for slurm components so that they automagically don't build on platforms without such things * Fix for mistaken use of cache variable in assembly setup * one more cached test hits the books This commit was SVN r7404.	2005-09-16 04:51:09 +00:00
Jeff Squyres	a107ab3897	Add missing header file This commit was SVN r7294.	2005-09-11 10:14:29 +00:00
Rainer Keller	3c639efa38	- Silly cleanup This commit was SVN r7289.	2005-09-10 08:01:47 +00:00
Rainer Keller	5fed46e072	- Allow usernames to be specified in the hostfile. The following formats are parsed: user@IPv4 user@fqdn IPv4 or fqdn [username\|user-name\|user_name]=user - Try a better error-detection when parsing (recognize wrong IPs, fqdns...) This commit was SVN r7288.	2005-09-10 07:57:50 +00:00
George Bosilca	f13690f16e	The prototype of ompi_help has been changed. This commit was SVN r7218.	2005-09-07 17:15:00 +00:00
Brian Barrett	ed56e743b7	* update configure.ac to use the modern version of AC_INIT and AM_INIT_AUTOMAKE, instead of the deprecated version. * Work around dumbness in modern AC_INIT that requires the version number to be set at autoconf time (instead of at configure time, as it was before). Set the version number, minus the subversion r number, at autoconf time. Override the internal variables to include the r number (if needed) at configure time. Basically, the right thing should always happen. The only place it might not is the version reported as part of configure --help will not have an r number. * Since AM_INIT_AUTOMAKE taks a list of options, no need to specify them in all the Makefile.am files. * Addes support for subdir-objects, meaning that object files are put in the directory containing source files, even if the Makefile.am is in another directory. This should start making it feasible to reduce the number of Makefile.am files we have in the tree, which will greatly reduce the time to run autogen and configure. This commit was SVN r7211.	2005-09-07 05:54:53 +00:00
Ralph Castain	4ed7752681	Continue cleaning up memory leaks during launch This commit was SVN r7158.	2005-09-02 20:41:24 +00:00
Jeff Squyres	a7fbb0f95e	Put in comments about why these assignments exist This commit was SVN r7146.	2005-09-02 10:27:23 +00:00
Jeff Squyres	7e4f696501	Fix silly compiler warnings This commit was SVN r7145.	2005-09-02 10:26:41 +00:00
Jeff Squyres	3962c53e2e	- Add to AM_CPPFLAGS $(OPAL_LTDL_CPPFLAGS) where necessary in order to add a -I to find the included ltdl.h (vs. a system-installed ltdl.h) - Clean up kruft in a bunch of Makefile.am's to remove now-unnecessary AM_CPPFLAGS settings to get static-components.h for each framework - Move the component_repository API functions out of opal/mca/base/base.h and into opal/mca/base/mca_base_component_repository.h in order to decrease unnecessary dependencies (e.g., before this, almost everything in the tree depended on ltdl.h, which is unnecessary -- only a small number of files really need ltdl.h) This commit was SVN r7127.	2005-09-01 12:16:36 +00:00
Rainer Keller	27f1174d0e	- Only return the nodes actually allocated to the job. (necessary when orted handles several jobs simultaneously). This commit was SVN r7105.	2005-08-31 07:09:47 +00:00
Jeff Squyres	7d895a4f08	Add missing header file This commit was SVN r7071.	2005-08-28 11:50:43 +00:00
Jeff Squyres	b306adf349	The SLURM components are now open for business! This commit was SVN r7046.	2005-08-26 14:43:18 +00:00
Brian Barrett	17c1bb355e	* more memory leak fixes - mainly string params not being freed at end of time * Added code to free dps structures at shutdown This commit was SVN r7043.	2005-08-26 02:08:23 +00:00
Brian Barrett	3e8740e740	* mostly working SLURM component. Had to add a sds for the daemons so that we could vector launch the daemons and still have the nodenames fixed up in the end This commit was SVN r7041.	2005-08-25 22:29:23 +00:00
Jeff Squyres	d5909421a9	Register the priority param in open so that ompi_info can see it This commit was SVN r7034.	2005-08-25 16:37:24 +00:00
Jeff Squyres	1649c7e855	Find out from SLURM how many slots per node we have This commit was SVN r7031.	2005-08-25 15:51:58 +00:00
Jeff Squyres	d0e847d1ed	Allow oversubscription This commit was SVN r7027.	2005-08-25 11:02:49 +00:00
Jeff Squyres	a6dd3537f1	Minor fixes. This commit was SVN r7026.	2005-08-25 02:59:55 +00:00
Jeff Squyres	072a59cc02	Properly register the MCA param during the open call This commit was SVN r7014.	2005-08-24 20:50:26 +00:00
Jeff Squyres	28f716542e	First cut of the SLURM ras. Seems to be working! Now need to write SLURM pls... This commit was SVN r7008.	2005-08-24 19:15:11 +00:00
Jeff Squyres	018504480a	- Update svn:ignore - Update to new MCA param API - Update to new #include format This commit was SVN r7007.	2005-08-24 18:37:28 +00:00
Jeff Squyres	72d2abe72e	Remove some outdated comments This commit was SVN r7006.	2005-08-24 18:30:09 +00:00
Brian Barrett	e737bba753	* version of the tm pls that uses the proxy orteds, avoiding all the nasty multi-client issues the old version had. Also, ignore the NULL iof component, since we shouldn't use it when using the proxy orteds This commit was SVN r6939.	2005-08-19 16:49:59 +00:00
Brian Barrett	80f27b5d87	* fix some bit rot in tm pls/ras * remove src/ directory for tm pls/ras This commit was SVN r6937.	2005-08-19 14:46:11 +00:00
Tim Woodall	ab4aac2c14	stubs This commit was SVN r6930.	2005-08-18 19:46:42 +00:00
Jeff Squyres	cce0950df7	- change a bunch of OMPI_* constants or ORTE_* equivalents - change the framework opens to [mostly] use the new MCA param API - properly pass in framework debug output streams to the mca_base_component_open() function This commit was SVN r6888.	2005-08-15 18:25:35 +00:00
Josh Hursey	22c7f2b3e0	Quite a range of small changes. ns_replica.c - Removed the error logging since I use this function in orte_init_stage1 to check if we have created a cellid yet or not. ras_types.h & rase_base_node.h - This was an empty file. moved the orte_ras_node_t from base/ras_base_node.h to this file. - Changed the name of orte_ras_base_node_t to orte_ras_node_t to match the naming mechanisms in place. ras.h - Exposed 2 functions: - node_insert: This takes a list of orte_ras_base_node_t's and places them in the Node Segment of the GPR. This is to be used in orte_init_stage1 for singleton processes, and the hostfile parsing (see rds_hostfile.c). This just puts in the appropriate API interface to keep from calling the orte_ras_base_node_insert function directly. - node_query: This is used in hostfile parsing. This just puts in the appropriate API interface to keep from calling the orte_ras_base_node_query function directly. - Touched all of the implemented components to add reference to these new function pointers ras_base_select.c & ras_base_open.c - Add and set the global module reference rds.h - Exposed 1 function: - store_resource: This stores a list of rds_cell_desc_t's to the Resource Segment. This is used in conjunction with the orte_ras.node_insert function in both the orte_init_stage1 for singleton processes and rds_hostfile.c rds_base_select.c & rds_base_open.c - Add and set the global module reference rds_hostfile.c - Added functionality to create a new cellid for each hostfile, placing each entry in the hostfile into the same cellid. Currently this is commented out with the cellid hard coded to 0, with the intention of taking this out once ORTE is able to handle multiple cellid's - Instead of just adding hosts to the Node Segment via a direct call to the ras_base_node_insert() function. First add the hosts to the Resource Segment of the GPR using the orte_rds.store_resource() function then use the API version of orte_ras.node_insert() to store the hosts on the Node Segment. - Add 1 new function pointer to module as required by the API. rds_hostfile_component.c - Converted this to use the new MCA parameter registration orte_init_stage1.c - It is possible that a cellid was not created yet for the current environment. So I put in some logic to test if the cellid 0 existed. If it does then continue, otherwise create the cellid so we can properly interact with the GPR via the RDS. - For the singleton case we insert some 'dummy' data into the GPR. The RAS matches this logic, so I took out the duplicate GPR put logic, and replaced it with a call to the orte_ras.node_insert() function. - Further before calling orte_ras.node_insert() in the singleton case, we also call orte_rds.store_resource() to add the singleton node to the Resource Segment. Console: - Added a bunch of new functions. Still experimenting with many aspects of the implementation. This is a checkpoint, and has very limited functionality. - Should not be considered stable at the moment. This commit was SVN r6813.	2005-08-11 19:51:50 +00:00

1 2 3 4 5 ...

271 Коммитов