openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	50433bf833	Turn off the new fqdn behavior pending resolution of hostfile issue This commit was SVN r18064.	2008-04-01 20:52:22 +00:00
Ralph Castain	3a4c10efd6	Delete obsolete file, cleanup obsolete cruft in another file This commit was SVN r18060.	2008-04-01 18:36:23 +00:00
Ralph Castain	dc7f45dafd	Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure. Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code. This commit was SVN r17926.	2008-03-23 23:10:15 +00:00
Galen Shipman	80ac7c87cd	don't forget command file.. This commit was SVN r17878.	2008-03-19 16:24:29 +00:00
Galen Shipman	77c8532cc9	do things in a less hacky way.. This commit was SVN r17877.	2008-03-19 16:23:56 +00:00
Galen Shipman	0fb6cf0916	make output use verbose macro.. This commit was SVN r17778.	2008-03-07 03:06:17 +00:00
Shiqing Fan	eb1dfaf4d5	Select the windows CCP component at runtime by testing if we are on Windows cluster. This commit was SVN r17776.	2008-03-07 01:31:53 +00:00
Tim Prins	f61c2333c0	Remove unneeded field, and the two uses of it. This commit was SVN r17757.	2008-03-06 12:46:36 +00:00
Tim Prins	f9916811ae	Make it so we do not mangle the options the user passes to their executeable. Fixes trac:1124 The change also: - cleans up and simplifies the command line processing code - adds an error output if more than one hostfile passed for a single app context - gets rid of the superfluous orte_app_context_map_t type, and instead use a simple argv of -host options This commit was SVN r17750. The following Trac tickets were found above: Ticket 1124 --> https://svn.open-mpi.org/trac/ompi/ticket/1124	2008-03-05 22:12:27 +00:00
Ralph Castain	edb8e32a7a	Add default hostfile parameter plus --default-hostfile command line option. Fix error message when job setup failed This commit was SVN r17724.	2008-03-05 04:54:57 +00:00
Shiqing Fan	ebf9c0441d	Set the windows components invisible. This commit was SVN r17687.	2008-03-04 17:37:17 +00:00
Shiqing Fan	ae41b5418b	Update the RAS and PLM components for Windows. These won't suffer another platforms but only windows. This commit was SVN r17686.	2008-03-04 17:13:01 +00:00
George Bosilca	9d421bea2a	Replace all occurences of orte_pointer_array by opal_pointer_array. Remove the implementation of orte_pointer_array. This commit was SVN r17636.	2008-02-28 05:32:23 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Gleb Natapov	da3e69101d	Add missing include. This commit was SVN r17493.	2008-02-18 14:55:02 +00:00
Galen Shipman	18d1d3b408	Add ORTE ALPS support (Cray XT CNL) This commit was SVN r17482.	2008-02-17 19:29:06 +00:00
Jeff Squyres	213b5d5c6e	Per long threads on the mailing list and much confusion discussion about linkers, have all OPAL, ORTE, and OMPI components '''not'' link against the OPAL, ORTE, or OMPI libraries. See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a better-formatted version of the same info). This commit was SVN r16968.	2007-12-15 13:32:02 +00:00
Ralph Castain	3dbd4d9be7	Squeeeeeeze the launch message. This is the message sent to the daemons that provides all the data required for launching their local procs. In reorganizing the ODLS framework, I discovered that we were sending a significant amount of unnecessary and repeated data. This commit resolves this by: 1. taking advantage of the fact that we no longer create the launch message via a GPR trigger. In earlier times, we had the GPR create the launch message based on a subscription. In that mode of operation, we could not guarantee the order in which the data was stored in the message - hence, we had no choice but to parse the message in a loop that checked each value against a list of possible "keys" until the corresponding value was found. Now, however, we construct the message "by hand", so we know precisely what data is in each location in the message. Thus, we no longer need to send the character string "keys" for each data value any more. This represents a rather large savings in the message size - to give you an example, we typically would use a 30-char "key" for a 2-byte data value. As you can see, the overhead can become very large. 2. sending node-specific data only once. Again, because we used to construct the message via subscriptions that were done on a per-proc basis, the data for each node (e.g., the daemon's name, whether or not the node was oversubscribed) would be included in the data for each proc. Thus, the node-specific data was repeated for every proc. Now that we construct the message "by hand", there is no reason to do this any more. Instead, we can insert the data for a specific node only once, and then provide the per-proc data for that node. We therefore not only save all that extra data in the message, but we also only need to parse the per-node data once. The savings become significant at scale. Here is a comparison between the revised trunk and the trunk prior to this commit (all data was taken on odin, using openib, 64 nodes, unity message routing, tested with application consisting of mpi_init/mpi_barrier/mpi_finalize, all execution times given in seconds, all launch message sizes in bytes): Per-node scaling, taken at 1ppn: #nodes original trunk revised trunk time size time size 1 0.10 819 0.09 564 2 0.14 1070 0.14 677 3 0.15 1321 0.14 790 4 0.15 1572 0.15 903 8 0.17 2576 0.20 1355 16 0.25 4584 0.21 2259 32 0.28 8600 0.27 4067 64 0.50 16632 0.39 7683 Per-proc scaling, taken at 64 nodes ppn original trunk revised trunk time size time size 1 0.50 16669 0.40 7720 2 0.55 32733 0.54 11048 3 0.87 48797 0.81 14376 4 1.0 64861 0.85 17704 Condensing those numbers, it appears we gained: per-node message size: 251 bytes/node -> 113 bytes/node per-proc message size: 251 bytes/proc -> 52 bytes/proc per-job message size: 568 bytes/job -> 399 bytes/job (job-specific data such as jobid, override oversubscribe flag, total #procs in job, total slots allocated) The fact that the two pre-commit trunk numbers are the same confirms the fact that each proc was containing the node data as well. It isn't quite the 10x message reduction I had hoped to get, but it is significant and gives much better scaling. Note that the timing info was, as usual, pretty chaotic - the numbers cited here were typical across several runs taken after the initial one to avoid NFS file positioning influences. Also note that this commit removes the orte_process_info.vpid_start field and the handful of places that passed that useless value. By definition, all jobs start at vpid=0, so all we were doing is passing "0" around. In fact, many places simply hardwired it to "0" anyway rather than deal with it. This commit was SVN r16428.	2007-10-11 15:57:26 +00:00
Ralph Castain	53af94fd87	Modify the configure system so that gridengine support is only built in specific conditions: 1. --with-sge, always builds 2. --without-sge, never builds 3. if neither is specified, build if and only if either SGE_ROOT is set or "qrsh" is found in the path This commit was SVN r16422.	2007-10-10 21:39:16 +00:00
Brian Barrett	3a0067249c	The previous hack to deal with Libtool not speaking Objective C stopped working with Automake 1.10. This is a new hack, which should be much more flexible. The ras doesn't contain any Objective C, so remove the hack entirely from that Makefile.am. This commit was SVN r16269.	2007-09-30 03:40:25 +00:00
Jeff Squyres	f9b9beba77	Allow the LSF components to be shipped in the nightly tarball and open it up to others. This commit was SVN r16143.	2007-09-17 22:42:33 +00:00
Josh Hursey	729c63cf9d	Fix invalid MCA 'base' names so they appear in ompi_info. A subset of this patch needs to be applied to v1.2 Refs trac:928 This commit was SVN r15918. The following Trac tickets were found above: Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928	2007-08-18 03:05:45 +00:00
Jeff Squyres	75192de1fc	LSF support is now working. W00t! May be subject to a further tweak or two. * checking lsb_init() is not sufficient to know whether you're in an LSF job or not; you also need to check for environment variable markers * remove lots of debugging output * no need for the sds lsf to call lsb_init() * remove some slurm-like dead code and a copy-n-paste error in the sds lsf This commit was SVN r15644.	2007-07-26 18:49:29 +00:00
Ralph Castain	f219cc1e6e	A few changes to the lsf components - mostly cleanup, no major logic changes This commit was SVN r15563.	2007-07-23 18:38:36 +00:00
Jeff Squyres	2baa866026	Compiles to the new API, but doesn't quite work yet... This commit was SVN r15537.	2007-07-20 19:49:27 +00:00
Brian Barrett	5b9fa7e998	reapply r15517 and r15520, which were removed in r15527 so that I could get the RML/OOB merge in slightly easier This commit was SVN r15530. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95 r15520 --> open-mpi/ompi@9cbc9df1b8 r15527 --> open-mpi/ompi@2d17dd9516	2007-07-20 02:34:29 +00:00
Brian Barrett	39a6057fc6	A number of improvements / changes to the RML/OOB layers: * General TCP cleanup for OPAL / ORTE * Simplifying the OOB by moving much of the logic into the RML * Allowing the OOB RML component to do routing of messages * Adding a component framework for handling routing tables * Moving the xcast functionality from the OOB base to its own framework Includes merge from tmp/bwb-oob-rml-merge revisions: r15506, r15507, r15508, r15510, r15511, r15512, r15513 This commit was SVN r15528. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15506 r15507 r15508 r15510 r15511 r15512 r15513	2007-07-20 01:34:02 +00:00
Brian Barrett	2d17dd9516	temporarily back our r15517 and 15520 so that I can get the RML / OOB changes to cleanly apply This commit was SVN r15527. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95	2007-07-20 01:10:34 +00:00
Ralph Castain	41977fcc95	Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but... Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point. Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings. This commit was SVN r15517.	2007-07-19 20:56:46 +00:00
Ralph Castain	d109e9a6f4	Roll in the Voltaire core/socket/etc process mapping implementation. Only change I made was to cleanup some of the diagnostic output in the odls_default component so it uses the -mca odls_base_verbose parameter. You will not see any impact from this change unless you use the syntax described in ticket #1023. I've tried as many of the RAS components as possible and saw no problem - there may be issues with other RAS components that would not compile on any of my systems. Anything that appears should be trivial to fix. This commit was SVN r15427.	2007-07-14 15:14:07 +00:00
Jeff Squyres	24a28494a6	Fix what looks like a cut-n-paste error. This will not cause everyone to run autogen because this component is .ompi_ignore'd for everyone except jsquyres and rhc. This commit was SVN r15401.	2007-07-13 14:47:03 +00:00
Jeff Squyres	b20248709a	Next round of LSF commits. Getting farther, but it still doesn't fully work yet (everything is still .ompi_ignore'ed for everyone). This commit was SVN r15398.	2007-07-13 11:57:17 +00:00
Jeff Squyres	4439734a8a	It compiles (too)! Getting a little further... This commit was SVN r15383.	2007-07-12 14:46:41 +00:00
Ralph Castain	39013e2a18	Clean up a couple of minor typos. Bring the new bproc-related RAS components online. This commit was SVN r15328.	2007-07-10 14:11:26 +00:00
Ralph Castain	a1bf04f39e	First cut at revamping bproc support to separate it out from LANL's configuration. First cut at adding support for LSF Lots of ompi_ignores so only Jeff and I will see this stuff This commit was SVN r15321.	2007-07-10 12:43:05 +00:00
Ralph Castain	684aa1bc9f	Since universe size now is an orte thing, we may as well give it some direct support. Create rmgr set/get functions so it becomes more obvious where this value is being defined and how to retrieve it. Modify the bproc pls to pass it to the app procs when launched. Modify one of the test programs to verify it has been correctly set. This commit was SVN r15266.	2007-07-02 16:45:40 +00:00
George Bosilca	715f6012cf	The DSS pack function can use the const attribute for the src field as it is never modified by the pack functions directly. Enforce it all over the code base. This commit was SVN r15026.	2007-06-12 22:47:14 +00:00
Jeff Squyres	51f286d737	Just like r14289 on the ORTE trunk: Per discussions with Brian and Ralph, make a slight correction in where components are installed. Use $pkglibdir, not $libdir/openmpi, so that when compiled in the orte trunk, components are installed to the right directory (because the component search patch is checking $pkglibdir). This commit was SVN r14345. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r14289	2007-04-12 11:19:42 +00:00
Pak Lui	e9e8dc2765	* comment out unused code This commit was SVN r14297.	2007-04-10 22:38:34 +00:00
Tim Prins	2ffc02870d	Reduce the memory usage of the GPR: - Make it so that all the GPR pointer arrays are allocated initially at 16 elements instead of 512. This saves (on a 64 bit machine) approximately 4*(# procs + # nodes) KB. - Fix up the segment prealloc function so that preallocating an existant segment is not an error, and make the areas where we do large inserts use it. Fix the orte_pointer_array to efficiently implement setting its size. Before we just realloced the array one block at a time until the desired size was reached. Now we resize it all in one realloc. This commit was SVN r14264.	2007-04-09 00:40:15 +00:00
Tim Prins	df4c468bb4	fix some more minor memory leaks This commit was SVN r14260.	2007-04-07 18:41:16 +00:00
Tim Prins	9cb455272b	Fix a pile of memory leaks in ORTE. Fix a major memory leak in the SLURM RAS, and cleanup a bit of code there. This commit was SVN r14164.	2007-03-29 00:50:56 +00:00
Jeff Squyres	2105f444ec	Add missing header file This commit was SVN r14129.	2007-03-23 00:47:30 +00:00
Sven Stork	6111ca1152	- Let's try to detect the default nodefile directory because it can different for different sites. If we cannot detect the default then we fall back to the hard coded path. This commit was SVN r14121.	2007-03-22 15:26:16 +00:00
Pak Lui	803655b555	* incorporated some of Jeff's comment regarding this fix. This commit was SVN r14070.	2007-03-19 21:59:48 +00:00
Pak Lui	da4d41e0e7	* fixed the missing fclose and eliminate the call to get_slot_count since it is not needed This commit was SVN r14066.	2007-03-19 17:47:30 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
Ralph Castain	5818a32245	Bring in a forgotten speed improvement for the TM launcher that was developed during SNL Tbird testing last year. Remove the redundant and slow calls to TM to resolve hostnames. Instead, read the host info from the PBS file during the RAS, and then just use that info in the PLS (rather than getting it again). Adjust the RMAPS mapped_node object to propagate the required launch_id info now included in the ras_node object. This provides support for those few systems that don't use nodename to launch, but instead want some id (typically an index into the array of allocated nodes). This value gets set for each node in the RAS - the RMAPS just propagates it for easy launch. This commit was SVN r13581.	2007-02-09 15:06:45 +00:00
Tim Prins	e199bf9b64	Refs trac:801 Fix compiler warning This commit was SVN r13308. The following Trac tickets were found above: Ticket 801 --> https://svn.open-mpi.org/trac/ompi/ticket/801	2007-01-25 16:12:05 +00:00
Tim Prins	4fd81b3407	Fixes trac:801 - Make it so the SLURM ras can handle different nodelist configurations - Some code cleanup and better/more informative error messages and error handling This commit was SVN r13271. The following Trac tickets were found above: Ticket 801 --> https://svn.open-mpi.org/trac/ompi/ticket/801	2007-01-24 14:45:42 +00:00

1 2 3 4

184 Коммитов