openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	213b5d5c6e	Per long threads on the mailing list and much confusion discussion about linkers, have all OPAL, ORTE, and OMPI components '''not'' link against the OPAL, ORTE, or OMPI libraries. See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a better-formatted version of the same info). This commit was SVN r16968.	2007-12-15 13:32:02 +00:00
Ralph Castain	b6196e8a39	When we can detect that a daemon has failed, then we would like to terminate the system without having it lock up. The "hang" is currently caused by the system attempting to send messages to the daemons (specifically, ordering them to kill their local procs and then terminate). Unfortunately, without some idea of which daemon has died, the system hangs while attempting to send a message to someone who is no longer alive. This commit introduces the necessary logic to avoid that conflict. If a PLS component can identify that a daemon has failed, then we will set a flag indicating that fact. The xcast system will subsequently check that flag and, if it is set, will send all messages direct to the recipient. In the case of "kill local procs" and "terminate", the messages will go directly to each orted, thus bypassing any orted that has failed. In addition, the xcast system will -not- wait for the messages to complete, but will return immediately (i.e., operate in non-blocking mode). Orterun will wait (via an event timer) for a period of time based on the number of daemons in the system to allow the messages to attempt to be delivered - at the end of that time, orterun will simply exit, alerting the user to the problem and -strongly- recommending they run orte-clean. I could only test this on slurm for the case where all daemons unexpectedly died - srun apparently only executes its waitpid callback when all launched functions terminate. I have asked that Jeff integrate this capability into the OOB as he is working on it so that we execute it whenever a socket to an orted is unexpectedly closed. Meantime, the functionality will rarely get called, but at least the logic is available for anyone whose environment can support it. This commit was SVN r16451.	2007-10-15 18:00:30 +00:00
Jeff Squyres	e2df42eea3	Move the <sys/wait.h> below "orte_config.h" This commit was SVN r16424.	2007-10-11 11:31:09 +00:00
Ralph Castain	53af94fd87	Modify the configure system so that gridengine support is only built in specific conditions: 1. --with-sge, always builds 2. --without-sge, never builds 3. if neither is specified, build if and only if either SGE_ROOT is set or "qrsh" is found in the path This commit was SVN r16422.	2007-10-10 21:39:16 +00:00
Ethan Mallove	d0b61db65c	Add in a missing #include for Solaris builds. This commit was SVN r16416.	2007-10-10 12:49:15 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Brian Barrett	3a0067249c	The previous hack to deal with Libtool not speaking Objective C stopped working with Automake 1.10. This is a new hack, which should be much more flexible. The ras doesn't contain any Objective C, so remove the hack entirely from that Makefile.am. This commit was SVN r16269.	2007-09-30 03:40:25 +00:00
Jeff Squyres	f9b9beba77	Allow the LSF components to be shipped in the nightly tarball and open it up to others. This commit was SVN r16143.	2007-09-17 22:42:33 +00:00
George Bosilca	6897926dce	Not used anymore. This commit was SVN r16129.	2007-09-14 21:20:19 +00:00
Ralph Castain	45986ad2aa	Add support to signal application procs for LSF This commit was SVN r16120.	2007-09-13 18:09:14 +00:00
Ralph Castain	9fa254c017	Provide a better error message when a daemon unexpectedly dies under SLURM so we differentiate between fail to start and aborting while the app is running. This commit was SVN r16115.	2007-09-12 20:53:50 +00:00
Jeff Squyres	5628084fec	Fix Coverity CID 463: remove unused variable / dead code. This commit was SVN r15999.	2007-08-29 01:30:15 +00:00
Andrew Friedley	2eedcd2539	Fixes trac:1047 Tie stdin to /dev/null to prevent stdin from being closed and thus making stdin not work in slurm allocations. This commit was SVN r15892. The following Trac tickets were found above: Ticket 1047 --> https://svn.open-mpi.org/trac/ompi/ticket/1047	2007-08-16 20:49:27 +00:00
Brian Barrett	330003361b	* Free memory from asprintf * need to compare ERANGE to errno This commit was SVN r15860.	2007-08-14 21:12:00 +00:00
Brian Barrett	881dd0654e	* Provide a hook so that a PLS can tell the orted it's starting that it needs to override the default umask. By default, this is not used since most environments do what the user would expect without any help. * Have TM use the newly added umask hook, so that processes inherit the user's umask from mpirun rather than the pbs_mom's umask, which the user has no control over. This commit was SVN r15858.	2007-08-14 18:44:52 +00:00
Shiqing Fan	eea712f9ab	- Export those components in correct way. This commit was SVN r15804.	2007-08-08 16:20:17 +00:00
Jeff Squyres	188d529beb	* We do need the LSF task ID as part of our vpid * Accidentally had the PLS LSF using the env SDS; switch it back to the LSF SDS This commit was SVN r15650.	2007-07-26 20:22:36 +00:00
Jeff Squyres	75192de1fc	LSF support is now working. W00t! May be subject to a further tweak or two. * checking lsb_init() is not sufficient to know whether you're in an LSF job or not; you also need to check for environment variable markers * remove lots of debugging output * no need for the sds lsf to call lsb_init() * remove some slurm-like dead code and a copy-n-paste error in the sds lsf This commit was SVN r15644.	2007-07-26 18:49:29 +00:00
George Bosilca	c961cb5749	The Windows support is now back in bussiness. This commit was SVN r15599.	2007-07-25 03:55:34 +00:00
Ralph Castain	f219cc1e6e	A few changes to the lsf components - mostly cleanup, no major logic changes This commit was SVN r15563.	2007-07-23 18:38:36 +00:00
Ralph Castain	ef141d1fbc	Ensure daemons know contact info for all other daemons. Update binomial xcast to work in revised design. Add debug output to orted so the daemon lets us know it launched (if --debug-daemons set) early on in case it fails during orte_init This commit was SVN r15555.	2007-07-23 15:00:39 +00:00
Jeff Squyres	78d214fec8	Oops -- didn't mean to commit the test program... This commit was SVN r15538.	2007-07-20 20:15:51 +00:00
Jeff Squyres	2baa866026	Compiles to the new API, but doesn't quite work yet... This commit was SVN r15537.	2007-07-20 19:49:27 +00:00
Brian Barrett	5b9fa7e998	reapply r15517 and r15520, which were removed in r15527 so that I could get the RML/OOB merge in slightly easier This commit was SVN r15530. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95 r15520 --> open-mpi/ompi@9cbc9df1b8 r15527 --> open-mpi/ompi@2d17dd9516	2007-07-20 02:34:29 +00:00
Brian Barrett	39a6057fc6	A number of improvements / changes to the RML/OOB layers: * General TCP cleanup for OPAL / ORTE * Simplifying the OOB by moving much of the logic into the RML * Allowing the OOB RML component to do routing of messages * Adding a component framework for handling routing tables * Moving the xcast functionality from the OOB base to its own framework Includes merge from tmp/bwb-oob-rml-merge revisions: r15506, r15507, r15508, r15510, r15511, r15512, r15513 This commit was SVN r15528. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15506 r15507 r15508 r15510 r15511 r15512 r15513	2007-07-20 01:34:02 +00:00
Brian Barrett	2d17dd9516	temporarily back our r15517 and 15520 so that I can get the RML / OOB changes to cleanly apply This commit was SVN r15527. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95	2007-07-20 01:10:34 +00:00
Ralph Castain	41977fcc95	Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but... Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point. Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings. This commit was SVN r15517.	2007-07-19 20:56:46 +00:00
Tim Prins	e41f86dfe6	add a small amount of debugging output This commit was SVN r15483.	2007-07-18 15:20:55 +00:00
Jeff Squyres	b20248709a	Next round of LSF commits. Getting farther, but it still doesn't fully work yet (everything is still .ompi_ignore'ed for everyone). This commit was SVN r15398.	2007-07-13 11:57:17 +00:00
George Bosilca	52eebd706f	Update the xgrid PLS to fit the current interface of the PLS. This commit was SVN r15396.	2007-07-13 06:18:16 +00:00
Ralph Castain	bd65f8ba88	Bring in an updated launch system for the orteds. This commit restores the ability to execute singletons and singleton comm_spawn, both in single node and multi-node environments. Short description: major changes include - 1. singletons now fork/exec a local daemon to manage their operations. 2. the orte daemon code now resides in libopen-rte 3. daemons no longer use the orte triggering system during startup. Instead, they directly call back to their parent pls component to report ready to operate. A base function to count the callbacks has been provided. I have modified all the pls components except xcpu and poe (don't understand either well enough to do it). Full functionality has been verified for rsh, SLURM, and TM systems. Compile has been verified for xgrid and gridengine. This commit was SVN r15390.	2007-07-12 19:53:18 +00:00
Jeff Squyres	aa2c64d66d	It compiles! That's a start... :-) This commit was SVN r15382.	2007-07-12 14:41:09 +00:00
Jeff Squyres	e51bb19fab	Fix some include files This commit was SVN r15381.	2007-07-12 14:22:47 +00:00
Ralph Castain	a1bf04f39e	First cut at revamping bproc support to separate it out from LANL's configuration. First cut at adding support for LSF Lots of ompi_ignores so only Jeff and I will see this stuff This commit was SVN r15321.	2007-07-10 12:43:05 +00:00
Ralph Castain	684aa1bc9f	Since universe size now is an orte thing, we may as well give it some direct support. Create rmgr set/get functions so it becomes more obvious where this value is being defined and how to retrieve it. Modify the bproc pls to pass it to the app procs when launched. Modify one of the test programs to verify it has been correctly set. This commit was SVN r15266.	2007-07-02 16:45:40 +00:00
Josh Hursey	f88aa6c273	This commit cleans up the AMCA parameter implementation a bit. * Remove the 'opal_mca_base_param_use_amca_sets' global variable * Harness the fact that you can (read should) call the cmd_line functions before initializing opal_init_util(). This pushes the MCA/GMCA/AMCA command line options into the environment before OPAL inits and starts to use these values. By putting the cmd_line parse before opal_init_util in orterun and orted we only parse the MCA parameter files once, and correctly (alleviating the need to 'recache' the files on init.) Small bits of cleanup. This commit was SVN r15219.	2007-06-27 01:03:31 +00:00
Sven Stork	0edcf1d47e	- export required symbol This commit was SVN r15190.	2007-06-25 14:27:04 +00:00
Jeff Squyres	bd56dc7e5d	Fixes trac:1060 Per suggestion, if we don't find a valid shell via getpwuid(), also check the $SHELL environment variable. Also perform a few minor cleanups along the way. This commit was SVN r15156. The following Trac tickets were found above: Ticket 1060 --> https://svn.open-mpi.org/trac/ompi/ticket/1060	2007-06-21 11:40:42 +00:00
George Bosilca	99e701062a	The Windows job scheduler PLS. Initial commit as I have to move to another Windows cluster. Right now it's not in a usable state. This commit was SVN r15113.	2007-06-17 04:54:07 +00:00
Ralph Castain	fde15ac97d	Bring the TM launcher online This commit was SVN r15076.	2007-06-14 12:33:34 +00:00
George Bosilca	8dfa06a617	Only output when the user request it. This commit was SVN r15067.	2007-06-14 04:33:18 +00:00
Pak Lui	de0f1eef89	No major changes here. Just updates to remove unused code and comments. This commit was SVN r15051.	2007-06-13 17:23:03 +00:00
Pak Lui	03a93a38c5	Added an option for daemonizing orted. The existing behavior to --no-daemonize for gridengine is not changed. This commit was SVN r15050.	2007-06-13 17:11:37 +00:00
George Bosilca	18c2bb0ed6	Don't forget to set the name argument before spawning the daemon. This commit was SVN r15047.	2007-06-13 15:45:34 +00:00
Pak Lui	8e7daea11f	bring inline more changes with r15007. This commit was SVN r15044. The following SVN revision numbers were found above: r15007 --> open-mpi/ompi@85df3bd92f	2007-06-13 15:30:18 +00:00
Ralph Castain	425fed95ff	Bring the SGE component online This commit was SVN r15043.	2007-06-13 15:02:47 +00:00
George Bosilca	9d342ccb61	Shorter warning message. This commit was SVN r15031.	2007-06-12 23:22:09 +00:00
George Bosilca	715f6012cf	The DSS pack function can use the const attribute for the src field as it is never modified by the pack functions directly. Enforce it all over the code base. This commit was SVN r15026.	2007-06-12 22:47:14 +00:00
George Bosilca	432185d617	Forget to remove the MCA parameter corresponding to the 2 unused fields in the RSH PLS component. This commit was SVN r15023.	2007-06-12 22:41:38 +00:00
George Bosilca	49e7bf3069	Be a little bit more clear when we fail to identify the shell. This commit was SVN r15022.	2007-06-12 22:40:44 +00:00

1 2 3 4 5 ...

415 Коммитов