openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	53af94fd87	Modify the configure system so that gridengine support is only built in specific conditions: 1. --with-sge, always builds 2. --without-sge, never builds 3. if neither is specified, build if and only if either SGE_ROOT is set or "qrsh" is found in the path This commit was SVN r16422.	2007-10-10 21:39:16 +00:00
Josh Hursey	6e5341c659	Forgot to move a header in the code movement. This commit was SVN r16420.	2007-10-10 15:39:40 +00:00
Ralph Castain	82a8e2d10d	Reorganize the odls framework to place common functionality in the base, thus making maintenance easier. We still need this to be a framework as some environments (e.g., bproc) require significantly different functionality. However, there is quite a bit of commonality across the components, so this ensures that fixes in one get propagated across the others. This patch also fixes a minor bug discovered along the way: we had "lost" the passing of the oversubscribed condition flag from the mapper to the orteds. Thus, we were not setting sched_yield correctly when in oversubscribed conditions (except when a hostfile was specified - different logic there because we treat the number of slots allocated on the node as "uncertain") I did not modify the process component in this patch - I will send a proposed patch to the maintainers of that component so they can review it first. This commit was SVN r16418.	2007-10-10 15:02:10 +00:00
Josh Hursey	7f833a9cb2	silence a warning that is triggered on restart This commit was SVN r16417.	2007-10-10 14:25:49 +00:00
Ethan Mallove	d0b61db65c	Add in a missing #include for Solaris builds. This commit was SVN r16416.	2007-10-10 12:49:15 +00:00
Josh Hursey	aa8391f888	Local and global coordinators should be the only ones involved in the movement of checkpoint files. This reduces the overhead on the applicaiton. This commit was SVN r16412.	2007-10-09 19:52:47 +00:00
Galen Shipman	fda1306807	revert my stupidity.. This commit was SVN r16410.	2007-10-09 19:01:20 +00:00
Josh Hursey	8fe2ef5647	a missing include This commit was SVN r16402.	2007-10-09 14:32:36 +00:00
Josh Hursey	7437f37e96	This commit contains the following: * Fix some missing includes in a few places. * Add the cr_request() functionality to the BLCR CRS component. We are now dependent upon the 0.6.* series of BLCR. * Made the CR notification mechanism a registered function. This way we can have an OPAL-only version and it can be replaced at runtime with the ORTE version. * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only CR functionality when the user wants it. Default: Disabled. * Fix the placement of a checkpoint request check in MPI_Init * Pull the OPAL notification mechanism into the SnapC framework. * We no longer fork/exec the 'opal-checkpoint' command for local checkpointing, the Local coordinator in the orted does this directly. * The Local and Application coordinator talk together bypassing the OPAL notifiation mechanism. * Optimized the Local <-> App Coordinator communication. * Improved the structure used to track vpid_snapshots in the local coord. * Fix a race condition in which an application under heavy communication load may produce an inconsistent global checkpoint. This commit was SVN r16389.	2007-10-08 20:53:02 +00:00
Galen Shipman	1c1b9d5480	make cray happy This commit was SVN r16377.	2007-10-08 14:31:59 +00:00
Jeff Squyres	fff1057597	We do ''not'' want the orted to yield the processor more than necessary (it already uses blocking semantics while waiting for events on fd's, so it's not taking more cycles from MPI applications than necessary), or this can/will cause lengthy delays in orted processing. The problem we most recently saw was with routed OOB messages getting significantly delayed before being delivered to the target MPI processes. This was problematic for BTLs that use OOB messaging for wireup. There is a lengthy comment in orted_main.c that describes this in more detail. Setting the orted to not voluntarily call yield() prevents this bad behavior. The orted only runs in small bursts anyway (and blocks the rest of the time), so this is not harmful to application performance. This commit was SVN r16365.	2007-10-06 09:24:51 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Brian Barrett	3a0067249c	The previous hack to deal with Libtool not speaking Objective C stopped working with Automake 1.10. This is a new hack, which should be much more flexible. The ras doesn't contain any Objective C, so remove the hack entirely from that Makefile.am. This commit was SVN r16269.	2007-09-30 03:40:25 +00:00
Rolf vandeVaart	a87267ef92	Fix a build error on Solaris. MAXHOSTNAMELEN is defined in netdb.h. This commit was SVN r16268.	2007-09-28 20:15:28 +00:00
Josh Hursey	665a1e280b	Copyright updates that should have gone into r16252. (Someday I'll learn to do this before committing) This commit was SVN r16260. The following SVN revision numbers were found above: r16252 --> open-mpi/ompi@e10f476c87	2007-09-27 14:37:04 +00:00
Josh Hursey	e10f476c87	Bring over the jjh-filem branch which contains a non-blocking FileM interface and implementation. This has shown drastic performance benefit when transferring Many files at roughly the same time. I tested this for many different filem operations and everything was working fine. Let me know if you have any problems with this functionality. Some Notes: - opal-checkpoint now has a 'quiet' flag to keep it from being too verbose. - FileM RSH component is fully non-blocking. - FileM RSH component has incomming connection throttling since by default ssh only allows 10 concurrent scp connections to any single host. This default can be adjusted via an MCA parameter. {{{-mca filem_rsh_max_incomming 10}}} - There is an MCA parameter for max outgoing connections, but it is currently not implemented. If someone needs it then it should not be hard to implement. {{{-mca filem_rsh_max_outgoing 10}}} - Changed the FileM request structure so that it is a bit more explicit and flexible. - Moved the 'preload-binary' and 'preload-files' functionality into odls/base allowing for code reuse in the 'process' and 'default' ODLS components. - Fixed a bug in the process name resolution which broke the 'preload-*' functionality due to GPR table structure changes. - The FileM RSH component might be able to see even more speedup from using a thread pool to operate on the work_pool structures, but that is for future work. - Added a 'opal-show-help' file to ODLS Base This commit was SVN r16252.	2007-09-27 13:13:29 +00:00
Josh Hursey	b5fc722c35	Add a flag to 'pretend' to do filem in snapc. This is useful when doing performance characterization, and should not be used by anyone doing anything else since it will not produce a globally consistent checkpoint in this mode. This commit was SVN r16192.	2007-09-24 16:19:45 +00:00
Jeff Squyres	f9b9beba77	Allow the LSF components to be shipped in the nightly tarball and open it up to others. This commit was SVN r16143.	2007-09-17 22:42:33 +00:00
Shiqing Fan	d4a7fb1378	- A small fix of format. This commit was SVN r16138.	2007-09-17 12:10:04 +00:00
George Bosilca	d32a54d74e	There is no values[1] ... How did the compilers goes away with this !!! This commit was SVN r16132.	2007-09-14 21:33:25 +00:00
George Bosilca	6897926dce	Not used anymore. This commit was SVN r16129.	2007-09-14 21:20:19 +00:00
George Bosilca	4e66376e66	Fix memory leak (Coverty 702). This commit was SVN r16122.	2007-09-13 20:11:38 +00:00
Ralph Castain	45986ad2aa	Add support to signal application procs for LSF This commit was SVN r16120.	2007-09-13 18:09:14 +00:00
Ralph Castain	9fa254c017	Provide a better error message when a daemon unexpectedly dies under SLURM so we differentiate between fail to start and aborting while the app is running. This commit was SVN r16115.	2007-09-12 20:53:50 +00:00
Rolf vandeVaart	a289ac114a	1. Remove some #ifdef 0 code. 2. Remove some unnecessary code that was causing a SEGV. There may be some more work to be done, but at least orte-clean is functional again. This commit was SVN r16111.	2007-09-12 19:50:58 +00:00
Josh Hursey	b4c68c0925	Turn back on the absolute path protection for the moment. It is masking a bug that I'm tracking down in the SNAPC FULL - FILEM interations Also make sure to cleanout the filem structure before asking for another checkpoint file when not storing the files in place. This commit was SVN r16109.	2007-09-12 18:19:39 +00:00
George Bosilca	e5d316dba6	Coverty: fix issues with using a string once it get freed. The problem, is that the mca_base_register_string don't set the result to NULL is an error occurs. This commit was SVN r16108.	2007-09-12 18:16:53 +00:00
Ralph Castain	f80ea093a2	Ensure that the orteds do not directly respond to USR1/2 signals. Those signals are trapped by mpirun and propagated from there - at most, the orteds are involved in the propagation process, but should never do anything on their own. This commit was SVN r16098.	2007-09-12 14:32:31 +00:00
Shiqing Fan	548a4fe943	- Use IOVBASE_TYPE instead of char to avoid warnings on some systems. This commit was SVN r16092.	2007-09-11 16:24:23 +00:00
Shiqing Fan	c1065d8262	- Some more type casts. This commit was SVN r16087.	2007-09-11 11:28:43 +00:00
Shiqing Fan	dcee7e4229	- Should not use ORTE_DECLSPEC with initialization. This commit was SVN r16086.	2007-09-11 10:13:53 +00:00
Ralph Castain	45767b038c	Ensure that no-daemonize is correctly set This commit was SVN r16079.	2007-09-10 14:50:54 +00:00
Brian Barrett	cfe737d1f9	Fix some mistaken error checks -- errors will be less than zero, not greater than zero This commit was SVN r16008.	2007-08-29 18:52:51 +00:00
Jeff Squyres	5628084fec	Fix Coverity CID 463: remove unused variable / dead code. This commit was SVN r15999.	2007-08-29 01:30:15 +00:00
Jeff Squyres	8f10c285ef	Fix Coverity CID 466: remove unused variable / dead code. This commit was SVN r15998.	2007-08-29 01:25:03 +00:00
Brian Barrett	dcf678dbab	Fix heterogeneous issue with non-blocking RML receive, where the sender field could be in the wrong endianness This commit was SVN r15989.	2007-08-28 20:54:52 +00:00
Josh Hursey	5a029a47bd	forgot to separate the arguments This commit was SVN r15940.	2007-08-21 19:43:41 +00:00
Josh Hursey	db79f2392e	Make sure to enable C/R support for the HNP when restarting. This commit was SVN r15931.	2007-08-19 20:43:33 +00:00
Josh Hursey	729c63cf9d	Fix invalid MCA 'base' names so they appear in ompi_info. A subset of this patch needs to be applied to v1.2 Refs trac:928 This commit was SVN r15918. The following Trac tickets were found above: Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928	2007-08-18 03:05:45 +00:00
Brian Barrett	8294f6de03	The portals_utcp component doesn't actually need the POrtals libraries and only pokes at environment variables. So don't link in the libraries, as that causes a whole other set of problems. This commit was SVN r15899.	2007-08-17 03:48:39 +00:00
Andrew Friedley	2eedcd2539	Fixes trac:1047 Tie stdin to /dev/null to prevent stdin from being closed and thus making stdin not work in slurm allocations. This commit was SVN r15892. The following Trac tickets were found above: Ticket 1047 --> https://svn.open-mpi.org/trac/ompi/ticket/1047	2007-08-16 20:49:27 +00:00
Tim Prins	5a795128af	Change it so that different components in orte use unique rml tags This commit was SVN r15881.	2007-08-16 14:02:35 +00:00
Brian Barrett	fe0d1f30d5	need errno.h This commit was SVN r15862.	2007-08-15 02:15:33 +00:00
Brian Barrett	330003361b	* Free memory from asprintf * need to compare ERANGE to errno This commit was SVN r15860.	2007-08-14 21:12:00 +00:00
Brian Barrett	881dd0654e	* Provide a hook so that a PLS can tell the orted it's starting that it needs to override the default umask. By default, this is not used since most environments do what the user would expect without any help. * Have TM use the newly added umask hook, so that processes inherit the user's umask from mpirun rather than the pbs_mom's umask, which the user has no control over. This commit was SVN r15858.	2007-08-14 18:44:52 +00:00
Shiqing Fan	eea712f9ab	- Export those components in correct way. This commit was SVN r15804.	2007-08-08 16:20:17 +00:00
Brian Barrett	59524a9009	Fix issue where we set state to SHUTDOWN rather than CONNECTING when we had to switch socket types. This commit was SVN r15784.	2007-08-06 22:55:41 +00:00
Ralph Castain	eb3a97f428	Don't overwrite the local rank key This commit was SVN r15776.	2007-08-06 16:56:23 +00:00
Shiqing Fan	d10570786c	- A small fix, add missed flag parameters. This commit was SVN r15774.	2007-08-06 16:15:38 +00:00
George Bosilca	d658a477af	Update the help file to match the real name of the required argument. This commit was SVN r15762.	2007-08-04 00:35:55 +00:00

1 2 3 4 5 ...

1413 Коммитов