openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	87f34860fe	Protect array against crossing boundaries cmr=v1.7.5:reviewer=jsquyres This commit was SVN r30316.	2014-01-17 21:36:20 +00:00
Mike Dubman	874c4e2558	PMI2: add missing file from prev commit Refs trac:4119 This commit was SVN r30301. The following Trac tickets were found above: Ticket 4119 --> https://svn.open-mpi.org/trac/ompi/ticket/4119	2014-01-16 13:17:08 +00:00
Mike Dubman	98234b5a69	SLURM/PMI2: Fix parsing of PMI2 process mapping fixed by AlexM, reviewed by miked cmr=v1.7.4:reviewer=rhc This commit was SVN r30300.	2014-01-16 12:05:29 +00:00
Ralph Castain	58479399c3	As per RFC and telecon, deprecate cmd line options and their corresponding MCA params for old-style mapping and binding directives cmr=v1.7.5:reviewer=jsquyres:subject=deprecate old-style mapping and binding directives This commit was SVN r30298.	2014-01-15 14:48:39 +00:00
Ralph Castain	590a87c730	You can't pass static buffer definitions to rml.send as it will attempt to release them upon completion - you need to send dynamically allocated buffers This commit was SVN r30261.	2014-01-11 19:38:11 +00:00
Ralph Castain	286ff6d552	For large scale systems, we would like to avoid doing a full modex during MPI_Init so that launch will scale a little better. At the moment, our options are somewhat limited as only a few BTLs don't immediately call modex_recv on all procs during startup. However, for those situations where someone can take advantage of it, add the ability to do a "modex on demand" retrieval of data from remote procs when we launch via mpirun. NOTE: launch performance will be absolutely awful if you do this with BTLs that aren't configured to modex_recv on first message! Even with "modex on demand", we still have to do a barrier in place of the modex - we simply don't move any data around, which does reduce the time impact. The barrier is required to ensure that the other proc has in fact registered all its BTL info and therefore is prepared to hand over a complete data package. Otherwise, you may not get the info you need. In addition, the shared memory BTL can fail to properly rendezvous as it expects the barrier to be in place. This behavior will only take effect under the following conditions: 1. launched via mpirun 2. #procs is greater than ompi_hostname_cutoff, which defaults to UINT32_MAX 3. mca param rte_orte_direct_modex is set to 1. At the moment, we are having problems getting this param to register properly, so only the first two conditions are in effect. Still, the bottom line is you have to want this behavior to get it. The planned next evolution of this will be to make the direct modex be non-blocking - this will require two fixes: 1. if the remote proc doesn't have the required info, then let it delay its response until it does. This means we need a way for the MPI layer to tell the RTE "I am done entering modex data". 2. adjust the SM rendezvous logic to loop until the required file has been created Creating a placeholder to bring this over to 1.7.5 when ready. cmr=v1.7.5:reviewer=hjelmn:subject=Enable direct modex at scale This commit was SVN r30259.	2014-01-11 17:36:06 +00:00
Ralph Castain	fb9e427320	One last corner case - when encountering an overload condition (e.g., by comm_spawning more procs than we have cores) and we are using the default binding policy, do not bind the new procs to anything as this can cause major problems. Instead, let the spawn succeed since the user didn't specifically ask to be bound, and leave the new procs as unbound. Refs trac:4077 This commit was SVN r30200. The following Trac tickets were found above: Ticket 4077 --> https://svn.open-mpi.org/trac/ompi/ticket/4077	2014-01-09 22:39:34 +00:00
Ralph Castain	24e990e747	Fix comm_spawn for oversubscribed systems by correctly computing the number of available slots cmr=v1.7.4:reviewer=jsquyres:subject=Fix comm_spawn for oversubscribed systems This commit was SVN r30197.	2014-01-09 20:33:48 +00:00
Ralph Castain	9fcb46d85a	Correctly detect and handle oversubscription for comm_spawn cmr=v1.7.4:reviewer=jsquyres:subject=Correctly detect and handle oversubscription for comm_spawn This commit was SVN r30186.	2014-01-09 18:27:51 +00:00
Ralph Castain	6e5fedeb04	Oops - add verbose output to inform that cannot default bind due to no cores detected Refs trac:4074 This commit was SVN r30185. The following Trac tickets were found above: Ticket 4074 --> https://svn.open-mpi.org/trac/ompi/ticket/4074	2014-01-09 18:17:14 +00:00
Ralph Castain	4cdc291df1	Ensure slurm properly dies on abnormal termination cmr=v1.7.4:reviewer=jsquyres:subject=Ensure slurm properly dies on abnormal termination This commit was SVN r30182.	2014-01-09 16:52:02 +00:00
Jeff Squyres	87e476ebd8	Clean up many references to "rank": usually change to "process" and/or specifically delineate that we're referring to the process' rank in MPI_COMM_WORLD. Refs trac:4068 This commit was SVN r30181. The following Trac tickets were found above: Ticket 4068 --> https://svn.open-mpi.org/trac/ompi/ticket/4068	2014-01-09 16:37:49 +00:00
Ralph Castain	7e4748a0f1	Handle the case of nodes that do not report cores, and thus our default binding policy will fail even though binding is supported by defaulting to not binding on those nodes. Thanks to Paul Hargrove for reporting the problem on NetBSD. cmr=v1.7.4:reviewer=jsquyres:subject=Handle the case of nodes that do not report cores This commit was SVN r30180.	2014-01-09 16:27:58 +00:00
Ralph Castain	f179f2086b	Do a better job of reporting bindings - if someone gives a spec that binds us to all processors, then we are effectively unbound and should report it clearly instead of outputting a long line of B's. cmr=v1.7.4:reviewer=jsquyres:subject=Do a better job of reporting bindings This commit was SVN r30179.	2014-01-09 16:16:16 +00:00
Ralph Castain	2a0e4b5e62	Update the orterun help messages and man page to reflect new map/rank/bind options and defaults. Thanks to Paul Hargrove for reporting it. cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30173.	2014-01-09 04:44:28 +00:00
Ralph Castain	bf453a2575	Reference the correct variable...sigh Refs trac:4059 This commit was SVN r30163. The following Trac tickets were found above: Ticket 4059 --> https://svn.open-mpi.org/trac/ompi/ticket/4059	2014-01-08 22:36:39 +00:00
Ralph Castain	80497d73cf	Need to mark the daemon as alive so that exit commands are properly routed during abnormal terminations. Also, remove stale references to the "selected oob component" as we no longer require only one component be selected cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30162.	2014-01-08 22:35:48 +00:00
Ralph Castain	d5647394d8	Initialize variable so dash-host option gets correctly parsed cmr=v1.7.4:reviewer=rolfv This commit was SVN r30159.	2014-01-08 15:17:16 +00:00
Ralph Castain	e724d0d12d	Ensure comm_spawn'd jobs get treated the same wrt setting default mapping directives Refs trac:4059 This commit was SVN r30158. The following Trac tickets were found above: Ticket 4059 --> https://svn.open-mpi.org/trac/ompi/ticket/4059	2014-01-08 15:16:22 +00:00
Ralph Castain	fb650aed0c	Fix how we transfer mapping directives to the job, ensuring that directives that can be given outside of a mapping policy (e.g., oversubscribe and no-use-local) are retained. cmr=v1.7.4:reviewer=jsquyres:subject=Fix how we transfer mapping directives to the job This commit was SVN r30155.	2014-01-08 04:25:43 +00:00
Ralph Castain	bc75250951	Cleanup the sensor framework close - existing code was using incorrect object type. Don't start sensors if sample rate is zero. Don't add zero-byte data from resusage as it means nothing was measured. cmr=v1.7.4:reviewer=hjelmn This commit was SVN r30150.	2014-01-08 02:38:56 +00:00
Jeff Squyres	13b29cff2c	This commit compliements/completes r30140. r30140 made all the configury/Makefile.am changes; this commit renames the internal installdirs.h framework struct field names to match the configry macro names: * pkgdatdir -> ompidatadir * pkglibdir -> ompilibdir * pkgincludedir -> ompiincludedir This commit was SVN r30145. The following SVN revision numbers were found above: r30140 --> open-mpi/ompi@8b778903d8	2014-01-07 23:36:33 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
Mike Dubman	40aadab85f	re-enable map-by dist after last refactoring in rmaps, map-by dist:hca was disabled. reverting it back found/fixed by Elena, reviewed by miked cmr=v1.7.4:reviewer=ompi-rm1.7 This commit was SVN r30118.	2014-01-04 20:44:41 +00:00
Ralph Castain	9a855ff58e	Update sensor component for new OOB calls cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30117.	2014-01-03 22:37:15 +00:00
Ralph Castain	3f2b3c53ea	Ensure that rankfile-provided allocations are correctly handled Fixes trac:4043 cmr=v1.7.4:reviewer=jsquyres:subject=Ensure that rankfile-provided allocations are correctly handled This commit was SVN r30106. The following Trac tickets were found above: Ticket 4043 --> https://svn.open-mpi.org/trac/ompi/ticket/4043	2014-01-02 16:07:16 +00:00
Ralph Castain	d5a5caa7e0	Restore the bycore mpirun option for backward compatibility Refs trac:4044 cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30103. The following Trac tickets were found above: Ticket 4044 --> https://svn.open-mpi.org/trac/ompi/ticket/4044	2014-01-02 04:16:43 +00:00
Ralph Castain	a8a91b374e	Update component-level selection comments to match latest revisions cmr=v1.7.4:reviewer=rhc This commit was SVN r30087.	2013-12-25 19:12:43 +00:00
Ralph Castain	d049731911	Add pubsub pmi component to list of components to avoid when indirect launch used Refs trac:4032 This commit was SVN r30083. The following Trac tickets were found above: Ticket 4032 --> https://svn.open-mpi.org/trac/ompi/ticket/4032	2013-12-25 16:25:37 +00:00
Ralph Castain	85f2429819	Ensure the ipv6 lists get initialized and finalized cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30081.	2013-12-24 17:24:39 +00:00
Ralph Castain	2e08219cac	Silence the valgrind report from the OOB Refs trac:4033 This commit was SVN r30080. The following Trac tickets were found above: Ticket 4033 --> https://svn.open-mpi.org/trac/ompi/ticket/4033	2013-12-24 17:06:45 +00:00
Ralph Castain	81df8d09ca	Avoid use of PMI components when launched via mpirun as this is just unnecessary overhead that can cause confusion. cmr=v1.7.4:reviewer=miked:subject=Avoid use of PMI components when launched via mpirun This commit was SVN r30078.	2013-12-24 16:32:31 +00:00
Ralph Castain	01ee5f380b	Remove debug - problem has been identified Refs trac:4026 This commit was SVN r30075. The following Trac tickets were found above: Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026	2013-12-24 15:22:18 +00:00
Jeff Squyres	ce02002a5e	Free minor memory leak / squash valgrind still-reachable warning. cmr=v1.7.5:reviewer=rhc This commit was SVN r30071.	2013-12-24 11:04:38 +00:00
Ralph Castain	38f46641ce	Ensure the recv handler has been initialized Refs trac:4026 This commit was SVN r30068. The following Trac tickets were found above: Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026	2013-12-24 06:09:45 +00:00
Ralph Castain	bb80625a8a	Add missing var initialization cmr=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r30063.	2013-12-24 00:02:22 +00:00
Ralph Castain	65228d3571	Don't use "size_t" for the nbytes field in the header - use uint32_t to ensure that ntohl/htonl correctly match it Refs trac:4026 This commit was SVN r30062. The following Trac tickets were found above: Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026	2013-12-23 21:39:49 +00:00
Ralph Castain	7d8c0459a4	Attempt to debug hang that is hitting some environments. Posting to 1.7.4 as a placeholder for the eventual solution cmr=v1.7.4:reviewer=rhc This commit was SVN r30060.	2013-12-23 19:57:05 +00:00
Nathan Hjelm	3be4536d9b	Cleanup various leaks in ompi_info reported by valgrind. cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30058.	2013-12-23 17:47:43 +00:00
George Bosilca	24879f9def	Code cleanup while chasing valgrind complaints. This commit was SVN r30048.	2013-12-21 23:28:14 +00:00
George Bosilca	38cbaeaa82	Try to impose a little bit of consistency on how we parse lists of modules by enforcing the use of OPAL list accessors. This commit was SVN r30045.	2013-12-21 23:23:33 +00:00
Ralph Castain	264150872b	Add a bunch of debug output to the OOB connection completion code so we can track down a handshake problem. Available in optimized builds as well as debug ones by setting -mca oob_base_verbose 10 No review will be required as this is just debug code for those helping us debug the 1.7.4 release candidates cmr-=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r30043.	2013-12-21 16:09:26 +00:00
Ralph Castain	9c768df8b8	Resolve an unexpected behavior in hostfile allocations. Now that we filter allocations to determine what will be used for mapping, let the initial global pool be the union of nodes from all sources (default hostfile, hostfiles, and dash-hosts). Each app will filter down to only those specified for it using its own hostfile and dash-host options. cmr=v1.7.4:reviewer=jsquyres:subject=Resolve an unexpected behavior in hostfile allocations This commit was SVN r30040.	2013-12-21 01:38:27 +00:00
Adrian Reber	53a70fe87f	Trying to get the C/R code to compile again. (send__nb) This patch changes all send/send_buffer occurrences in the C/R code to send_nb/send_buffer_nb. The new code compiles but does not work. Changes from V1: #ifdef out the code (so it is preserved for later re-design) * marked the broken C/R code with ENABLE_FT_FIXED Changes from V2: * just replace the blocking calls with the non-blocking calls * all #ifdef's introduced in V1 are gone * send_* returns error code or ORTE_SUCCESS (not the number of bytes) This commit was SVN r30036.	2013-12-20 21:58:28 +00:00
Adrian Reber	a3813d37c7	Trying to get the C/R code to compile again. (recv__nb) This patch changes all recv/recv_buffer occurrences in the C/R code to recv_nb/recv_buffer_nb. The old code is still there but disabled using ifdefs (ENABLE_FT_FIXED). The new code compiles but does not work. Changes from V1: #ifdef out the code (so it is preserved for later re-design) * marked the broken C/R code with ENABLE_FT_FIXED Changes from V2: * only #ifdef out the code where the behaviour is changed (used to be blocking; now non-blocking) This commit was SVN r30035.	2013-12-20 21:05:40 +00:00
Ralph Castain	31248c0985	Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match. Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node. Refs trac:4003 This commit was SVN r30033. The following Trac tickets were found above: Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003	2013-12-20 20:42:39 +00:00
Ralph Castain	71b52fe861	Ensure that comm_spawn'd procs get user-specified forwarded envars Thanks to Tim Miller for reporting the regression from the 1.6 series cmr=v1.7.4:reviewer=jsquyres:subject=Ensure that comm_spawn'd procs get user-specified forwarded envars This commit was SVN r30012.	2013-12-20 14:47:35 +00:00
Ralph Castain	d47d2569f3	We stripped the process info packing routine to minimize message size when sending the launch message, but tools still require all the info. So modify the tool-hnp handshake to explicitly add the missing info Refs trac:3992 This commit was SVN r29989. The following Trac tickets were found above: Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992	2013-12-19 20:42:20 +00:00
Ralph Castain	55cd65b149	Don't warn about binding (process and/or memory) if the node cannot do it or if we would overload, but it wasn't specifically requested by the user (i.e., it is the result of the default policy). Instead, just don't bind and quietly move along. Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff. cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings This commit was SVN r29978.	2013-12-19 16:31:45 +00:00
Ralph Castain	9b32dacb6c	Ensure we don't abort if a tool cannot send a message - the orte/util/comm library used by tools to query mpirun knows how to handle this situation. Refs trac:3992 This commit was SVN r29975. The following Trac tickets were found above: Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992	2013-12-19 07:10:36 +00:00
Ralph Castain	6239e64f36	Further cleanup of orte-ps so it doesn't abort when hitting a stale HNP - only report that event once and just keep working. Refs trac:3992 This commit was SVN r29974. The following Trac tickets were found above: Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992	2013-12-19 03:28:05 +00:00
Ralph Castain	bf5e314f76	Tools require their own errmgr and state components so they can handle any errors that occur in, for example, communication . Refs trac:3992 This commit was SVN r29972. The following Trac tickets were found above: Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992	2013-12-19 01:49:33 +00:00
Ralph Castain	3aaca16faa	Silence warnings that are no longer valid Refs trac:3992 This commit was SVN r29970. The following Trac tickets were found above: Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992	2013-12-19 00:40:36 +00:00
Ralph Castain	c5956e7b8c	Convert debug output to opal_output_verbose Thanks to Tetsuya Mishima for reporting it cmr=v1.7.4:reviewer=jsquyres This commit was SVN r29969.	2013-12-19 00:36:15 +00:00
Ralph Castain	39957df08e	Fixes trac:3963. Fix the tool ess procedure so it opens and selects the OOB framework, and have the OOB TCP module update the route to new connections (the routed modules know what to do). Thanks to Dave Love and Ashley Pittman for pointing out the problem. cmr=v1.7.4:reviewer=jsquyres:subject=Fix tool communications with mpirun This commit was SVN r29959. The following Trac tickets were found above: Ticket 3963 --> https://svn.open-mpi.org/trac/ompi/ticket/3963	2013-12-18 23:13:46 +00:00
Ralph Castain	77553f72be	Per this email thread: http://www.open-mpi.org/community/lists/devel/2013/12/13412.php fix the backtrace function to avoid async issues. Thanks to Takahiro Kawashima for the patch This commit was SVN r29955.	2013-12-18 17:57:37 +00:00
Ralph Castain	ab4636c47b	Per email on devel list, change the default rank-by to slot unless map-by <obj> is specified, in which case use rank-by <obj> Refs trac:3977 This commit was SVN r29945. The following Trac tickets were found above: Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977	2013-12-18 00:48:50 +00:00
Ralph Castain	53cd00fe16	By setting a default mapping/ranking/binding policy that wasn't "none", we introduced a problem for users of the Mac and any other machine where sockets aren't defined and/or binding is not supported. Fix that by checking to see if the user specified the failing policy - if not, then fall back to the old map/rank by slot and no binding. Refs trac:3977 This commit was SVN r29933. The following Trac tickets were found above: Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977	2013-12-17 14:50:10 +00:00
Adrian Reber	b42aad44a3	Trying to get the C/R code to compile again. This patch includes various fixes all over the C/R code which are hard to group like the other patches. Changes from V1: * explain why mca_base_component_distill_checkpoint_ready no longer works * compare return result of opal functions with OPAL_* values Changes from V2: * use orte_rml_oob_ft_event() instead of referencing through the modules * properly protect variable (thanks to --enable-picky) This commit was SVN r29922.	2013-12-16 15:35:28 +00:00
Ralph Castain	8b6d117541	Per the OMPI devel conference that changed our default behaviors: * default to bind-to core * map-by slot if np=2 * map-by socket (balance across sockets on each node) if np > 2 * map-by <obj> will imply rank-by <obj> by default (leave default binding as above) Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values This commit was SVN r29919.	2013-12-15 17:25:54 +00:00
Jeff Squyres	770bf77149	Fix some minor memory leaks in error code paths. Many thanks to Tom Fogal for the patch. cmr=v1.7.4:reviewer=rhc:subject=Fix minor memory leaks in error code paths This commit was SVN r29905.	2013-12-14 00:41:21 +00:00
Jeff Squyres	0ab48ad0d2	Fix some annoying flex warnings that have been there for years. Many thanks to Tom Fogal for the initial patch. cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings This commit was SVN r29904.	2013-12-14 00:36:12 +00:00
Jeff Squyres	2e7653e4c2	Add missing argv.h includes. Noticed these as part of #3694: external libevent's don't cause argv.h to automatically get included. Refs trac:3694 This commit was SVN r29897. The following Trac tickets were found above: Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694	2013-12-13 21:17:36 +00:00
Brian Barrett	121ca26c59	Per discussion at Develoepr's Meeting, remove Solaris threads support. Solaris will just fall back to pthreads, which should be no problem. This commit was SVN r29893.	2013-12-13 20:07:11 +00:00
Ralph Castain	0e81959aae	Cleanup mindist error messages - already patched in 1.7 This commit was SVN r29869.	2013-12-12 15:30:29 +00:00
Ralph Castain	1ff12362da	Cleanup merge conflict that was incorrectly committed This commit was SVN r29851.	2013-12-09 20:20:14 +00:00
Ralph Castain	83e59e6761	Once again, the Slurm folks have decided to redefine their envars, reversing what they had previously told us to do. So cleanup the Slurm allocation code, and also adjust to a change in srun behavior that now aborts a job if the ntasks-per-node doesn't get specified when ORTE calls it, but the user specified it when getting an allocation. Sigh. cmr=v1.7.4:reviewer=miked:subject=Update Slurm allocation and launch This commit was SVN r29849.	2013-12-09 17:58:46 +00:00
Mike Dubman	c208b858e7	improve error messages in mindist cmr=v1.7.4:reviewer=ompi-rm1.7 This commit was SVN r29846.	2013-12-09 06:34:38 +00:00
Ralph Castain	f2c49c6c19	Fix the map-by object mapper to handle cpus-per-proc by accounting for the request when computing the number of procs to put on each object. This ensures that the binding routine doesn't automatically overload the cores. cmr=v1.7.4:reviewer=jsquyres This commit was SVN r29843.	2013-12-08 16:59:25 +00:00
Ralph Castain	9604f36c3b	Specify units for the job completion timeout This commit was SVN r29839.	2013-12-08 04:51:58 +00:00
Ralph Castain	62c9e5c64c	Really is better if we output a message indicating that the job was aborted due to hitting the execution time limit Refs trac:3960 This commit was SVN r29833. The following Trac tickets were found above: Ticket 3960 --> https://svn.open-mpi.org/trac/ompi/ticket/3960	2013-12-07 15:33:56 +00:00
Ralph Castain	d44e4a311f	Per request from Dave Goodell, add support for MPIEXEC_TIMEOUT - if set in the environment, terminate the job after the specified number of seconds has passed. Equivalent to MPICH functionality. cmr=v1.7.4:reviewer=dgoodell:subject=add support for MPIEXEC_TIMEOUT This commit was SVN r29831.	2013-12-07 01:58:32 +00:00
Jeff Squyres	ed9aba3896	This patch fixes error: void value not ignored as it ought to be in the C/R code by ignoring the return value of functions which no longer return a value (only void). Signed-off-by: Adrian Reber <adrian.reber@hs-esslingen.de> This commit was SVN r29816.	2013-12-06 14:40:10 +00:00
Ralph Castain	fb59b6b875	Silence compiler warning when --disable-orte-static-ports This commit was SVN r29783.	2013-12-03 01:53:31 +00:00
Ralph Castain	617a0edbb8	Fix hostfile parsing for the case where RMs count slots by listing the node multiple times. Thanks to Tetsuya Mishima for rep[orting the problem and providing a patch. cmf=v1.7.4:reviewer=rhc This commit was SVN r29748.	2013-11-24 16:17:52 +00:00
Ralph Castain	7c23a5ad65	Fix headers when building with ft enabled. Thanks to Adrian Reber for the patch! This commit was SVN r29743.	2013-11-23 22:58:32 +00:00
Ralph Castain	7480beb7f0	Per request from Nathan, add an offset value to the job struct so we can construct a "global rank" that spans multiple jobs during dynamic launch operations. Store a new ORTE_DB_GLOBAL_RANK value for each process in the database, and ensure that we share our own value during connect_accept so both sides can see it. This isn't being used yet - just enabling Nathan to do what he needs. *** NOTE: any use of the OMPI_DB_GLOBAL_RANK database key must be protected by #ifdef OMPI_DB_GLOBAL_RANK as not all RTE's will define this key. *** This commit was SVN r29708.	2013-11-14 17:01:43 +00:00
Ralph Castain	0f420f3676	Add a little debug cmr:v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r29705.	2013-11-14 04:22:59 +00:00
Ralph Castain	561c1830f7	Cleanup the radix MCA params - the max connections has no relationship to the max fd's on a node. It solely is used to improve the performance of small jobs by avoiding unnecessary inter-daemon routing. Refs trac:3917 This commit was SVN r29704. The following Trac tickets were found above: Ticket 3917 --> https://svn.open-mpi.org/trac/ompi/ticket/3917	2013-11-14 04:19:06 +00:00
Ralph Castain	540d38bc12	Per patch from Jeff, treat the case where someone transfers an archived file of an unknown type. This can't actually happen as this is on the receive end, and the error would have been treated and rejected on the send side. Still, practice defensive programming. cmr:v1.7.4:reviewer=rhc:subject=Practice defensive programming This commit was SVN r29699.	2013-11-13 23:49:26 +00:00
Jeff Squyres	f4e647538c	mca_routed_radix_component.max_connections is unsigned; it can never be <0 cmr=v1.7.4:reviewer=hjelmn This commit was SVN r29694.	2013-11-13 19:37:24 +00:00
Jeff Squyres	038116e4b8	orte_iof_base.output_limit is an unsigned type; just initialize it to INT_MAX cmr=v1.7.4:reviewer=rhc This commit was SVN r29693.	2013-11-13 19:36:43 +00:00
Mike Dubman	840e2cb4a2	mindist: cosmetic, use fallback to byslot if unable to read NUMA info, small fix. fixed by Elena, reviewed by Ralph/Mike cmr=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r29679.	2013-11-13 09:26:40 +00:00
Ralph Castain	5b38259264	Ouch - remove an extraneous line. Thanks to Tetsuya Mishima for reporting it cmr=v1.7.4:reviewer=rhc:subject=Remove extraneous line from OOB This commit was SVN r29677.	2013-11-13 04:02:05 +00:00
Ralph Castain	f1e510154c	Revise the launch timeout detection so we don't mistakenly declare "failed to start". Recognize that timeout is at the per-job level, and define the timeout param as a total value instead of seconds/daemon as it otherwise can get to be an enormous (and useless) number. Resolves problems in loop_spawn where the timer was incorrectly firing and killing the overall job. cmr=v1.7.4:reviewer=hjelmn This commit was SVN r29661.	2013-11-11 23:50:40 +00:00
Ralph Castain	46f633883b	Correct the error check on rml.send cmr=v1.7.4:reviewer=jsquyres This commit was SVN r29660.	2013-11-11 23:23:12 +00:00
Ralph Castain	e35ad23176	Correctly compute usage for dynamic spawns when binding is invoked. Ensure we correctly account for existing process usage on each node when computing bindings during dynamic spawns. cmr=v1.7.4:reviewer=hjelmn:subject=Correctly compute usage for dynamic spawns when binding is invoked This commit was SVN r29649.	2013-11-10 00:38:01 +00:00
Joshua Ladd	d594ffbfc7	Backing out Elena's patch - abstraction violation This commit was SVN r29645.	2013-11-08 13:12:07 +00:00
Joshua Ladd	da3e272fdd	Adds a check in the mindist mapper for whether or not the user asks for a specific device. This patch was submited by Elena Elkina and reviewed by Josh Ladd and should be added to cmr=v1.7.4:reviewer=jladd This commit was SVN r29644.	2013-11-08 04:28:53 +00:00
Brian Barrett	6d7a1fbb82	Move opal_portable_platform.h to opal/include/opal, which is where it really should have been all along and fix one place that uses the file Update opal_portable_platform.h with changes to mpi_portable_platform.h made in r29608. Make mpi_portable_platform.h a symlink to opal_portable_platform.h, so that they won't get out of sync. I'd like to remove mpi_portable_platform.h, but we don't automatically add -I${includedir}/openmpi/ to make that sane from a header include point of view, so that's future work. This commit was SVN r29618. The following SVN revision numbers were found above: r29608 --> open-mpi/ompi@b71bd51cdd	2013-11-06 17:12:26 +00:00
Ralph Castain	fb0940a9d9	Add a couple of useful tests This commit was SVN r29539.	2013-10-28 13:24:16 +00:00
Ralph Castain	8c5c7d0db4	Correct a bug in handling of oob_tcp_if_include/exclude addresses by using the kernel index instead of the raw index of the interface. Refs trac:3696 This commit was SVN r29522. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-10-26 00:47:14 +00:00
Ralph Castain	604970a1a2	Initialize orte_coprocessors hash table to NULL. Delay coprocessor detection on HNP until after node topology final definition in case rmaps changes it. Minor spacing change. Refs trac:3847 This commit was SVN r29504. The following Trac tickets were found above: Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847	2013-10-24 00:08:47 +00:00
Ralph Castain	f5920e9312	Revert r29489. This function only executes in the HNP. In orte/mca/ess/hnp/ess_hnp_module.c, we already check for local coprocessors and add them to the hash table if found. Thus, r29489 simply overwrote what was already present. The data for each remote daemon is added later in the daemon callback function. Only the HNP retains info in the hash table. If it is desirable to have each daemon retain its own coprocessor info, then this must be done in orte/mca/ess/base/ess_base_std_orted.c. This commit was SVN r29497. The following SVN revision numbers were found above: r29489 --> open-mpi/ompi@2e2794fa15	2013-10-23 22:35:24 +00:00
Nathan Hjelm	2e2794fa15	Fix coprocessor detection by always adding the local daemon's co-processors to the hash table. Tested and working on a system with 2 Xeon Phi co-processors. cmr=v1.7.4:ticket=3847:reviewer=ompi-rm1.7 This commit was SVN r29489. The following Trac tickets were found above: Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847	2013-10-23 15:56:23 +00:00
Ralph Castain	7c86a843c8	Silence compiler warning This commit was SVN r29477.	2013-10-23 04:13:36 +00:00
Ralph Castain	960a255e7f	Do some cleanup of the --without-hwloc build - no need to work on coprocessors since we can't detect them anyway, cleanup some unused variables in the ppr mapper This commit was SVN r29476.	2013-10-23 01:45:21 +00:00
Ralph Castain	25a84c7f0a	Fix build --without-hwloc This commit was SVN r29453.	2013-10-19 23:12:33 +00:00
Ralph Castain	b12167abef	Per a good suggestion from Jeff, make the coprocessor mapping more scalable by using a hash table to cache the coprocessor list, and then do a single pass thru the nodes at the end to assign hostid's. Refs trac:3847 This commit was SVN r29439. The following Trac tickets were found above: Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847	2013-10-14 22:01:48 +00:00
Ralph Castain	24c811805f	************************************************************** This change contains a non-mandatory modification of the MPI-RTE interface. Anyone wishing to support coprocessors such as the Xeon Phi may wish to add the required definition and underlying support ************************************************************** Add locality support for coprocessors such as the Intel Xeon Phi. Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host. So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following: 1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board 2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions 3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future. 4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time. 5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored. 6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set. cmr:v1.7.4:reviewer=hjelmn This commit was SVN r29435.	2013-10-14 16:52:58 +00:00
Ralph Castain	9902748108	*** THIS INCLUDES A SMALL CHANGE IN THE MPI-RTE INTERFACE *** Fix two problems that surfaced when using direct launch under SLURM: 1. locally store our own data because some BTLs want to retrieve it during add_procs rather than use what they have internally 2. cleanup MPI_Abort so it correctly passes the error status all the way down to the actual exit. When someone implemented the "abort_peers" API, they left out the error status. So we lost it at that point and always exited with a status of 1. This forces a change to the API to include the status. cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch This commit was SVN r29405.	2013-10-08 18:37:59 +00:00
Ralph Castain	9389592e05	Fix --without-hwloc build This commit was SVN r29399.	2013-10-08 15:02:47 +00:00
Ralph Castain	2bd2284b93	Add a useful test and update another This commit was SVN r29370.	2013-10-04 15:21:40 +00:00
Ralph Castain	697fb253fa	Minor modification to test code This commit was SVN r29357.	2013-10-04 03:11:31 +00:00
Ralph Castain	f4f2287958	Singletons currently start out by spawning an HNP - this is required solely in the cases where the singleton subsequently calls MPI_Comm_spawn or publishes port info without support from an external orte-server. In all other cases, the HNP is of no value and can actually be a detriment by creating additional overhead on the node. This is particularly concerning for async operations where processes may begin as singletons and then dynamically wireup to perform pt2pt communications. So we now allow singletons to start on their own, only spawning an HNP when initiating an operation that actually requires it. cmr:v1.7.4:reviewer=jsquyres This commit was SVN r29354.	2013-10-04 02:58:26 +00:00
Nathan Hjelm	11722457ce	Fix typo in grpcomm_pmi_module.c that was giving the wrong locality for direct launched jobs. Refs trac:3824 This commit was SVN r29348. The following Trac tickets were found above: Ticket 3824 --> https://svn.open-mpi.org/trac/ompi/ticket/3824	2013-10-03 14:38:45 +00:00
Ralph Castain	2121e9c01b	Fix an issue regarding use of PMI when running processes and tools that don't need or want to use it. We build PMI support based on configuration settings and library availability. However, tools such as mpirun don't need it, and definitely shouldn't be using it. Ditto for procs launched by mpirun. We used to have a way of dealing with this - we had the PMI component check to see if the process was the HNP or was launched by an HNP. Sadly, moving the OPAL db framework removed that ability as OPAL has no notion of HNPs or proc type. So add a boolean flag to the db_base_select API that allows us to restrict selection to "local" components. This gives the PMI component the ability to reject itself as required. W e then need to pass that param into the ess_base_std_app call so it can pass it all down. This commit was SVN r29341.	2013-10-02 19:03:46 +00:00
Ralph Castain	5ec422dbc1	Correctly compute num local peers when launched via mpirun This commit was SVN r29327.	2013-10-02 01:46:09 +00:00
Ralph Castain	71a24d6e74	Add some debug This commit was SVN r29326.	2013-10-02 01:37:02 +00:00
Ralph Castain	c6b7d9d027	Fix variable declaration This commit was SVN r29324.	2013-10-02 01:08:51 +00:00
Ralph Castain	fcb381c2e2	Minor cleanup - should behave the same, but just cleanup the variable names to avoid confusion This commit was SVN r29323.	2013-10-02 00:10:36 +00:00
Ralph Castain	d565a76814	Do some cleanup of the way we handle modex data. Identify data that needs to be shared with peers in my job vs data that needs to be shared with non-peers - no point in sharing extra data. When we share data with some process(es) from another job, we cannot know in advance what info they have or lack, so we have to share everything just in case. This limits the optimization we can do for things like comm_spawn. Create a new required key in the OMPI layer for retrieving a "node id" from the database. ALL RTE'S MUST DEFINE THIS KEY. This allows us to compute locality in the MPI layer, which is necessary when we do things like intercomm_create. cmr:v1.7.4:reviewer=rhc:subject=Cleanup handling of modex data This commit was SVN r29274.	2013-09-27 00:37:49 +00:00
Ralph Castain	6522963b9c	Flag that a daemon has been launched when it reports back to the HNP so we avoid re-launching it on spawns against dynamic allocations cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29245.	2013-09-25 16:58:19 +00:00
Ralph Castain	23c8848157	Only connect the first time thru the Torque launch, remove stale code cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29227.	2013-09-22 23:53:57 +00:00
Ralph Castain	400c68ed0f	Fix a segfault when a topology file is given to use in place of the one detected by mpirun itself. In that situation, the rmaps framework replaces the opal_hwloc_topology structure - but since that occurs after mpirun has set the node->topology field, we lose that definition. So don't set the node->topology field until after the rmaps framework has been opened. Does not need to go to 1.7 branch as that ordering is different. -This line, and those below, will be ignored-- M orte/mca/ess/hnp/ess_hnp_module.c This commit was SVN r29225.	2013-09-21 19:47:41 +00:00
Jeff Squyres	758cd25fff	Move the MCA / MPI_T level of the LAMA component down to 5 (from 9). This commit was SVN r29214.	2013-09-20 15:23:27 +00:00
George Bosilca	273d66d0f2	The MPI_Intercomm_create test was broken, as the remote peer was always considered as being 1 (instead of count). This commit was SVN r29207.	2013-09-18 16:47:54 +00:00
Ralph Castain	865a7028f8	Per patch from George, with a few minor cleanups. Correctly address the complete exchange of required wireup information in Intercomm_create so all procs in the resulting communicator know how to talk to each other. Refs trac:29166 This commit was SVN r29200. The following Trac tickets were found above: Ticket 29166 --> https://svn.open-mpi.org/trac/ompi/ticket/29166	2013-09-18 02:01:30 +00:00
Ralph Castain	99611ac1d2	Revert r29166 in favor of a better solution from George This commit was SVN r29199. The following SVN revision numbers were found above: r29166 --> open-mpi/ompi@497c7e6abb	2013-09-18 01:41:26 +00:00
George Bosilca	9e6c3c0646	Save the error code. This commit was SVN r29196.	2013-09-17 23:50:11 +00:00
Ralph Castain	2680bff88e	The function orte_iof_base_setup_prefork attempts to create a pty for child stdout and falls back to plain pipe if openpty fails. Child uses the 'usepty' flag to decide whether to treat this descriptor as a pty or as a pipe. Set 'usepty' flag to 0 upon openpty failure to inform the child that it isn't dealing with a pty even though pty has been requested. Thanks to Michal Peclo for reporting it and providing a patch. cmr:v1.7.3:reviewer=jsquyres cmr:v1.6.6:reviewer=jsquyres This commit was SVN r29169.	2013-09-15 15:33:51 +00:00
Ralph Castain	b64c8dafd8	Cleanup some errors in pubsub - must set the active flag before posting the recv in case the message has already arrived Refs trac:3696 This commit was SVN r29167. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-09-15 15:26:32 +00:00
Ralph Castain	497c7e6abb	Fixes trac:2904 The intercomm "merge" function can create a linkage between procs that was not reflected anywhere in a modex, and so at least some of the procs in the resulting communicator don't know how to talk to some of the new communicator's peers. For example, consider the case where: 1. parent job A comm_spawns a process (job B) - these processes exchange modex and can communicate 2. parent job A now comm_spawns another process (job C) - again, these can communicate, but the proc in C knows nothing of B 3. do an intercomm merge across the communicators created by the two comm_spawns. This puts B and C into the same communicator, but they know nothing about how to talk to each other as they were not involved in any exchange of contact info. Hence, collectives on that communicator now fail. This fix adds an API to the ompi/dpm framework that (a) exchanges the modex info across the procs in the merge to ensure all procs know how to communicate, and (b) calls add_procs to give the btl's a chance to select transports to any new procs. cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29166. The following Trac tickets were found above: Ticket 2904 --> https://svn.open-mpi.org/trac/ompi/ticket/2904	2013-09-15 15:00:40 +00:00
Ralph Castain	eb132f923b	Check for bozo error of negative np for an app as this will cause ORTE to spin forever. cmr:v1.7.3:reviewer=jsquyres:subject=Check for negative np cmr:v1.6.6:reviewer=jsquyres:subject=Check for negative np This commit was SVN r29157.	2013-09-11 19:21:22 +00:00
Ralph Castain	2a116ecdfc	Fix a race condition created when two processes attempt to send to each other at the same time. This causes both processes to start connection procedures, resulting in a c onflict that can cause messages to be lost. Add detection of this condition, and have both processes cancel their connect operations. The process with the higher rank will reconnect, while the lower rank process will simply wait for the connection to be created. Refs trac:3696 This commit was SVN r29139. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-09-06 05:15:25 +00:00
Ralph Castain	e8697de521	Deal with PGI compilers on the Mac by initializing a global variable. cmr:v1.6.6:reviewer=jsquyres cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29129.	2013-09-05 21:40:50 +00:00
Ralph Castain	13ae51a91b	Protect against possible race conditions and threads by ensuring that rml send always occurs inside an event. cmr:v1.7.4:reviewer=jsquyres:subject=Protect against race conditions in rml send This commit was SVN r29128.	2013-09-05 01:16:32 +00:00
Ralph Castain	d32dfc96be	Use the rankfile to obtain list of nodes for VM launch if/when rankfile is given. cmr:v1.7.3:reviewer=jsquyres:subject=Obtain VM nodes from rankfile This commit was SVN r29119.	2013-09-04 16:37:30 +00:00
Ralph Castain	d9f0505952	Fix the lama verbose outputs so they don't segfault if someone asks for verbose output, but isn't using lama cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29108.	2013-09-03 17:55:35 +00:00
Ralph Castain	2bfa99e945	If a rankfile is given and the number of procs not specified in the mpirun cmd line, then set the number of procs to the number of ranks in the rankfile cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29104.	2013-09-02 15:04:40 +00:00
Ralph Castain	43d1cd92ac	Ensure we activate the "daemons launched" state when only the HNP is left or else we will hang. cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29094.	2013-08-29 22:50:51 +00:00
Dave Goodell	d17f104e7a	oob: squash some valgrind warnings These warnings were harmless, but they appeared even for simple programs like single-process runs of `ring_c`. This commit was SVN r29093.	2013-08-29 21:08:44 +00:00
Ralph Castain	12d4f45b5e	Silence warning: oob_tcp_connection.c: In function 'mca_oob_tcp_peer_accept': oob_tcp_connection.c:725:9: warning: variable 'cmpval' set but not used [-Wunused-but-set-variable] Refs trac:3696 This commit was SVN r29091. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-08-29 20:56:05 +00:00
Ralph Castain	7a7cfdd519	A little cleanup - the base function to sort numa lists must return something or you get a warning about non-void function returning without value, so cleanup the return values. Ensure the mindist module actually checks for a return of "error" so it won't segfault, and have it emit a polite message when that happens. cmr:v1.7.3:reviewer=jladd This commit was SVN r29089.	2013-08-29 20:01:06 +00:00
Ralph Castain	c71e760e6c	The modex code was unfortunately written solely for PMI1 when updated to minimize calls to PMI_get - add the required PMI2 code This commit was SVN r29084.	2013-08-28 23:52:32 +00:00
Joshua Ladd	1802aabf1a	Add support for autodetecting a MLNX HCA in the rmaps min distance feature. In this way, .ini files distributed with software stacks need not specify a particular HCA but instead may select the key word auto which will automatically select the discovered device. To use this feature, simply pass the keyword auto instead of a specific device name, --mca rmaps_base_dist_hca auto. If more than one card is installed, the mapper will inform the user of this and, at this point, the user will then need to specify which card via the normal route, e.g. --mca rmaps_base_dist_hca <dev_name>. This should be added to \ncmr=v1.7.4:reviewer=rhc:subject=Autodetect logic for min dist mapping This commit was SVN r29079.	2013-08-28 16:23:33 +00:00
Ralph Castain	7125143253	Replace missing opal_db open/select that was apparently lost on a prior merge. Thanks to Nathan for pointing it out This commit was SVN r29072.	2013-08-27 19:42:31 +00:00
George Bosilca	65a362909d	Can't see how it works ... Thanks Thomas and Arm for the patch. This commit was SVN r29066.	2013-08-27 16:52:24 +00:00
Ralph Castain	c9a25465da	Don't need the number of nodes any more for PMI Refs trac:3729 This commit was SVN r29064. The following Trac tickets were found above: Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729	2013-08-23 18:36:51 +00:00
Ralph Castain	6d24b34940	Extend the dpm framework API to support persistent accept/connect operations: * paccept - establish a persistent listening port for async connect requests * pconnect - async connect to remote process that has posted a paccept port. Provides a timeout mechanism, and allows the underlying implementation to retry until timeout * pclose - shuts down a prior paccept posting Includes example programs paccept.c and pconnect.c in orte/test/mpi. New MPI extension interfaces coming... This commit was SVN r29063.	2013-08-23 18:02:50 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Ralph Castain	63d10d2d0d	Fix typo Refs trac:3729 This commit was SVN r29057. The following Trac tickets were found above: Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729	2013-08-22 16:05:58 +00:00
Ralph Castain	16c5b30a1f	Since the calls to "PMI get" scale by number of procs (not nodes), it makes more sense to have the MCA param be the cutoff based on number of procs. Also, it occurred to me that this shouldn't impact the nidmap process as that is built and circulated when we launch via mpirun, not during direct launch. So shift the cutoff param to the MPI layer, and have it solely determine whether or not we call modex_recv on the hostname. If comm_world is of size greater than the cutoff, then we don't automatically retrieve the hostname when we build the ompi_proc_t for a process - instead, we fill the hostname entry on first call to modex_recv for that process. The param is now "ompi_hostname_cutoff=N", where N=number of procs for cutoff. Refs trac:3729 This commit was SVN r29056. The following Trac tickets were found above: Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729	2013-08-22 03:40:26 +00:00
Ralph Castain	45e695928f	As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time: * add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit. * remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL" * modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded * removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base * added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames This commit was SVN r29052.	2013-08-20 18:59:36 +00:00
Ralph Castain	9aebd7e281	Ensure we register the nidmap verbosity in mpirun, and add some debug This commit was SVN r29042.	2013-08-18 23:40:32 +00:00
Ralph Castain	611d7f9f6b	When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require. This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times. Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes: * upon first request for data, have the OPAL db pmi component fetch and decode all the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally * reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test * reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued). Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time. This commit was SVN r29040.	2013-08-17 00:49:18 +00:00
Ralph Castain	b2d86e1857	Silence uninitialized var warning This commit was SVN r29034.	2013-08-16 21:35:51 +00:00
Ralph Castain	b34bff8792	Cleanup warning This commit was SVN r29032.	2013-08-16 21:14:35 +00:00
Ralph Castain	bebe852057	Add new info key for publish that allows user to designate that the port is to be unique - i.e., to return an error if that service has already been published. Default is to overwrite This commit was SVN r29028.	2013-08-14 04:21:17 +00:00
Ralph Castain	72b5e867ab	Correct shutdown ordering - rml must go last This commit was SVN r29027.	2013-08-14 04:20:17 +00:00

1 2 3 4 5 ...

4212 Коммитов