openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	f5495ace48	coll/ml: update the coll_ml_enable_fragmentation variable to support the option to autodetect whether fragmentation should be enabled cmr=v1.7.3:ticket=trac:3717 This commit was SVN r29065. The following Trac tickets were found above: Ticket 3717 --> https://svn.open-mpi.org/trac/ompi/ticket/3717	2013-08-27 16:36:54 +00:00
Ralph Castain	c9a25465da	Don't need the number of nodes any more for PMI Refs trac:3729 This commit was SVN r29064. The following Trac tickets were found above: Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729	2013-08-23 18:36:51 +00:00
Ralph Castain	6d24b34940	Extend the dpm framework API to support persistent accept/connect operations: * paccept - establish a persistent listening port for async connect requests * pconnect - async connect to remote process that has posted a paccept port. Provides a timeout mechanism, and allows the underlying implementation to retry until timeout * pclose - shuts down a prior paccept posting Includes example programs paccept.c and pconnect.c in orte/test/mpi. New MPI extension interfaces coming... This commit was SVN r29063.	2013-08-23 18:02:50 +00:00
Rolf vandeVaart	96457df9bc	Fix compile errors created from changeset 29058. This commit was SVN r29061.	2013-08-22 18:25:23 +00:00
Jeff Squyres	63ac60864b	Refs trac:3730 Turns out that AC_CHECK_DECLS is one of the "new style" Autoconf macros that #defines the output to be 0 or 1 (vs. #define'ing or #undef'ing it). So don't check for "#if defined(..."; just check for "#if ...". This commit was SVN r29059. The following Trac tickets were found above: Ticket 3730 --> https://svn.open-mpi.org/trac/ompi/ticket/3730	2013-08-22 17:44:20 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Ralph Castain	63d10d2d0d	Fix typo Refs trac:3729 This commit was SVN r29057. The following Trac tickets were found above: Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729	2013-08-22 16:05:58 +00:00
Ralph Castain	16c5b30a1f	Since the calls to "PMI get" scale by number of procs (not nodes), it makes more sense to have the MCA param be the cutoff based on number of procs. Also, it occurred to me that this shouldn't impact the nidmap process as that is built and circulated when we launch via mpirun, not during direct launch. So shift the cutoff param to the MPI layer, and have it solely determine whether or not we call modex_recv on the hostname. If comm_world is of size greater than the cutoff, then we don't automatically retrieve the hostname when we build the ompi_proc_t for a process - instead, we fill the hostname entry on first call to modex_recv for that process. The param is now "ompi_hostname_cutoff=N", where N=number of procs for cutoff. Refs trac:3729 This commit was SVN r29056. The following Trac tickets were found above: Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729	2013-08-22 03:40:26 +00:00
Rolf vandeVaart	504fa2cda9	Fix support in smcuda btl so it does not blow up when there is no CUDA IPC support between two GPUs. Also make it so CUDA IPC support is added dynamically. Fixes ticket 3531. This commit was SVN r29055.	2013-08-21 21:00:09 +00:00
Rolf vandeVaart	96fdb060ea	Fix compile errors and warnings from changeset 29052. This commit was SVN r29054.	2013-08-21 19:01:54 +00:00
Steve Wise	67fe3f23ed	Use the HAVE_DECL_IBV_LINK_LAYER_ETHERNET macro. Commit r27211 added ifdef checks for #define HAVE_IBV_LINK_LAYER_ETHERNET, which is incorrect. The correct #define is HAVE_DECL_IBV_LINK_LAYER_ETHERNET. This broke OMPI over iWARP. This fixes trac:3726 and should be added to cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29053. The following SVN revision numbers were found above: r27211 --> open-mpi/ompi@b27862e5c7 The following Trac tickets were found above: Ticket 3726 --> https://svn.open-mpi.org/trac/ompi/ticket/3726	2013-08-20 20:00:46 +00:00
Ralph Castain	45e695928f	As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time: * add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit. * remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL" * modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded * removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base * added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames This commit was SVN r29052.	2013-08-20 18:59:36 +00:00
Ralph Castain	f49f879b2d	Set ignore This commit was SVN r29051.	2013-08-20 18:29:27 +00:00
Jeff Squyres	31283aaffd	Revert r29049 because it is incorrectly overriding the results of an AC config macro. This commit was SVN r29050. The following SVN revision numbers were found above: r29049 --> open-mpi/ompi@b82f89e78b	2013-08-20 01:21:41 +00:00
Steve Wise	b82f89e78b	Define HAVE_IBV_LINK_LAYER_ETHERNET if it is supported in libibverbs. Commit r27211 missed a config file change which broke ompi over iwarp transports. This fixes trac:3726 and should be added to cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29049. The following SVN revision numbers were found above: r27211 --> open-mpi/ompi@b27862e5c7 The following Trac tickets were found above: Ticket 3726 --> https://svn.open-mpi.org/trac/ompi/ticket/3726	2013-08-19 22:27:51 +00:00
Jeff Squyres	b30ad28276	Remove some unused variables and an unused goto label. This commit was SVN r29044.	2013-08-19 16:18:35 +00:00
Ralph Castain	e0cfcf376f	Okay, fix it so it works both --disable-mpi-profile and --enable-mpi-profile. I'm not sure why mpit's library has to be treated differently, but it seems that it needs some special care to work in both scenarios Refs trac:3725 This commit was SVN r29043. The following Trac tickets were found above: Ticket 3725 --> https://svn.open-mpi.org/trac/ompi/ticket/3725	2013-08-19 14:48:23 +00:00
Ralph Castain	9aebd7e281	Ensure we register the nidmap verbosity in mpirun, and add some debug This commit was SVN r29042.	2013-08-18 23:40:32 +00:00
Ralph Castain	b730c9540e	Fix --disable-mpi-profile option so it can build cmr:v1.7.3:reviewer=hjelmn This commit was SVN r29041.	2013-08-18 18:22:34 +00:00
Ralph Castain	611d7f9f6b	When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require. This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times. Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes: * upon first request for data, have the OPAL db pmi component fetch and decode all the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally * reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test * reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued). Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time. This commit was SVN r29040.	2013-08-17 00:49:18 +00:00
Ralph Castain	991e59a58a	Update MCA param in platform file This commit was SVN r29039.	2013-08-16 22:18:22 +00:00
Ralph Castain	11a3743b21	Cleanup unitialized var warnings This commit was SVN r29038.	2013-08-16 21:49:17 +00:00
Ralph Castain	90cfd139cf	Cleanup error - need an "and" instead of an "or" This commit was SVN r29037.	2013-08-16 21:41:59 +00:00
Ralph Castain	f8a72feb25	Silence unitialized var warning This commit was SVN r29036.	2013-08-16 21:39:28 +00:00
Ralph Castain	c5f395d36a	Silence unitialized var warnings This commit was SVN r29035.	2013-08-16 21:37:35 +00:00
Ralph Castain	b2d86e1857	Silence uninitialized var warning This commit was SVN r29034.	2013-08-16 21:35:51 +00:00
Ralph Castain	c74c54e18d	Cleanup uninitialized warnings This commit was SVN r29033.	2013-08-16 21:23:09 +00:00
Ralph Castain	b34bff8792	Cleanup warning This commit was SVN r29032.	2013-08-16 21:14:35 +00:00
Ralph Castain	7947cec8fa	Cleanup warning This commit was SVN r29031.	2013-08-16 21:13:40 +00:00
Ralph Castain	33beab5918	Avoid segfault due to uninitialized variable This commit was SVN r29030.	2013-08-16 21:10:38 +00:00
Ralph Castain	7d2e3028d6	Add unique info_key to documentation This commit was SVN r29029.	2013-08-14 04:24:17 +00:00
Ralph Castain	bebe852057	Add new info key for publish that allows user to designate that the port is to be unique - i.e., to return an error if that service has already been published. Default is to overwrite This commit was SVN r29028.	2013-08-14 04:21:17 +00:00
Ralph Castain	72b5e867ab	Correct shutdown ordering - rml must go last This commit was SVN r29027.	2013-08-14 04:20:17 +00:00
Ralph Castain	8a4c5f4957	Attempt to plug a few memory leaks by ensuring we finalize all things opened during init. However, we are still leaking memory like a sieve in param registration and hwloc. This commit was SVN r29026.	2013-08-14 02:03:00 +00:00
Ralph Castain	318467c04f	If we only have global scope, then don't fall back to looking at local scope if the lookup target wasn't found else we will hang This commit was SVN r29025.	2013-08-13 04:45:33 +00:00
Nathan Hjelm	6c75699068	coll/ml: fix typo in assert that could cause an abort in debug builds. cmr=v1.7.3:reviewer=manjugv This commit was SVN r29024.	2013-08-12 14:31:44 +00:00
Ralph Castain	2c286bccca	Fix typo - thanks to Michael Schlottke for pointing it out cmr:v1.7.3:reviewer=brbarret This commit was SVN r29015.	2013-08-11 18:16:21 +00:00
Jeff Squyres	c09ec204ad	Change usNIC BTL to always use small fragments when there is a non-contiguous converter. We can't "convert on the fly" because the # of bytes requested may not divide evenly into the convertor data type. This commit was SVN r29014.	2013-08-11 17:04:13 +00:00
Nathan Hjelm	b2e773ece3	Fix debugger support for direct-launched jobs. The orte rte component checks the orte_standalone_operation to decide if it should wait for a message from the hnp or wait on the debugger. This variable needed to be set to true in ess/pmi to enable the correct path when direct launching. cmr=v1.7.3:reviewer=rhc cmr=v1.6.6:reviewer=rhc This commit was SVN r29013.	2013-08-09 22:39:41 +00:00
Nathan Hjelm	524e9b148b	MCA/base: add a function to unload a component without closing it for components that have been registered but not opened This commit was SVN r29012.	2013-08-09 20:16:08 +00:00
Nathan Hjelm	841ed962f6	fix MCA variable and component system leaks cmr=v1.7.3:reviewer=rhc This commit was SVN r29011.	2013-08-09 19:50:28 +00:00
Nathan Hjelm	47320713bb	coll/ml: do not register variables in open and fix a bug in the coll/ml parser cmr=v1.7.3:reviewer=pasha This commit was SVN r29010.	2013-08-09 17:55:30 +00:00
Rolf vandeVaart	cd72024a3c	Refactor some of the initialization code. This commit was SVN r29009.	2013-08-09 14:54:17 +00:00
Edgar Gabriel	f7391eca23	Lazy open does not work for the addproc sharedfp component since it starts by spawning a process using MPI_Comm_spawn. For this, the first operation has to be collective which we can not guarantuee outside of the MPI_File_open operation. This commit was SVN r29008.	2013-08-06 20:48:20 +00:00
Edgar Gabriel	e348f5567f	add unignore for me. This commit was SVN r29007.	2013-08-06 20:47:08 +00:00
Jeff Squyres	d5e6b50d83	Add bullet about MPI_Get_address in the "mpi" module This commit was SVN r29006.	2013-08-06 15:23:36 +00:00
Jeff Squyres	ed130dcef0	Add missing Fortran mpi module TKR implementation for MPI_Get_address This commit was SVN r29005.	2013-08-06 15:08:00 +00:00
George Bosilca	837b3363fe	Silence few warnings. This commit was SVN r29004.	2013-08-06 09:38:30 +00:00
George Bosilca	710d3836d5	Use a recv convertor for the pack external case. This commit was SVN r29003.	2013-08-06 09:09:42 +00:00
George Bosilca	30b910b54d	More info in the debug mode. This commit was SVN r29002.	2013-08-06 09:08:43 +00:00

1 2 3 4 5 ...

18568 Коммитов