openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	50b4b92758	hostname may not NULL-terminate the string if the buffer is too small. Thanks to Kevin M. Hildebrand for catching this. cmr=v1.7.3:reviewer=jsquyres This commit was SVN r29412.	2013-10-09 15:49:18 +00:00
Ralph Castain	9902748108	*** THIS INCLUDES A SMALL CHANGE IN THE MPI-RTE INTERFACE *** Fix two problems that surfaced when using direct launch under SLURM: 1. locally store our own data because some BTLs want to retrieve it during add_procs rather than use what they have internally 2. cleanup MPI_Abort so it correctly passes the error status all the way down to the actual exit. When someone implemented the "abort_peers" API, they left out the error status. So we lost it at that point and always exited with a status of 1. This forces a change to the API to include the status. cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch This commit was SVN r29405.	2013-10-08 18:37:59 +00:00
Ralph Castain	6951976bc4	Update struct member name - this is why we put such things in the trunk before moving them to a branch, especially when coming from outside :-) Refs trac:3830 This commit was SVN r29390. The following Trac tickets were found above: Ticket 3830 --> https://svn.open-mpi.org/trac/ompi/ticket/3830	2013-10-07 15:43:43 +00:00
Ralph Castain	13cd112fb4	Avoid use of interface in struct because cygwin compilers apparently object (go figure) This commit was SVN r29388.	2013-10-06 23:55:38 +00:00
Ralph Castain	2d2307b6eb	Modify libevent to support cygwin - patch will be pushed upstream This commit was SVN r29387.	2013-10-06 23:53:31 +00:00
Ralph Castain	2121e9c01b	Fix an issue regarding use of PMI when running processes and tools that don't need or want to use it. We build PMI support based on configuration settings and library availability. However, tools such as mpirun don't need it, and definitely shouldn't be using it. Ditto for procs launched by mpirun. We used to have a way of dealing with this - we had the PMI component check to see if the process was the HNP or was launched by an HNP. Sadly, moving the OPAL db framework removed that ability as OPAL has no notion of HNPs or proc type. So add a boolean flag to the db_base_select API that allows us to restrict selection to "local" components. This gives the PMI component the ability to reject itself as required. W e then need to pass that param into the ess_base_std_app call so it can pass it all down. This commit was SVN r29341.	2013-10-02 19:03:46 +00:00
George Bosilca	43b4d76913	Fix a corner case for a non-contiguous send convertor where the convertor accepted to be set to a position in the middle of a predefined datatype. Once set there is was unable to provide the second part of the datatype. This fix force the convertor to be aligned on predefined datatypes boundaries for any non-contiguous send convertor. This commit was SVN r29285.	2013-09-28 16:46:21 +00:00
Ralph Castain	d565a76814	Do some cleanup of the way we handle modex data. Identify data that needs to be shared with peers in my job vs data that needs to be shared with non-peers - no point in sharing extra data. When we share data with some process(es) from another job, we cannot know in advance what info they have or lack, so we have to share everything just in case. This limits the optimization we can do for things like comm_spawn. Create a new required key in the OMPI layer for retrieving a "node id" from the database. ALL RTE'S MUST DEFINE THIS KEY. This allows us to compute locality in the MPI layer, which is necessary when we do things like intercomm_create. cmr:v1.7.4:reviewer=rhc:subject=Cleanup handling of modex data This commit was SVN r29274.	2013-09-27 00:37:49 +00:00
Ralph Castain	9aeba777fa	Ensure we don't enter into an infinite loop looking for the PML modex key if it isn't present. The PMI implementation will load ALL modex keys when the first key is queried, so the hash db component can safely return "not found" if a subsequent key isn't present. The PML modex_recv needs to assume everything is okay if the modex recv fails to return a value. cmr:v1.7.3:reviewer=jladd:subject=Prevent infinite loop when PML modex not found This commit was SVN r29243.	2013-09-25 16:04:00 +00:00
Ralph Castain	63da76ad5f	Silence warnings about pointer casting This commit was SVN r29226.	2013-09-22 19:21:29 +00:00
Nathan Hjelm	01839db11b	MCA/base: When encounter a duplicate file value don't free the filename. Stale code. cmr=v1.7.3:reviewer=rhc This commit was SVN r29224.	2013-09-21 18:53:36 +00:00
Nathan Hjelm	bc31773523	Fix bug in db/pmi when a stored byte object has a NULL pointer. cmr=v1.7.3:reviewer=samuel This commit was SVN r29215.	2013-09-20 15:38:36 +00:00
Ralph Castain	7bc20866fd	C standard stipulates that we have to cast the function to another of the same type to avoid unexpected behavior. We aren't using the function in this case, but Nick correctly points out that we should follow the standard regardless. Refs trac:3755 This commit was SVN r29210. The following Trac tickets were found above: Ticket 3755 --> https://svn.open-mpi.org/trac/ompi/ticket/3755	2013-09-19 18:42:21 +00:00
Ralph Castain	7de493fc02	Silence a warning about an address that can never be NULL - libevent needs to deal with the situation where the user may have compiled the code on a system where this function is present, but executes it on one where it isn't. Thus, a compile-time test isn't adequate. Pushed upstream. cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29201.	2013-09-18 02:03:01 +00:00
George Bosilca	55273f1c98	Cleanup spaces, nothing else. This commit was SVN r29197.	2013-09-18 00:07:58 +00:00
Nathan Hjelm	7929fb9dea	Cleanup complex datatypes and update datatypes and operator code to use C99. This commit changes the underlying opal complex datatypes to match the C99 types: float _Complex, double _Complex, and long double _Complex. The fortran and C++ types now are aliases to these basic types instead of structure types. The operators in ompi/mca/op/base now work on only the C99 types and the fortran types use these operators if the fortran type matches a C complex type (this should almost always be the case.) C99 is not is use in both the datatype and operator code and should make the code both cleaner and much less fragile. This commit was SVN r29193.	2013-09-17 17:49:42 +00:00
Ralph Castain	2245ac0e7e	Don't error log the return from setup_pmi as it can indicate that the process wasn't launched via srun or its equivalent. cmr:v1.7.3:reviewer=jladd This commit was SVN r29180.	2013-09-17 02:26:46 +00:00
Ralph Castain	e01953b440	Per Brice, silence warning on old Linux kernels Refs trac:3744 This commit was SVN r29179. The following Trac tickets were found above: Ticket 3744 --> https://svn.open-mpi.org/trac/ompi/ticket/3744	2013-09-16 15:43:33 +00:00
Ralph Castain	845e92bc5d	Remove the old version of hwloc. Update the new one to reflect the official release dates. Refs trac:3744 This commit was SVN r29154. The following Trac tickets were found above: Ticket 3744 --> https://svn.open-mpi.org/trac/ompi/ticket/3744	2013-09-10 16:30:13 +00:00
Joshua Ladd	b3f88c4a1d	Per the RFC schedule, this commit adds Mellanox OpenSHMEM to the trunk. It does not yet run on OSX or with CM PML for an MTL other than MXM. Mellanox is aware of these issues and is in the process of resolving them. This should be added to \ncmr=v1.7.4:subject=Move OSHMEM to 1.7.4:reviewer=rhc This commit was SVN r29153.	2013-09-10 15:34:09 +00:00
Ralph Castain	46ed907003	Correctly handle list of cores specified in the rankfile - i.e., a rankfile entry such as: rank 0=foo slot=0:0-1;1:0,1 cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29152.	2013-09-08 02:04:29 +00:00
Alex Margolin	50a3c01a0f	fixed build without thread support This commit was SVN r29145.	2013-09-06 19:03:19 +00:00
Ralph Castain	0d7fb932f1	Remove build product file Refs trac:3744 This commit was SVN r29120. The following Trac tickets were found above: Ticket 3744 --> https://svn.open-mpi.org/trac/ompi/ticket/3744	2013-09-04 16:38:22 +00:00
Ralph Castain	6011a4d29c	As per the telecon, update hwloc to v1.7.2 so we can add MIC support. Ignore hwloc1.5.2 component for now until this tests out - will remove it then. cmr:v1.7.4:reviewer=jsquyres This commit was SVN r29107.	2013-09-03 16:23:42 +00:00
Ralph Castain	7a7cfdd519	A little cleanup - the base function to sort numa lists must return something or you get a warning about non-void function returning without value, so cleanup the return values. Ensure the mindist module actually checks for a return of "error" so it won't segfault, and have it emit a polite message when that happens. cmr:v1.7.3:reviewer=jladd This commit was SVN r29089.	2013-08-29 20:01:06 +00:00
Ralph Castain	3516348aad	We don't need to report errors in pmi_setup as it is possible that PMI is available, but that we weren't launched under it (e.g., we launched via mpirun). cmr:v1.7.3:reviewer=hjelmn:subject="Silence unnecessary PMI error msgs" This commit was SVN r29086.	2013-08-29 16:35:20 +00:00
Joshua Ladd	1802aabf1a	Add support for autodetecting a MLNX HCA in the rmaps min distance feature. In this way, .ini files distributed with software stacks need not specify a particular HCA but instead may select the key word auto which will automatically select the discovered device. To use this feature, simply pass the keyword auto instead of a specific device name, --mca rmaps_base_dist_hca auto. If more than one card is installed, the mapper will inform the user of this and, at this point, the user will then need to specify which card via the normal route, e.g. --mca rmaps_base_dist_hca <dev_name>. This should be added to \ncmr=v1.7.4:reviewer=rhc:subject=Autodetect logic for min dist mapping This commit was SVN r29079.	2013-08-28 16:23:33 +00:00
Nathan Hjelm	77a41e1ca9	ompi_info: mark the variables from disabled components as disabled in the output of ompi_info. A variable is disabled if its component will never be selected due to a component selection parameter (eg. -mca btl self). The old behavior of ompi_info was to not print these parameters at all. Now we print the parameters. After some discussion with George it was decided that there needed to be some way to see what parameters will not be used. This was the comprimise. This commit also fixes a bug and a typo in the pvar sytem. The enum_count value in mca_base_pvar_dump was being used without being set. The full_name in mca_base_pvar_t was not being used. cmr=v1.7.3:ticket=trac:3734 This commit was SVN r29078. The following Trac tickets were found above: Ticket 3734 --> https://svn.open-mpi.org/trac/ompi/ticket/3734	2013-08-28 16:03:23 +00:00
Nathan Hjelm	3744c5e0be	Also check for /dev/mic/scif when deciding whether to enable the Linux memory hooks. The MIC has a /dev/scif device and the host has /dev/mic/scif. I do not know if this device exists when no MIC is connected. cmr=v1.7.4:ticket=trac:3733:reviewer=jsquyres This commit was SVN r29071. The following Trac tickets were found above: Ticket 3733 --> https://svn.open-mpi.org/trac/ompi/ticket/3733	2013-08-27 19:40:02 +00:00
Nathan Hjelm	c699ee7812	Update the ompi_info man page with information about variable levels and improve the behavior of ompi_info. This commit changes the default behavior of ompi_info --all when a level is not specified. Instead of assuming level 1 in this case we now assume level 9. This change is due to feedback from the community after the introduction of the --level option. I also added a new option: --selected-only. This option will limit the displayed variables to components that can be selected (ie. if there is a selection parameter set-- btl self,sm) cmr=v1.7.3:reviewer=jsquyres This commit was SVN r29070.	2013-08-27 19:11:37 +00:00
Nathan Hjelm	6e1656279e	Enable the use of the Linux memory hooks on Intel MIC. cmr=v1.7.3:reviewer=jsquyres This commit was SVN r29069.	2013-08-27 18:25:18 +00:00
Nathan Hjelm	2da64eb719	Fix compilation of the MPI tools information interface when profiling is enabled and fix a bug in the handling of watermark performance variables. cmr=v1.7.3:ticket=trac:3725:reviewer=jsquyres This commit was SVN r29068. The following Trac tickets were found above: Ticket 3725 --> https://svn.open-mpi.org/trac/ompi/ticket/3725	2013-08-27 18:19:18 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Ralph Castain	611d7f9f6b	When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require. This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times. Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes: * upon first request for data, have the OPAL db pmi component fetch and decode all the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally * reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test * reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued). Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time. This commit was SVN r29040.	2013-08-17 00:49:18 +00:00
Ralph Castain	11a3743b21	Cleanup unitialized var warnings This commit was SVN r29038.	2013-08-16 21:49:17 +00:00
Ralph Castain	7947cec8fa	Cleanup warning This commit was SVN r29031.	2013-08-16 21:13:40 +00:00
Ralph Castain	8a4c5f4957	Attempt to plug a few memory leaks by ensuring we finalize all things opened during init. However, we are still leaking memory like a sieve in param registration and hwloc. This commit was SVN r29026.	2013-08-14 02:03:00 +00:00
Ralph Castain	2c286bccca	Fix typo - thanks to Michael Schlottke for pointing it out cmr:v1.7.3:reviewer=brbarret This commit was SVN r29015.	2013-08-11 18:16:21 +00:00
Nathan Hjelm	524e9b148b	MCA/base: add a function to unload a component without closing it for components that have been registered but not opened This commit was SVN r29012.	2013-08-09 20:16:08 +00:00
Nathan Hjelm	841ed962f6	fix MCA variable and component system leaks cmr=v1.7.3:reviewer=rhc This commit was SVN r29011.	2013-08-09 19:50:28 +00:00
George Bosilca	30b910b54d	More info in the debug mode. This commit was SVN r29002.	2013-08-06 09:08:43 +00:00
Nathan Hjelm	be1bd4661c	db/pmi: speed up modex by caching pmi data internally This commit was SVN r29001.	2013-08-05 22:31:50 +00:00
Nathan Hjelm	88cadc552d	Make opal/db/pmi use as few PMI keys as possible. This commit reintroduces key compression into the pmi db. This feature compresses the keys stored into the component into a small number of PMI keys by serializing the data and base64 encoding the result. This will avoid issues with Cray PMI which restricts us to ~ 3 PMI keys per rank. This commit was SVN r28993.	2013-08-03 01:06:59 +00:00
Ralph Castain	3c8aa7c296	Don't just hardcode the max length of the PMI name as it could be wrong. PMI2 installations seem to be retaining at least some of the PMI functions, so use the one to get the max name length. This commit was SVN r28962.	2013-07-30 14:13:15 +00:00
Nathan Hjelm	99adeb7f6e	Fix support for complex datatypes when fortran is not available but _Complex is This commit was SVN r28951.	2013-07-25 19:08:21 +00:00
Nathan Hjelm	ebbb32120a	MCA/base: variable system updates - Use an enumerator to handle bool values. - Fix a leak in the variable enumerator. - Fix a leak in an orte parameter. This commit was SVN r28949.	2013-07-25 15:42:01 +00:00
Ralph Castain	41f97931e9	Need to include module-level CPPFLAGS so it can build This commit was SVN r28947.	2013-07-24 23:07:43 +00:00
Nathan Hjelm	c4c69b4ddf	MPI-3: add support for large counts using derived datatypes Add support for MPI_Count type and MPI_COUNT datatype and add the required MPI-3 functions MPI_Get_elements_x, MPI_Status_set_elements_x, MPI_Type_get_extent_x, MPI_Type_get_true_extent_x, and MPI_Type_size_x. This commit adds only the C bindings. Fortran bindins will be added in another commit. For now the MPI_Count type is define to have the same size as MPI_Offset. The type is required to be at least as large as MPI_Offset and MPI_Aint. The type was initially intended to be a ssize_t (if it was the same size as a long long) but there were issues compiling romio with that definition (despite the inclusion of stddef.h). I updated the datatype engine to use size_t instead of uint32_t to support large datatypes. This will require some review to make sure that 1) the changes are beneficial, 2) nothing was broken by the change (I doubt anything was), and 3) there are no performance regressions due to this change. Increase the maximum number of predifined datatypes to support MPI_Count Put common get_elements code to ompi/datatype/ompi_datatype_get_elements.c Update MPI_Get_count to reflect changes in MPI-3 (return MPI_UNDEFINED when the count is too large for an int) This commit was SVN r28932.	2013-07-23 15:35:14 +00:00
Ralph Castain	6c1a140e99	Per request from Nathan, add a "commit" API to the opal db framework. This allows him to aggregate keys to work around the Cray's severe PMI limitations This commit was SVN r28917.	2013-07-22 22:57:16 +00:00
Jeff Squyres	49b5342130	After talking with Nathan, update some comments/documentation about the new MCA var and pvar systems. This commit was SVN r28913.	2013-07-22 20:34:42 +00:00

1 2 3 4 5 ...

2320 Коммитов