openmpi

Автор	SHA1	Сообщение	Дата
Mike Dubman	14304c299d	add globalexit API support. it is not fully functional yet, but initial version is good enough. developed by Igor, reviewed by miked This commit was SVN r29430.	2013-10-12 19:15:36 +00:00
Mike Dubman	2141e9e6b4	tools: Add oshmem_info utility Reworked ompi_info tool to be close with orte_info implementation. ompi_info_register_types(), ompi_info_close_components() and ompi_info_show_ompi_version() are moved to runtime/ompi_info_support.c. Added runtime/oshmem_info_support layer that exports following api to be used into oshmem_info tool as oshmem_info_register_types() oshmem_info_register_framework_params() oshmem_info_close_components() oshmem_info_show_oshmem_version() These functions call ompi_info_support related interfaces as long as Oshmem supports Open MPI/SHMEM combination. Now orte_info/ompi_info/oshmem_info have identical implementation approach. Possible improvement: OSHMEM processing of --config option is the same as OMPI`s (code is duplicated). Probably list of info_support interfaces can be extended by xxx_info_do_config(). developed by Igor, reviewed by miked This commit was SVN r29429.	2013-10-12 19:03:32 +00:00
Jeff Squyres	dc822de80f	Fix typo/misspelling in variable name. This commit was SVN r29410.	2013-10-09 14:04:25 +00:00
Jeff Squyres	3fb4401dee	Remove an unused configure option, and comment that another seemingly-unused configure option is used by downstream projects. This commit was SVN r29367.	2013-10-04 14:16:09 +00:00
Mike Dubman	08efe5a338	Adopting shmem configure logic to trunk build system conventions fixed by Dinar, reviewed by miked cmr=v1.7.4:reviewer=ompi-rm1.7 This commit was SVN r29328.	2013-10-02 06:59:09 +00:00
Jeff Squyres	7941c81caa	The TargetConditionals.h check is specific to Java -- move it to ompi_setup_java.m4. This commit was SVN r29261.	2013-09-26 21:34:00 +00:00
Rolf vandeVaart	d67e3077f5	Add a check for the CUDA 6.0 version of the cuda.h header file. This commit was SVN r29250.	2013-09-26 12:46:06 +00:00
Jeff Squyres	df7654e8cf	1. Per my previous email (http://www.open-mpi.org/community/lists/devel/2013/09/12889.php), I renamed all "f77" and "f90" directory/file names to "fortran" (including removing shmemf77 / shmemf90 wrapper compilers and replacing them with "shmemfort"). 2. Fixed several Fortran coding errors. 3. Removed lots of old/stale comments that were clearly the result of copying from the OMPI layer and then not cleaning up afterwards (i.e., the comments were wholly inaccurate in the oshmem layer). 4. Removed both redundant and harmful code from oshmem_config.h.in. 5. Temporarily slave building the oshmem Fortran bindings to --enable-mpi-fortran. This doesn't seem like a good long-term solution, but at least you can now build all Fortran bindings (MPI + oshmem) or not. *** SEE MY NOTE IN config/oshmem_configure_options.m4 FOR WORK THAT STILL NEEDS TO BE DONE! This commit was SVN r29165.	2013-09-15 09:32:07 +00:00
Joshua Ladd	b3f88c4a1d	Per the RFC schedule, this commit adds Mellanox OpenSHMEM to the trunk. It does not yet run on OSX or with CM PML for an MTL other than MXM. Mellanox is aware of these issues and is in the process of resolving them. This should be added to \ncmr=v1.7.4:subject=Move OSHMEM to 1.7.4:reviewer=rhc This commit was SVN r29153.	2013-09-10 15:34:09 +00:00
Brian Barrett	16a1166884	Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a configure-time dynamic allocation of flags. The net result for platforms which only support BTL-based communication is a reduction of 8*nprocs bytes per process. Platforms which support both MTLs and BTLs will not see a space reduction, but will now be able to safely run both the MTL and BTL side-by-side, which will prove useful. This commit was SVN r29100.	2013-08-30 16:54:55 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Jeff Squyres	31283aaffd	Revert r29049 because it is incorrectly overriding the results of an AC config macro. This commit was SVN r29050. The following SVN revision numbers were found above: r29049 --> open-mpi/ompi@b82f89e78b	2013-08-20 01:21:41 +00:00
Steve Wise	b82f89e78b	Define HAVE_IBV_LINK_LAYER_ETHERNET if it is supported in libibverbs. Commit r27211 missed a config file change which broke ompi over iwarp transports. This fixes trac:3726 and should be added to cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29049. The following SVN revision numbers were found above: r27211 --> open-mpi/ompi@b27862e5c7 The following Trac tickets were found above: Ticket 3726 --> https://svn.open-mpi.org/trac/ompi/ticket/3726	2013-08-19 22:27:51 +00:00
Ralph Castain	e0cfcf376f	Okay, fix it so it works both --disable-mpi-profile and --enable-mpi-profile. I'm not sure why mpit's library has to be treated differently, but it seems that it needs some special care to work in both scenarios Refs trac:3725 This commit was SVN r29043. The following Trac tickets were found above: Ticket 3725 --> https://svn.open-mpi.org/trac/ompi/ticket/3725	2013-08-19 14:48:23 +00:00
Ralph Castain	e6199da2e7	Fixes trac:3486 - prevent opal_check_pmi from bleeding CPPFLAGS This commit was SVN r28940. The following Trac tickets were found above: Ticket 3486 --> https://svn.open-mpi.org/trac/ompi/ticket/3486	2013-07-24 03:53:23 +00:00
Nathan Hjelm	c6e586a81d	MPI-3: fortran support for large counts using derived datatypes Jeff: - Make sure not to go over 72 characters. Love Fortran! - Ensure to include 'mpif-config.h' in Type_size_x. This commit was SVN r28933.	2013-07-23 15:36:03 +00:00
Rolf vandeVaart	49663fb802	Move CUDA-aware configurary to its own file and other minor changes due to review. This commit was SVN r28832.	2013-07-17 22:12:29 +00:00
Nathan Hjelm	e6e9f2c6fd	Add profiling function definitions for MPI_T and add a missing type into mpi.h This commit was SVN r28803.	2013-07-16 16:03:33 +00:00
Ralph Castain	5f520e241b	Ensure we get both -lpmi and -lpmi2 when the libs are separate This commit was SVN r28795.	2013-07-16 14:57:18 +00:00
Ralph Castain	10ca1c1b04	Turns out that there was exactly ONE place in all of the OMPI code base that still referred to OPAL_TRACE, though a few places retained the include file for no reason. So no point in letting this sit as it is clearly an unused "feature". This commit was SVN r28789.	2013-07-14 18:57:20 +00:00
Joshua Ladd	16beaa3878	This fixes the nasty configure.m4 hack that was added long ago and not removed. My fault for not catching earlier. I've also removed the '.ompi_ignore' in coll/hcoll. Throwing this to Nathan for review. Upon successful review, this should be added to cmr:v1.7:reviewer=hjelmn This commit was SVN r28753.	2013-07-11 09:55:46 +00:00
Jeff Squyres	04da831f04	The gm BTL was removed in June of 2012; there's no need for ompi_check_gm.m4 any more. This commit was SVN r28744.	2013-07-09 20:57:12 +00:00
Brian Barrett	ecbbf888d3	* Update Portals 4 MTL's multi-md code to be a bit cleaner (no if statements in the path) and not create MDs due to boundary crossing * Add the same logic to the Coll component This commit was SVN r28733.	2013-07-08 21:27:37 +00:00
Joshua Ladd	e2b53dcf10	Adding the ompi_check_libhcoll.m4 file This commit was SVN r28695.	2013-07-01 22:45:36 +00:00
Ralph Castain	7331dd9534	Apparently, the alps configury has not been checked since we added the RTE abstraction code. Fix it now. This commit was SVN r28673.	2013-06-26 07:03:54 +00:00
Ralph Castain	e8340b6339	There is no convention out there as to how OEMs handle PMI2 functions. Some put them in their own -lpmi2 library, and some don't. Some have split the PMI2 definitions into a pmi2.h and keep the PMI-1 definitions in a separate pmi.h, and some don't. Try to handle cases more generally so at least Slurm and Cray can co-exist in peace. This commit was SVN r28672.	2013-06-26 00:43:26 +00:00
Ralph Castain	fa943dc6ff	Cleanup a few things in the revised PMI configury - we know slurm has both pmi and pmi2 libs, so just auto-detect the presence of them if the user directed us to build with pmi support. Also cleanup some changed names in the alps code This commit was SVN r28670.	2013-06-24 02:41:40 +00:00
Joshua Ladd	0b5c1f2ea8	Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd This commit was SVN r28666.	2013-06-21 15:28:14 +00:00
Rolf vandeVaart	7771857991	Adjust how cuda.h is found. It can be found in the with-cuda dir now. This commit was SVN r28555.	2013-05-22 22:04:46 +00:00
Jeff Squyres	4d9da92e60	Fixes trac:376: bu default the wrappr compilers will enable rpath support in generated executables on systems that support it. Use --disable-wrapper-rpath to disable this behavior. See text in README about --disable-wrapper-rpath for more details. This commit was SVN r28479. The following Trac tickets were found above: Ticket 376 --> https://svn.open-mpi.org/trac/ompi/ticket/376	2013-05-11 00:49:17 +00:00
Ralph Castain	353c77e659	Re-enable udcm for test purposes - appears to be working, but needs broader exposure to MTT This commit was SVN r28468.	2013-05-09 16:10:29 +00:00
Ralph Castain	12969cec81	Update orte_progress_threads configure option - no longer need to test for --enable-event-threads This commit was SVN r28449.	2013-05-05 14:48:35 +00:00
Nathan Hjelm	809db8f6a9	Check for .git directory when checking for developer build This commit was SVN r28149.	2013-03-06 17:47:27 +00:00
Rolf vandeVaart	ebe63118ac	Remove dependency on libcuda.so when building in CUDA-aware support. Dynamically load it if needed. This commit was SVN r28140.	2013-03-01 13:21:52 +00:00
Ralph Castain	a4b6fb241f	Remove all remaining vestiges of the Windows integration This commit was SVN r28137.	2013-02-28 17:31:47 +00:00
Ralph Castain	cf9796accd	Remove the old configure option for disabling full rte support - we now use the OMPI rte framework for such purposes This commit was SVN r28134.	2013-02-28 01:35:55 +00:00
Ralph Castain	bd9265c560	Per the meeting on moving the BTLs to OPAL, move the ORTE database "db" framework to OPAL so the relocated BTLs can access it. Because the data is indexed by process, this requires that we define a new "opal_identifier_t" that corresponds to the orte_process_name_t struct. In order to support multiple run-times, this is defined in opal/mca/db/db_types.h as a uint64_t without identifying the meaning of any part of that data. A few changes were required to support this move: 1. the PMI component used to identify rte-related data (e.g., host name, bind level) and package them as a unit to reduce the number of PMI keys. This code was moved up to the ORTE layer as the OPAL layer has no understanding of these concepts. In addition, the component locally stored data based on process jobid/vpid - this could no longer be supported (see below for the solution). 2. the hash component was updated to use the new opal_identifier_t instead of orte_process_name_t as its index for storing data in the hash tables. Previously, we did a hash on the vpid and stored the data in a 32-bit hash table. In the revised system, we don't see a separate "vpid" field - we only have a 64-bit opaque value. The orte_process_name_t hash turned out to do nothing useful, so we now store the data in a 64-bit hash table. Preliminary tests didn't show any identifiable change in behavior or performance, but we'll have to see if a move back to the 32-bit table is required at some later time. 3. the db framework was a "select one" system. However, since the PMI component could no longer use its internal storage system, the framework has now been changed to a "select many" mode of operation. This allows the hash component to handle all internal storage, while the PMI component only handles pushing/pulling things from the PMI system. This was something we had planned for some time - when fetching data, we first check internal storage to see if we already have it, and then automatically go to the global system to look for it if we don't. Accordingly, the framework was provided with a custom query function used during "select" that lets you seperately specify the "store" and "fetch" ordering. 4. the ORTE grpcomm and ess/pmi components, and the nidmap code, were updated to work with the new db framework and to specify internal/global storage options. No changes were made to the MPI layer, except for modifying the ORTE component of the OMPI/rte framework to support the new db framework. This commit was SVN r28112.	2013-02-26 17:50:04 +00:00
Jeff Squyres	7136dbb5b3	Fixes trac:3523. Discussed on the OMPI weekely teleconf today: add a configure test to ensure that the Fortran compiler supports BIND(C) with LOGICAL parameters (per http://lists.mpi-forum.org/mpi-comments/2013/02/0076.php). This may become moot shortly -- Pathscale tells me that they intend upgrade their compiler to support BIND(C) with default LOGICAL in the very near term (this week?). But we still want this configure test so that Open MPI won't even try to build the F08 bindings with older versions of the Pathscale compilers (or any compiler that doesn't support BIND(C) and LOGICAL parameters). This commit was SVN r28110. The following Trac tickets were found above: Ticket 3523 --> https://svn.open-mpi.org/trac/ompi/ticket/3523	2013-02-26 17:11:11 +00:00
Brian Barrett	504a6d036f	* Rather than use the extra_includes directive, add the extra includes (which is really just -I${includedir}/openmpi/ for devel headers) to CPPFLAGS, since all the other necessary -Is for devel headers (like libevent and hwloc) are added to CPPFLAGS. * Clean up ${includedir} and ${libdir} for script wrapper compilers * Update script wrapper compilers to work like the C wrapper compilers w.r.t static and dynamic linking * Remove the ORTE script wrapper compilers since they didn't support the ${includedir} stuff and Ralph said they weren't used anymore. This commit was SVN r28052.	2013-02-13 00:33:05 +00:00
Joshua Ladd	70ad711337	Backing out the Open SHMEM project This commit was SVN r28050.	2013-02-12 17:45:27 +00:00
Mike Dubman	ff384daab4	Added new project: oshmem. This commit was SVN r28048.	2013-02-12 15:33:21 +00:00
Nathan Hjelm	ab57e2ef38	update lustre configuration to use OMPI_CHECK_PACKAGE and only check for liblustreapi (not liblustre) This commit was SVN r28036.	2013-02-06 17:22:58 +00:00
Brian Barrett	d80218996f	Rather than setting up the direct call stuff in ompi_mca (which requires modifying ompi_mca for every interface that is direct called), do it in the framework's .m4 file. This commit was SVN r28031.	2013-02-04 23:26:42 +00:00
Nathan Hjelm	c28879137d	set $1_LIBS correctly on all calls to ORTE_CHECK_ALPS not just the first call This commit was SVN r28006.	2013-01-31 23:42:28 +00:00
Ralph Castain	576a5ab6f0	Actually, those calls need to be removed as we superceded them with the new opal_setup_ft.m4 This commit was SVN r28000.	2013-01-31 20:46:35 +00:00
Jeff Squyres	da5d093fdc	Can't just use "0" in an AM_CONDITIONAL -- you have to use a valid executable that can be run and returns a false exit status (which, amusingly, in this case is 1, not 0). This commit was SVN r27998.	2013-01-31 20:30:30 +00:00
Nathan Hjelm	1b7c2b0ed1	lustreapi.h includes lustre/lustre_user.h so the CPP should include -I$with_lustre/include This commit was SVN r27996.	2013-01-31 19:19:54 +00:00
Rolf vandeVaart	729caaf0cd	Remove any dependency on libcuda.so in opal layer. All changes are within OMPI_CUDA_SUPPORT code. This commit was SVN r27986.	2013-01-30 23:07:32 +00:00
Brian Barrett	cf0420aaa7	Need the WRAPPER_EXTRA_* defines for orte_info as well. This commit was SVN r27972.	2013-01-29 23:01:51 +00:00
Ralph Castain	4fdc6f1127	Revise the approach towards the FT options. Include them in a new opal_setup_ft.m4 file. Capture the ft-thread option there as well since it had slipped thru the cracks. Add a detailed comment to configure.ac that describes how to make those options visible, if desired. This commit was SVN r27969.	2013-01-29 18:30:41 +00:00

1 2 3 4 5 ...

851 Коммитов