1
1

973 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
3bd9c603ff Clean up variables used in configure with OPAL_VAR_SCOPE.
This is helpful in the work for #3694: ensure that many places that
eventually end up in configure don't overly-pollute the global shell
variable space (because debugging accidental shell variable pollution
can be a real pain).

Refs trac:3694

This commit was SVN r29830.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-06 23:40:34 +00:00
Rolf vandeVaart
d556b60b21 Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880.
Functionally, nothing changes.

This commit was SVN r29815.
2013-12-06 14:35:10 +00:00
Mike Dubman
7b7b82ef35 check for mxm API version >= 1.5
Refs: #3947

This commit was SVN r29808.
2013-12-05 12:25:52 +00:00
Oscar Vega-Gisbert
78a65a30e6 Remove pinning check from ompi_setup_java.m4
This commit was SVN r29803.
2013-12-04 23:04:52 +00:00
Mike Dubman
753d630809 libshmem to liboshmem rename voices
Refs: 3763

This commit was SVN r29785.
2013-12-03 13:16:12 +00:00
Mike Dubman
5c7557f11a reverting r29771, duplicates Brian`s fix
This commit was SVN r29772.

The following SVN revision numbers were found above:
  r29771 --> open-mpi/ompi@bc25091b61
2013-11-29 14:28:02 +00:00
Mike Dubman
bd358cb180 Apply Jeff`s patch from ticket: 3145
Refs: 3763

This commit was SVN r29751.
2013-11-25 11:02:42 +00:00
Mike Dubman
826fa635a3 add support for --without-oshmem in configure
This commit was SVN r29733.
2013-11-22 13:51:46 +00:00
Brian Barrett
aa517b09a8 Add option to disable building the OSHMEM interface. This isn't perfect, but should work for the v1.7 timeframe
This commit was SVN r29727.
2013-11-21 17:13:28 +00:00
Jeff Squyres
05f4f62db1 Update -Wno-long-double test to support clang.
This prevents bazillions of warnings from clang tha -Wno-long-double
isn't supported.

The warning that clang issues when it see -Wno-long-double is does not
use any of the words we were looking for, so I added "unknown" to the
list of words to look for.  I also re-indented the two m4 tests so
that they're a bit more readable.

cmr=v1.7.4:reviewer=brbarret:subject=Update -Wno-long-double test to support clang

This commit was SVN r29681.
2013-11-13 15:08:01 +00:00
Mike Dubman
b10bb525f1 fix oshmem_CFLAGS to meet OMPI requirements
Refs: 3763

This commit was SVN r29662.
2013-11-12 07:25:14 +00:00
Mike Dubman
ab796052b4 Applying Jeff`s comments about proper SHMEM fortran organization of files.
Refs: 3870

This commit was SVN r29651.
2013-11-11 14:26:25 +00:00
Brian Barrett
9780043456 re-apply r29608 and fix the broken configure test that broke worse with the
patch.  See ticket #3885, comment 10 for an explination of why calling
_STRINGIFY on something that's not a numerical constant is always a bad idea.

This commit was SVN r29613.

The following SVN revision numbers were found above:
  r29608 --> open-mpi/ompi@b71bd51cdd
2013-11-05 22:41:10 +00:00
Jeff Squyres
337b2f7639 Output the directory where the java files were found.
Useful for debugging / understanding exactly where OMPI will be using
Java header files / libraries from.

This commit was SVN r29596.
2013-11-05 03:31:07 +00:00
Nathan Hjelm
b922cd1583 Add support for OSX builtin atomics.
OSX atomic support is disabled by default. Enable with --enable-osx-builtin-atomics.

Fixes trac:2120

This commit was SVN r29568.

The following Trac tickets were found above:
  Ticket 2120 --> https://svn.open-mpi.org/trac/ompi/ticket/2120
2013-10-30 17:48:15 +00:00
Ralph Castain
74c6b12d67 Don't force "treat warnings as errors" in the OpenSHMEM branch as this prevents building of tarballs on machines with compilers that generate warnings when the system is built with "tarball" settings. Instead, make this a configuration option so that OSHMEM developers can set it when they work on the code, but others don't have to use such restrictions.
Refs trac:3763

This commit was SVN r29538.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2013-10-27 15:35:14 +00:00
Ralph Castain
588e7ce974 Cleanup the Java setup m4's - orte doesn't require java, and we do all compiler checks in opal
This commit was SVN r29530.
2013-10-26 19:43:32 +00:00
Joshua Ladd
2f22329ea4 This commit (hopefully) fully addresses the enabling of oshmem fortran bindings given its dependency on the building of fortran bindings in OMPI. This commit closes trac:3851.
This commit was SVN r29521.

The following Trac tickets were found above:
  Ticket 3851 --> https://svn.open-mpi.org/trac/ompi/ticket/3851
2013-10-25 15:42:10 +00:00
Mike Dubman
5cc1f3803b shmem-fortran fix, naming conventions for shmem option, examples
This commit was SVN r29518.
2013-10-25 05:25:41 +00:00
Dave Goodell
620f40b6c7 fix compile error introduced by r29488
Apologies for the breakage, I did my test build in the wrong window...

No reviewer.

cmr=v1.7.4:ticket=3865

This commit was SVN r29492.

The following SVN revision numbers were found above:
  r29488 --> open-mpi/ompi@25dd719d4d

The following Trac tickets were found above:
  Ticket 3865 --> https://svn.open-mpi.org/trac/ompi/ticket/3865
2013-10-23 16:36:52 +00:00
Dave Goodell
25dd719d4d opal: support __attribute__((__noinline__))
First cut does not attempt any "cross-check".  As we discover compilers
which complain about __noinline__, we will add specific cross checks to
handle those cases.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29488.
2013-10-23 15:52:05 +00:00
Mike Dubman
2d76a9be45 add --enable-oshmem-fortran opt to configure
This commit was SVN r29448.
2013-10-17 05:42:43 +00:00
Mike Dubman
14304c299d add globalexit API support.
it is not fully functional yet, but initial version is good enough.
developed by Igor, reviewed by miked

This commit was SVN r29430.
2013-10-12 19:15:36 +00:00
Mike Dubman
2141e9e6b4 tools: Add oshmem_info utility
Reworked ompi_info tool to be close with orte_info implementation.
ompi_info_register_types(), ompi_info_close_components() and
ompi_info_show_ompi_version() are moved to runtime/ompi_info_support.c.

Added runtime/oshmem_info_support layer that exports following api to be
used into oshmem_info tool as
oshmem_info_register_types()
oshmem_info_register_framework_params()
oshmem_info_close_components()
oshmem_info_show_oshmem_version()
These functions call ompi_info_support related interfaces as long as
Oshmem supports Open MPI/SHMEM combination.

Now orte_info/ompi_info/oshmem_info have identical implementation approach.

Possible improvement:
OSHMEM processing of --config option is the same as OMPI`s (code is duplicated).
Probably list of info_support interfaces can be extended by xxx_info_do_config().
developed by Igor, reviewed by miked

This commit was SVN r29429.
2013-10-12 19:03:32 +00:00
Jeff Squyres
dc822de80f Fix typo/misspelling in variable name.
This commit was SVN r29410.
2013-10-09 14:04:25 +00:00
Jeff Squyres
3fb4401dee Remove an unused configure option, and comment that another
seemingly-unused configure option is used by downstream projects.

This commit was SVN r29367.
2013-10-04 14:16:09 +00:00
Mike Dubman
08efe5a338 Adopting shmem configure logic to trunk build system conventions
fixed by Dinar, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29328.
2013-10-02 06:59:09 +00:00
Jeff Squyres
7941c81caa The TargetConditionals.h check is specific to Java -- move it to
ompi_setup_java.m4.

This commit was SVN r29261.
2013-09-26 21:34:00 +00:00
Rolf vandeVaart
d67e3077f5 Add a check for the CUDA 6.0 version of the cuda.h header file.
This commit was SVN r29250.
2013-09-26 12:46:06 +00:00
Jeff Squyres
df7654e8cf 1. Per my previous email
(http://www.open-mpi.org/community/lists/devel/2013/09/12889.php), I
renamed all "f77" and "f90" directory/file names to "fortran"
(including removing shmemf77 / shmemf90 wrapper compilers and
replacing them with "shmemfort").

2. Fixed several Fortran coding errors.

3. Removed lots of old/stale comments that were clearly the result of
copying from the OMPI layer and then not cleaning up afterwards (i.e.,
the comments were wholly inaccurate in the oshmem layer).

4. Removed both redundant and harmful code from oshmem_config.h.in.

5. Temporarily slave building the oshmem Fortran bindings to
--enable-mpi-fortran.  This doesn't seem like a good long-term
solution, but at least you can now build all Fortran bindings (MPI +
oshmem) or not. *** SEE MY NOTE IN config/oshmem_configure_options.m4
FOR WORK THAT STILL NEEDS TO BE DONE!

This commit was SVN r29165.
2013-09-15 09:32:07 +00:00
Joshua Ladd
b3f88c4a1d Per the RFC schedule, this commit adds Mellanox OpenSHMEM to the trunk. It does not yet run on OSX or with CM PML for an MTL other than MXM. Mellanox is aware of these issues and is in the process of resolving them. This should be added to \ncmr=v1.7.4:subject=Move OSHMEM to 1.7.4:reviewer=rhc
This commit was SVN r29153.
2013-09-10 15:34:09 +00:00
Brian Barrett
16a1166884 Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a
configure-time dynamic allocation of flags.  The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process.  Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.

This commit was SVN r29100.
2013-08-30 16:54:55 +00:00
Ralph Castain
a200e4f865 As per the RFC, bring in the ORTE async progress code and the rewrite of OOB:
*** THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE ***

Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro.

***************************************************************************************

I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week.

The code is in  https://bitbucket.org/rhc/ompi-oob2


WHAT:    Rewrite of ORTE OOB

WHY:       Support asynchronous progress and a host of other features

WHEN:    Wed, August 21

SYNOPSIS:
The current OOB has served us well, but a number of limitations have been identified over the years. Specifically:

* it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code)

* we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface.

* the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients

* there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort

* only one transport (i.e., component) can be "active"


The revised OOB resolves these problems:

* async progress is used for all application processes, with the progress thread blocking in the event library

* each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on")

* multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC.

* a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions.

* opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object

* NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions

* obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel

* the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport

* routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active

* all blocking send/recv APIs have been removed. Everything operates asynchronously.


KNOWN LIMITATIONS:

* although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline

* the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker

* routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways

* obviously, not every error path has been tested nor necessarily covered

* determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when *all* transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost.

* reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways

* the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC

This commit was SVN r29058.
2013-08-22 16:37:40 +00:00
Jeff Squyres
31283aaffd Revert r29049 because it is incorrectly overriding the results of an
AC config macro.

This commit was SVN r29050.

The following SVN revision numbers were found above:
  r29049 --> open-mpi/ompi@b82f89e78b
2013-08-20 01:21:41 +00:00
Steve Wise
b82f89e78b Define HAVE_IBV_LINK_LAYER_ETHERNET if it is supported in libibverbs.
Commit r27211 missed a config file change which broke ompi over
iwarp transports.  

This fixes trac:3726 and should be added to cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29049.

The following SVN revision numbers were found above:
  r27211 --> open-mpi/ompi@b27862e5c7

The following Trac tickets were found above:
  Ticket 3726 --> https://svn.open-mpi.org/trac/ompi/ticket/3726
2013-08-19 22:27:51 +00:00
Ralph Castain
e0cfcf376f Okay, fix it so it works both --disable-mpi-profile and --enable-mpi-profile. I'm not sure why mpit's library has to be treated differently, but it seems that it needs some special care to work in both scenarios
Refs trac:3725

This commit was SVN r29043.

The following Trac tickets were found above:
  Ticket 3725 --> https://svn.open-mpi.org/trac/ompi/ticket/3725
2013-08-19 14:48:23 +00:00
Ralph Castain
e6199da2e7 Fixes trac:3486 - prevent opal_check_pmi from bleeding CPPFLAGS
This commit was SVN r28940.

The following Trac tickets were found above:
  Ticket 3486 --> https://svn.open-mpi.org/trac/ompi/ticket/3486
2013-07-24 03:53:23 +00:00
Nathan Hjelm
c6e586a81d MPI-3: fortran support for large counts using derived datatypes
Jeff:
 - Make sure not to go over 72 characters.  Love Fortran!
 - Ensure to include 'mpif-config.h' in Type_size_x.

This commit was SVN r28933.
2013-07-23 15:36:03 +00:00
Rolf vandeVaart
49663fb802 Move CUDA-aware configurary to its own file and other minor changes due to review.
This commit was SVN r28832.
2013-07-17 22:12:29 +00:00
Nathan Hjelm
e6e9f2c6fd Add profiling function definitions for MPI_T and add a missing type into mpi.h
This commit was SVN r28803.
2013-07-16 16:03:33 +00:00
Ralph Castain
5f520e241b Ensure we get both -lpmi and -lpmi2 when the libs are separate
This commit was SVN r28795.
2013-07-16 14:57:18 +00:00
Ralph Castain
10ca1c1b04 Turns out that there was exactly ONE place in all of the OMPI code base that still referred to OPAL_TRACE, though a few places retained the include file for no reason. So no point in letting this sit as it is clearly an unused "feature".
This commit was SVN r28789.
2013-07-14 18:57:20 +00:00
Joshua Ladd
16beaa3878 This fixes the nasty configure.m4 hack that was added long ago and not removed. My fault for not catching earlier. I've also removed the '.ompi_ignore' in coll/hcoll. Throwing this to Nathan for review. Upon successful review, this should be added to cmr:v1.7:reviewer=hjelmn
This commit was SVN r28753.
2013-07-11 09:55:46 +00:00
Jeff Squyres
04da831f04 The gm BTL was removed in June of 2012; there's no need for
ompi_check_gm.m4 any more.

This commit was SVN r28744.
2013-07-09 20:57:12 +00:00
Brian Barrett
ecbbf888d3 * Update Portals 4 MTL's multi-md code to be a bit cleaner (no if statements
in the path) and not create MDs due to boundary crossing
* Add the same logic to the Coll component

This commit was SVN r28733.
2013-07-08 21:27:37 +00:00
Joshua Ladd
e2b53dcf10 Adding the ompi_check_libhcoll.m4 file
This commit was SVN r28695.
2013-07-01 22:45:36 +00:00
Ralph Castain
7331dd9534 Apparently, the alps configury has not been checked since we added the RTE abstraction code. Fix it now.
This commit was SVN r28673.
2013-06-26 07:03:54 +00:00
Ralph Castain
e8340b6339 There is no convention out there as to how OEMs handle PMI2 functions. Some put them in their own -lpmi2 library, and some don't. Some have split the PMI2 definitions into a pmi2.h and keep the PMI-1 definitions in a separate pmi.h, and some don't.
Try to handle cases more generally so at least Slurm and Cray can co-exist in peace.

This commit was SVN r28672.
2013-06-26 00:43:26 +00:00
Ralph Castain
fa943dc6ff Cleanup a few things in the revised PMI configury - we know slurm has both pmi and pmi2 libs, so just auto-detect the presence of them if the user directed us to build with pmi support.
Also cleanup some changed names in the alps code

This commit was SVN r28670.
2013-06-24 02:41:40 +00:00
Joshua Ladd
0b5c1f2ea8 Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28666.
2013-06-21 15:28:14 +00:00