This is helpful in the work for #3694: ensure that many places that
eventually end up in configure don't overly-pollute the global shell
variable space (because debugging accidental shell variable pollution
can be a real pain).
Refs trac:3694
This commit was SVN r29830.
The following Trac tickets were found above:
Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
This prevents bazillions of warnings from clang tha -Wno-long-double
isn't supported.
The warning that clang issues when it see -Wno-long-double is does not
use any of the words we were looking for, so I added "unknown" to the
list of words to look for. I also re-indented the two m4 tests so
that they're a bit more readable.
cmr=v1.7.4:reviewer=brbarret:subject=Update -Wno-long-double test to support clang
This commit was SVN r29681.
patch. See ticket #3885, comment 10 for an explination of why calling
_STRINGIFY on something that's not a numerical constant is always a bad idea.
This commit was SVN r29613.
The following SVN revision numbers were found above:
r29608 --> open-mpi/ompi@b71bd51cdd
OSX atomic support is disabled by default. Enable with --enable-osx-builtin-atomics.
Fixes trac:2120
This commit was SVN r29568.
The following Trac tickets were found above:
Ticket 2120 --> https://svn.open-mpi.org/trac/ompi/ticket/2120
Apologies for the breakage, I did my test build in the wrong window...
No reviewer.
cmr=v1.7.4:ticket=3865
This commit was SVN r29492.
The following SVN revision numbers were found above:
r29488 --> open-mpi/ompi@25dd719d4d
The following Trac tickets were found above:
Ticket 3865 --> https://svn.open-mpi.org/trac/ompi/ticket/3865
First cut does not attempt any "cross-check". As we discover compilers
which complain about __noinline__, we will add specific cross checks to
handle those cases.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29488.
Reworked ompi_info tool to be close with orte_info implementation.
ompi_info_register_types(), ompi_info_close_components() and
ompi_info_show_ompi_version() are moved to runtime/ompi_info_support.c.
Added runtime/oshmem_info_support layer that exports following api to be
used into oshmem_info tool as
oshmem_info_register_types()
oshmem_info_register_framework_params()
oshmem_info_close_components()
oshmem_info_show_oshmem_version()
These functions call ompi_info_support related interfaces as long as
Oshmem supports Open MPI/SHMEM combination.
Now orte_info/ompi_info/oshmem_info have identical implementation approach.
Possible improvement:
OSHMEM processing of --config option is the same as OMPI`s (code is duplicated).
Probably list of info_support interfaces can be extended by xxx_info_do_config().
developed by Igor, reviewed by miked
This commit was SVN r29429.
(http://www.open-mpi.org/community/lists/devel/2013/09/12889.php), I
renamed all "f77" and "f90" directory/file names to "fortran"
(including removing shmemf77 / shmemf90 wrapper compilers and
replacing them with "shmemfort").
2. Fixed several Fortran coding errors.
3. Removed lots of old/stale comments that were clearly the result of
copying from the OMPI layer and then not cleaning up afterwards (i.e.,
the comments were wholly inaccurate in the oshmem layer).
4. Removed both redundant and harmful code from oshmem_config.h.in.
5. Temporarily slave building the oshmem Fortran bindings to
--enable-mpi-fortran. This doesn't seem like a good long-term
solution, but at least you can now build all Fortran bindings (MPI +
oshmem) or not. *** SEE MY NOTE IN config/oshmem_configure_options.m4
FOR WORK THAT STILL NEEDS TO BE DONE!
This commit was SVN r29165.
configure-time dynamic allocation of flags. The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process. Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.
This commit was SVN r29100.
*** THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE ***
Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro.
***************************************************************************************
I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week.
The code is in https://bitbucket.org/rhc/ompi-oob2
WHAT: Rewrite of ORTE OOB
WHY: Support asynchronous progress and a host of other features
WHEN: Wed, August 21
SYNOPSIS:
The current OOB has served us well, but a number of limitations have been identified over the years. Specifically:
* it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code)
* we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface.
* the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients
* there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort
* only one transport (i.e., component) can be "active"
The revised OOB resolves these problems:
* async progress is used for all application processes, with the progress thread blocking in the event library
* each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on")
* multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC.
* a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions.
* opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object
* NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions
* obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel
* the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport
* routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active
* all blocking send/recv APIs have been removed. Everything operates asynchronously.
KNOWN LIMITATIONS:
* although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline
* the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker
* routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways
* obviously, not every error path has been tested nor necessarily covered
* determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when *all* transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost.
* reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways
* the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC
This commit was SVN r29058.
Commit r27211 missed a config file change which broke ompi over
iwarp transports.
This fixes trac:3726 and should be added to cmr:v1.7.3:reviewer=jsquyres
This commit was SVN r29049.
The following SVN revision numbers were found above:
r27211 --> open-mpi/ompi@b27862e5c7
The following Trac tickets were found above:
Ticket 3726 --> https://svn.open-mpi.org/trac/ompi/ticket/3726
in generated executables on systems that support it. Use
--disable-wrapper-rpath to disable this behavior. See text in
README about --disable-wrapper-rpath for more details.
This commit was SVN r28479.
The following Trac tickets were found above:
Ticket 376 --> https://svn.open-mpi.org/trac/ompi/ticket/376
A few changes were required to support this move:
1. the PMI component used to identify rte-related data (e.g., host name, bind level) and package them as a unit to reduce the number of PMI keys. This code was moved up to the ORTE layer as the OPAL layer has no understanding of these concepts. In addition, the component locally stored data based on process jobid/vpid - this could no longer be supported (see below for the solution).
2. the hash component was updated to use the new opal_identifier_t instead of orte_process_name_t as its index for storing data in the hash tables. Previously, we did a hash on the vpid and stored the data in a 32-bit hash table. In the revised system, we don't see a separate "vpid" field - we only have a 64-bit opaque value. The orte_process_name_t hash turned out to do nothing useful, so we now store the data in a 64-bit hash table. Preliminary tests didn't show any identifiable change in behavior or performance, but we'll have to see if a move back to the 32-bit table is required at some later time.
3. the db framework was a "select one" system. However, since the PMI component could no longer use its internal storage system, the framework has now been changed to a "select many" mode of operation. This allows the hash component to handle all internal storage, while the PMI component only handles pushing/pulling things from the PMI system. This was something we had planned for some time - when fetching data, we first check internal storage to see if we already have it, and then automatically go to the global system to look for it if we don't. Accordingly, the framework was provided with a custom query function used during "select" that lets you seperately specify the "store" and "fetch" ordering.
4. the ORTE grpcomm and ess/pmi components, and the nidmap code, were updated to work with the new db framework and to specify internal/global storage options.
No changes were made to the MPI layer, except for modifying the ORTE component of the OMPI/rte framework to support the new db framework.
This commit was SVN r28112.
configure test to ensure that the Fortran compiler supports BIND(C)
with LOGICAL parameters (per
http://lists.mpi-forum.org/mpi-comments/2013/02/0076.php).
This may become moot shortly -- Pathscale tells me that they intend
upgrade their compiler to support BIND(C) with default LOGICAL in the
very near term (this week?). But we still want this configure test so
that Open MPI won't even try to build the F08 bindings with older
versions of the Pathscale compilers (or any compiler that doesn't
support BIND(C) and LOGICAL parameters).
This commit was SVN r28110.
The following Trac tickets were found above:
Ticket 3523 --> https://svn.open-mpi.org/trac/ompi/ticket/3523
* Clean up ${includedir} and ${libdir} for script wrapper compilers
* Update script wrapper compilers to work like the C wrapper compilers w.r.t static and dynamic linking
* Remove the ORTE script wrapper compilers since they didn't support the ${includedir} stuff and Ralph said they weren't used anymore.
This commit was SVN r28052.
this patch has already gone into the v1.6 branch.
Leif would like to revamp the ARM support in a different way, and will
submit a patch to do so in the future.
This commit was SVN r27961.
Leif would like to revamp the ARM support in a different way, and will
submit a patch to do so in the future.
This commit was SVN r27960.
The following SVN revision numbers were found above:
r27882 --> open-mpi/ompi@8649b5eece
flags, and mca flags are kept seperate until the very end. The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro. MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.
This commit was SVN r27950.
will be wrong (because it's called via a shell subroutine rather than
an m4 macro... doh!), but it's still helpful for searching around in
config.log.
This commit was SVN r27912.
case.
Fixing this issue allows the following case to start working again:
./configure --disable-dlopen --with-hwloc=/path/to/some/external/hwloc ...
Without this fix, the wrapper compilers would include -lhwloc, but
would not include -L/path/..., so -lhwloc would not be found.
This commit was SVN r27894.
meeting, and RFCed in mid-December (#3424): we no longer build the MPI
C++ bindings by default.
The C++ bindings are still ''there'' -- starting with 1.9, we'll just
be providing a little encouragement to no longer use them.
There are no definite plans to ''remove'' the C++ bindings yet. At
the earliest, we would remove them in the next feature series after
1.9.
This commit was SVN r27755.
which is the same size as a Fortran or C integer. This resulted in configure
coming up with Fortran's MAX_INT as -2^31, which obviously isn't a positive
number. Since we found the MAX_INT using the same broken loop in a couple
places and doing it right is complicated, added a new macro that is much
more careful about sign roll-over.
During the Fortran rework between v1.6 and v1.7, the variable which
indicates whether or not Fortran is being compiled changed, so on platforms
without Fortran compilers, we were trying to determine the max value for
Fortran INTEGERS where we previously didn't. I believe this is why
bug #3374 appeared as a regression.
Finally, since the OMPI code doesn't cope with OMPI_FORTRAN_HANDLE_MAX
being negative (which was the root cause of the segfault in $3374),
add a check at the end of the OMPI_FORTRAN_GET_HANDLE_MAX macro to
ensure that OMPI_FORTRAN_HANDLE_MAX is always non-negative.
This commit was SVN r27714.
config/ directory. We split them apart a while ago in the hopes that
it would simplify things, but it didn't really (e.g., because there
were still some ompi/opal .m4 files in the top-level config/
directory, resulting in developer confusion where any given m4 macro
was defined).
So this commit consolidates them back into the top-level directory for
simplicity.
There's still (at least) two changes that would be nice to make:
1. Split any generated .m4 file (e.g., autogen-generated .m4 files)
into a separate directory somewhere so that a top-level -Iconfig/
will only get our explicitly defined macros, not the autogen stuff
(e.g., with libevent2019 needing to get the visibility macro, but
NOT all the autogen-generated inclusion of component configure.m4
files).
1. Change configure to be of the form:
{{{
# ...a small amount of preamble/setup...
OPAL_SETUP
m4_ifdef([project_orte], [ORTE_SETUP])
m4_ifdef([project_ompi], [OMPI_SETUP])
# ...a small amount of finishing stuff...
}}}
I doubt we'll ever get anything as clean as that, but that would be
the goal to shoot for.
This commit was SVN r27704.
accidentally adding component names to the OMPI_MPIEXT_ALL_SUBDIRS
variable twice; once with mpiext/ and once without. And then
prefixing all of them with mpiext/ again at the end. That wasn't
really necessary :-) -- we only need to add it once (without mpiext/)
and then prefix at the end (for consistency with the
OMPI_MPI_EXT_*_SUBDIRS handling).
This commit was SVN r26981.
did so!). Previously, I forgot to include the case of not specifying
--with-verbs -- oops! So now the OPAL_CHECK_VERBS_DIR macro will:
1. $opal_want_verbs will be set to: "yes" if --with-verbs or
--with-verbs=DIR was specified "no" if --without-verbs was
specified) "optional" if neither --with-verbs* nor --without-verbs
was specified
'''NOTE:''' --with-openib* are deprecated synonyms for --with-verbs*.
1. $opal_verbs_dir and $opal_verbs_libdir with either both be set or
both be empty.
This commit was SVN r26654.