- If one wants to use this solution, remember to unload the project 'orte-restart' which is currently not working for Windows.
This commit was SVN r15680.
This is because internally 'self' uses dlopen to look at the application
running to determine if it can/should be used or not.
This commit was SVN r15673.
in a callback from the event library and post an RML receive, we'll
deadlock because the event library wouldn't be entered until the
event library was not already entered. Now just protect data structures
(which we were basically already doing) instead of code, like good
threading people ;).
This commit was SVN r15585.
* General TCP cleanup for OPAL / ORTE
* Simplifying the OOB by moving much of the logic into the RML
* Allowing the OOB RML component to do routing of messages
* Adding a component framework for handling routing tables
* Moving the xcast functionality from the OOB base to its own framework
Includes merge from tmp/bwb-oob-rml-merge revisions:
r15506, r15507, r15508, r15510, r15511, r15512, r15513
This commit was SVN r15528.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r15506
r15507
r15508
r15510
r15511
r15512
r15513
asprintf and friends. This is not a failsafe; there are many cases
where this check will not be used. But at least it's something...
This commit was SVN r15500.
opal_net_get_hostname() rather than malloc, because no one was freeing
the buffer and the common use case was for printfs, where calling
free is a pain.
This commit was SVN r15494.
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
Remove the matching logic out of dynamic path into an
extra function. Add the corresponing check to the static
component path.
This commit was SVN r15458.
There are several interesting things:
1. less NFS traffic [as we potentially access less files]
2. faster loading time [in case the user tune it's execution environment]
3. (1) + (2) -> faster startup time [at least everything which do not depend on the network]
4. MX bug will go away if the pml is specified.
5. No useless BTL will be opened, which will solve few others issues.
This commit was SVN r15402.
VxWorks. Still some issues remaining, I'm sure.
Refs trac:1010
This commit was SVN r15320.
The following Trac tickets were found above:
Ticket 1010 --> https://svn.open-mpi.org/trac/ompi/ticket/1010
* Make orted.1 man page be non-descriptive because it's really an
internal command.
* Re-work the opal_wrapper man page logic a bit so that we can have a
real opal_wrapper.1 installed that says "don't look here -- look at
mpicc (etc.)"
This commit was SVN r15264.
* Remove the 'opal_mca_base_param_use_amca_sets' global variable
* Harness the fact that you can (read should) call the cmd_line functions
before initializing opal_init_util(). This pushes the MCA/GMCA/AMCA
command line options into the environment before OPAL inits and starts
to use these values. By putting the cmd_line parse before opal_init_util
in orterun and orted we only parse the *MCA parameter files once, and
correctly (alleviating the need to 'recache' the files on init.)
* Small bits of cleanup.
This commit was SVN r15219.
param says we should Also, check for != 0, rather than == 1, as there
are way too many double locks, but they'll get warned when we do the
double lock. No need to warn again, in a meaningless way.
Originally part of r15167, reverted with r15172.
This commit was SVN r15173.
The following SVN revision numbers were found above:
r15167 --> open-mpi/ompi@faa401dc47
r15172 --> open-mpi/ompi@5f16251808
OBJ_NEW
* Need to single when the passive unlock has left an expose epoch for
the win_free case
* Clean up some debugging output
* fix missing variable initialization
This commit was SVN r15167.
flex (which, incidentally, emit ''more'' warnings than earlier
versions). Grumble.
This commit was SVN r15166.
The following SVN revision numbers were found above:
r15158 --> open-mpi/ompi@57d09c10f7
Ensure that the AM_CONDITIONALs are ''always'' run, even if we
--enable-mca-no-build the paffinity/linux component.
This commit was SVN r15095.
The following Trac tickets were found above:
Ticket 1057 --> https://svn.open-mpi.org/trac/ompi/ticket/1057
single threaded builds. In its default configuration, all this does
is ensure that there's at least a good chance of threads building
based on non-threaded development (since the variable names will be
checked). There is also code to make sure that a "mutex" is never
"double locked" when using the conditional macro mutex operations.
This is off by default because there are a number of places in both
ORTE and OMPI where this alarm spews mega bytes of errors on a
simple test. So we have some work to do on our path towards
thread support.
Also removed the macro versions of the non-conditional thread locks,
as the only places they were used, the author of the code intended
to use the conditional thread locks. So now you have upper-case
macros for conditional thread locks and lowercase functions for
non-conditional locks. Simple, right? :).
This commit was SVN r15011.
re-enabling compilation of this component.
However, it still won't compile because this component provides a
module finalize function which apparently somehow got dropped from the
paffinity base. Support for the paffinity module finalize function
needs to be re-added.
This commit was SVN r14915.
* Enable VPATH builds to work (slight tweak of r14895 -- mainly
because I already had it done when George committed :-) )
* Enable "make dist" to work properly for PLPA included mode
* Update plpa.h.in
* Update svn:ignore
Took relevant changes back to the main PLPA SVN as well.
This commit was SVN r14896.
The following SVN revision numbers were found above:
r14895 --> open-mpi/ompi@bb7b04e875
Changes paffinity interface to use a cpu mask for available/preferred cpus
rather than the current coarse grained paffinity that lets the OS choose
which processor.
Macros for setting and clearing masks are provided.
Solaris and windows changes have not been made. Solaris subdirectory has some
suggested changes - however the relevant man pages for the Solaris 10 APIs
have some ambiguity regarding order in which one create and sets a processor
set. As we did not have access to a solaris 10 machine we could not test to
see the correct way to do the work under solaris.
This commit was SVN r14887.
symbols in them and environ is defined only in the final application
(probably in crt1.o). Apple provides a function for getting at the
environment, so use that instead if it's available.
This commit was SVN r14857.
OPAL and ORTE. Since we now do opal_progress_init(), we do it
there. Fixes a performance issue introduced in r14773.
* While trying to find the above, notived that we did the reference
counting for the init in init_util and for finalize in fini. That
isn't right, so make them both in the non-util versions.
This commit was SVN r14830.
The following SVN revision numbers were found above:
r14773 --> open-mpi/ompi@1e678c3f55
This commit moves the initalization/finalization of opal_event and opal_progress
to opal_init/finalize. These were previously init/final in ORTE which is an
abstraction violation. After talking about it we concluded that there are no
ordering issues that require these to be init/final in ORTE instead of OPAL.
I ran the IBM test suite against this commit and it didn't turn up any new
failures so I think it is good to go.
Let us know if this causes problems.
This commit was SVN r14773.
* Move ipv6comat.h code into opal_config_bottom.h and change into some
more intelligent testing of structures
* Change opal's if interface to use sockaddr instead of sockaddr_storage,
as the RFCs suggest we do
* Move the networking code in opal that isn't directly related to if
detection into net.h
* Add quicky function to get the port out of either a sockaddr_in
or sockaddr_in6, saving a bunch of code in the oob.
* Update TCP oob and btl with new interface
This commit was SVN r14679.
* Require Autoconf 2.60 or higher and remove some cruft
required for AC 2.59 or the AC 2.59 / AC 2.60 mix
* Remove a bunch of now unnecessary AC_SUBST calls
* Use the libtool-provided variables for the -I and
library to use when compiling against ltdl
Fixes trac:1000
This commit was SVN r14652.
The following Trac tickets were found above:
Ticket 1000 --> https://svn.open-mpi.org/trac/ompi/ticket/1000
via the visibility feature that is provided by some compilers.
Per default this feature is disabled, to enable it you need to
configure with --enable-visibility and obviously you need a compiler
with visibility support. Please refer to the wiki for more information.
https://svn.open-mpi.org/trac/ompi/wiki/Visibility
This commit was SVN r14582.
because the Sun Studio compiler did not recognize __const.
This commit fixes trac:1011.
This commit was SVN r14558.
The following Trac tickets were found above:
Ticket 1011 --> https://svn.open-mpi.org/trac/ompi/ticket/1011
- make opal_sockaddr2str() take a sockaddr_storage instead of a sockaddr_in6
so that it works for IPv4 and IPv6 addresses, and remove a whole bunch
of #ifs in the OOOB code.
- Fix a compiler warning in the TCP BTL due to run-time determined
array size by making it a dynamicly allocated array.
- Fix the unpacking code of IPv4 addresses when using IPv6 support, so
that the address is in the correct location (instead of in an IPv6
structure, use an IPv4 structure). Refs trac:1005.
This commit was SVN r14514.
The following Trac tickets were found above:
Ticket 1005 --> https://svn.open-mpi.org/trac/ompi/ticket/1005
This bug(?) become apparent due to the installdirs commit since these tools
were not finding the proper libraries since the paths were wonkey.
It all looks good now. :)
This commit was SVN r14461.
Protect the free and strdup values for replacing keyval pairs just as we do
below in the files for new keyval pairs.
In basic testing this seems to make everything work as it should again.
This commit was SVN r14460.
The following Trac tickets were found above:
Ticket 1002 --> https://svn.open-mpi.org/trac/ompi/ticket/1002
finally brings in functionality that is already on the 1.2 branch, and
was developed and tested in the v1.2ofed branch (and other places).
Short version of new features:
* Support for ibv_fork_init()
* Automatically fill in the openib BTL bandwidth value by
querying the HCA port
* Installdirs functionality
* Fixes to always use -I in the Fortran wrapper compilers (#924)
* Gleb's mpool updates
* Remove some kruft in btl/openib/configure.m4, therefore
fixing the harmless warnings noted in #665
* Bunches of updates to the Linux RPM spec file
I.e., effectively the same thing that r14411 brought to the v1.2
branch.
Also effectively brought in r14432 and r14433 (some fixes on top of
the original r14411 commit to v1.2). Still need to bring in the moral
equivalent of r14445 after this commit (fixes to installdirs).
This commit was SVN r14449.
The following SVN revision numbers were found above:
r14411 --> open-mpi/ompi@83b31314ae
r14432 --> open-mpi/ompi@a48f160595
r14433 --> open-mpi/ompi@68f346d2bc
r14445 --> open-mpi/ompi@13d366b827
that makes sense or not, it is allowed and it is done). Remove the compiler
hint that the argument will never be NULL. Fixes a segfault in oob init
code when opal_argv_split() was called with a NULL first argument.
This commit was SVN r14440.
VXWorks, for example). Checked with George before committing this to
ensure that nothing broke on Windows -- he said it was ok.
This commit was SVN r14388.
Fix for memory corruption in the restarted process stack. This stemed from
the brute force method we were previously using. This commit fixes this by
using a lighter weight solution focused in the r2 BML instead of above the PML.
This is a more efficient and flexible solution, and it solves the original
problem.
In the process I pulled out the ft_event function in the tcp BTL and r2 BML
into a set of *_ft.[c|h] files just to keep any updates to these code paths
as isolated as possible to make merging easier on everyone.
This commit was SVN r14371.
The following SVN revision numbers were found above:
r2 --> open-mpi/ompi@58fdc18855
The following Trac tickets were found above:
Ticket 977 --> https://svn.open-mpi.org/trac/ompi/ticket/977
supported, but "q" was for long long. This isn't ANSI
C and causes a warning when using PRI?64 macros. We
don't support versions prior to OS X 10.3, so we dont'
need such backward compatibility. Instead, redefine
the macros to be "ll", which is ANSI C and doesn't
cause a compiler warning.
Fixes trac:868
This commit was SVN r14358.
The following Trac tickets were found above:
Ticket 868 --> https://svn.open-mpi.org/trac/ompi/ticket/868
Per discussions with Brian and Ralph, make a slight correction in
where components are installed. Use $pkglibdir, not $libdir/openmpi,
so that when compiled in the orte trunk, components are installed to
the right directory (because the component search patch is checking
$pkglibdir).
This commit was SVN r14345.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r14289
__opal_attribute_nonnull__, __opal_attribute_warn_unused_result__,
__opal_attribute_malloc__, __opal_attribute_sentinel__ and
__opal_attribute_format__
This commit was SVN r14078.
- Add signal handler BLCR register (helps with debugging)
- ifdef out the cr_request_file section for checkpointing self.
There is a bug with the 0.4.2 version of BLCR such that this
does not handle moving checkpoint files around.
I'm following up with the BLCR folks on this one (and checking
the newest release).
This commit was SVN r14069.
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.
This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.
This commit closes trac:158
More details to follow.
This commit was SVN r14051.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r13912
The following Trac tickets were found above:
Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
builds, so disable it there
* On 10.4.8 (and possibly others), siginfo is NULL in the signal
callback on 64 bit Intel builds, so account for that in the signal
callback.
This commit was SVN r14045.
This saves some memory for the constructors and destructors arrays of a
class by counting the constructors and destructors while we are counting
the cls_depth. And the reversion of the constructor array can now be done
without an extra loop.
This commit was SVN r13939.
allocated from mpool memory (which is registered memory for RDMA transports)
This is not a problem for a small jobs, but for a big number of ranks an
amount of waisted memory is big.
This commit was SVN r13921.
- mca_base_param_file_prefix
(Default: NULL)
This is the fullname of the "-am" mpirun option. Used to specify a ':'
separated list of AMCA parameter set files.
- mca_base_param_file_path
(Default: $SYSCONFDIR/amca-param-sets/:$CWD)
The path to search for AMCA files with relative paths. A warning will be
printed if the AMCA file cannot be found.
* Added a new function "mca_base_param_recache_files" the re-reads the file
configurations. This is used internally to help bootstrap the MCA system.
* Added a new orterun/mpirun command line option '-am' that aliases for the
mca_base_param_file_prefix MCA parameter
* Exposed the opal_path_access function as it is generally useful in other
places in the code.
* New function "opal_cmd_line_make_opt_mca" which will allow you to append a
new command line option with MCA parameter identifiers to set at the same
time. Previously this could only be done at command line declaration time.
* Added a new directory under the $pkgdatadir named "amca-param-sets" where all
the 'shipped with' Open MPI AMCA parameter sets are placed. This is the first
place to search for AMCA sets with relative paths.
* An example.conf AMCA parameter set file is located in
contrib/amca-param-sets/.
* Jeff Squyres contributed an OpenIB AMCA set for benchmarking.
Note: You will need to autogen with this commit as it adds a configure param.
Sorry :(
This commit was SVN r13867.
the dispatch function opal_atomic_cmpset_acq_xx was used to call
opal_atomic_cmpset_acq_32 would cause the compiler / linker to fix up a
jump address to be itself, leading to an infinite loop.
We're still looking into exactly what caused this, but during the
investigation into the hang, we determined that the compiler (both pathcc
and gcc) weren't always inlining both the call to opal_atomic_cmpset_acq_xx
and opal_atomic_cmpset_acq_32, meaning there was a function call in
opal_atomic_lock. The atomic lock will always be 32 bit, so there's no
need for the dispatch function, so might as well remove the dispatch
function that may or may not be inlined.
A bug fix that leads to potentially better performance. Gotta love the
few times that happens...
This commit was SVN r13651.
to flush all writes pending (ie, the data being protected) out of the memory
manager before we write the spinlock unlock. Only need a wmb instead of
full mb, which is at least slightly less intrusive. Also, after much
thought, no need for a memory barrier in init.
This commit was SVN r13649.
- Set ompi-specific autoconf cache-variables
- Implement one function to check for availability of an
attribute with the possibility for a cross-check.
- Do cross-checks for
__attribute__(format)
__attribute__(nonnull)
__attribute__(sentinel)
__attribute__(warn_unused_result)
- Grep the compilers warnings for keywords regarding ignored
attributes.
- Include also the no_instrument_function
This commit was SVN r13556.
Add new function opal_get_num_processors() that will return the number
of processors on the local host. Does the Right thing in POSIX
environments (to include a special case for OS X), and will shortly do
the Right Thing for Windows (this commit includes a change to
configure, so I wanted to get that in before the US workday -- the
Windows code can some shortly because it won't involve configury
changes).
This commit was SVN r13506.
The following Trac tickets were found above:
Ticket 853 --> https://svn.open-mpi.org/trac/ompi/ticket/853