This commit does a bunch of things:
* Address all remaining code review items from CMR #2023:
* Defer mmap setup to be lazy; only set it up the first time we
invoke a collective. In this way, we don't penalize apps that
make lots of communicators but don't invoke collectives on them
(per #2027).
* Remove the extra assignments of mca_coll_sm_one (fixing a
convertor count setup that was the real problem).
* Remove another extra/unnecessary assignment.
* Increase libevent polling frequency when using the RML to
bootstrap mmap'ed memory.
* Fix a minor procs-related memory leak in btl_sm.
* Commit a datatype fix that George and I discovered along the way to
fixing the coll sm.
* Improve error messages when mmap fails, potentially trying to
de-alloc any allocated memory when that happens.
* Fix a previously-unnoticed confusion between extent and true_extent
in coll sm reduce.
This commit was SVN r22049.
The following Trac tickets were found above:
Ticket 2023 --> https://svn.open-mpi.org/trac/ompi/ticket/2023
This commit looks larger than it really is since it includes a fair amount of code cleanup.
The SIGSTOP/SIGCONT+checkpointing work uses some of the functionality in r20391. Basic use case below (note that the checkpoint generated is useable as usual if the stopped application is terminated).
{{{
shell 1) mpirun -np 2 -am ft-enable-cr my-app
... running ...
shell 2) ompi-checkpoint --stop -v MPIRUN_PID
[localhost:001300] [ 0.00 / 0.20] Requested - ...
[localhost:001300] [ 0.00 / 0.20] Pending - ...
[localhost:001300] [ 0.01 / 0.21] Running - ...
[localhost:001300] [ 1.01 / 1.22] Stopped - ompi_global_snapshot_1234.ckpt
Snapshot Ref.: 0 ompi_global_snapshot_1234.ckpt
shell 2) killall -CONT mpirun
... Application Continues execution in shell 1 ...
}}}
Other items in this commit are mostly cleanup that has been sitting off-trunk for too long:
* Add a new {{{opal_crs_base_ckpt_options_t}}} type that encapsulates the various options that could be passed to the CRS. Currently only TERM and STOP, but this makes adding others ''much'' easier.
* Eliminate ORTE_SNAPC_CKPT_STATE_PENDING_TERM, since it served a redundant purpose with the new options type.
* Lay some basic ground work for some future features.
This commit was SVN r21995.
The following SVN revision numbers were found above:
r20391 --> open-mpi/ompi@0704b98668
masks between the mask argument and a local PLPA mask. There were three
problems:
1) The "get" function computed the number of bits as sizeof(mask),
which is the size of the pointer to the mask rather than the mask
itself. So, only 4 bits were copied with m32 and 8 bits with m64.
There are actually 1024 bits.
2) The "get" and "set" functions both copied a number of bits computed
from the sizeof() mask, but sizeof() reports the number of bytes.
We have to multiply by 8 to get the number of bits.
3) These two functions check to make sure tha the mask argument is not
bigger than the PLPA mask. But, the set function copies a number
of bits in the PLPA mask, which is conceivably greater than the
number of bits in the mask argument. So, accesses to the mask
argument may overrun that argument.
Problems 1 and 2 meant that one would encounter errors when the number of
cores exceeded 4 (with -m32) or 8 (with -m64). Problem 3 probably caused
no errors.
This commit was SVN r21993.
don't set the ref count to 1, it has been already set by the call to OBJ_NEW
when the type was allocated. This fixes ticket #2014.
This commit was SVN r21976.
The new options work by adding an ":if-avail" qualifier to the "bind-to-socket" and "bind-to-core" MCA params. If the system does not support this capability, the job will launch anyway. Without the qualifier, the job will abort with an error message indicating that the required functionality is not supported on this system.
This commit was SVN r21975.
As noted in http://www.open-mpi.org/community/lists/devel/2009/08/6741.php,
we do not correctly free a dupped predefined datatype.
The fix is a bit more involving. See ticket for details.
Tested with ibm tests and mpi_test_suite (though there's two "old" failures
zero5.c and zero6.c)
Thanks to Lisandro Dalcin for bringing this up.
This commit was SVN r21929.
The following Trac tickets were found above:
Ticket 2014 --> https://svn.open-mpi.org/trac/ompi/ticket/2014
For some historical reasons, we didn't use select() for the Windows events. Now it could be merged back to have a better performance on Windows.
This commit was SVN r21899.
Due to a typo (probably cut/paste error) the CFLAGS/CPPFLAGS/LDFLAGS/LIBS arguments were not propogated to the BLCR component.
This changes a few {{{crs_blcr_check_}}} to {{{crs_blcr_save}}}, which they should have been all along.
This needs to be applied to v1.3 as well :/
This commit was SVN r21860.
#if defined (c_plusplus)
defined (__cplusplus)
followed by
extern "C" {
and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.
Notable exceptions are:
- opal/include/opal_config_bottom.h:
This is our generated code, that itself defines BEGIN_C_DECL and
END_C_DECL
- ompi/mpi/cxx/mpicxx.h:
Here we do not include opal_config_bottom.h:
- Belongs to external code:
opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c
opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h
- opal/include/opal/prefetch.h:
Has C++ specific macros that are protected:
- Had #if ... } #endif _and_ END_C_DECLS (aka end up with 2x
END_C_DECLS)
ompi/mca/btl/openib/btl_openib.h
- opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
- opal/win32/ompi_process.h: had extern "C"\n {...
opal/win32/ompi_process.h: dito
- ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
ompi/mpi/f90/test/align_c.c: dito
- ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
- ompi/mpi/f90/xml/common-C.xsl: Amend
Tested on linux using --with-openib and --with-mx
The following do not contain either opal_config.h, orte_config.h or
ompi_config.h
(but possibly other header files, that include one of the above):
ompi/mca/bml/r2/bml_r2_ft.h
ompi/mca/btl/gm/btl_gm_endpoint.h
ompi/mca/btl/gm/btl_gm_proc.h
ompi/mca/btl/mx/btl_mx_endpoint.h
ompi/mca/btl/ofud/btl_ofud_endpoint.h
ompi/mca/btl/ofud/btl_ofud_frag.h
ompi/mca/btl/ofud/btl_ofud_proc.h
ompi/mca/btl/openib/btl_openib_mca.h
ompi/mca/btl/portals/btl_portals_endpoint.h
ompi/mca/btl/portals/btl_portals_frag.h
ompi/mca/btl/sctp/btl_sctp_endpoint.h
ompi/mca/btl/sctp/btl_sctp_proc.h
ompi/mca/btl/tcp/btl_tcp_endpoint.h
ompi/mca/btl/tcp/btl_tcp_ft.h
ompi/mca/btl/tcp/btl_tcp_proc.h
ompi/mca/btl/template/btl_template_endpoint.h
ompi/mca/btl/template/btl_template_proc.h
ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
ompi/mca/btl/udapl/btl_udapl_endpoint.h
ompi/mca/btl/udapl/btl_udapl_mca.h
ompi/mca/btl/udapl/btl_udapl_proc.h
ompi/mca/mtl/mx/mtl_mx_endpoint.h
ompi/mca/mtl/mx/mtl_mx.h
ompi/mca/mtl/psm/mtl_psm_endpoint.h
ompi/mca/mtl/psm/mtl_psm.h
ompi/mca/pml/cm/pml_cm_component.h
ompi/mca/pml/csum/pml_csum_comm.h
ompi/mca/pml/dr/pml_dr_comm.h
ompi/mca/pml/dr/pml_dr_component.h
ompi/mca/pml/dr/pml_dr_endpoint.h
ompi/mca/pml/dr/pml_dr_recvfrag.h
ompi/mca/pml/example/pml_example.h
ompi/mca/pml/ob1/pml_ob1_comm.h
ompi/mca/pml/ob1/pml_ob1_component.h
ompi/mca/pml/ob1/pml_ob1_endpoint.h
ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
ompi/mca/pml/ob1/pml_ob1_recvfrag.h
ompi/mca/pml/v/pml_v_output.h
opal/include/opal/prefetch.h
opal/mca/timer/aix/timer_aix.h
opal/util/qsort.h
test/support/components.h
This commit was SVN r21855.
The following SVN revision numbers were found above:
r2 --> open-mpi/ompi@58fdc18855
- add a cmake module for searching libltdl libraries and headers
- a configure option to enable DSO build, default OFF.
- update a few source files for including correct header
and loading correct mca libraries path/suffix.
This commit was SVN r21804.
support for _Complex is disabled until we figure out the correct
black magic. So instead of using this nice C99 feature, we use the
a strcture with a double type, the same approach that worked pretty
well for the last couple of years.
Switching from one mode to the other is done using the
OPAL_USE_[FLOAT|DOUBLE|LONG_DOUBLE]__COMPLEX macros defined in
opal_datatype_internal.h at line 442.
This commit was SVN r21800.
Adds several new mpirun options:
* -bysocket - assign ranks on a node by socket. Effectively load balances the procs assigned to a node across the available sockets. Note that ranks can still be bound to a specific core within the socket, or to the entire socket - the mapping is independent of the binding.
* -bind-to-socket - bind each rank to all the cores on the socket to which they are assigned.
* -bind-to-core - currently the default behavior (maintained from prior default)
* -npersocket N - launch N procs for every socket on a node. Note that this implies we know how many sockets are on a node. Mpirun will determine its local values. These can be overridden by provided values, either via MCA param or in a hostfile
Similar features/options are provided at the board level for multi-board nodes.
Documentation to follow...
This commit was SVN r21791.
Refs trac:1987
The patch for v1.3 attached to Ticket #1987 already includes this change.
I did not have a chance to commit this last night, so sorry for the delay.
This commit was SVN r21777.
The following Trac tickets were found above:
Ticket 1987 --> https://svn.open-mpi.org/trac/ompi/ticket/1987
Complex should either be a struct of float on the ompi-layer
or as C99 float _complex in opal.
unconstify to not break windows / the mpi* wrappers.
This commit was SVN r21770.
The following SVN revision numbers were found above:
r21763 --> open-mpi/ompi@668f001351
* Check for {{{dlfcn.h}}} in the self component's configure.m4 (also clean up the .m4 a bit.
* Adjust the priority of the BLCR component so that the self component has a higher priority (if the application went to the trouble of writing the routines, why not use them.) The 'self' component checks for the appropriate functions during query, so it know if it -can- be used during component selection.
* Adjust some copyrights that I missed before
* Fix a warning when casing the result of dlsym() into a function pointer. There is a bit of pointer magic to make this happen (thanks to the following website, and RedHat EL 4 man pages for illustrating it:
http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html
Passing to Jeff for a final review of the patch before moving to v1.3.
This commit was SVN r21768.
The following SVN revision numbers were found above:
r21766 --> open-mpi/ompi@91e52d062b
Due to the visibility patch to libltdl in r21731, this module can no longer access or use the libltdl interfaces directly. Instead just use the dlopen/dlsym/dlclose functions directly. This is a portability implication here, but for the moment it does not seem to bite us.
Also in this patch, cleanup some of the 'self' specific code paths.
* opal-restart need not special case the 'self' component since it can now interact with it as if it were a normal component.
* Cleanup the initialization of the cmd line arguments in opal-restart.
* Make sure to mark opal-restart as a 'tool', but do so by setting the global variable directly instead of setting the environment variable, which could be inherited by the application.
* Most of the functions in the 'self' component should not be used by a command line tool (exception being 'restart'), so make sure that if we accidently call them then errors are returned.
* Increase the priority of the 'none' component to be above that of 'self' when being selected in a command line tool. This allows for both mpirun and opal-restart to work correctly with the 'self' module.
This commit was SVN r21766.
The following SVN revision numbers were found above:
r21731 --> open-mpi/ompi@0278b86456
C++ compiler in configure. If we have a C++ compiler, then the MPI
C++ bindings are built by default. If we don't have a C++ compiler,
then the MPI C++ bindings are not built by default.
--enable-mpi-cxx will now force an error if there is no C++ compiler
available. --disable-mpi-cxx (or the lack of a C++ compiler) will now
disable many of the C++ compiler checks in configure.
Note that there are a few items to clean up regarding the difference
between C's _Bool type and C++'s bool type. Right now, we assume that
they are the same. But they aren't, and they shouldn't be treated as
such. This cleanup will be forced in MPI-2.2 with the introduction of
the MPI_C_BOOL MPI datatype.
This commit was SVN r21755.
note in VERSION file.
NOTE: the versions will ''always'' be 0:0:0 on the SVN trunk and
developer branches. They will only have meaningful values (starting
with 0:0:0 in 1.3.4) on release branches. Only RM's will modify these
values immediately preceeding a release.
This commit was SVN r21729.