We ran into a case where the OMPI SVN trunk grew a new acceptable MCA
parameter value, but this new value was not accepted on the v1.6
branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts
the value "silent", but on the older v1.6 branch, it doesn't). If you
set "hwloc_base_mem_bind_failure_action=silent" in the default MCA
params file and then accidentally ran with the v1.6 branch, every OMPI
executable (including ompi_info) just failed because hwloc_base_open()
would say "hey, 'silent' is not a valid value for
hwloc_base_mem_bind_failure_action!". Kaboom.
The only problem is that it didn't give you any indication of where
this value was being set. Quite maddening, from a user perspective.
So we changed the ompi_info handles this case. If any framework open
function return OMPI_ERR_BAD_PARAM (either because its base MCA params
got a bad value or because one of its component register/open
functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out
a warning that it received and error, and then dump out the parameters
that it has received so far in the framework that had a problem.
At a minimum, this will show the user the MCA param that had an error
(it's usually the last one), and ''where it was set from'' (so that
they can go fix it).
We updated ompi_info to check for O???_ERR_BAD_PARAM from each from
the framework opens. Also updated the doxygen docs in mca.h for this
O???_BAD_PARAM behavior. And we noticed that mca.h had MCA_SUCCESS
and MCA_ERR_??? codes. Why? I think we used them in exactly one
place in the code base (mca_base_components_open.c). So we deleted
those and just used the normal OPAL_* codes instead.
While we were doing this, we also cleaned up a little memory
management during ompi_info/orte-info/opal-info finalization.
Valgrind still reports a truckload of memory still in use at ompi_info
termination, but they mostly look to be components not freeing
memory/resources properly (and outside the scope of this fix).
This commit was SVN r27306.
The following Trac tickets were found above:
Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.
Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.
This commit was SVN r26242.
supposed to. I.e., half-baked/not complete stuff.
This commit backs out all of r25545. Sorry folks!
This commit was SVN r25546.
The following SVN revision numbers were found above:
r25545 --> open-mpi/ompi@7f9ae11faf
to make MPI_IN_PLACE (and other sentinel Fortran constants) work on OS
X, we need to use the following compiler (linker) flag:
-Wl,-commons,use_dylibs
So if we're compiling on OS X, test to see if that flag works with the
compiler. If so, add it to the wrapper FFLAGS and FCFLAGS (note that
per a future update, we'll only have one Fortran compiler anyway).
Fixes trac:1982.
This commit was SVN r25545.
The following Trac tickets were found above:
Ticket 1982 --> https://svn.open-mpi.org/trac/ompi/ticket/1982
http://www.open-mpi.org/community/lists/devel/2011/10/9878.php
I am making a final decision to decide the behavior of what happens
when an MCA parameter is re-registered and changes types. In
developer builds (i.e., OPAL_ENABLE_DEBUG==1), a show_help message
will be displayed. In all builds, an error status will be returned.
Specifically, the logic looks like this:
{{{
if (detect_re-registration_with_type_change) {
#if OPAL_ENABLE_DEBUG
opal_show_help(...);
#endif
return OPAL_ERR_VALUE_OUT_OF_BOUNDS;
}
}}}
If someone would like to change this behavior, they are welcome to do
so. :-) I am committing this so that ''some'' action occurs (rather
than talking about the issue and then nothing happens).
This commit was SVN r25432.
handled properly when MCA parameters are re-registered and their types
change. Specifically, this case was broken:
1. Register an int MCA param with a non-zero default value
1. Re-register the same MCA param as a string with a NULL default value
The 2nd step would cause a segv because the first int default value
wasn't being reset properly. Here's sample code that shows the issue:
{{{
{
int ibogus;
char *sbogus;
opal_init(&argc, &argv);
mca_base_param_reg_int_name("type", "name", "help", false, false, 3, &ibogus);
printf("Ibogus: %d\n", ibogus);
mca_base_param_reg_string_name("type", "name", "help", false, false, NULL, &sbogus);
printf("Sbogus: %s\n", (NULL == sbogus) ? "NULL" : sbogus);
exit(0);
}
}}}
This commit fixes the problem from the sample code above as well as
the a similar issue for file-set MCA params and override values. It
also resets default values for MCA params initially registered as a
string but then re-registered as an int.
This commit was SVN r25392.
specify btl_tcp_if_include because btl_tcp_if_exclude is defaulted to
the loopback devices.
This commit does a few things:
* Introduce a new OPAL MCA base function:
mca_base_param_check_exclusive_string(). It checks to see that the
''user'' does not set two MCA parameters that are mutually
exclusive by checking the source of those MCS param values.
* Use the above function in many BTLs (and the OOB TCP) to ensure
that <foo>_if_include and <foo>_if_exclude are not both specified
''by the user''.
* Re-arrange many of these BTLs to move their MCA registration code
into a separate component_register() function (vs. the
component_open() function).
This code has been nominally reviewed and checked by Ralph, George,
Terry, and Shiqing.
This commit was SVN r25043.
The following SVN revision numbers were found above:
r24976 --> open-mpi/ompi@8f4ac54336
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.
Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.
This commit was SVN r23764.
The fix is to just check if the return value is positive or not, since all the SOS encoded errors are *always* negative.
The real fix (as Ralph points out) is to change these functions (opal_pointer_array_add and mca_base_param*) to return the index as a pointer.
This commit was SVN r23173.
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
back the native error code.
* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
(OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
decode 'ret' to get the native error code.
This commit was SVN r23162.
:-)
Okay, cleanup the prior commit so that the default component search path shows in ompi_info, and remains available in component_find.
This commit was SVN r22278.
"my_perfect_path":SYSTEM_DEFAULT:USER_DEFAULT
and OPAL will substitute its internally derived values for the defaults (instead of forcing the user to figure them out).
This commit was SVN r22272.
#if defined (c_plusplus)
defined (__cplusplus)
followed by
extern "C" {
and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.
Notable exceptions are:
- opal/include/opal_config_bottom.h:
This is our generated code, that itself defines BEGIN_C_DECL and
END_C_DECL
- ompi/mpi/cxx/mpicxx.h:
Here we do not include opal_config_bottom.h:
- Belongs to external code:
opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c
opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h
- opal/include/opal/prefetch.h:
Has C++ specific macros that are protected:
- Had #if ... } #endif _and_ END_C_DECLS (aka end up with 2x
END_C_DECLS)
ompi/mca/btl/openib/btl_openib.h
- opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
- opal/win32/ompi_process.h: had extern "C"\n {...
opal/win32/ompi_process.h: dito
- ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
ompi/mpi/f90/test/align_c.c: dito
- ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
- ompi/mpi/f90/xml/common-C.xsl: Amend
Tested on linux using --with-openib and --with-mx
The following do not contain either opal_config.h, orte_config.h or
ompi_config.h
(but possibly other header files, that include one of the above):
ompi/mca/bml/r2/bml_r2_ft.h
ompi/mca/btl/gm/btl_gm_endpoint.h
ompi/mca/btl/gm/btl_gm_proc.h
ompi/mca/btl/mx/btl_mx_endpoint.h
ompi/mca/btl/ofud/btl_ofud_endpoint.h
ompi/mca/btl/ofud/btl_ofud_frag.h
ompi/mca/btl/ofud/btl_ofud_proc.h
ompi/mca/btl/openib/btl_openib_mca.h
ompi/mca/btl/portals/btl_portals_endpoint.h
ompi/mca/btl/portals/btl_portals_frag.h
ompi/mca/btl/sctp/btl_sctp_endpoint.h
ompi/mca/btl/sctp/btl_sctp_proc.h
ompi/mca/btl/tcp/btl_tcp_endpoint.h
ompi/mca/btl/tcp/btl_tcp_ft.h
ompi/mca/btl/tcp/btl_tcp_proc.h
ompi/mca/btl/template/btl_template_endpoint.h
ompi/mca/btl/template/btl_template_proc.h
ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
ompi/mca/btl/udapl/btl_udapl_endpoint.h
ompi/mca/btl/udapl/btl_udapl_mca.h
ompi/mca/btl/udapl/btl_udapl_proc.h
ompi/mca/mtl/mx/mtl_mx_endpoint.h
ompi/mca/mtl/mx/mtl_mx.h
ompi/mca/mtl/psm/mtl_psm_endpoint.h
ompi/mca/mtl/psm/mtl_psm.h
ompi/mca/pml/cm/pml_cm_component.h
ompi/mca/pml/csum/pml_csum_comm.h
ompi/mca/pml/dr/pml_dr_comm.h
ompi/mca/pml/dr/pml_dr_component.h
ompi/mca/pml/dr/pml_dr_endpoint.h
ompi/mca/pml/dr/pml_dr_recvfrag.h
ompi/mca/pml/example/pml_example.h
ompi/mca/pml/ob1/pml_ob1_comm.h
ompi/mca/pml/ob1/pml_ob1_component.h
ompi/mca/pml/ob1/pml_ob1_endpoint.h
ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
ompi/mca/pml/ob1/pml_ob1_recvfrag.h
ompi/mca/pml/v/pml_v_output.h
opal/include/opal/prefetch.h
opal/mca/timer/aix/timer_aix.h
opal/util/qsort.h
test/support/components.h
This commit was SVN r21855.
The following SVN revision numbers were found above:
r2 --> open-mpi/ompi@58fdc18855
- add a cmake module for searching libltdl libraries and headers
- a configure option to enable DSO build, default OFF.
- update a few source files for including correct header
and loading correct mca libraries path/suffix.
This commit was SVN r21804.
Libltdl erroneously returns an error string of "file not found" for
lots of reasons, even if the file really *is* there, but just failed
to dlopen() for some reason. So if lt_dlerror() returns "file not
found", do some simple hueristics and if we *do* find a file, print a
slightly better error message.
This commit was SVN r21214.
OMPI_* to OPAL_*. This allows opal layer to be used more independent
from the whole of ompi.
NOTE: 9 "svn mv" operations immediately follow this commit.
This commit was SVN r21180.
- Delete unnecessary header files using
contrib/check_unnecessary_headers.sh after applying
patches, that include headers, being "lost" due to
inclusion in one of the now deleted headers...
In total 817 files are touched.
In ompi/mpi/c/ header files are moved up into the actual c-file,
where necessary (these are the only additional #include),
otherwise it is only deletions of #include (apart from the above
additions required due to notifier...)
- To get different MCAs (OpenIB, TM, ALPS), an earlier version was
successfully compiled (yesterday) on:
Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled
This commit was SVN r21096.
Enable the debug library suffix, which is extremely necessary on Windows. If users want to debug their own programs in Visual Studio, but linking the programs to the release version libraries of Open MPI, i.e. mixing debug and release version DLLs, that will definitely cause some errors. What we have to do is providing both debug and release versions libraries, distinguished with suffix 'd', e.g. libmpid.dll for debug version.
This commit was SVN r20828.
- This patch solely _adds_ required headers and is rather localized
The next patch (after RFC) heavily removes headers (based on script)
- ompi/communicator/communicator.h: For sources that use
ompi_mpi_comm_world, don't require them to include "mpi.h"
- ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs
#include "ompi/mca/topo/topo.h"
- ompi/errhandler/errhandler_predefined.h:
ompi/communicator/communicator.h depends on this header file!
To prevent recursion just have fwd declarations.
#include "ompi/types.h" for fwd declarations of the main structs.
- ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t
- ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and
ompi_rb_tree_t, so have the proper classes
- ompi/mca/op/op.h:
Op is pretty self-contained: Nobody up to now has done
#include "opal/class/opal_object.h"
- ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h:
#include "opal/types.h" for ompi_ptr_t
- ompi/mca/pml/base/base.h:
We use opal_lists
- ompi/mca/pml/dr/pml_dr_vfrag.h:
#include "opal/types.h" for ompi_ptr_t
- ompi/mca/pml/ob1/pml_ob1_hdr.h:
#include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t
- opal/dss/dss_unpack.c:
#include "opal/types.h"
- opal/mca/base/base.h:
#include "opal/util/cmd_line.h" for opal_cmd_line_t
- orte/mca/oob/tcp/oob_tcp.c:
#include "opal/types.h" for opal_socklen_t
- orte/mca/oob/tcp/oob_tcp.h:
#include "opal/threads/threads.h" for opal_thread_t
- orte/mca/oob/tcp/oob_tcp_msg.c:
#include "opal/types.h"
- orte/mca/oob/tcp/oob_tcp_peer.c:
#include "opal/types.h" for opal_socklen_t
- orte/mca/oob/tcp/oob_tcp_send.c:
#include "opal/types.h"
- orte/mca/plm/base/plm_base_proxy.c:
#include "orte/util/name_fns.h" for ORTE_NAME_PRINT
- orte/mca/rml/base/rml_base_receive.c:
#include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
- orte/mca/rml/oob/rml_oob_recv.c:
#include "opal/types.h" for ompi_iov_base_ptr_t
- orte/mca/rml/oob/rml_oob_send.c:
#include "opal/types.h" for ompi_iov_base_ptr_t
- orte/runtime/orte_data_server.c
#include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
- orte/runtime/orte_globals.h:
#include "orte/util/name_fns.h" for ORTE_NAME_PRINT
Tested on Linux/x86-64
This commit was SVN r20817.
''once'' and keep the names in an argv-style array. Each time we go
to open a framework, we just scan that array rather than re-reading
all the filenames from the filesystem.
This commit was SVN r20309.
The following Trac tickets were found above:
Ticket 1271 --> https://svn.open-mpi.org/trac/ompi/ticket/1271