library to multiple libraries that are implicitly sucked into the executable
as a dependency of libmpi. The initialize hook isn't visible to libc on some
linux distributions when it's in libopal and libopal isn't explicity linked
into the executable. The fix is to have a duplicate initialize hook in
libmpi as well as libopal. *sigh*.
This commit was SVN r28164.
A few changes were required to support this move:
1. the PMI component used to identify rte-related data (e.g., host name, bind level) and package them as a unit to reduce the number of PMI keys. This code was moved up to the ORTE layer as the OPAL layer has no understanding of these concepts. In addition, the component locally stored data based on process jobid/vpid - this could no longer be supported (see below for the solution).
2. the hash component was updated to use the new opal_identifier_t instead of orte_process_name_t as its index for storing data in the hash tables. Previously, we did a hash on the vpid and stored the data in a 32-bit hash table. In the revised system, we don't see a separate "vpid" field - we only have a 64-bit opaque value. The orte_process_name_t hash turned out to do nothing useful, so we now store the data in a 64-bit hash table. Preliminary tests didn't show any identifiable change in behavior or performance, but we'll have to see if a move back to the 32-bit table is required at some later time.
3. the db framework was a "select one" system. However, since the PMI component could no longer use its internal storage system, the framework has now been changed to a "select many" mode of operation. This allows the hash component to handle all internal storage, while the PMI component only handles pushing/pulling things from the PMI system. This was something we had planned for some time - when fetching data, we first check internal storage to see if we already have it, and then automatically go to the global system to look for it if we don't. Accordingly, the framework was provided with a custom query function used during "select" that lets you seperately specify the "store" and "fetch" ordering.
4. the ORTE grpcomm and ess/pmi components, and the nidmap code, were updated to work with the new db framework and to specify internal/global storage options.
No changes were made to the MPI layer, except for modifying the ORTE component of the OMPI/rte framework to support the new db framework.
This commit was SVN r28112.
* Clean up ${includedir} and ${libdir} for script wrapper compilers
* Update script wrapper compilers to work like the C wrapper compilers w.r.t static and dynamic linking
* Remove the ORTE script wrapper compilers since they didn't support the ${includedir} stuff and Ralph said they weren't used anymore.
This commit was SVN r28052.
processor_bind to see if we're bound to a single core.
If not, THEN check lgroup affinity. Already CMR'ed to
v1.6 (trac 3507) and fixed upstream in hwloc (r5295).
This commit was SVN r28040.
The following SVN revision numbers were found above:
r5295 --> open-mpi/ompi@6df8cb0f02
Leif would like to revamp the ARM support in a different way, and will
submit a patch to do so in the future.
This commit was SVN r27960.
The following SVN revision numbers were found above:
r27882 --> open-mpi/ompi@8649b5eece
flags, and mca flags are kept seperate until the very end. The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro. MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.
This commit was SVN r27950.
actually care if opal_pointer_array is limited to handle_max already passes
that in as the max_size during init, so don't need it there. The arch
constant was a bit more difficult, so pass that in during MPI init and
leave empty otherwise.
This is to help with the effort to allow building ompi against an external
opal or orte.
This commit was SVN r27817.
it that the others did: move the "I won!" code up into the POST_CONFIG
macro. Also, fix a long-standing typo when restoring the $CPPFLAGS (!).
This commit was SVN r27813.
STOP_AT_FIRST. And move the side-effect-inducing code in
hwloc142/configure.m4 up to POST_CONFIG.
Also change the priority of the external hwloc component to 90 so that
it is evaluated before the internal component (as a direct result of
changing to STOP_AT_FIRST).
This commit was SVN r27796.
The following SVN revision numbers were found above:
r27794 --> open-mpi/ompi@569a60c2de
event framework to STOP_AT_FIRST, and then moves a bunch of
side-effect-inducing code in the libevent2019 configure.m4 up to
POST_CONFIG.
== More detail ==
Change the event framework from STOP_AT_FIRST_PRIORITY to
STOP_AT_FIRST. This means that only one component can win (vs. all
STOP_AT_FIRST_PRIORITY, in which multiple components of the same
priority can all win).
You still need to ensure that there are no side-effects from the
winner, however, so check for winning during POST_CONFIG, and set
things like the base_include there.
This simplifies the configury quite a bit -- you don't have to assume
that mulitple components can win: zero or one components will win.
Also change the libevent 2019 priority to 50 so that some other
(developer-specific/local) component could win, if it wanted to.
This commit was SVN r27794.
party configure.in scripts to be configure.ac so that Automake stops
complaining about them.
This commit was SVN r27791.
The following SVN revision numbers were found above:
r27790 --> open-mpi/ompi@675a2f5c48
using the modex or RML to share sm initialization information, have node rank 0
create a file containing initialization information in a well-known place. Then
during add_procs, the rest of the node processes requiring sm BTL initialization
will just read from that file to complete their initialization.
This commit was SVN r27789.
Carns, change to use access(.., F_OK) instead of stat() to check for
the presence of files.
Also remove redundant check for FAKEROOTKEY, and update all comments
to match.
This commit was SVN r27785.
config/ directory. We split them apart a while ago in the hopes that
it would simplify things, but it didn't really (e.g., because there
were still some ompi/opal .m4 files in the top-level config/
directory, resulting in developer confusion where any given m4 macro
was defined).
So this commit consolidates them back into the top-level directory for
simplicity.
There's still (at least) two changes that would be nice to make:
1. Split any generated .m4 file (e.g., autogen-generated .m4 files)
into a separate directory somewhere so that a top-level -Iconfig/
will only get our explicitly defined macros, not the autogen stuff
(e.g., with libevent2019 needing to get the visibility macro, but
NOT all the autogen-generated inclusion of component configure.m4
files).
1. Change configure to be of the form:
{{{
# ...a small amount of preamble/setup...
OPAL_SETUP
m4_ifdef([project_orte], [ORTE_SETUP])
m4_ifdef([project_ompi], [OMPI_SETUP])
# ...a small amount of finishing stuff...
}}}
I doubt we'll ever get anything as clean as that, but that would be
the goal to shoot for.
This commit was SVN r27704.
additional functionality. Rationale (refs trac:3422):
* Normal MPI applications only ever use the MPI API. Hence, -lmpi is
sufficient (they'll never directly call ORTE or OPAL
functions). This is arguably the most common case.
* That being said, we do have some test programs (e.g., those in
orte/test/mpi) that call MPI functions but also call ORTE/OPAL
functions. I've also written the occasional MPI test program that
calls opal_output, for example (there even might be a few tests in
the IBM test suite that directly call ORTE/OPAL functions).
* Even though this is not a common case, these applications should
also compile/link with mpicc.
* So we should add a --openmpi:linkall option that will also link
in whatever is necessary to call ORTE/OPAL functions
* Yes, we could hard-code "-lopen-rte -lopen-pal" in Makefiles, but
we do reserve the right to change those library names and/or add
others someday, so it's better to abstract out the names and let
the wrapper supply whatever is necessary.
* ORTE programs, however, are different. They almost always call OPAL
functions (e.g., if they want to send a message, they must use the
OPAL DSS). As such, it seems like the ORTE programs should always
link in OPAL.
Therefore:
* Add undocumented --openmpi:linkall flag to the wrapper compilers.
See the comment in opal_wrapper.c for an explanation of what it
does. This flag is only intended for Open MPI developers -- not
end users. That's why it's undocumented.
* Update orte/test/mpi/Makefile.am to add --openmpi:linkall
* Make ortecc/ortec++'s wrapper data text files always explicitly
link in libopen-pal
This commit was SVN r27670.
The following SVN revision numbers were found above:
r27668 --> open-mpi/ompi@cf845897aa
The following Trac tickets were found above:
Ticket 3422 --> https://svn.open-mpi.org/trac/ompi/ticket/3422
1. Restore libopen-pal.la, libopen-rte.la, and libmpi.la to be
separate entities (i.e., don't have libopen-rte.la include
libopen-pal.la, and don't have libmpi.la include libopen-pal.la).
Yay!
1. Consequently, make the wrapper compilers look for flags indicating
that the user wants to compile statically (currently: -static,
!--static, -Bstatic, and "-Wl," in front of all of those). If it
is, follow a 6-way matrix for determinining which libraries to
list on the underlying command line.
1. To support that, add the name of a token static and dynamic
library to look for in each of the wrapper compiler data files.
1. Fix a long-standing typo in the opalcc wrapper data file.
This commit was SVN r27662.
The old behavior of mca_base_param_deregister could cause the indices of other mca parameters to change. This could potentially cause problems if a mca user saves and later references an affected index.
This commit was SVN r27633.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.
This commit was SVN r27570.
pml/v:
- If vprotocol is not being used vprotocol_include_list is leaked. Assume vprotocol never takes ownership (see below) and always free the string.
coll/ml:
- (patch verified) calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.
- Need to free mca_coll_ml_component.config_file_name on component close.
btl/openib:
- calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.
vprotocol/base:
- There was no way for pml/v to determine if vprotocol took ownership of vprotocol_include_list. Fix by always never ownership (use strdup).
mca/base:
- param_lookup will result in storage->stringval to be a newly allocated string if the mca parameter has a string value. ensure this string is always freed.
cmr:v1.7
This commit was SVN r27569.
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.
This commit was SVN r27527.
The following SVN revision numbers were found above:
r27451 --> open-mpi/ompi@d59034e6ef
r27456 --> open-mpi/ompi@ecdbf34937
Not sure what happened here, but the resulting trunk wouldn't even configure. After spending time fixing that problem, I found it wouldn't compile due to multiple syntax errors that had been introduced in both the OPAL and OMPI layer. This raised questions as to the completeness of the work.
Given that the author is departing, I pinged Jeff about it and we agreed to revert this for now. Hopefully, it can either be fixed by the author prior to actual departure, or someone else can pick it up (now that it is in the history) and fix it.
This commit was SVN r27511.
The following SVN revision numbers were found above:
r27508 --> open-mpi/ompi@12c3c743de
r27509 --> open-mpi/ompi@79e4a8ca38
r27510 --> open-mpi/ompi@1ad5ff625a
opal_shmem_segment_create by testing whether or not the target mount has enough
space to accommodate the shared-memory backing store. Fixes trac:2827. Will work
with Shiqing to add Windows support (if required).
This commit was SVN r27433.
The following Trac tickets were found above:
Ticket 2827 --> https://svn.open-mpi.org/trac/ompi/ticket/2827
* Use the hwloc logical index, not the os_index. Fixes problems with
opal_hwloc_base_cset2str() output (e.g., --report-bindings output)
on machines where the os_index is not tightly packed in the range
![0, n-1]
This commit was SVN r27394.
ompi/mca/sbgp/basesmsocket
orte/mca/rmaps/lama
Remove stale configure.params files from the sbgp framework as the OMPI build system no longer looks at those files.
This commit was SVN r27377.
Cannot start the data clearing at the root object level as the root object has a different struct attached to userdata.
This commit was SVN r27357.
The following Trac tickets were found above:
Ticket 3322 --> https://svn.open-mpi.org/trac/ompi/ticket/3322
This now results in the procs being bound within their assigned location. It also causes us to use only the 0th HT on a core unless --use-hwthread-cpus has been specified (in which case, we use all the HTs in a core). Bind to core binds you to all HTs regardless - the --use-hwthread-cpus only impacts the oversubscribed determination and when binding to HT.
cmr:v1.7
This commit was SVN r27342.
We ran into a case where the OMPI SVN trunk grew a new acceptable MCA
parameter value, but this new value was not accepted on the v1.6
branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts
the value "silent", but on the older v1.6 branch, it doesn't). If you
set "hwloc_base_mem_bind_failure_action=silent" in the default MCA
params file and then accidentally ran with the v1.6 branch, every OMPI
executable (including ompi_info) just failed because hwloc_base_open()
would say "hey, 'silent' is not a valid value for
hwloc_base_mem_bind_failure_action!". Kaboom.
The only problem is that it didn't give you any indication of where
this value was being set. Quite maddening, from a user perspective.
So we changed the ompi_info handles this case. If any framework open
function return OMPI_ERR_BAD_PARAM (either because its base MCA params
got a bad value or because one of its component register/open
functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out
a warning that it received and error, and then dump out the parameters
that it has received so far in the framework that had a problem.
At a minimum, this will show the user the MCA param that had an error
(it's usually the last one), and ''where it was set from'' (so that
they can go fix it).
We updated ompi_info to check for O???_ERR_BAD_PARAM from each from
the framework opens. Also updated the doxygen docs in mca.h for this
O???_BAD_PARAM behavior. And we noticed that mca.h had MCA_SUCCESS
and MCA_ERR_??? codes. Why? I think we used them in exactly one
place in the code base (mca_base_components_open.c). So we deleted
those and just used the normal OPAL_* codes instead.
While we were doing this, we also cleaned up a little memory
management during ompi_info/orte-info/opal-info finalization.
Valgrind still reports a truckload of memory still in use at ompi_info
termination, but they mostly look to be components not freeing
memory/resources properly (and outside the scope of this fix).
This commit was SVN r27306.
The following Trac tickets were found above:
Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275
following:
* Provides a fixed number of resource slots (i.e., "hotel rooms").
* Allows one thing to occupy a resource slot at a time (i.e., each
hotel room can have an occupant check in to that room).
* Resource slots can be vacated at any time (i.e., occupants can
voluntarily check out of their hotel room).
* Resource slots can be occupied for a specific maximum amount of
time. If that time expires, the occupant is forcibly evicted and
the upper layer is notified via (libevent) callback (i.e., the maid
will kick an occupant of out of their room when their reservation
is over).
This class can be to be used for things like retransmission schemes
for unreliable transports. For example, a message sent on an
unreliable transport can be checked in to a hotel room. If an ACK for
that message is received, the message can be checked out. But if the
ACK is never received, the message will eventually be evicted from its
room and the upper layer will be notified that the message failed to
check out in time (i.e., that an ACK for that message was not received
in time).
Code using this class is currently being developed off-trunk, but will
be coming to SVN soon.
This commit was SVN r27067.
"num_app_ctx" - the number of app_contexts in the job
"first_rank" - the MPI rank of the first process in each app_context
"np" - the number of procs in each app_context
Still need clarification on the MPI_Init portion of the ticket. Specifically, does the ticket call for returning an error is someone calls MPI_Init more than once in a program? We set a flag to tell us that we have been initialized, but currently never check it.
This commit was SVN r27005.
distinguish between unknown ''options'' (i.e., command line options
that are registered and have some meaning) and unknown ''tokens''
(i.e., strings that do not begin with "-"). Hence, if you did:
mpirun --fo my_mpi_program
(when perhaps you meant to type "--foo", mpirun would complain that no
such executable "--fo" existed. That is ''correct,'' but perhaps not
completely useful. It is more accurate for mpirun to report that
there is no such "--fo" option.
This change to cmd_line.c makes it so that we will ''always'' report
errors regarding tokens that begin with "-".
This commit was SVN r26953.
* NULL's out the hwloc_obj_t->userdata in
hwloc_base_util.c:free_object() and
hwloc_base_util.c:opal_hwloc_base_free_topology() after it has been
OBJ_RELEASE'd.
* Adds a userdata field to opal_hwloc_topo_data_t. This field will
be used in an upcoming rmaps component ("lama") to cache some
associated data during hardware tree traversals.
This commit was SVN r26938.
particularly with respect to threading flags.
Before this change, the following scenario would fail (e.g., on Linux
with pthreads):
{{{
$ ./configure --disable-shared --enable-static ...
$ make clean install
$ cd examples
$ make clean all
}}}
Linking the Fortran examples would fail with missing pthread symbols.
This commit was SVN r26927.
Update all the orte ess components to remove their associated APIs for retrieving proc data. Update the grpcomm API to reflect transfer of set/get modex info to the db framework.
Note that this doesn't recreate the old GPR. This is strictly a local db storage that may (at some point) obtain any missing data from the local daemon as part of an async methodology. The framework allows us to experiment with such methods without perturbing the default one.
This commit was SVN r26678.
it (i.e., so that it doesn't screw up the ident version string).
This commit was SVN r26651.
The following Trac tickets were found above:
Ticket 2913 --> https://svn.open-mpi.org/trac/ompi/ticket/2913
* Add new configure command line options and deprecate some old ones:
* --with-verbs replaces --with-openib
* --with-verbs-libdir replaces --with-openib-libdir
* If you specify --with-openib[-libdir] without
--with-verbs[-libdir], you'll get a "these options have been
deprecated!" warning, but then they'll act just like
--with-verbs[--libdir].
'''Sidenote:''' Note that we are not renaming any components at this
time, nor are we renaming the top-level OMPI_CHECK_OPENIB m4 macro
(which is pretty strongly tied to the openib BTL and is bastaridzed
by the ofud BTL). Note that there will likely be more changes in
this area coming soon (next week?) when some long-standing changes
move to the SVN trunk: some openib BTL infrastructure will move to
ompi/mca/common, and its configury gets split up / refactored.
We extend our philosophy of other --with-<foo> configure options of
--with-verbs to ''all'' verbs-lovin components:
* If you specify --with-verbs, then all verbs-lovin' components must
configure successfully (or abort). This currently means: OOB ud,
BTL ofud, BTL openib.
* If you specify --with-verbs=DIR, then all verbs-lovin' component
must configure successfully (or abort), and will use DIR to find
verbs headers and libraries.
* If you specify --without-verbs, then all verbs-lovin' components
will be ignored.
This commit also fixes a problem where the --with-openib=DIR form
would not use DIR for ''all'' verbs-lovin' components (I think only
BTL openib and BTL ofud used that DIR). Now all of them do, as does
hwloc (because hwloc has some !OpenFabrics helper functions that
require ibv types from verbs.h).
There's a little new m4 infrastructure worth mentioning:
* If you create a new verbs-lovin' component (i.e., a component that
need verbs), your configure.m4 should
AC_REQUIRE([OPAL_CHECK_VERBS_DIR]).
* You can then use three global shell variables: $opal_want_verbs,
$opal_verbs_dir, $opal_verbs_libdir, which will be set as follows:
* opal_want_verbs will be "yes" and opal_verbs_dir and
opal_verbs_libdir will both be set to directory values, '''OR'''
* opal_want_verbs will be "no" and opal_verbs_dir and
opal_verbs_libdir will both be set empty
This commit was SVN r26640.
Attach). It is a header file that contains syscall defs for process_vm_readv
and process_vm_writev. It is only used on systems where glibc does not yet
have support for the new syscalls and where --with-cma has been passed
to configure. The syscall numbers are hardcoded but have been in a released
kernel and so will not change in the future
Once all linux distros have the new glibc this file can be removed.
This commit was SVN r26615.
The following SVN revision numbers were found above:
r26134 --> open-mpi/ompi@524de80eaa
* opal_hwloc_base_cset2str(): Make a human-readable string of a
hwloc_cpuset_t (e.g., socket 2[core 3[hwt 1]])
* opal_hwloc_base_cset2mapstr(): Make a map-like string of a
hwloc_cpuset_t (e.g., [B./..])
This commit was SVN r26532.