- This line, and those below, will be ignored--
M opal/mca/event/libevent2021/configure.m4
M opal/mca/hwloc/hwloc172/configure.m4
M configure.ac
M config/opal_setup_libltdl.m4
M config/opal_check_visibility.m4
M config/opal_setup_cc.m4
This commit was SVN r31637.
Several fixes:
- I was allowing an MPI_T_cvar_handle to be created for an invalid
variable. Fixed this by checking if the variable is valid in
mca_base_var_get.
- Use a better error code when the caller tries to create an unbound
pvar handle for a bound variable.
- Return the verbosity level in MPI_T_cvar_get_info.
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r31576.
http://www.open-mpi.org/community/lists/devel/2014/04/14496.php
Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM
This commit was SVN r31557.
Make sure that an internal, long-lived hwloc fd is marked as
close-on-exec so that children don't inherit it. This patch is
committed upstream in the hwloc master and v1.9 branches as 7489287
and b654e19, respectively. The patch applied here is the exact same
logic, but the surrounding code changed slightly since the hwloc v1.7
series, so the patch doesn't apply cleanly.
Refs trac:4550
This commit was SVN r31511.
The following Trac tickets were found above:
Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
Aside from comments, these help files are empty -- so just remove
them. Also remove a few inclusions of show_help.h, since it isn't
used in those source files.
This commit was SVN r31328.
The contents of this file were superceded by mca-help-var.txt.
Indeed, this file was already removed from Makefile.am; it looks like
it was accidentally left in SVN.
This commit was SVN r31327.
Also:
* fix formatting error at top (make the emacs format line begin
with a comment character)
* Remove blank lines between topics and replace them with empty
comment lines (so that the help messages are not displayed with a
trailing blank line)
This commit was SVN r31324.
Also, add missing ORTE_ERROR_LOG in the other case where this error
message is used (i.e., ORTE_ERROR_LOG was used in the one place, so
let's also use it in the other place).
This commit was SVN r31321.
This type of help message is now handled by the MCA var system itself
(via the MCA_BASE_VAR_FLAG_ENVIRONMENT_ONLY flag at var_register
time).
This commit was SVN r31318.
add -mca base_env_list "var1=val1 var2=val2 ..." mca parameter that can be used in mca param files
or with -am app.conf mpirun commandline to set rank env variables with mca mechanism
fixed by Elena, reviewed by Miked
cmr=v1.8.1:reviewer=ompi-rm1.8
This commit was SVN r31302.
Also, since I put some of the macros for these silent/verbose rules up
in the top-level Makefile.man-page-rules file, I renamed it to
Makefile.ompi-rules.
I've had this sitting around for a while; now seems like as good a
time as any to commit it.
This commit was SVN r31271.
on some linux distro (sles11sp2) csh fails to parse $LS_COLORS and borks with error:
Unknown colorls variable `mh'.
The workaround is to unset LS_COLORS before calling to csh script
reviewed by Jeff
cmr=v1.8:reviewer=ompi-rm1.8
This commit was SVN r31244.
an enumerator's mca_base_var_enum_sfv_fn_t can be NULL.
cmr=v1.7.5:ticket=trac:4398:reviewer=ompi-gk1.7
This commit was SVN r31085.
The following Trac tickets were found above:
Ticket 4398 --> https://svn.open-mpi.org/trac/ompi/ticket/4398
In the OPAL_ENABLE_FT_CR code path there used to be a variable
'mca_base_component_distill_checkpoint_ready' which got removed.
The FT code was not compiling and while trying to get it to compile
again the old variable was #ifdef'd out. This re-introduces the
variable with a new name 'opal_base_distill_checkpoint_ready'
and enables the code previously #ifdef'd out.
This removes the last hack introduced to get the FT code to compile
again.
This commit was SVN r30928.
1. Changed rng_buff_t --> opal_rng_buff_t
2. All global variables obey the prefix rule
3. Old code has been removed
4. Found a couple of unnecessary includes
Refs trac:4298
This commit was SVN r30807.
The following Trac tickets were found above:
Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
This adds the code to actually checkpoint a process using CRIU
with the necessary variables to control the behaviour.
Right now only --np 1 is supported and --mca oob tcp.
Following parameters are supported:
* crs_criu_log: name of the log file
* crs_criu_log_level: verbosity level in the log file
* crs_criu_tcp_established: C/R established TCP connections
* crs_criu_shell_job: C/R shell jobs
* crs_criu_ext_unix_sk: allow external unix connections
* crs_criu_leave_running: leave tasks in running state after checkpoint
This commit was SVN r30772.
* Remove redundant/unnecessary uses of $2
* Change a bunch of logic from negative to positive
* Use OPAL_VAR_SCOPE_PUSH/POP to help reduce env var usage
* Only use "" in test statements with strings that require sanitization
* Removed redundant AC_MSG_WARN/ERROR. There's now only one check at
the bottom for whether the component is "good" or not. We'll
AC_MSG_WARN/ERROR in that one location.
Thanks to Jeff Squyres for this patch.
This commit was SVN r30739.
To be able to checkpoint/restart using criu (criu.org) a new
CRS component is added which is based on criu. This first commit
provides the minimal set of functions and configure script options
to enable --with-criu and link against libcriu.so.
No actual checkpoint/restart functionality is yet implemented.
This is only the framework which needs to be filled with the
actual functionality.
This commit was SVN r30666.
Wire the security check into ORTE's OOB handshake, and add a "version" check to ensure that both ends are from the same ORTE version. If not, report the mismatch and refuse the connection
Fixes trac:4171
cmr=v1.7.5:reviewer=jsquyres:subject=Add a security framework for authenticating connections
This commit was SVN r30551.
The following Trac tickets were found above:
Ticket 4171 --> https://svn.open-mpi.org/trac/ompi/ticket/4171
Refs trac:4117. Please use this commit rather than the patch attached to
the ticket; the patch had a few mistakes in the tweaked wording.
This commit was SVN r30362.
The following SVN revision numbers were found above:
r30298 --> open-mpi/ompi@58479399c3
The following Trac tickets were found above:
Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
work around buggy NUMA node cpusets (i.e., buggy BIOSs).
Thanks to Jeff Becker for reporting the issue.
Submitted by Brice Goglin, reviewed by Jeff Squyres.
cmr=v1.7.4:reviewer=ompi-rm1.7
This commit was SVN r30306.
The original code was passing a union by value, and doing odd things
on Solaris/SPARC (where "odd" rhymes with "SIGBUS"). Replace it with
an exploded switch/case block for all the enum values. Also use the
string literals so that we get compiler checking of the format string
vs. the type of the actual arguments.
cmr=v1.7.4:revier=hjelmn:subject=Fix MCA base var to not pass union by value
This commit was SVN r30276.
variable names for deprecated variables.
Closes trac:3270
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r30275.
The following Trac tickets were found above:
Ticket 3270 --> https://svn.open-mpi.org/trac/ompi/ticket/3270
Change the logic in bind.c to only include <malloc.h> if we don't have posix_memalign.
In http://www.open-mpi.org/community/lists/devel/2014/01/13619.php,
Paul Hargrove found a compiler warning on OpenBSD where <malloc.h>
exists, but is not intended to be used (and doesn't error out, so
AC_CHECK_HEADERS says its ok).
Reviewed by Brice Goglin.
cmr=v1.7.4:reviewer=ompi-rm1.7
This commit was SVN r30234.
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:
* pkgdatdir -> ompidatadir
* pkglibdir -> ompilibdir
* pkgincludedir -> ompiincludedir
This commit was SVN r30145.
The following SVN revision numbers were found above:
r30140 --> open-mpi/ompi@8b778903d8
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi. This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.
This commit was SVN r30140.
Right now the C/R code fails because of a change introduced in
opal/mca/compress/base/compress_base_open.c and
pal/mca/crs/base/crs_base_open.c in 2013 with commit
git 734c724ff76d9bf814f3ab0396bcd9ee6fddcd1b
svn r28239
Update OPAL frameworks to use the MCA framework system.
This commit changed a lot but also the return value of functions from
OPAL_SUCCESS to OPAL_ERR_NOT_AVAILABLE/OPAL_ERR_NOT_AVAILABLE.
This commit lets opal_compress_base_register() and opal_crs_base_open()
always return OPAL_SUCCESS and removes unneeded #includes.
This commit was SVN r30130.
The following SVN revision numbers were found above:
r28239 --> open-mpi/ompi@365cf48db5
Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node.
Refs trac:4003
This commit was SVN r30033.
The following Trac tickets were found above:
Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003
- Use ->boolval for booleans when creating a string.
- Solaris has some issue with the ?: used in one of find functions. Use an if instead.
- Change all instances of index -> vari to avoid issues with redefining index.
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r29997.
Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff.
cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings
This commit was SVN r29978.
compiled with ummunotify support (which is the check that r29720 just
recently added).
This commit was SVN r29961.
The following SVN revision numbers were found above:
r29720 --> open-mpi/ompi@ae8c826527
includes various fixes all over the C/R code which are
hard to group like the other patches.
Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values
Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)
This commit was SVN r29922.
* default to bind-to core
* map-by slot if np=2
* map-by socket (balance across sockets on each node) if np > 2
* map-by <obj> will imply rank-by <obj> by default (leave default binding as above)
Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs
cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values
This commit was SVN r29919.
got linked together (work on one caused work in the other):
* Clean up a bunch of VAR_SCOPE issues in configure. This includes:
* Using VAR_SCOPE_PUSH and VAR_SCOPE_POP in more places
* Cleaning up the use of some shell variables (e.g., name them better)
* Add support for external libevent via
--with-libevent=<dir-to-libevent-install-tree>, as specifically
asked for by downstream packagers.
* Revamp how wrapper compiler RPATH (and RUNPATH) support is done.
The external libevent work exposed weakenesses in how the original
RPATH/RUNPATH work was done, so we had to re-do it to be a bit more
robust.
This work has not yet been tested on Solaris.
Refs trac:3694
This commit was SVN r29899.
The following Trac tickets were found above:
Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
This is helpful in the work for #3694: ensure that many places that
eventually end up in configure don't overly-pollute the global shell
variable space (because debugging accidental shell variable pollution
can be a real pain).
Refs trac:3694
This commit was SVN r29830.
The following Trac tickets were found above:
Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
* Ensure "cnt" is always initialized
* Ensure we dont' buffer overflow on strncat() -- need to ensure we
account for the terminating \0 character
* hwloc_get_type_depth() returns an int (not unsigned), and
HWLOC_TYPE_DEPTH_UNKNOWN if it's unknown (which is probably <0, but
still, might as well check what the official hwloc docs say to
check for)
cmr=v1.7.4:reviewer=rhc:subject=fix hwloc base compiler warnings
This commit was SVN r29686.
this change the next time we update libevent.
cmr=v1.7.4:ticket=3882
This commit was SVN r29597.
The following Trac tickets were found above:
Ticket 3882 --> https://svn.open-mpi.org/trac/ompi/ticket/3882
- Make a copy of enumerator data for default enumerators. This will allow
the caller to free their data once the enumerator has been created. This
is a change from just referencing the values array.
- Make mca_base_pvar_notify check if the pvar is valid before calling the
notify callback. This fixes a segmentation fault when destroying handles
after MPI_Finalize().
cmr=v1.7.4:ticket=trac:3861
This commit was SVN r29512.
The following Trac tickets were found above:
Ticket 3861 --> https://svn.open-mpi.org/trac/ompi/ticket/3861
Fixes:
- Segmentation fault when using watermark variables.
- Segmentation fault when using a handle bound to a no longer valid
performance variable.
- Incorrect return codes from MPI_T_pvar_* functions.
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r29481.
This change contains a non-mandatory modification
of the MPI-RTE interface. Anyone wishing to support
coprocessors such as the Xeon Phi may wish to add
the required definition and underlying support
****************************************************************
Add locality support for coprocessors such as the Intel Xeon Phi.
Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.
So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:
1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board
2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions
3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.
4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.
5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.
6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.
cmr:v1.7.4:reviewer=hjelmn
This commit was SVN r29435.
Fix two problems that surfaced when using direct launch under SLURM:
1. locally store our own data because some BTLs want to retrieve
it during add_procs rather than use what they have internally
2. cleanup MPI_Abort so it correctly passes the error status all
the way down to the actual exit. When someone implemented the
"abort_peers" API, they left out the error status. So we lost
it at that point and *always* exited with a status of 1. This
forces a change to the API to include the status.
cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch
This commit was SVN r29405.
However, tools such as mpirun don't need it, and definitely shouldn't be using it. Ditto for procs launched by mpirun.
We used to have a way of dealing with this - we had the PMI component check to see if the process was the HNP or was launched by an HNP. Sadly, moving the OPAL db framework removed
that ability as OPAL has no notion of HNPs or proc type.
So add a boolean flag to the db_base_select API that allows us to restrict selection to "local" components. This gives the PMI component the ability to reject itself as required. W
e then need to pass that param into the ess_base_std_app call so it can pass it all down.
This commit was SVN r29341.
Create a new required key in the OMPI layer for retrieving a "node id" from the database. ALL RTE'S MUST DEFINE THIS KEY. This allows us to compute locality in the MPI layer, which is necessary when we do things like intercomm_create.
cmr:v1.7.4:reviewer=rhc:subject=Cleanup handling of modex data
This commit was SVN r29274.
the output of ompi_info.
A variable is disabled if its component will never be selected due to
a component selection parameter (eg. -mca btl self). The old behavior
of ompi_info was to not print these parameters at all. Now we print the
parameters. After some discussion with George it was decided that there
needed to be some way to see what parameters will not be used. This was
the comprimise.
This commit also fixes a bug and a typo in the pvar sytem. The enum_count
value in mca_base_pvar_dump was being used without being set. The full_name
in mca_base_pvar_t was not being used.
cmr=v1.7.3:ticket=trac:3734
This commit was SVN r29078.
The following Trac tickets were found above:
Ticket 3734 --> https://svn.open-mpi.org/trac/ompi/ticket/3734
memory hooks.
The MIC has a /dev/scif device and the host has /dev/mic/scif. I do not
know if this device exists when no MIC is connected.
cmr=v1.7.4:ticket=trac:3733:reviewer=jsquyres
This commit was SVN r29071.
The following Trac tickets were found above:
Ticket 3733 --> https://svn.open-mpi.org/trac/ompi/ticket/3733
is enabled and fix a bug in the handling of watermark performance
variables.
cmr=v1.7.3:ticket=trac:3725:reviewer=jsquyres
This commit was SVN r29068.
The following Trac tickets were found above:
Ticket 3725 --> https://svn.open-mpi.org/trac/ompi/ticket/3725
*** THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE ***
Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro.
***************************************************************************************
I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week.
The code is in https://bitbucket.org/rhc/ompi-oob2
WHAT: Rewrite of ORTE OOB
WHY: Support asynchronous progress and a host of other features
WHEN: Wed, August 21
SYNOPSIS:
The current OOB has served us well, but a number of limitations have been identified over the years. Specifically:
* it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code)
* we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface.
* the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients
* there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort
* only one transport (i.e., component) can be "active"
The revised OOB resolves these problems:
* async progress is used for all application processes, with the progress thread blocking in the event library
* each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on")
* multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC.
* a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions.
* opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object
* NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions
* obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel
* the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport
* routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active
* all blocking send/recv APIs have been removed. Everything operates asynchronously.
KNOWN LIMITATIONS:
* although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline
* the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker
* routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways
* obviously, not every error path has been tested nor necessarily covered
* determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when *all* transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost.
* reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways
* the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC
This commit was SVN r29058.
This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times.
Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes:
* upon first request for data, have the OPAL db pmi component fetch and decode *all* the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally
* reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test
* reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued).
Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it
Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time.
This commit was SVN r29040.
This commit reintroduces key compression into the pmi db. This feature
compresses the keys stored into the component into a small number of
PMI keys by serializing the data and base64 encoding the result. This
will avoid issues with Cray PMI which restricts us to ~ 3 PMI keys per
rank.
This commit was SVN r28993.
This commit adds an API for registering and querying performance
variables (mca_base_pvar) in the MCA base. The existing MCA variable
system API has been updated to reflect the new API: MCA variable
groups have performance variables, and new types have been added (double,
unsigned long long) to reflect what is required by the MPI_T
interface. Additionally, the MCA variable group code has been split
into its own set of files: mca_base_var_group.[ch].
Details of the new API can be found in doxygen comments in the header:
mca_base_pvar.h.
Other changes to the variable system:
- Use an opal_hash_table to speed up variable/group lookup.
- Clean up code associated with MCA variable types.
- Registered performance variables are printed by ompi_info -a. In the
future an option should be added to control this behavior.
Changes to OMPI:
- Added full support for the MPI_T performance variable interface.
This commit was SVN r28800.
Add an option to ompi_info (-l, --level) that takes a number in the
interval (1,9). Only MCA variables up to this level will be printed.
The default level is 1.
Print the level as part of both the parsable and readable output.
This commit was SVN r28750.
sandbox team has informed me that they are getting rid of SANDBOX_PID
in the future and that using SANDBOX_ON would be preferred.
This commit was SVN r28708.
To resolve this situation, add the ability to specify a backend topology file that mpirun shall use for its mapping operations. Create a new "set_topology" function in opal hwloc to support it.
This commit was SVN r28682.
some relevant updates/new functionality in the opal/mca/hwloc and
orte/mca/rmaps bases. This work was mainly developed by Mellanox,
with a bunch of advice from Ralph Castain, and some minor advice from
Brice Goglin and Jeff Squyres.
Even though this is mainly Mellanox's work, Jeff is committing only
for logistical reasons (he holds the hg+svn combo tree, and can
therefore commit it directly back to SVN).
-----
Implemented distance-based mapping algorithm as a new "mindist"
component in the rmaps framework. It allows mapping processes by NUMA
due to PCI locality information as reported by the BIOS - from the
closest to device to furthest.
To use this algorithm, specify:
{{{mpirun --map-by dist:<device_name>}}}
where <device_name> can be mlx5_0, ib0, etc.
There are two modes provided:
1. bynode: load-balancing across nodes
1. byslot: go through slots sequentially (i.e., the first nodes are
more loaded)
These options are regulated by the optional ''span'' modifier; the
command line parameter looks like:
{{{mpirun --map-by dist:<device_name>,span}}}
So, for example, if there are 2 nodes, each with 8 cores, and we'd
like to run 10 processes, the mindist algorithm will place 8 processes
to the first node and 2 to the second by default. But if you want to
place 5 processes to each node, you can add a span modifier in your
command line to do that.
If there are two NUMA nodes on the node, each with 4 cores, and we run
6 processes, the mindist algorithm will try to find the NUMA closest
to the specified device, and if successful, it will place 4 processes
on that NUMA but leaving the remaining two to the next NUMA node.
You can also specify the number of cpus per MPI process. This option
is handled so that we map as many processes to the closest NUMA as we
can (number of available processors at the NUMA divided by number of
cpus per rank) and then go on with the next closest NUMA.
The default binding option for this mapping is bind-to-numa. It works
if you don't specify any binding policy. But if you specified binding
level that was "lower" than NUMA (i.e hwthread, core, socket) it would
bind to whatever level you specify.
This commit was SVN r28552.
in generated executables on systems that support it. Use
--disable-wrapper-rpath to disable this behavior. See text in
README about --disable-wrapper-rpath for more details.
This commit was SVN r28479.
The following Trac tickets were found above:
Ticket 376 --> https://svn.open-mpi.org/trac/ompi/ticket/376
added by hwloc's embedding so that it doesn't appear in
libhwloc_embedded.la (and therefore propogate all the way up to
libmpi.la).
Committed upstream in hwloc SVN r5588.
This commit was SVN r28457.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r5588
everything out before using it.
This is not in response to any known bug, but rather just a
pre-emptive, defensive move to help prevent bugs in code that forgets
to initialize a field.
This commit was SVN r28343.