needlessly registered in multiple different places, and none of them
had a good help string. There was also an inconsistent check for
setting both mpi_leave_pinned and mpi_leave_pinned_pipeline (i.e., it
was only in ob1). This commit moves the registration of these params
to one central place (ompi/runtime/ompi_mpi_params.c, with all other
mpi_* MCA params) and uses globals to propagate the values as
relevant. The error check was also moved to the central location to
ensure that we can consistency everywhere.
This commit was SVN r13226.
- Fix some fpritnf's in ompi_mpi_abort() that incorrectly assumed that
we were always being invoked from MPI_ABORT (ompi_mpi_abort() may be
invoked from a bunch of different places)
- Also try to opal_backtrace_print() if opal_bactrace_buffer() is not
supported.
- Print a message in MPI_ABORT if we're aborting.
This commit was SVN r12998.
* Add line about heterogeneous support to ompi_info output
* Print warning and abort if heterogeneous detected and
no heterogeneous support available.
Refs trac:587
This commit was SVN r12943.
The following Trac tickets were found above:
Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
CM PML. The req_mtl structure is cast into a mtl_*_request_structure for
each MTL, which is larger than the req_mtl itself. The cast will cause
the *_request to overwrite parts of the heavy requests if the req_mtl
isn't the *LAST* thing on each structure (hence the comment). This was
moved as an optimization at some point, which caused buffer sends to
fail...
Refs trac:669
This commit was SVN r12871.
The following Trac tickets were found above:
Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669
- we have to be able to attach a string to an error class, not just to an
error code
- according to MPI-2 the attribute MPI_LASTUSEDCODE has to be updated
everytime you add a new code or a new class. Thus, you have to have single
list for both.
Thus, we got rid of the error_class structure. In the error-code structure, we
can distinguish whether we are dealing with an error code or an error class by
looking at the err->code element of the structure. In case its value is
MPI_UNDEFINED, the according entry is a class, else it is an error code. All
predefined error codes have the code and the class field set to the same
value.
The test MPI_Add_error_class1 passes now.
Fixes trac:418
This commit was SVN r12764.
The following Trac tickets were found above:
Ticket 418 --> https://svn.open-mpi.org/trac/ompi/ticket/418
the same time, remove some of the MPI-related options from OPAL:
- provide mechanism to change at runtime whether sched_yield() should
be called when the progress engine is idle
- provide mechanism for changing the rate at which the event engine
is called when there are "no" users of the event engine (ie, when
using MPI but not TCP)
- fix some function names in the progress engine to better match
their intended use (and remove MPI naming scheme)
- remove progress_mpi_enable / progress_mpi_disable because
we can now use the functions to set the sched_yield and
tick rate interfaces
- rename opal_progress_events() to opal_progress_set_event_flag()
because the first really isn't descriptive of what the function
does and I always got confused by it
This commit was SVN r12645.
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
1. Fix the "hang" condition when an application isn't found. It turned out that the ODLS had some difficulty with the process actually not having been started - hence, it never called the waitpid callback. As a result, the "terminated" trigger didn't fire, and so mpirun didn't wake up. With this change, the HNP's errmgr forces the issue by causing the trigger to fire itself when an abort condition occurs.
2. Shift the recording of the pid and the nodename from mpi_init to the orted launcher. This allows programs such as Eclipse PTP to get the pids even for non-MPI applications. In the case of bproc, the pls handles this chore since we don't use orteds in that system.
This commit was SVN r12558.
1. Added reporting points around the xcasts in MPI_Init. Note that these times will include time spent waiting for a trigger to fire, which is why the times between stage gates did NOT include these times initially. The inter-stage-gate times still do NOT include the xcast time - the xcast time is reported separately.
2. Added the process vpid on the MPI_Init timing reports for clarity.
3. Added a report from the xcast function on the HNP that outputs the number of bytes in the message being sent to the processes.
This commit was SVN r12422.
If you want to look at our launch and MPI process startup times, you can do so with two MCA params:
OMPI_MCA_orte_timing: set it to anything non-zero and you will get the launch time for different steps in the job launch procedure. The degree of detail depends on the launch environment. rsh will provide you with the average, min, and max launch time for the daemons. SLURM block launches the daemon, so you only get the time to launch the daemons and the total time to launch the job. Ditto for bproc. TM looks more like rsh. Only those four environments are currently supported - anyone interested in extending this capability to other environs is welcome to do so. In all cases, you also get the time to setup the job for launch.
OMPI_MCA_ompi_timing: set it to anything non-zero and you will get the time for mpi_init to reach the compound registry command, the time to execute that command, the time to go from our stage1 barrier to the stage2 barrier, and the time to go from the stage2 barrier to the end of mpi_init. This will be output for each process, so you'll have to compile any statistics on your own. Note: if someone develops a nice parser to do so, it would be really appreciated if you could/would share!
This commit was SVN r12302.
I have added a new MCA param (hey, you can't have too many!) called OMPI_MCA_orte_timing. If set to anything other than zero, the system will report out critical timing loops. At the moment, this includes three measurements:
1. Time spent going through the RDS->RAS->RMAPS, setting up triggers, etc. prior to calling the actual PLS launch function. This is reported out as time to setup job.
2. Time spent in MPI_Init from start of that function (well, right after opal_init) to the place where we send all of our info the registry. Reported out as time from start to exec_compound_cmd
3. Time actually spent executing the compound cmd. Reported out as time to exec_compound_cmd.
A few additional timing points will be added shortly.
These may eventually be removed or (better) setup with a conditional compile flag.
This commit was SVN r11892.
should die, according to the MPI standard. It's possible that the
ORTE layer may kill additional processes, but that's beyond our
control and seems to be allowed by the standard (ie, it might also
end up killing all the procs in all the jobs covered by the
communicator).
* update the stack trace printing code to use the framework rather
than calling execinfo directly, so that we should be able to get
stack traces on all the platforms we support stack tracing on
(if the user wants stack traces on abort, of course)
This commit was SVN r11753.
Other changes:
1. Remove the old xcpu components as they are not functional.
2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.
This will require an autogen/configure, I'm afraid.
This commit was SVN r11228.
r10877:
add warm up connection option.. of course this only warms up the first
eager btl but this should be adequate for now..
r10881:
Consulted with Galen and did a few things:
- Fix the algorithm to actually make the connections that we want
- Rename the MCA param to mpi_preconnect_all
- Cleanup the code a bit:
- move the logic to a separate .c file
- check return codes properly
This commit was SVN r11114.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r10877
r10877
r10881
r10881
did pre-libevent update. The problem is that the behavior of
OPAL_EVLOOP_ONCE was changed by the OMPI team, which them broke things
during the update, so it had to be reverted to the old meaning of
loop until one event occurs. OPAL_EVLOOP_ONELOOP will go through the
event loop once (like EVLOOP_NONBLOCK) but will pause in the event
library for a bit (like EVLOOP_ONCE).
fixes trac:234
This commit was SVN r11081.
The following Trac tickets were found above:
Ticket 234 --> https://svn.open-mpi.org/trac/ompi/ticket/234
users mailing list:
http://www.open-mpi.org/community/lists/users/2006/07/1680.php
Warning: this log message is not for the weak. Read at your own
risk.
The problem was that we had several variables in Fortran common blocks
of various types, but their C counterparts were all of a type
equivalent to a fortran double complex. This didn't seem to matter
for the compilers that we tested, but we never tested static builds
(which is where this problem seems to occur, at least with the Intel
compiler: the linker compilains that the variable in the common block
in the user's .o file was of one size/alignment but the one in the C
library was a different size/alignment).
So this patch fixes the sizes/types of the Fortran common block
variables and their corresponding C instantiations to be of the same
sizes/types.
But wait, there's more.
We recently introduced a fix for the OSX linker where some C versions
of the fortran common block variables (e.g.,
_ompi_fortran_status_ignore) were not being found when linking
ompi_info (!). Further research shows that the code path for
ompi_info to require ompi_fortran_status_ignore is, unfortunately,
necessary (a quirk of how various components pull in different
portions of the code base -- nothing in ompi_info itself requires
fortran or MPI knowledge, of course).
Hence, the real problem was that there was no code path from ompi_info
to the portion of the code base where the C globals corresponding to
the Fortran common block variables were instantiated. This is because
the OSX linker does not automatically pull in .o files that only
contain unintialized global variables; the OSX linker typically only
pulls in a .o file from a library if it either has a function that is
used or have a global variable that is initialized (that's the short
version; lots of details and corner cases omitted). Hence, we changed
the global C variables corresponding to the fortran common blocks to
be initialized, thereby causing the OSX linker to pull them in
automatically -- problem solved. At the same time, we moved the
constants to another .c file with a function, just for good measure.
However, this didn't really solve the problem:
1. The function in the file with the C versions of the fortran common
block variables (ompi/mpi/f77/test_constants_f.c) did not have a
code path that was reachable from ompi_info, so the only reason
that the constants were found (on OSX) was because they were
initialized in the global scope (i.e., causing the OSX compiler to
pull in that .o file).
2. Initializing these variable in the global scope causes problems for
some linkers where -- once all the size/type problems mentioned
above were fixed -- the alignments of fortran common blocks and C
global variables do not match (even though the types of the Fortran
and C variables match -- wow!). Hence, initializing the C
variables would not necessarily match the alignment of what Fortran
expected, and the linker would issue a warning (i.e., the alignment
warnings referenced in the original post).
The solution is two-fold:
1. Move the Fortran variables from test_constants_f.c to
ompi/mpi/runtime/ompi_mpi_init.c where there are other global
constants that *are* initialized (that had nothing to do with
fortran, so the alignment issues described above are not a factor),
and therefore all linkers (including the OSX linker) will pull in
this .o file and find all the symbols that it needs.
2. Do not initialize the C variables corresponding to the Fortran
common blocks in the global scope. Indeed, never initialize them
at all (because we never need their *values* - we only check for
their *locations*). Since nothing is ever written to these
variables (particularly in the global scope), the linker does not
see any alignment differences during initialization, but does make
both the C and Fortran variables have the same addresses (this
method has been working in LAM/MPI for over a decade).
There were some comments here in the OMPI code base and in the LAM
code base that stated/implied that C variables corresponding to
Fortran common blocks had to have the same alignment as the Fortran
common blocks (i.e., 16). There were attempts in both code bases to
ensure that this was true. However, the attempts were wrong (in both
code bases), and I have now read enough Fortran compiler documentation
to convince myself that matching alignments is not required (indeed,
it's beyond our control). As long as C variables corresponding to
Fortran common blocks are not initialized in the global scope, the
linker will "figure it out" and adjust the alignment to whatever is
required (i.e., the greater of the alignments). Specifically (to
counter comments that no longer exist in the OMPI code base but still
exist in the LAM code base):
- there is no need to make attempts to specially align C variables
corresponding to Fortran common blocks
- the types and sizes of C variables corresponding to Fortran common
blocks should match, but do not need to be on any particular
alignment
Finally, as a side effect of this effort, I found a bunch of
inconsistencies with the intent of status/array_of_statuses
parameters. For all the functions that I modified they should be
"out" (not inout).
This commit was SVN r11057.
(which is currently the default, although we may argue over this later
:-) ), a new field in the ompi_proc_t named proc_hostname will have
the string hostname of that peer. If 0, this field will be NULL.
This allows for printing nicer error messages in environments where
peer hostnames are not otherwise easily obtainable, such as the mvapi
BTL (requested by Sandia, who has both a *huge* number of nodes and
6GB of RAM per node, so they don't care about the extra memory usage
;-) ).
This commit was SVN r9902.
- My original patch stands: MPI_FINALIZE directly invokes the
attribute callbacks on MPI_COMM_SELF
- We added some user-level checks to ensure that they don't call
MPI_FINALIZE twice (this isn't really required, but it will prevent
whacky segv's -- they'll at least get a nice error message)
- Removed the attribute callbacks on MPI_COMM_SELF from
ompi_mpi_comm_finalize (i.e., we just moved them from
ompi_mpi_comm_finalize to ompi_mpi_finalize -- we just moved this
process up earlier in the MPI_FINALIZE sequence of events)
- Because there were so many conversations about this, here's the
rationale:
- MPI-2:4.8 says that we have to MPI_COMM_FREE MPI_COMM_SELF so that
the attribute callbacks are invoked.
- After considerable discussion, we came to the conclusion that
FREE'ing COMM_SELF is not the issue -- calling the callbacks is
the issue.
- So it is sufficent for MPI_FINALIZE to directly invoke these
attribute callbacks
- The attribute callbacks are *not* invoked on other communicators
because said communicators are not MPI_COMM_FREE'ed
This commit was SVN r9628.
MPI_ABORT. From the ompi_info output:
MCA mpi: parameter "mpi_abort_delay" (current value: "0")
If nonzero, print out an identifying message when
MPI_ABORT is invoked (hostname, PID of the process
that called MPI_ABORT) and delay for that many seconds
before exiting (a negative delay value means to never
abort). This allows attaching of a debugger before
quitting the job.
MCA mpi: parameter "mpi_abort_print_stack" (current value: "0")
If nonzero, print out a stack trace when MPI_ABORT is
invoked
This commit was SVN r9487.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
complete, but stable enough that it will have no impact on general development,
so into the trunk it goes. Changes in this commit include:
- Remove the --with option for disabling MPI-2 onesided support. It
complicated code, and has no real reason for existing
- add a framework osc (OneSided Communication) for encapsulating
all the MPI-2 onesided functionality
- Modify the MPI interface functions for the MPI-2 onesided chapter
to properly call the underlying framework and do the required
error checking
- Created an osc component pt2pt, which is layered over the BML/BTL
for communication (although it also uses the PML for long message
transfers). Currently, all support functions, all communication
functions (Put, Get, Accumulate), and the Fence synchronization
function are implemented. The PWSC active synchronization
functions and Lock/Unlock passive synchronization functions are
still not implemented
This commit was SVN r8836.
Thanks to Anthony Chan for pointing this out.
Note that these will only work for the Fortran compiler that Open MPI
was configured with -- since these values, are, by definition,
single-value, they cannot support all 4 values that Open MPI may
generate for the different Fortran name-mangling schemes. A lengthy
comment in ompi_mpi_init.c explains this in more detail. Added to the
README to explain this situation, as well as the forthcoming
.TRUE. Fortran fixes.
This commit was SVN r8231.
- Separate out the registration of the MCA params into a standalong
function that is invoked by ompi_mpi_init() (so that ompi_info can
see these params)
- Rename the params to "mpi_ddt_*" instead of "datatype_*" so that
they fit into the common naming scheme
This commit was SVN r8196.
quering some of the collective components. Up to now, it just worked
somehow, but now with correct reference counting for ops in place, it
refused :-)
This commit was SVN r7866.
originally suggested by Ralf Wildenhues, to try to speed autogen, configure,
and make (and possibly even make install). Use automake's include directive
to drastically reduce the number of Makefile files (although the number of
Makefile.am files is the same - most are just included in a top-level
Makefile.am). Also use an Automake SUBDIRs feature to eliminate the
dynamic-mca tree, which was no longer really needed. This makes adding
a framework easier (since you don't have to remember the dynamic-mca
tree) and makes building faster (as make doesn't have to recurse through
the dynamic-mca tree)
This commit was SVN r7777.
- If we have an NS error, don't return an error -- this function's
purpose is to abort :-)
- s/abort()/exit(1)/ so that we don't drop massive corefiles
This commit was SVN r7524.
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
number to be set at autoconf time (instead of at configure time, as
it was before). Set the version number, minus the subversion r number,
at autoconf time. Override the internal variables to include the r
number (if needed) at configure time. Basically, the right thing
should always happen. The only place it might not is the version
reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
in the directory containing source files, even if the Makefile.am is
in another directory. This should start making it feasible to
reduce the number of Makefile.am files we have in the tree, which
will greatly reduce the time to run autogen and configure.
This commit was SVN r7211.
Here's the huge registry check-in you've all been waiting for with baited breath. The revised version sends a single message to all processes at the various stage gates, thus making the startup much more scalable. I could provide you with all the tawdry details, but won't for now - you are welcome to ask, though, and I'll merrily bore your ears to tears.
In addition, the commit contains the following:
1. set the ignore properties on ompi/debuggers and orte/mca/pls/poe
2. Added simplified subscribe and put functions to the registry's API. I have also converted all of the ompi functions that registered subscriptions to the new API, and caught their associated put's as well.
In a follow-on commit, I'll be adding support for George's hetero arch registry subscription (wanted to get this one in first).
This commit was SVN r7118.
tree.
- fix up #include's throughout the tree (yay contrib/search_replace.pl!)
- remove a few extraneous #include's
- remove orte_sys_info*() from opal_init()/opal_finalize() (it's
already in orte_init_stage1() and orte_system_finalize())
- remove dependencies in opal on orte_system_info -- util/os_path.c
and util/os_create_dirpath.c (they only used path_sep, anyway --
easily changed to #defines)
This commit was SVN r7059.
- Change orte_base_infrastructre to orte_infrastructre to conform with
ompi_info's needs
- Move MCA Param registration in ORTE to a centralized function that is
called first in orte_init_stage1
- Set the infrastructre flag as an argument to orte_init
- Adjust initalization functions to properly pass down the infrastructre
flag.
This commit was SVN r7053.
API is still a bit unstable and may change.
- Add a primitive "first use" component that simply has each process
"touch" the pages that they want to use, thereby [hopefully] locking
them locally to a specific processor
- Add hooks in ompi_mpi_init to enable memory affinity when processor
affinity is used.
- Added hooks in ompi_mpi_finalize to shut down memory affinity when
it was initialized during ompi_mpi_init.
- Added right hooks in ompi_info to display maffinity components.
This commit was SVN r7044.
to opal_progress() to use the timers instead of a tick count for deciding
whether to call the event loop or not. Currently supported platforms are:
- solaris (x86 / sparc)
- Linux (x86 / x86_64 / IA64)
- Mac OS X (x86 / Power PC)
This commit was SVN r6922.
that this ORTE job is the only one on the nodes involved, and if
told what processors to assign the processes to, will bind MPI
processes to specific processors.
- Convert #include's to new style
- Convert some <tab>'s to spaces
This commit was SVN r6904.
Change all the places where they are used to fit the new name.
Remove the code to check the remote arch from the PML. We will have a GPR mechanism
in ompi_mpi_initialize to do that.
This commit was SVN r6750.
- new preferred API calls for registering MCA parameters are
mca_base_param_reg_{int|string} and
mca_base_param_reg_{int|string}_name.
- See opal/mca/base/mca_base_param.h for docs on new calls.
- Can now register and lookup a value at the same time.
- Can now mark a parameter "read only" at registration time
- Can now mark a parameter "internal" at registration time
- Can now associate a help message with the parameter at registration
time; displayed in the ompi_info output.
The old API calls are still available for backwards compatibility
(mca_base_param_register_{int|string}. They will eventually be
removed -- all developers are encouraged to use the new APIs from here
on out and replace any old calls with the new API.
Some params were also renamed -- the previous convention of using
"base_" as a prefix for any param that was not associated with a
component is henceforth deprecated. Instead, use one of the following
prefixes:
mca: for anything in the MCA base itself
opal: for anything in OPAL
orte: for anything in ORTE
mpi: for anything in OMPI
This commit was SVN r6698.
This required a little fiddling with a number of areas. Biggest problem was that it uncovered a potential for an infinite loop to be created in the registry. If a callback function modified the registry, the registry checked the triggers to see if anything had fired. Well, if the original callback was due to a trigger firing, that condition hadn't changed - so the trigger fired again....which caused the callback to be called, which modified the registry, which checked the triggers, etc. etc.
Triggers are now checked and then "flagged" as being "in process" so that the registry will NOT recheck that trigger until all callbacks have been processed. Tried doing this with subscriptions as well, but that caused a problem - when we release processes from a stagegate, they (at the moment) immediately place data on the registry that should cause a subscription to fire. Unfortunately, the system will just hang if that subscription doesn't get processed. So, I have left the subscription system alone - any callback function that modifies the registry in a fashion that will fire a subscription will indeed fire that subscription. We'll have to see if this causes problems - it shouldn't, but a careless user could lock things up if the callback generates a callback to itself.
Also fixed the code that placed a process' RML contact info on the registry to eliminate the leading '/' from the string.
This commit was SVN r6684.
is a lot more difficult than a PTL, and it can adapt it's behavior to the level of threading required
by the user. In this case the behavior is the priorit of the PML. Therefore this information is never
availale before the init function (of the PML) is called. So I try to keep nearly the same structure
as it was before, with one change. When a PML get initialized it does not necessarily means it has been
selected, so it does not means it has to create all it's internal structures (and select the PTL and
all this stuff). They can all be done later, when a PML knows that it definitively get selected
(when the enable function is called with the argument set to true). Thus, in the case of a PML close
one have to check if the PML has been selected or not before trying to clean up the internals.
I had to change the MPI_Init function to allow the PML to be enabled before we start adding procs inside.
This commit was SVN r6434.
* mpi_show_mca_params
If set to true, this turns on the dumping of all MCA parameters when MPI_INIT is called.
Only the 'rank 0' processes will print the parameters.
* mpi_show_mca_params_file
(This value is only used if the first argument is set to true) If this value is non-NULL
it specifies the file to put the dump into. This file can then be used as input to mpirun
for debugging purposes. If this value is not set (and mpi_show_mca_params is set) then
the parameters are dumped to stdout.
This commit was SVN r6401.