Modify the mapper to better bookmark its stopping place each time, and to pick up the next time from there. This needs to be validated on a multi-node system.
Fix a major memory corruption problem in the registry put/get functions that was doing multiple free's. Not sure how valgrind missed this one, though it only occurred in specific circumstances (such as comm_spawn).
This commit was SVN r12179.
This change does a couple of things:
1. Since the USE_PARENT_ALLOC attribute is a directive about regarding allocation of resources to a job, it more properly should be an attribute of the RAS. Change the name to reflect that and move the attribute define to the ras_types.h file.
2. Add the attributes list to the RMAPS map_job interface. This provides us with the desired flexibility to dynamically specify directives for mapping. The system will - in the absence of any attribute-based directive - default to the values provided in the MCA parameters (either from environment or command-line interface).
This commit was SVN r12164.
size and diplacement of data-type. After this patch all data can contain size_t bytes
and the displacements are defined as ptrdiff_t. All of the files I was able to compile
have been modified to match this requirement.
This commit was SVN r12146.
The UD BTL isn't gone - the latest version is in my afriedle-ud branch. This version on the trunk was very old, ompi_ignore'd, lacked performance, and probably contained bugs. The maintained version on my branch is working solid, and will eventually come back, but not for v1.2.
This commit was SVN r12144.
Fix the problem observed by multiple people that comm_spawned children were (once again) being mapped onto the same nodes as their parents. This was caused by going through the RAS a second time, thus overwriting the mapper's bookkeeping that told RMAPS where it had left off.
To solve this - and to continue moving forward on the ORTE development - we introduce the concept of attributes to control the behavior of the RM frameworks. I defined the attributes and a list of attributes as new ORTE data types to make it easier for people to pass them around (since they are now fundamental to the system, and therefore we will be packing and unpacking them frequently). Thus, all the functions to manipulate attributes can be implemented and debugged in one place.
I used those capabilities in two places:
1. Added an attribute list to the rmgr.spawn interface.
2. Added an attribute list to the ras.allocate interface. At the moment, the only attribute I modified the various RAS components to recognize is the USE_PARENT_ALLOCATION one (as defined in rmgr_types.h).
So the RAS components now know how to reuse an allocation. I have debugged this under rsh, but it now needs to be tested on a wider set of platforms.
This commit was SVN r12138.
* Update comments in some MPI_FILE_* functions to reflect that the
MPI specs have different page numbers in the ps and pdf (woof!).
* Update comments to say "Retain" where we meant retain (not "return)
* Add a check in MPI_ERRHANDLER_FREE to raise an MPI exception if the
user attempts to free an intrinsic errhandler *and* the refcount is
1 (meaning that it would actually free the intrinsic). This
protects erroneous programs from segv'ing.
* Remove lengthy comment from comm_get_errhandler.c which is no
longer valid (because of the MPI-2 errata that says that users *do*
have to call MPI_ERRHANDLER_FREE).
This commit was SVN r12128.
The following SVN revision numbers were found above:
r12122 --> open-mpi/ompi@407b3cb788
The following Trac tickets were found above:
Ticket 502 --> https://svn.open-mpi.org/trac/ompi/ticket/502
MPI::SEEK_* because iostreams (well, ios_base, but I don't think that
should be included directly) can use SEEK_* as values in an enum, which
means that 'const int' is bad for them.
* Remove now useless comments in the cxx example programs
* include iostream after mpi.h so that our examples work with other MPI
implementations that don't try to be as friendly with the constants.
Refs trac:387
This commit was SVN r12125.
The following Trac tickets were found above:
Ticket 387 --> https://svn.open-mpi.org/trac/ompi/ticket/387
* Fix MPI-2 page number in comments for a specific reference in the
spec
* Allow getting/setting the errhandler on MPI_FILE_NULL
* Allow freeing of intrinsic errhandlers, per MPI-2 errata (if you GET
an errhandler on a communicator, you must be able to FREE it, even
if it's an intrinsic).
Thanks to Lisandro Dalcin for reporting these problems.
This commit was SVN r12122.
some issues with the C #defines SEEK_{SET, END, POS}. The workaround
involves some hackery that should work in almost every common use case
for the C stdio constants (and all the legal issues of the MPI constants).
The one issue is that the C stdio constants are now const ints instead
of #defines, which means that #ifdef checks will fail for the constants.
Behavior can be disabled at either configure time or build time.
Refs trac:387
This commit was SVN r12121.
The following Trac tickets were found above:
Ticket 387 --> https://svn.open-mpi.org/trac/ompi/ticket/387
where a window was in both the passive and active side of a lock sequence.
Refs trac:488
This commit was SVN r12112.
The following Trac tickets were found above:
Ticket 488 --> https://svn.open-mpi.org/trac/ompi/ticket/488
Just follow inc_num and you will understand. Now _resize will grow the list to match
the required number of elements as described in the comment in the .h file.
This commit was SVN r12074.
constrained:
* Make sure we always have a number of eager fragments available
that scales with the number of processes communicating with
a given proc over shared memory
* Use FREE_LIST_GET instead of FREE_LIST_WAIT to return an
error to the PML when resource exhaustion occurs
* Don't dereference the frag during alloc unless we're sure
it's not NULL
Reviewed by: Galen
Refs trac:413
This commit was SVN r12053.
The following Trac tickets were found above:
Ticket 413 --> https://svn.open-mpi.org/trac/ompi/ticket/413
then use broadcast in order to wake them up. If there is only one then use signal
(which is supposed to be faster) and of course if there are no threads
waiting then just continue.
This commit was SVN r12049.
Doing pointer math properly (e.g., incrementing by the right amount)
helps you not overflow buffers, cause random chaos, and contribute to
the heat death of the universe. Sigh.
This commit was SVN r12015.
The following Trac tickets were found above:
Ticket 236 --> https://svn.open-mpi.org/trac/ompi/ticket/236
recvbuf in MPI_GATHER).
* Minor style updates (constants on the left of == and !=)
* Fix a minor buglet that crept in r11904: had a recvbuf where it
should have been recvcount. Thankfully, this would have only
affected erroneous programs. ;-)
This commit was SVN r11980.
The following SVN revision numbers were found above:
r11904 --> open-mpi/ompi@17539dc154
The following Trac tickets were found above:
Ticket 338 --> https://svn.open-mpi.org/trac/ompi/ticket/338
compute the hetero flag a little bit different. The hetero flag is now
attached to a convertor if and only if we can use the optimized conversion
functions. It's a little bit broader than before (the 2 architectures
has to be identical).
This commit was SVN r11962.
First, move the OPAL_THREAD_LOCK out to the same level as its corresponding UNLOCK. It was possible to hit the UNLOCK without ever acquiring the lock.
Since the OPAL_THREAD_ADD64() is now protected by this lock, we can just do the decrement non-atomically.
This commit was SVN r11958.
Don't try to acquire ompi_request_lock here, which in all cases is already held. Avoids deadlock that occurs even when threads are enabled and we're running a THREAD_SINGLE app.
Reviewed by Galen.
This commit was SVN r11957.
The following Trac tickets were found above:
Ticket 183 --> https://svn.open-mpi.org/trac/ompi/ticket/183
conversion function are more complex and costly than a simple memcpy. Therefore,
we want to decrease as much as possible the usage of these functions.
We now check not only th HOMOGENEOUS flag on the datatype or convertor, but the
bits indicating a type is in use. If a communication transfert a type having the
same representation on both peers we can use the optimized version of the conversion.
In same time we build a more accurate conversion table for each master convertor,
based on the minimum differences between the 2 architectures.
This commit was SVN r11945.
remove requirements on .la files on wrapper scripts
Ticket: #374
extend compilers to support 32 bit and 64 bit in one version of the wrapper
Submitted by: Dan Lacher
Reviewed by: Rolf Vandevaart
This commit was SVN r11908.
with the use of MPI_IN_PLACE, and make some optimization checks more
correct. Thanks to Lisandro Dalcin for reporting the problems.
This commit was SVN r11904.
The following Trac tickets were found above:
Ticket 430 --> https://svn.open-mpi.org/trac/ompi/ticket/430
something went wrong. A positiv number > 0 is however a correct value (in
contrary to orte_rml.recv_buffer, which really returns ORTE_SUCCESS or an
error code).
Note: this part of the code is correct on 1.1 and 1.2 branch, no need to move
this change patch to the release branches.
This commit was SVN r11897.
install-exec-hook is not only wrong, it can cause ordering issues such
as trying to put sym links to man pages in directories that do not yet
exist.
This commit was SVN r11893.
I have added a new MCA param (hey, you can't have too many!) called OMPI_MCA_orte_timing. If set to anything other than zero, the system will report out critical timing loops. At the moment, this includes three measurements:
1. Time spent going through the RDS->RAS->RMAPS, setting up triggers, etc. prior to calling the actual PLS launch function. This is reported out as time to setup job.
2. Time spent in MPI_Init from start of that function (well, right after opal_init) to the place where we send all of our info the registry. Reported out as time from start to exec_compound_cmd
3. Time actually spent executing the compound cmd. Reported out as time to exec_compound_cmd.
A few additional timing points will be added shortly.
These may eventually be removed or (better) setup with a conditional compile flag.
This commit was SVN r11892.
I do something else" rule screws me up again. If we're in a FENCE, but
not in ACCESS | EXPOSE, put us in ACCESS|EXPOSE, as we are now known we
now in a real Fence epoch. Yay silly MPI standards
Refs trac:441
This commit was SVN r11865.
The following Trac tickets were found above:
Ticket 441 --> https://svn.open-mpi.org/trac/ompi/ticket/441
Make the Fortran MPI_MAX_DATAREP_STRING follow the same convention as
the rest of the Fortran constants -- be one less than the C constant
of the same name.
This commit was SVN r11842.
The following Trac tickets were found above:
Ticket 389 --> https://svn.open-mpi.org/trac/ompi/ticket/389
groups. And zero is also an acceptable value according to the MPI spec.
Fixes trac:428
This commit was SVN r11841.
The following Trac tickets were found above:
Ticket 428 --> https://svn.open-mpi.org/trac/ompi/ticket/428
Fixes simple off-by-one error in the error check for
MPI_INFO_GET_NTHKEY.
This commit was SVN r11838.
The following Trac tickets were found above:
Ticket 429 --> https://svn.open-mpi.org/trac/ompi/ticket/429
on the number of known local procs, with a high and low watermark. Right
now, we default to a low watermark point of 64MB, a per-proc scaling
factor of 32MB, and a high watermark point of 512MB
Refs trac:212
This commit was SVN r11824.
The following Trac tickets were found above:
Ticket 212 --> https://svn.open-mpi.org/trac/ompi/ticket/212
GIDs (there can be more than one) and not GIDs of the HCA on the network. Entry
zero always have to be initialized so we use it, and warn user if there is more
then one port active and default subnet is configured on at least one of them.
This commit was SVN r11815.
the component is configured successfully. Otherwise, we can end up
trying to run make in the romio directory without any Makefiles. This
really only happens on the targets that recurse into DIST_SUBDIRS - ie
dist, maintainer-clean, and distclean
refs trac:411
This commit was SVN r11807.
The following Trac tickets were found above:
Ticket 411 --> https://svn.open-mpi.org/trac/ompi/ticket/411
Makefile.am's is from a very old Automake bug which has long-since
been fixed. Since we require very recent versions of AM, we don't
need these anymore.
This commit was SVN r11774.
should die, according to the MPI standard. It's possible that the
ORTE layer may kill additional processes, but that's beyond our
control and seems to be allowed by the standard (ie, it might also
end up killing all the procs in all the jobs covered by the
communicator).
* update the stack trace printing code to use the framework rather
than calling execinfo directly, so that we should be able to get
stack traces on all the platforms we support stack tracing on
(if the user wants stack traces on abort, of course)
This commit was SVN r11753.
tell if the remote proc should be in an exposure epoch or not.
Refs trac:325
This commit was SVN r11746.
The following Trac tickets were found above:
Ticket 325 --> https://svn.open-mpi.org/trac/ompi/ticket/325
epoch's control data could overwrite the previous epoch's data because
we were reusing data structures between PW and SC. Instead, we now
have explicit post_msg and complete_msg counters for completion.
refs trac:354
* Only register the rdma osc callback once, as it turns out that some
btls (MX) do somethng more than update a table during the register
call, and each register call sucks up valuable fragments...
This commit was SVN r11745.
The following Trac tickets were found above:
Ticket 354 --> https://svn.open-mpi.org/trac/ompi/ticket/354
long ago) supposed to be used as a cache for accessing the PML procs. But in
all of the PMLs the PML proc contain only one field i.e. a pointer to the ompi_proc.
This pointer can be accessed using the c_remote_group easily. Therefore, there is no
meaning of keeping the PML procs around. Slim fast commit ...
This commit was SVN r11730.
if we want to be able to reuse the request. If not, the request will never be freed
even if the user call MPI_Request_free.
This commit was SVN r11717.
Add --enable-orterun-prefix-by-default (and a synonym:
--enable-mpirun-prefix-by-default) to make orterun always behave as if
"--prefix $prefix" was given on the command line (where $prefix is the
value given to the --prefix option to configure). This prevents many
rsh/ssh users from needing to modify their shell startup files to set
the LD_LIBRARY_PATH for Open MPI (they will still need to set PATH or
otherwise find the OMPI executables to mpicc/mpirun/etc. their MPI
applications).
Also added --noprefix option to orterun to disable this behavior.
Finally, note that even if --enable-orterun-prefix-by-default is
specified, if the user specifies --prefix or /path/to/mpirun, these
options will override the default value of the prefix ($prefix).
This commit was SVN r11669.
The following Trac tickets were found above:
Ticket 377 --> https://svn.open-mpi.org/trac/ompi/ticket/377
This provides support for the Infinipath interconnect using the PSM API.
Of note:
This version has a "hackaround" we always return 1 or greater from
the MTL PSM progress function, this should be examined further.
This commit was SVN r11655.
- Real fix for pgi compilers and missing asprintf declaration.
The problem was that C++ headers has been included before "ompi_config.h".
This lead to the problem that "stdio.h" has been source without _GNU_SOURCE.
This commit was SVN r11649.
George: ompi_ddt_type_size() returns a signed int only because of the
MPI spec; it will never return a negative value. So casting the
return value out of it to a (uint32_t) is safe, and makes the
comparisons be between two unsigned values.
This commit was SVN r11639.
The following SVN revision numbers were found above:
r11619 --> open-mpi/ompi@8667648a1b
todos: macroize it as we do it 10 different ways, add mca params to control handling (push up size, no change, switch off segmenting)
This commit was SVN r11619.
* Consolidate everything inside of the same AM_CONDITIONAL that is
used to suck in the glue convenience library in ompi/Makefile.am:
OMPI_WANT_F77_BINDINGS. This AM conditional is set to true if we
want (and can support) the F77 MPI API bindings at all (And does
not say anything about whether we are compiling the top-level or
bottom-level f77 directory to get the bindings).
* Clarify all the comments surrounding the [confusing!] issue.
* The problem with r11563 was that it used the wrong AM_CONDITIONAL
to decide whether to build the separate F77 library or not; it
would do so only if the top-level library was being built (e.g., on
systems like OSX where weak symbols don't work the way we need them
to). This patch somewhat simplifies the situation by encapsulating
everything in one large conditional (OMPI_WANT_F77_BINDINGS, as
described above). Hence, libmpi_f77 will exist (and be installed)
if F77 support is enabled overall, regardless of whether you're on
a system with insufficient weak symbol support (e.g., OSX) or not
(e.g., Linux).
This commit was SVN r11618.
The following SVN revision numbers were found above:
r11563 --> open-mpi/ompi@c8f3ff71b1
Had the wrong type for one of the arguments of MPI_TYPE_GET_CONTENTS
(MPI_Fint should have been MPI_Aint).
This commit was SVN r11517.
The following Trac tickets were found above:
Ticket 330 --> https://svn.open-mpi.org/trac/ompi/ticket/330
* Print a warning error message if a target is not in an exposure epoch
and an update is received. This results in the app continuing with
that call having never happened, rather than evil hangs.
refs trac:325
This commit was SVN r11514.
The following Trac tickets were found above:
Ticket 325 --> https://svn.open-mpi.org/trac/ompi/ticket/325
bunch of code changed indenting level and some code got moved out of
one function and made into its own subroutine.
- Gleb pointed out that I wasn't taking into account values from the
default section of the INI file (and not finding values in the INI
file is not an error).
- I incorrectly thought that 0x5ad was Mellanox's vendor ID. Turns
out that 0x5ad is Cisco's ID, while 0x2c9 is Mellanox.
Specifically, Cisco burns its own firmware into the HCA which
replaces the vendor ID, although the part ID stays the same. So
it's Mellanox hardware with Cisco firmware. And apparently several
of us do that. :-) So I expanded the concept of the vendor_id in
the INI file to allow for lists of vendor IDs.
- Along with that, I updated the default INI file to list all the IB
vendors (that I am aware of -- certainly open to putting more data
in there from other vendors) who overwrite Mellanox's vendor_id with
their own for the part numbers that we have on file.
This commit was SVN r11506.
to return the value on seconds not some other unit based on the resolution
of MPI_Wtick. Which I think it's the wrong solution, as instead of forcing
the user to do additional computations in order to convert when he needs
the result in seconds, force us to convert every time. Unfortunately,
converting requires a division with a double which is a costly
operation. But, MPI is a standard and we have to follow it ...
This commit was SVN r11481.
- everything statically built (dynamically opened).
- OPAL, ORTE and OMPI static libraries and all the components
as dynamic files(DLL).
- everything as dynamic files (DLL).
This commit was SVN r11461.
can be in both a Post and Start state. Also, the asserts were only
correct assuming that we were never in the post and start state at the
same time, which was obviously silly.
refs trac:303
This commit was SVN r11428.
The following Trac tickets were found above:
Ticket 303 --> https://svn.open-mpi.org/trac/ompi/ticket/303
strings. Here's one: no matter how much of the string you copy, the
destination string must be space-padded for the entire remaining area.
Specifically, even if you call MPI_INFO_GET and tell MPI to only copy
a max of N characters of the value into the result string, if the
Fortran string is M characters (where M > N), MPI must space-pad the
remaining (M-N) characters to be spaces. So you're supposed to obey
the argument to MPI_INFO_GET... sorta.
Precedents:
* http://www.ibiblio.org/pub/languages/fortran/ch2-13.html
* LAM/MPI
* Sun CT MPI
This commit was SVN r11412.
I know it does not make much sense but one can play around with the
performance. Numbers are available at http://www.unixer.de/research/nbcoll/perf/.
This is the first step towards collv2. Next step includes the addition
of non-blocking functions to the MPI-Layer and the collv1 interface.
It implements all MPI-1 collective algorithms in a non-blocking manner.
However, the collv1 interface does not allow non-blocking collectives so
that all collectives are used blocking by the ompi-glue layer.
I wanted to add LibNBC as a separate subdirectory, but I could not
convince the buildsystem (and had not the time). So the component looks
pretty messy. It would be great if somebody could explain me how to move
all nbc*{c,h}, and {hb,dict}*{c,h} to a seperate subdirectory.
It's .ompi_ignored because I did not test it exhaustively yet.
This commit was SVN r11401.
and MPI_WIN_DISP_UNIT were off by one from their C counterparts.
This fixes trac:304.
This commit was SVN r11385.
The following Trac tickets were found above:
Ticket 304 --> https://svn.open-mpi.org/trac/ompi/ticket/304
convert between fortran and C string representations properly. In
doing so, we properly adhere to the MPI spec stating that MPI_Info
keys and values must be whitespace-trimmed when coming in from
Fortran. Hence, this fixes bug #241.
This commit was SVN r11356.
of 4 when we are finding the next MPI_STATUS in the array.
Refs trac:236
This commit was SVN r11332.
The following Trac tickets were found above:
Ticket 236 --> https://svn.open-mpi.org/trac/ompi/ticket/236
(but didn't use it), but MPI_TYPE_GET_NAME and MPI_WIN_GET_NAME did
not.
This commit changes all three functions to pass the compile-added
string length parameter to clear out the remainder of the string with
spaces (i.e., the rest of the string that was not set with the name).
This is what was done in LAM/MPI, and apparently what was done in
Sun's MPI, because the test that Rolf attached now passes.
Fixes trac:274.
This commit was SVN r11301.
The following Trac tickets were found above:
Ticket 274 --> https://svn.open-mpi.org/trac/ompi/ticket/274