Had the wrong type for one of the arguments of MPI_TYPE_GET_CONTENTS
(MPI_Fint should have been MPI_Aint).
This commit was SVN r11517.
The following Trac tickets were found above:
Ticket 330 --> https://svn.open-mpi.org/trac/ompi/ticket/330
* Print a warning error message if a target is not in an exposure epoch
and an update is received. This results in the app continuing with
that call having never happened, rather than evil hangs.
refs trac:325
This commit was SVN r11514.
The following Trac tickets were found above:
Ticket 325 --> https://svn.open-mpi.org/trac/ompi/ticket/325
bunch of code changed indenting level and some code got moved out of
one function and made into its own subroutine.
- Gleb pointed out that I wasn't taking into account values from the
default section of the INI file (and not finding values in the INI
file is not an error).
- I incorrectly thought that 0x5ad was Mellanox's vendor ID. Turns
out that 0x5ad is Cisco's ID, while 0x2c9 is Mellanox.
Specifically, Cisco burns its own firmware into the HCA which
replaces the vendor ID, although the part ID stays the same. So
it's Mellanox hardware with Cisco firmware. And apparently several
of us do that. :-) So I expanded the concept of the vendor_id in
the INI file to allow for lists of vendor IDs.
- Along with that, I updated the default INI file to list all the IB
vendors (that I am aware of -- certainly open to putting more data
in there from other vendors) who overwrite Mellanox's vendor_id with
their own for the part numbers that we have on file.
This commit was SVN r11506.
to return the value on seconds not some other unit based on the resolution
of MPI_Wtick. Which I think it's the wrong solution, as instead of forcing
the user to do additional computations in order to convert when he needs
the result in seconds, force us to convert every time. Unfortunately,
converting requires a division with a double which is a costly
operation. But, MPI is a standard and we have to follow it ...
This commit was SVN r11481.
- everything statically built (dynamically opened).
- OPAL, ORTE and OMPI static libraries and all the components
as dynamic files(DLL).
- everything as dynamic files (DLL).
This commit was SVN r11461.
can be in both a Post and Start state. Also, the asserts were only
correct assuming that we were never in the post and start state at the
same time, which was obviously silly.
refs trac:303
This commit was SVN r11428.
The following Trac tickets were found above:
Ticket 303 --> https://svn.open-mpi.org/trac/ompi/ticket/303
strings. Here's one: no matter how much of the string you copy, the
destination string must be space-padded for the entire remaining area.
Specifically, even if you call MPI_INFO_GET and tell MPI to only copy
a max of N characters of the value into the result string, if the
Fortran string is M characters (where M > N), MPI must space-pad the
remaining (M-N) characters to be spaces. So you're supposed to obey
the argument to MPI_INFO_GET... sorta.
Precedents:
* http://www.ibiblio.org/pub/languages/fortran/ch2-13.html
* LAM/MPI
* Sun CT MPI
This commit was SVN r11412.
I know it does not make much sense but one can play around with the
performance. Numbers are available at http://www.unixer.de/research/nbcoll/perf/.
This is the first step towards collv2. Next step includes the addition
of non-blocking functions to the MPI-Layer and the collv1 interface.
It implements all MPI-1 collective algorithms in a non-blocking manner.
However, the collv1 interface does not allow non-blocking collectives so
that all collectives are used blocking by the ompi-glue layer.
I wanted to add LibNBC as a separate subdirectory, but I could not
convince the buildsystem (and had not the time). So the component looks
pretty messy. It would be great if somebody could explain me how to move
all nbc*{c,h}, and {hb,dict}*{c,h} to a seperate subdirectory.
It's .ompi_ignored because I did not test it exhaustively yet.
This commit was SVN r11401.
and MPI_WIN_DISP_UNIT were off by one from their C counterparts.
This fixes trac:304.
This commit was SVN r11385.
The following Trac tickets were found above:
Ticket 304 --> https://svn.open-mpi.org/trac/ompi/ticket/304
convert between fortran and C string representations properly. In
doing so, we properly adhere to the MPI spec stating that MPI_Info
keys and values must be whitespace-trimmed when coming in from
Fortran. Hence, this fixes bug #241.
This commit was SVN r11356.
of 4 when we are finding the next MPI_STATUS in the array.
Refs trac:236
This commit was SVN r11332.
The following Trac tickets were found above:
Ticket 236 --> https://svn.open-mpi.org/trac/ompi/ticket/236
(but didn't use it), but MPI_TYPE_GET_NAME and MPI_WIN_GET_NAME did
not.
This commit changes all three functions to pass the compile-added
string length parameter to clear out the remainder of the string with
spaces (i.e., the rest of the string that was not set with the name).
This is what was done in LAM/MPI, and apparently what was done in
Sun's MPI, because the test that Rolf attached now passes.
Fixes trac:274.
This commit was SVN r11301.
The following Trac tickets were found above:
Ticket 274 --> https://svn.open-mpi.org/trac/ompi/ticket/274
that the compiler might need to inform the compiler that a .f90 extension
means "this is Fortran 90 code". Fortran compilers are so weird.
refs trac:284
This commit was SVN r11280.
The following Trac tickets were found above:
Ticket 284 --> https://svn.open-mpi.org/trac/ompi/ticket/284
sbrk and the use of mmap(). So rather than checking just for mallopt(),
we should also be checking for those defines when determining if we can
disable giving memory back to the OS or not.
This commit was SVN r11279.
allocated directly by the OS in the paging file (the HUGE file
that cannot be defragmented with any tools). Unlike UNIX, they
do not have physical existence as files.
This commit was SVN r11273.
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.
This commit was SVN r11270.
releases on Linux and OS X) don't handle const_cast<> of 2-dimensional
arrays properly. If we're using one of the compilers that isn't friendly
to such casts, fall back to a standard C-style cast.
refs: #271
This commit was SVN r11263.
In order to provide backwards compatability the framework versions are bumped
and the handler registeration function is at the end of the btl struct.
Testing done on sm, openib, and gm..
This commit was SVN r11256.
through the c_pml_procs, as it might be an intercommunicator and therefore
c_my_rank might not be a valid index.
Fixes trac:266.
This commit was SVN r11238.
The following Trac tickets were found above:
Ticket 266 --> https://svn.open-mpi.org/trac/ompi/ticket/266
so use the req_buff field for keeping track of the bsend buffer and the
req_addr field for the user buffer, the way the comments suggested we
were doing it
This commit was SVN r11233.
the upperlayer assynchronously although there are some issues with this.. such
as there are multiple consumers of the btl's.. who get's the
This commit was SVN r11232.
Other changes:
1. Remove the old xcpu components as they are not functional.
2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.
This will require an autogen/configure, I'm afraid.
This commit was SVN r11228.
Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced).
Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up).
I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t).
In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but...
Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems.
This commit was SVN r11204.
allocation. This is necessary to detect if the user requests a specific
mpool for the allocationi. Searching the key values for a specific mpool
name does not work for the case that the user provides an info object
without mpool specific information (see Ticket #254).
- In the case that the user provides a info object without requesting a
specific mpool we use malloc to allocate buffer instead of returning
NULL (fix for Ticket #254 )
This commit was SVN r11188.
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
1. comm_spawn processes by default will inherit the "--prefix" from their parent job. Thus, the "--prefix" provided on the command line will be propagated automatically to any children.
2. application programs can override the default by providing their own "ompi_prefix" in the MPI_Info parameter passed to comm_spawn
This commit was SVN r11143.
r10877:
add warm up connection option.. of course this only warms up the first
eager btl but this should be adequate for now..
r10881:
Consulted with Galen and did a few things:
- Fix the algorithm to actually make the connections that we want
- Rename the MCA param to mpi_preconnect_all
- Cleanup the code a bit:
- move the logic to a separate .c file
- check return codes properly
This commit was SVN r11114.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r10877
r10877
r10881
r10881
full argument checking (allowing that MPI_PROC_NULL is legal, of course).
Only after the argument checking do we shortcut. Fixes trac:237, which
was caused by moving the MPI_PROC_NULL test in MPI_Bsend_init,
but not allowing for MPI_PROC_NULL when checking rank.
This commit was SVN r11108.
The following SVN revision numbers were found above:
r10972 --> open-mpi/ompi@31c66d92aa
The following Trac tickets were found above:
Ticket 237 --> https://svn.open-mpi.org/trac/ompi/ticket/237
on almost all platforms (except OS X... sigh...). This is the merge
of r10846 - 10894 from the tmp/f90-shared branch to the trunk.
This commit was SVN r11103.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r10846
implemented entirely on top of the PML. This allows us to have a
one-sided interface even when we are using the CM PML and MTLs for
point-to-point transport (and therefore not using the BML/BTLs)
* Old pt2pt component was renamed "rdma", as it will soon be having
real RDMA support added to it.
Work was done in a temporary branch. Commit is the result of the
merge command:
svn merge -r10862:11099 https://svn.open-mpi.org/svn/ompi/tmp/bwb-osc-pt2pt
This commit was SVN r11100.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r10862
r11099
did pre-libevent update. The problem is that the behavior of
OPAL_EVLOOP_ONCE was changed by the OMPI team, which them broke things
during the update, so it had to be reverted to the old meaning of
loop until one event occurs. OPAL_EVLOOP_ONELOOP will go through the
event loop once (like EVLOOP_NONBLOCK) but will pause in the event
library for a bit (like EVLOOP_ONCE).
fixes trac:234
This commit was SVN r11081.
The following Trac tickets were found above:
Ticket 234 --> https://svn.open-mpi.org/trac/ompi/ticket/234
users mailing list:
http://www.open-mpi.org/community/lists/users/2006/07/1680.php
Warning: this log message is not for the weak. Read at your own
risk.
The problem was that we had several variables in Fortran common blocks
of various types, but their C counterparts were all of a type
equivalent to a fortran double complex. This didn't seem to matter
for the compilers that we tested, but we never tested static builds
(which is where this problem seems to occur, at least with the Intel
compiler: the linker compilains that the variable in the common block
in the user's .o file was of one size/alignment but the one in the C
library was a different size/alignment).
So this patch fixes the sizes/types of the Fortran common block
variables and their corresponding C instantiations to be of the same
sizes/types.
But wait, there's more.
We recently introduced a fix for the OSX linker where some C versions
of the fortran common block variables (e.g.,
_ompi_fortran_status_ignore) were not being found when linking
ompi_info (!). Further research shows that the code path for
ompi_info to require ompi_fortran_status_ignore is, unfortunately,
necessary (a quirk of how various components pull in different
portions of the code base -- nothing in ompi_info itself requires
fortran or MPI knowledge, of course).
Hence, the real problem was that there was no code path from ompi_info
to the portion of the code base where the C globals corresponding to
the Fortran common block variables were instantiated. This is because
the OSX linker does not automatically pull in .o files that only
contain unintialized global variables; the OSX linker typically only
pulls in a .o file from a library if it either has a function that is
used or have a global variable that is initialized (that's the short
version; lots of details and corner cases omitted). Hence, we changed
the global C variables corresponding to the fortran common blocks to
be initialized, thereby causing the OSX linker to pull them in
automatically -- problem solved. At the same time, we moved the
constants to another .c file with a function, just for good measure.
However, this didn't really solve the problem:
1. The function in the file with the C versions of the fortran common
block variables (ompi/mpi/f77/test_constants_f.c) did not have a
code path that was reachable from ompi_info, so the only reason
that the constants were found (on OSX) was because they were
initialized in the global scope (i.e., causing the OSX compiler to
pull in that .o file).
2. Initializing these variable in the global scope causes problems for
some linkers where -- once all the size/type problems mentioned
above were fixed -- the alignments of fortran common blocks and C
global variables do not match (even though the types of the Fortran
and C variables match -- wow!). Hence, initializing the C
variables would not necessarily match the alignment of what Fortran
expected, and the linker would issue a warning (i.e., the alignment
warnings referenced in the original post).
The solution is two-fold:
1. Move the Fortran variables from test_constants_f.c to
ompi/mpi/runtime/ompi_mpi_init.c where there are other global
constants that *are* initialized (that had nothing to do with
fortran, so the alignment issues described above are not a factor),
and therefore all linkers (including the OSX linker) will pull in
this .o file and find all the symbols that it needs.
2. Do not initialize the C variables corresponding to the Fortran
common blocks in the global scope. Indeed, never initialize them
at all (because we never need their *values* - we only check for
their *locations*). Since nothing is ever written to these
variables (particularly in the global scope), the linker does not
see any alignment differences during initialization, but does make
both the C and Fortran variables have the same addresses (this
method has been working in LAM/MPI for over a decade).
There were some comments here in the OMPI code base and in the LAM
code base that stated/implied that C variables corresponding to
Fortran common blocks had to have the same alignment as the Fortran
common blocks (i.e., 16). There were attempts in both code bases to
ensure that this was true. However, the attempts were wrong (in both
code bases), and I have now read enough Fortran compiler documentation
to convince myself that matching alignments is not required (indeed,
it's beyond our control). As long as C variables corresponding to
Fortran common blocks are not initialized in the global scope, the
linker will "figure it out" and adjust the alignment to whatever is
required (i.e., the greater of the alignments). Specifically (to
counter comments that no longer exist in the OMPI code base but still
exist in the LAM code base):
- there is no need to make attempts to specially align C variables
corresponding to Fortran common blocks
- the types and sizes of C variables corresponding to Fortran common
blocks should match, but do not need to be on any particular
alignment
Finally, as a side effect of this effort, I found a bunch of
inconsistencies with the intent of status/array_of_statuses
parameters. For all the functions that I modified they should be
"out" (not inout).
This commit was SVN r11057.
of send/receives outstanding.
Use ibv_cq_resize if available after initial creation of completion queue if
cq_size is too small (based on number of peers).
This commit was SVN r11053.
make it consistent with the indenting in the rest of the file
(otherwise it was quite difficult to understand -- saw this while I
was reviewing 11039).
This commit was SVN r11042.
object if provided.
The associated value is a comma-separated list of hosts -- which must be
in the initial allocation -- and is used to populate the application
context map.
This commit was SVN r11039.
libevent-1.1a.
svn merge -r10917:11006 https://svn.open-mpi.org/svn/ompi/tmp/libevent-update
This commit was SVN r11022.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r10917
r11006
Reviewed by: Jeff Squyres
Fix for ticket #220. Missing a few C++ methods.
MPI::Datatype::Create_indexed_block
MPI::Datatype::Create_resize
MPI::Datatype::Get_true_extent
This commit was SVN r11010.
functions MPI_Test, MPI_Testany, MPI_Wait, MPI_Waitany
should not reset the status.MPI_ERROR as passed by user.
- This needed implementing the MPI_Waitsome and MPI_Testsome.
This commit was SVN r10980.
- bsend_init: use *request after error-checking
- Always reset the status->cancelled
- cancel, wait: need to check *request for MPI_REQUEST_NULL, not
NULL...
(actually ompi_request_wait handles MPI_PROC_NULL, so no need
to check&set of status_empty in wait.c)
This commit was SVN r10972.
- ensure to initialize the values that we use for fortran constants
(even tough their *values* don't matter -- only their *addresses* do,
but initializing them or not has implications for the OSX linker)
- move the fortran constants to a file with functions in it, because
the OSX linker sometimes does not import global variables from
object files that do not have functions (I'm not even going to
pretend to get all the subtle details about the OSX linker right
here -- it's just "better" to have global variables in object files
with functions that otherwise get pulled in during linker
resolution).
This commit was SVN r10908.
SPAWN[_MULTIPLE] from a singleton (and displays a pretty help message
explaining that you need to use mpirun). This can be removed when
fixes for ORTE come over that allow SPAWN[_MULTIPLE] from singletons.
This commit was SVN r10898.
shared memory segments
* make sure to properly unlink the collectives sm bootstrap area at
shutdown
* Add missing / in the path for the mpool shared memory segment
* make sure to release the common_mmap structure in the SM btl
after unlinking the file during shutdown
This commit was SVN r10886.
MPI_WAIT, MPI_TEST, MPI_WAITANY, and MPI_TESTANY. It isn't really
clear what the standard wants as the return code for these functions,
and this is what Sun MPI, LAM/MPI, and MPICH2 all do.
Fixes trac:172
This commit was SVN r10872.
The following Trac tickets were found above:
Ticket 172 --> https://svn.open-mpi.org/trac/ompi/ticket/172
final datatype not on the shape of the added datatype. The gaps exist if the
extent of the final datatype is not equal to its size.
This commit was SVN r10867.
than $(LN_S). This causes problems with with Windows and probably
elsewhere (re: #200). So use a slightly different trick to get the
right header selected for the MEMCPY and TIMER components.
* Using the same trick used to solve the AC_CONFIG_LINKS problem,
stop using a separate header file for direct calling in the
PML and MTL. This lets me remove some icky code in ompi_mca.m4
that was more fragile than I really liked.
This commit was SVN r10841.
We now set truncation error if we received more than we delivered for both
the OB1 and DR PMLs (the CM PML doesn't need such a fix, as the condition
is set at the MTL level)
This commit was SVN r10812.
The following Trac tickets were found above:
Ticket 172 --> https://svn.open-mpi.org/trac/ompi/ticket/172
all but buffered and persistent requests. Unfortunately we were note able to
reuse the pml_base_request_t as it was just too heavy for our needs. Lots of
code for 2/10 usec ;-)
This commit was SVN r10810.
Keeping the cache misses as low as possible is always a good approach.
The opal_list_t is widely used, it should be a highly optimized class.
The same functionality can be reached with one one sentinel instead
of 2 currently used.
I don't have anything against the STL version, but so far nothing can
compare with the Knuth algorithm. I replace the current implementation
with a modified version of the Knuth algorithm (the one described in
The Art of Computer Programming). As expected, the latency went down.
This commit was SVN r10776.
There was some old code regarding the convertor which does not have to be there
(the problem was corrected a while ago). In the PML we already know how the progress
function is defined, so call the BML progress instead, which will save one function
call.
The macro MCA_PML_OB1_COMPUTE_SEGMENT_LENGTH is already defined in the pml_ob1.h
so it should not be in the endpoint.h.
Remove a double definition of the mca_pml_ob1_progress function in the pml_ob1.h.
This commit was SVN r10775.
I've introduced a race condition - seeing occasional LOCAL_LENGTH errors on the receive side. I think I'm mixing up eager/max somehow - will look at it more on monday.
This commit was SVN r10690.
convertor doesn't handle it properly
continue peeking until we don't get anything else..
close the endpoint before closing the library..
add a blocking send that uses mx_test ..
This commit was SVN r10684.
is the one provided by the user. For the buffered send the real datatype used
for the communication is always MPI_BYTE and the count can be retrieved from
the req_bytes_packed field. This will decrease the size of the request by
one pointer and one size_t (8 bytes or 16 bytes depending on the architecture).
This commit was SVN r10680.
bsend_request_init, but not both. Otherwise, you don't free
some buffer space and end up leaking buffers and ending in
badness
* since you only call alloc() or init(), but not both, need to
restore reference counting in init()
This commit was SVN r10674.