Modify the configure logic and the PMI components to accommodate Cray's approach. Refactor the PMI error reporting code so it resides in only one place. Cray actually decided -not- to define the PMI-2 error codes, so we have to use the PMI-1 codes instead. More fun.
This commit was SVN r25348.
Use hwloc to obtain the cpuset for each process during mpi_init, and share that info in the modex. As it arrives, use a new opal_hwloc_base utility function to parse the value against the local proc's cpuset and determine where they overlap. Cache the value in the pmap object as it may be referenced multiple times.
Thus, the return value from orte_ess.proc_get_locality is a 16-bit bitmask that describes the resources being shared with you. This bitmask can be tested using the macros in opal/mca/paffinity/paffinity.h
Locality is available for all procs, whether launched via mpirun or directly with an external launcher such as slurm or aprun.
This commit was SVN r25331.
zeroes);
if so, use it for bit-operations like opal_cube_dim and opal_hibit.
Implement two versions of power-of-two.
In case of opal_next_poweroftwo, this reduces the average execution
time from 83 cycles to 4 cycles (Intel Nehalem, icc, -O2, inlining,
measured rdtsc, with loop over 2^27 values).
Numbers for other functions are similar (but of course heavily depend
on the usage, e.g. opal_hibit() with a start of 4 does not save
much). The bsr instruction on AMD Opteron is also not as fast.
- Replace various places where the next power-of-two is computed.
Tested on Intel Nehalem Cluster with openib, compilers GNU-4.6.1 and
Intel-12.0.4 using mpi_testsuite -t "Collective" with 128 processes.
This commit was SVN r25270.
If a user specifically asks for rdmacm support in configure script and
librdmacm (usual and devel) libraries are not found, configure script
would abort.
If a user didn't specify anything, and rdmacm libraries are not found,
configure script will continue after issuing warning message:
"Please install librdmacm and librdmacm-devel or disable rdmacm support"
-- YK
This commit was SVN r25253.
* Only print returnable errors when verbose=1. Still print errors when
we're going to abort, since those obviously aren't returnable
This commit was SVN r25213.
* hdr_data now includes opcount and length for all messages, which is the match
bits for long and rndv messages
* Re-add probe implementation
This commit was SVN r25207.
- add 2 new device ids.
- default rq depth to 64, which proved good for large runs.
This commit should be added to cmr:v1.4:reviewer=jsquyres and
cmr:v1.5:reviewer=jsquyres
This commit was SVN r25145.
Global rdmacm_resolve_timeout defaults to 1000 (1000 ms), which is way
too small for even a 16 node x 12 core iwarp cluster in the presence
of drops. Bump up the default to 30000ms.
This commit fixes trac:2860 and should be added to cmr:v1.4:reviewer=jsquyres
and cmr:v1.5:reviewer=jsquyres
This commit was SVN r25144.
The following Trac tickets were found above:
Ticket 2860 --> https://svn.open-mpi.org/trac/ompi/ticket/2860
the command line, hwloc is just like any other external dependency
in OMPI: if we find it, we'll use it. If we don't find it, we'll
ignore it. See comments in opal/mca/hwloc/configure.m4 for an
explanation.
* Fix some copy-n-paste errors in opal/mca/hwloc/configure.m4
w.r.t. flags coming in from the winning component.
* Add another line in ompi_info's output about whether hwloc support
is included or not.
This commit was SVN r25134.
- slight change in the selection logic of the fs module, which makes
the ompio independent of the file system type (otherwise ompio
would also have required a configure script).
This commit was SVN r25118.
Don't juse include pre-processor macros between two strins ("s1" #if 0 ... "s2")...
Rather print out the epoch as 0 always...
This commit was SVN r25110.
- configure:
- patch Makefiles which define library targets that depend on other libraries to prevent the following Libtool warning:
"libtool: link: warning: `...//*.la' seems to be moved"
(Libtool getting confused by the "//" in the library paths,
so remove the trailing '/' from all *LIBDIR variables.)
- vtwrapper:
- added options '-vt:showme-<compile|link>' to the compiler wrapper to show the compiler/linker flags that would be supplied to the underlying compiler
This commit was SVN r25105.
To enable the epochs and the resilient orte code, use the configure flag:
--enable-resilient-orte
This will define both:
ORTE_ENABLE_EPOCH
ORTE_RESIL_ORTE
This commit was SVN r25093.
libs don't seem to propagate correctly under certain circumstances. This makes
hopefully the nightly tests pass.
also, remove the files that should not have been committed in the first place
:-)
This commit was SVN r25085.
- updated version number to 5.11.2
- fixed even more Coverity warnings
- vtunify:
- replaced std::vector relict by LargeVectorC (fixes segfault during gathering)
- vtwrapper:
- do also escape '\', ''', '(', and ')' in arguments
This commit was SVN r25071.
specify btl_tcp_if_include because btl_tcp_if_exclude is defaulted to
the loopback devices.
This commit does a few things:
* Introduce a new OPAL MCA base function:
mca_base_param_check_exclusive_string(). It checks to see that the
''user'' does not set two MCA parameters that are mutually
exclusive by checking the source of those MCS param values.
* Use the above function in many BTLs (and the OOB TCP) to ensure
that <foo>_if_include and <foo>_if_exclude are not both specified
''by the user''.
* Re-arrange many of these BTLs to move their MCA registration code
into a separate component_register() function (vs. the
component_open() function).
This code has been nominally reviewed and checked by Ralph, George,
Terry, and Shiqing.
This commit was SVN r25043.
The following SVN revision numbers were found above:
r24976 --> open-mpi/ompi@8f4ac54336
- always check the result of OTF_WStream_get*Buffer since it might be NULL in case OTF_File_open fails
Changes to VT:
- CUDA Tracing:
- fixed configure stack for filtered kernels
- fixed buffer size for CUPTI tracing
- replaced error message with warning to continue tracing, even if CUDA error occured (VTCUDAsynchronizeEvt)
- vtunify:
- enlarged minimum message size for transfering local definitions to rank 0
- use binary search for searching already created global definitions
- use binary search for searching already created global marker definitions
- use LargeVectorC instead of std::vector for pre-allocating elements
- vtwrapper:
- added options '-vt:CC' and '-vt:c++' which are synonyms for '-vt:cxx'
This commit was SVN r24997.
- Added dynamic SL support to xoob
- Fixed seg fault in finalization
- All the code has been moved to separate files: connect/btl_openib_connect_sl.{c,h}
- The new files compilation is conditionalized
This commit was SVN r24991.
comment out some unused parameter names. I didn't use
__opal_attribute_unused__ because comm_inln.h is (eventually) included
by <mpi.h>, and therefore we don't have all the OPAL config stuff
available. And it didn't seem worth it to add the optional
attribute_unused stuff to the top of mpi.h.
Thanks to Júlio Hoffimann for reporting the issue.
This commit was SVN r24989.
btl_tcp_if_include and btl_tcp_if_exclude are specified.
This commit was SVN r24976.
The following Trac tickets were found above:
Ticket 2838 --> https://svn.open-mpi.org/trac/ompi/ticket/2838
still unstable. Reverted errmgr modules back to the original errmgr (with the
updates since the resilient code was brought into the trunk).
This commit was SVN r24958.
- improved zlib compression
- otfprofile-mpi:
- fixed progress
Changes to VT:
- fixed C++ linker issue for manual instrumentation of multiple files
- fixed CUDA kernel launch configuration
- process and thread buffer size can be explicitly specified by the user via the environment variables VT_BUFFER_SIZE and VT_THREAD_BUFFER_SIZE
- fixed CUDA buffer management
- vtfilter:
- fixed progress
- vtwrapper:
- link CUPTI library, if available
- vtsetup:
- removed fixed path to *.dtd file in vtsetup-data.xml[.in] (fixes 'java.net.MalformedURLException')
This commit was SVN r24950.
''other'' direction, so to speak, compared to r24921).
This commit was SVN r24924.
The following SVN revision numbers were found above:
r24921 --> open-mpi/ompi@bd96d028de
they only add the context id to the tag selection of the underlying
messaging meachinsm.
We would like to enable an MTL to maintain its own context data
per-communicator. This way an MTL will be able to queue incoming eager
messages and rendezvous requests per-communicator basis.
The MTL will be allowed to override comm->c_pml_comm member,
since it's unused in pml_cm anyway.
This commit was SVN r24858.
was explicitly requested. If it was, but opensm-devel
package is not found, warn and abort.
Otherwise, doing the best effort: if opensm-devel found,
enable dynamic SL. If it's not found, disable dynamic
SL and build OMPI w/o it.
This commit was SVN r24852.
otfprofile-mpi:
- added progress display
- added verbose messages
- added functions to sychronize the error indicator to all worker ranks
(enforces that all ranks will be terminated by calling MPI_Abort if anyone fails)
- wrap def. comments after 80 characters
- use pdf[la]tex instead of latex/dvipdf to convert TeX output to PDF
- added configure checks for pdf[la]tex and PGFPLOTS v1.4
- fixed function invocation statistics generated from summarized information (--stat)
- fixed memory leak
Changes to VT:
MPI wrappers:
- fixed wrapper generation for MPI implementations which don't support the MPI-2 standard (e.g. MVAPICH, MPICH)
- corrected IN_PLACE denotation for MPI_Alltoall* and MPI_Scatter*
vtwrapper:
- corrected detection of IBM XL's OpenMP flag -qsmp=*:omp:*
vtunify:
- fixed faulty cleanup of temporary files which occurred if VT is configured without trace compression support
This commit was SVN r24851.
request completion callback
* Use the completion callback pointer to remove all need for opal_progress
calls in the one-sided layer
This commit was SVN r24848.
- Added enable/disable configuration parameter for dynamic SL
- All the dynamic SL code is conditionalized
- Removed libibmad dependency
- Using only one include - ib_types.h (part of opensm-devel package)
- Removed all the macro and data types definitions, using the
existing definitions from ib_types.h instead
- general cleaning here and there
The async mode is not implemented yet - stay tuned...
This commit was SVN r24830.
Everyone will be starting at MIN anyway (until we implement restart of course)
so there's no reason to set the epoch to INVALID and then immediately reset them
to MIN. This way there's less room to make mistakes later.
This commit was SVN r24829.
if-elseif statements to make it compile with OpenMPi again.
Fixes trac:2808
This commit was SVN r24768.
The following Trac tickets were found above:
Ticket 2808 --> https://svn.open-mpi.org/trac/ompi/ticket/2808
Rather than try to support a bunch of lightweight environments like I did
with the Portals3 code, always use the "modex" and hack the grpcomm for
the SHMEM implementation to return the right nid/pid for a remote
process by "magic".
This commit was SVN r24733.
is zero. The routine assumes that at least one process is available in the
group, which lead to a segfault when creating communicators with GROUP_EMPTY.
Fixes trac:2752
This commit was SVN r24595.
The following Trac tickets were found above:
Ticket 2752 --> https://svn.open-mpi.org/trac/ompi/ticket/2752
Instead of returning MPI_SUCCESS every time they are called regardless of the status of the call, they should return a value representative of the action. So similar to MPI_Wait/MPI_Test they will return MPI_SUCCESS if the action was successfull, or the value that matches status.MPI_ERROR for the operation if it is unsuccessful.
This was discussed on the [http://www.open-mpi.org/community/lists/devel/2011/03/9109.php ompi-devel list]
This commit was SVN r24551.
No need for any CMRs to 1.5... that was already done in CMR 2728.
This commit was SVN r24545.
The following SVN revision numbers were found above:
r22841 --> open-mpi/ompi@b400b84162
* rename ib_path_rec_service_level -> ib_path_record_service_level
* use mad.h and ib_types.h
* free all resources
* move ibv_post_recv to be just before ibv_post_send
* cleanup and beatify code
This commit was SVN r24507.
* If something goes wrong during ompi_mpi_init, don't erroneously
report that it is illegal to invoke MPI_INIT* before MPI_INIT
* Aggregate help messages when possible when something goes wring
during ompi_mpi_init
This commit was SVN r24492.
be applied. Correct the MPI validation process of the
MPI_Accumulate arguments.
Fix another potential problem not yet reported. If we convert the
MPI datatypes direclty into OPAL datatypes, we will restrict their
number to the locally different types. Which might not be identical
on the remote node, if we are in a heterogeneous environment. So,
for MPI One sided only deal with MPI level types, never simplify
them on OPAL types (at least on the args). The unfortunate
outcome is that we need to create the args for all datatypes.
This commit was SVN r24466.
Update the CMake script for checking mca subdirs.
Add windows support for __attribute__ packed structures.
Define usleep and posix_memalign with equivalent windows functions.
And a few minor fixes, type casts.
This commit was SVN r24429.
OMPI supports multiple different repository systems (SVN, hg, git).
But the VERSION file has listed "want_svn" and "svn_r" as fields, even
though the actual repo system and version may not be SVN.
So search/replace those fields (and derrivative values that come from
those fields) with "want_repo_rev" and "repo_rev", respectively.
This commit was SVN r24405.
- poll() can return POLLRDNORM even if not requested (Solaris bug)
- MIN macro not defined in btl_openib.c
and while we're at it, we clean up the MIN definition in ad_bgl_pset.h
- btl_openib_connect_rdmacm.c was calling rdma_destroy_id() twice
leading to undefined behavior (a hang on Solaris)
This commit was SVN r24356.
code from upper level into btl configure.m4. Changed
prefix from "OMPI" to "BTL" in preprocessor macro. Add
an mca param that shows it has been configured in.
This commit was SVN r24270.
either direct link to these basic predefined types, or a combination of them.
Anyway, the first items in the datatype list belong to OPAL, the second round
are MPI datatypes created by composing basic OPAL datatypes, and the last
batch are mapped datatype (direct correspondance between an OMPI datatype and
an OPAL one such as int -> int32_t).
Modify the op to fit this new scheme.
This commit was SVN r24247.
Apparently, gcc 4.4.x and 4.5.x complain about the ''possibility'' of
us calling free() on a non-heap variable. We know that this case can
never happen because the refcount will absolutely not go to zero
here. We think it may be gcc being a bit too aggressive on the
warnings.
However, since this happens with gcc 4.4.x and 4.5.x, and since gcc
4.5.x ship in RHEL6 and Fedora 14 (and others), someone '''will'''
complain about this in the future, so we might as well code around it
so that we don't have to keep explaining "despite the warning, it's
really ok."
The workaround is pretty simple: just OBJ_RELEASE the values from
ompi_mpi_comm_parent before it is re-assigned to the new
intercommunicator. Then the compiler's static code analysis can't
possibly tell that it's not a heap variable, and we're ok.
So yes, we are still calling OBJ_RELEASE on a non-heap variable. But
free() '''will never be called''' on it because of the refcount.
This commit was SVN r24214.
The following Trac tickets were found above:
Ticket 2669 --> https://svn.open-mpi.org/trac/ompi/ticket/2669
on some systems caused by the definition of malloc in
opal_config_bottom.h getting expanded in the system malloc.h when
OPAL_ENABLE_MEM_DEBUG is set to 1.
This commit was SVN r24210.
verbose statement that shows up when you --mca btl_base_verbose 100.
It clearly states that the openib BTL disqualifies itself when
MPI_THREAD_MULTIPLE is used.
This commit was SVN r24209.
have to have an identical ''value'' (it must be the same file handle,
but that means nothing about its actual value). But the datarep must
have an identical value. Additionaly, the etype does not need to be
identical, but the extent of all the etypes supplied must be
identical.
See MPI-2.2 p401-402 for further details.
This commit was SVN r24201.
{{{
base/paffinity_base_service.c: In function ‘opal_paffinity_base_cset2mapstr’:
base/paffinity_base_service.c:623: warning: unused variable ‘range_last’
base/paffinity_base_service.c:623: warning: unused variable ‘range_first’
base/paffinity_base_service.c:622: warning: unused variable ‘count’
base/paffinity_base_service.c:622: warning: unused variable ‘m’
}}}
{{{
connect/btl_openib_connect_oob.c: In function ‘init_ud_qp’:
connect/btl_openib_connect_oob.c:1111: warning: control reaches end of non-void function
connect/btl_openib_connect_oob.c: In function ‘init_device’:
connect/btl_openib_connect_oob.c:1235: warning: unused variable ‘i’
connect/btl_openib_connect_oob.c: In function ‘get_pathrecord_sl’:
connect/btl_openib_connect_oob.c:1323: warning: unused variable ‘i’
}}}
This commit was SVN r24196.
New mca parameter is added (ib_path_rec_service_level) - positive value means that we should get the SL from the SA.
This is usable for torus topologies where different SL value is used for different endpoints.
A cache is kept of ib queue pairs used to communicate with the SA for a particular device and port and path record SL values retrieved from that SA.
The interaction with the cache assumes that there are no recursive calls to these routines. This must be solved either by code flow, by using higher level locks, or by adding a locking mechanism to these routines along with some method for avoiding deadlock.
This code use a UD queue pair to talk to the SA, and not need to chmod /dev/infiniband/umad* for use by normal users.
The request to the SA is a SubnAdmGet(), not a SubnAdmGetTable().
In the future we might add a support of a SubnAdmGetTable(), but it will require implementing RMPP (Reliable Multi-Packet Transaction Protocol) and I'm not sure we want to do that.
This patched is based on the work of David McMillen <davem@systemfabricworks.com>.
This commit was SVN r24195.
The patch includes the following:
* Add new mca parameter - btl_openib_max_hw_msg_size - Maximum size (in bytes) of a single fragment of a long message when using the RDMA protocols (must be > 0 and <= hw capabilities).
* If btl_openib_max_hw_msg_size is larger than the maximum hw limitation print error message.
* Change the default openib flags to include only PUT and not GET.
* Print error message if user choose manually GET flag in openib btl.
* In prepare_dst: limit the message size to be the minimum of both endpoint's hw_limitation and the user limitation (if requested).
This commit was SVN r24191.
Somehow they got fixed in the pt2pt implementation, but not the RDMA
implementation. Thanks to Guillaume Thouvenin for finding this issue.
This commit was SVN r24188.
If specified, a comma-delimited list of TCP interfaces. Interfaces
will be assigned, one to each MPI process, in a round-robin fashion
on each server. For example, if the list is "eth0,eth1" and four
MPI processes are run on a single server, then local ranks 0 and 2
will use eth0 and local ranks 1 and 3 will use eth1.
This feature is only useful for environments with virtual ethernet
interfaces on the same network. For example, if eth0 and eth1 are
virtual interfaces to the same NIC on the same subnet, and if the NIC
provides different hardware resources to eth0 and eth1 (not just
different kernel resources), some HOL blocking and congestion issues
can be eased in a modest fashion.
This commit was SVN r24181.