non-contiguous converter. We can't "convert on the fly" because the #
of bytes requested may not divide evenly into the convertor data type.
This commit was SVN r29014.
spawning a process using MPI_Comm_spawn. For this, the first operation has to
be collective which we can not guarantuee outside of the MPI_File_open
operation.
This commit was SVN r29008.
improvements:
* Fix minor memory leaks during component_init
* Ensure that an initialization loop does not underflow an unsigned int
* Improve mlock limit checking
* Fix set of BTL modules created during component_init when failing to
get QP resources or otherwise excluding some (but not all) usnic
verbs devices
* Fix/improve error messages to be consistent with other Cisco
documentation
* Randomize the initial sliding window sequence number so that we
silently drop incoming frames from previous jobs that still have
existant processes in the middle of dying (and are still
transmitting)
* Ensure we don't break out of add_procs too soon and create an
asymetrical view of what interfaces are available
This commit was SVN r28975.
mpi_leave_pinned_pipeline
This change should improve performance is the non-pinned case where
the same memory region is involved in multiple simultaneous transfers.
cmr=v1.7.3:reviewer=brbarret
This commit was SVN r28973.
Working on faster algorithms for tuned that will come at a later time.
cmr=v1.7.3:ticket=trac:2965
This commit was SVN r28952.
The following Trac tickets were found above:
Ticket 2965 --> https://svn.open-mpi.org/trac/ompi/ticket/2965
Use the new sysfs files to check that there are enough VFs, QPs, and
CQs for all the MPI processes on this server.
Move the checking code into its own subroutine to make it smaller and
easier to read/grok.
This commit was SVN r28937.
Add support for MPI_Count type and MPI_COUNT datatype and add the required
MPI-3 functions MPI_Get_elements_x, MPI_Status_set_elements_x,
MPI_Type_get_extent_x, MPI_Type_get_true_extent_x, and MPI_Type_size_x.
This commit adds only the C bindings. Fortran bindins will be added in
another commit. For now the MPI_Count type is define to have the same size
as MPI_Offset. The type is required to be at least as large as MPI_Offset
and MPI_Aint. The type was initially intended to be a ssize_t (if it was
the same size as a long long) but there were issues compiling romio with
that definition (despite the inclusion of stddef.h).
I updated the datatype engine to use size_t instead of uint32_t to support
large datatypes. This will require some review to make sure that 1) the
changes are beneficial, 2) nothing was broken by the change (I doubt
anything was), and 3) there are no performance regressions due to this
change.
Increase the maximum number of predifined datatypes to support MPI_Count
Put common get_elements code to ompi/datatype/ompi_datatype_get_elements.c
Update MPI_Get_count to reflect changes in MPI-3 (return MPI_UNDEFINED when the count is too large for an int)
This commit was SVN r28932.
Brian (rightfully) hit me on the head with the
don't-use-ORTE-use-the-rte-framework clue bat; the usnic BTL now
nicely plays with the RTE framework.
This commit was SVN r28907.
Note that the PMI RTE still doesn't listen for asynchronous errors, so
the error handler still won't ever actually do anything :).
This commit was SVN r28886.
The following SVN revision numbers were found above:
r28852 --> open-mpi/ompi@e4e678e234
This BTL accesses the Cisco usNIC Linux device via the Linux verbs
API via Unreliable Datagram queue pairs. A few noteworthy points:
* This BTL does most of its own fragmentation; it tells the PML that
it has a very high max_send_size (much higher than the network
MTU).
* Since UD fragments are, by definition, unreliable, the usnic BTL
handles all of its own reliability via a sliding window approach
using the opal_hotel construct and many tricks stolen from the
corpus of knowledge surrounding efficient TCP.
* There is a fun PML latency-metric based optimization for NUMA
awareness of short messages.
* Note that this is ''not'' a generic UD verbs BTL; it is specific to
the Cisco usNIC device.
This commit was SVN r28879.
George and I were talking about ORTE's error handling the other day in regards to the right way to deal with errors in the updated OOB. Specifically, it seemed a bad idea for a library such as ORTE to be aborting the job on its own prerogative. If we lose a connection or cannot send a message, then we really should just report it upwards and let the application and/or upper layers decide what to do about it.
The current code base only allows a single error callback to exist, which seemed unduly limiting. So, based on the conversation, I've modified the errmgr interface to provide a mechanism for registering any number of error handlers (this replaces the current "set_fault_callback" API). When an error occurs, these handlers will be called in order until one responds that the error has been "resolved" - i.e., no further action is required - by returning OMPI_SUCCESS. The default MPI layer error handler is specified to go "last" and calls mpi_abort, so the current "abort" behavior is preserved unless other error handlers are registered.
In the register_callback function, I provide an "order" param so you can specify "this callback must come first" or "this callback must come last". Seemed to me that we will probably have different code areas registering callbacks, and one might require it go first (the default "abort" will always require it go last). So you can append and prepend, or go first. Note that only one registration can declare itself "first" or "last", and since the default "abort" callback automatically takes "last", that one isn't available. :-)
The errhandler callback function passes an opal_pointer_array of structs, each of which contains the name of the proc involved (which can be yourself for internal errors) and the error code. This is a change from the current fault callback which returned an opal_pointer_array of just process names. Rationale is that you might need to see the cause of the error to decide what action to take. I realize that isn't a requirement for remote procs, but remember that we will use the SAME interface to report RTE errors internal to the proc itself. In those cases, you really do need to see the error code. It is legal to pass a NULL for the pointer array (e.g., when reporting an internal failure without error code), so handlers must be prepared for that possibility. If people find that too burdensome, we can remove it.
Should we ever decide to create a separate callback path for internal errors vs remote process failures, or if we decide to do something different based on experience, then we can adjust this API.
This commit was SVN r28852.
- extend the framework API
- remove the dummy component, not require anymore
- add four components to perform the actual job.
This commit was SVN r28828.
soon:
- add a new abstraction layer to be used internally for some operations
- add a new mca parameter to control lazy intialization of shared file
pointer structures
This commit was SVN r28826.
This commit adds an API for registering and querying performance
variables (mca_base_pvar) in the MCA base. The existing MCA variable
system API has been updated to reflect the new API: MCA variable
groups have performance variables, and new types have been added (double,
unsigned long long) to reflect what is required by the MPI_T
interface. Additionally, the MCA variable group code has been split
into its own set of files: mca_base_var_group.[ch].
Details of the new API can be found in doxygen comments in the header:
mca_base_pvar.h.
Other changes to the variable system:
- Use an opal_hash_table to speed up variable/group lookup.
- Clean up code associated with MCA variable types.
- Registered performance variables are printed by ompi_info -a. In the
future an option should be added to control this behavior.
Changes to OMPI:
- Added full support for the MPI_T performance variable interface.
This commit was SVN r28800.
identified by Takahiro Kawashima. The packed length was reported as a
max bound and not provided on the unpacking side, so the unpacking
buffer could become out of sync with the content stored after the
packed representation.
The fix force the packing operation itself before reporting the length,
so we always report now the real number of bytes in the packed
representation.
cmr:v1.7.3:reviewer=jsquyres
This commit was SVN r28790.
This commit improved the small message latency and bandwidth when using
the vader btl. These improvements should make performance competative
with other MPI implementations.
This commit was SVN r28760.
many builds. I am temporarily .ompi_ignore'ing this component until
it can be fixed by its owner.
* It calls AC_MSG_ERROR, which configure.m4 scripts are ''never''
supposed to do. If you don't want to build, then call $2.
* All static and --disable-dlopen builds are broken; they fall afoul
of whatever test configure.m4 is doing and therefore error out of
configure entirely (vs. simply disabling the hcoll component).
* There appear to be multiple shell scripting errors in the
configure.m4. Here's the output of "./configure --disable-dlopen":
{{{
--- MCA component coll:hcoll (m4 configuration macro)
checking for MCA component coll:hcoll compile mode... static
checking --with-hcoll value... simple ok (unspecified)
./configure: line 421: test: basic: integer expression expected
configure: error: Can not use coll/hcoll and coll/ml (static build)
simultaneously. You have two options:
1. Use static build & disable ml with:
--enable-mpi-no-build=coll-ml
2. Use dso build for ML & disable ml at runtime: -mca
coll self
./configure: line 310: return: basic: numeric argument required
./configure: line 320: exit: basic: numeric argument required
}}}
Finally, all of these configure.m4 errors aside, I don't understand
why there is a ''compile-time'' exclusion between the hcoll and ml
components. Why isn't this a ''run-time'' decision? Having what
seems to be an unnecessary compile-time exclusion goes against the
general Open MPI philosophy.
Note: Open MPI 1.7 is also broken in all the same ways. I suggest
that the RM's .ompi_ignore hcoll over there, too.
Mellanox: please fix.
This commit was SVN r28748.
for the SM and TCP BTLs, as well as the mca_btl_base_param_register()
function (which registers MCA params for all BTLs).
The guidelines in
https://svn.open-mpi.org/trac/ompi/wiki/MCAParamLevels were used to
pick these levels.
This commit was SVN r28746.
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.
The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.
This commit was SVN r28722.
This patch reshape the way we deal with topologies completely. Where
our topologies were mainly storage components (they were not capable
of creating the new communicator), the new version is built around a
[possibly] common representation (in mca/topo/topo.h), but the functions
to attach and retrieve the topological information are specific to each
component. As a result the ompi_create_cart and ompi_create_graph functions
become useless and have been removed.
In addition to adding the internal infrastructure to manage the topology
information, it updates the MPI interface, and the debuggers support and
provides all Fortran interfaces.
This commit was SVN r28687.
As per discussion in the June 2013 developer meeting these
flags will be used by the PML in the future to request
asynchronous progress on an operation. The naming was chosen
to reflect that a BTL supports this mode (MCA_BTL_FLAG_SIGNALED)
and that a descriptor should "signal" the remote side to wake
up and progress the message (MCA_BTL_DES_FLAG_SIGNAL).
Future commits will update OB1 to take advantage of this
feature when performing the RDMA get or RDMA rendezvous
protocols.
This commit was SVN r28612.
commit is the trunk version of what is needed for #3626.
Add the "ignore_device" field to the INI file. This allows us to
specifically list devices that should be ignored by the openib BTL
(such as the Intel Phi, at least as of May 2013 -- see #3626).
Also add the Intel Phi to the ini file, and set its ignore_device=1.
Finally, add the concept of counting intentionally ignored verbs
devices. Devices are ignored for one of two reasons:
* If the number of allowed ports on that device is 0 (i.e., if
if_include/if_exclude was set such that we're intentionally
ignoring this device).
* If the INI ignore_device field for this device is set to 1.
Once we have the count of devices that were intentionally ignored,
only show the "Hey, there's verbs devices that you're not using!"
show_help message if there are devices that were ''unintentionally''
ignored.
This commit was SVN r28589.
The following Trac tickets were found above:
Ticket 3626 --> https://svn.open-mpi.org/trac/ompi/ticket/3626