PSM2 enables support for GPU buffers and CUDA managed memory and it can
directly recognize GPU buffers, handle copies between HFIs and GPUs.
Therefore, it is not required for OMPI to handle GPU buffers for pt2pt cases.
In this patch, we allow the PSM2 MTL to specify when
it does not require CUDA convertor support. This allows us to skip CUDA
convertor init phases and lets PSM2 handle the memory transfers.
This translates to improvements in latency.
The patch enables blocking collectives and workloads with GPU contiguous,
GPU non-contiguous memory.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
they are supposed to be unsigned, casting them to a signed
value for all atomic operations is as errorprone as handling
them as signed entities.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Resolves#3705
* Components should link against the project level library to better
support `dlopen` with `RTLD_LOCAL`.
* Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
with the appropriate project level library:
```
MCA components in ompi/
$(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
$(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
$(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
$(top_builddir)/oshmem/liboshmem.la"
```
Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
- change the increment used to test various no. of aggregators
to avoid using only power of two numbers
- convert some paratemers in the cost function from integers to
to floats for providing smoother and more consistent results
- set the FVIEW_IS_SET flag on the file *only* if the user
has set anything else than the default file view.
Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>
adjust how the aggregator nodes are selected depending on whether processes
have been mapped by node or anything else.
Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>
add a new aggregator selection algorithm based on the performance
model described in:
Shweta Jha, Edgar Gabriel,
'Performance Models for Communication in Collective I/O Operations'
Proceedings of the 17th IEEE/ACM Symposium
on Cluster, Cloud and Grid Computing, Workshop on Theoretical
Approaches to Performance Evaluation, Modeling and Simulation, 2017.
Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>
This fix is related to issue #1877, and prevents the OMPI library from
messing the user level random values.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
simply based on some local state. This is the second
part of the patch proposed for open-mpi/ompi#1183.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
This fix is related to issue #1877, and prevents the OMPI library from
messing the user level random values.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
simply based on some local state. This is the second
part of the patch proposed for open-mpi/ompi#1183.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Update to support passing of HWLOC shmem topology to client procs
Update use of distance API per @bgoglin
Have the openib component lookup its object in the distance matrix
Bring usnic up-to-date
Restore binding for hwloc2
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
adjust the location on where the fcoll_base_file_select function is
colled to ensure that all fs level parameters are correctly set.
io/ompio: minor fixes to initialization of the stripe_size and an if statement in the
simple_grouping option.
Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>
adjust the fcoll selection table to achieve the following:
- two_phase should not advertise itself on lustre file systems
- two_phase should advertise itself on sequential file systems (stripe_size == 0 )
- priority for dynamic, static and individual is reduced. This will lead to
two_phase being selected in scenarios where two or more components indicate
willingness to run.
Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>