Force only procs that are participating in the ne Comm to decide what
CID is appropriate. This will have 2 advantages:
* Speedup Comm creation for small communicators: non-participating procs
will not interfere
* Reduce CID fragmentation: non-overlaping groups will be allowed to use
same CID.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
The usNIC BTL does not use more than 1 iov, so be sure to set it to 1
so that we don't allocate cq/rq/sq entries based on a default (i.e.,
>1) number of iovs per entry.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This PR renames the common library for OFI libfabric from
libfabric to ofi. There are a number of reasons this
is good to do:
1) its shorter and replaces 9 characters with three for
function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
the term "ofi" internally and also in their configure options
relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)
There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.
This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
This commit recategorizes several mpirun arguments,
and moves the information for mpirun --help arguments
to the bottom of the general help message. I also
added the OPAL_CMD_LINE_OTYPE field to two commands
that were missed initially because they were not
in the same area as the others.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
since Open MPI now requires a C99, and ptrdiff_t type is part of C99,
there is no more need for the abstract OPAL_PTRDIFF_TYPE type.
Thanks George, Nathan and Paul for the help.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
since Open MPI now requires a C99, and ptrdiff_t type is part of C99,
there is no more need for the abstract OPAL_PTRDIFF_TYPE type.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
* Complete rewrite of opal_pointer_array
Instead of a cache oblivious linear search use a bits array
to speed up the management of the free space. As a result we
slightly increase the memory used by the structure, but we get a
significant boost in performance.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Do not register datatypes in the f2c translation table.
The registration is now done up into the Fortran layer, by
forcing a call to MPI_Type_c2f.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
With this, libs (e.g., "-ldl") are not added to the wrapper LIBS
flags. This may work on some platforms, but on at least RHEL 7.3, it
does not (i.e., compiling MPI applications fails because it can't find
dlopen).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
* `ompi_info --show-failed` will include the failed components along
with information about why they failed.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
* Add a path for failed component load information to be reported up.
* This allows ompi_info to display this information inline to make it
easier for folks to see if the component is present but failed for
some reason. Most likely a missing library, but could be a libnl
conflict.
* Add MCA parameter to enable this feature:
- `mca_base_component_track_load_errors` takes a boolean
- Default: `false`
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
So that we can stop asking common questions like "What version of Open
MPI are you using?", etc.
[skip ci]
bot:notest
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The fact that application proc called Abort (read failed) doesn't
mean that ORTE subsystem has failed - vice versa it does it's work
to gracefuly exit the whole application.
orted exiting with non-zero status creates a problem for at least
plm/slurm environments where orteds are launched via `srun` with
"--kill-on-bad-exit" flag. If one of orteds has exited with non-
zero status slurm will immediately kill all other orteds. As the
result we see a lot of leftover in the `/tmp` directory.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>