Gilles Gouaillardet
a66909b8b4
Merge pull request #3488 from ggouaillardet/topic/romio314_ad_nfs
...
romio314: ad_nfs fixes for large files from upstream mpich
2017-05-09 16:58:02 +09:00
Gilles Gouaillardet
26f44da429
coll/base: fix mca_coll_base_alltoallv_intra_basic_inplace()
...
correctly handle the case when a MPI task has no data to send/recv
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 15:19:14 +09:00
Gilles Gouaillardet
eaf050cfe1
romio314: adio/ad_nfs: fix buffer overflows in ADIOI_NFS_{Read,Write}Strided
...
Refs: models/mpich#2338
Refs: models/mpich#2617
Signed-off-by: Rob Latham <robl@mcs.anl.gov>
(back-ported from upstream commit pmodels/mpich@642db57648 )
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 11:11:12 +09:00
Gilles Gouaillardet
02af10ce6e
romio314: update NFS read/write routines for large xfers
...
When we updated UFS and others we left NFS alone. HDF group would like
a fix, so here we go.
Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>
(back-ported from upstream commit pmodels/mpich@684df9f4c9 )
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-09 11:07:47 +09:00
Jeff Squyres
7185567d50
Merge pull request #3455 from jsquyres/pr/fix-lustre-configure
...
Lustre configure fixes
2017-05-08 16:49:23 -04:00
Ralph Castain
2f11d371cd
Merge pull request #3448 from rhc54/topic/omp
...
Implement the changes required to support cross-library coordination.…
2017-05-08 11:08:36 -07:00
Ralph Castain
0afcb1a448
Update to support server self-notifications
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-08 10:04:50 -07:00
Ralph Castain
ef0e0171c9
Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c).
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-08 10:04:50 -07:00
Ralph Castain
42d31454a5
Merge pull request #3469 from rhc54/topic/nidmap
...
Do not pass topologies during tree spawn of daemons as there is no wa…
2017-05-08 06:22:50 -07:00
KAWASHIMA Takahiro
9841ad3035
Merge pull request #3472 from open-mpi/revert-3410-pr/group-remote-peers
...
Revert "group: Fix `ompi_group_have_remote_peers`"
2017-05-08 18:47:30 +09:00
KAWASHIMA Takahiro
913adce59b
Revert "group: Fix ompi_group_have_remote_peers
"
2017-05-08 18:42:18 +09:00
Gilles Gouaillardet
e101f2b3f9
orte/util: fix vpids parsing in orte_util_nidmap_parse()
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-08 16:46:13 +09:00
Ralph Castain
180809f2ef
Do not pass topologies during tree spawn of daemons as there is no way the HNP can know the backend topologies at that point. Any needed topologies will be sent along with the launch_apps command
...
Do not pass param file MCA params if the user has requested that no param files be read - required when trying to avoid launch time penalties from large numbers of processes reading default param files. The daemon picks them up and passes them along anyway, so it isn't clear what value we gain from having them all read the defaults
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-07 21:14:43 -07:00
Ralph Castain
ee4ce13e16
Merge pull request #3467 from rhc54/topic/slurm
...
Enable full operations under SLURM on Cray systems
2017-05-07 06:38:27 -07:00
Ralph Castain
a143800bce
Enable full operations under SLURM on Cray systems by co-locating a daemon with mpirun when mpirun is executing on a compute node in that environment. This allows local application procs to inherit their security credential from the daemon as it will have been launched via SLURM
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-06 19:08:50 -07:00
Jeff Squyres
88948f752f
Merge pull request #3454 from nmorey/devel/master-s390-support
...
master: opal: add support for s390 and s390x architectures
2017-05-06 06:49:41 -04:00
Artem Polyakov
858d8cdff7
Merge pull request #3375 from artpol84/comm_create/master
...
ompi/comm: Improve MPI_Comm_create algorithm
2017-05-05 20:41:16 -07:00
Ralph Castain
4dc27fe7fc
Merge pull request #3460 from rhc54/topic/pmix-static
...
Fix pmix configury so that libpmix is still emitted when --with-devel-headers is given, even under static builds
2017-05-05 12:11:17 -07:00
Ralph Castain
3bca715780
Fix pmix configury so that libpmix is still emitted when --with-devel-headers is given, even under static builds
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-05 11:15:32 -07:00
Ralph Castain
6d65e50da3
Merge pull request #3459 from rhc54/topic/oobdefaults
...
By default, use the system default snd/recv buffer sizes
2017-05-05 11:11:23 -07:00
Ralph Castain
3a434d75d6
By default, use the system default snd/recv buffer sizes
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-05 09:58:05 -07:00
Nicolas Morey-Chaisemartin
b4d9d5ee0f
opal: add support for s390 and s390x architectures
...
Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin@suse.com>
2017-05-05 17:23:42 +02:00
Jeff Squyres
c11975947b
ompi_check_lustre.m4: abort if Lustre requested and not found
...
Follow the OMPI bias: if a human requests feature X and configure
can't deliver feature X, abort and let the human figure it out.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-05-05 05:42:26 -07:00
Jeff Squyres
eb89712b3e
ompi_check_lustre.m4: trivial updates
...
Minor style updates; nothing of real consequence.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-05-05 05:39:58 -07:00
Jeff Squyres
8604273a7e
ompi_check_lustre.m4: ensure --with-lustre isn't harmful
...
Make sure the default Autoconf "yes" value for $with_lustre when the
user specifies --with-lustre on the command line (without a value)
does not propagate down into the directory logic.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-05-05 05:29:40 -07:00
Jeff Squyres
c81bc50198
fs/lustre: remove redundant/dead code
...
We check for liblustreapi.h in OMPI_CHECK_LUSTRE, so this code was
commented out here. Might as well fully delete it, since it's
redundant and dead.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-05-05 05:28:33 -07:00
Jeff Squyres
eb03679d7f
Merge pull request #3444 from jsquyres/pr/fix-pmix-static-devel-header-builds
...
pmix/configure.m4: always use embedded mode
2017-05-04 14:25:28 -04:00
Jeff Squyres
af336ac0e8
pmix/configure.m4: always use embedded mode
...
Looks like embedded mode was mistakenly disabled when
--with-devel-headers was specified.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-05-04 10:01:41 -07:00
Nathan Hjelm
4676575343
Merge pull request #3410 from kawashima-fj/pr/group-remote-peers
...
group: Fix `ompi_group_have_remote_peers`
2017-05-04 09:20:35 -06:00
Ralph Castain
a737d0f963
Merge pull request #3430 from bosilca/topic/tcp_hostname
...
Use the OPAL function to get the hostname.
2017-05-03 06:42:02 -07:00
Brian Barrett
3b991498be
btl tcp: Don't set socket buffer size by default
...
Set the default send and receive socket buffer size to 0,
which means Open MPI will not try to set a buffer size during
startup.
The default behavior since near day one of the TCP BTL has been
to set the send and receive socket buffer sizes to 128 KiB. A
number that works great on 1 GbE, but not so great on 10 GbE
fabrics of any real size. Modern TCP stacks, particularly on
Linux, have gotten much smarter about buffer sizes and are much
less efficient if a buffer size is set (even if set to something
large).
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2017-04-28 14:14:49 -07:00
George Bosilca
2d8943d920
Use the OPAL function to get the hostname.
2017-04-28 02:48:15 -04:00
Gilles Gouaillardet
c793dc8881
Merge pull request #3424 from ggouaillardet/topic/compress_hwloc_topo
...
compress the XML topology sent out-of-band
2017-04-28 11:05:58 +09:00
Nathan Hjelm
1707022f12
Merge pull request #3426 from hjelmn/ugni_fix
...
btl/ugni: remove erroneous mca_btl_ugni_frag_return call
2017-04-27 13:02:52 -06:00
Nathan Hjelm
387467c358
btl/ugni: remove erroneous mca_btl_ugni_frag_return call
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-04-27 09:14:51 -06:00
Gilles Gouaillardet
57b4144e57
orte: use compression for ORTE_DAEMON_REPORT_TOPOLOGY_CMD answer
...
Refs open-mpi/ompi#3414
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-27 17:21:59 +09:00
Gilles Gouaillardet
49cd40b2df
compress the topology sent by the first orted
...
Refs open-mpi/ompi#3414
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-27 16:20:11 +09:00
KAWASHIMA Takahiro
28281190eb
Merge pull request #3402 from kawashima-fj/pr/java
...
mpi/java: Add missing Java binding methods
2017-04-27 15:45:49 +09:00
Gilles Gouaillardet
c38ef3d46f
oob/tcp: fix short writev handling in send_msg()
...
Fixes open-mpi/ompi#3414
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-27 10:24:38 +09:00
Yossi
f56847542e
Merge pull request #3347 from alinask/topic/ucx-sync-send
...
PML UCX: handle a synchronous send.
2017-04-26 18:02:09 +03:00
Alina Sklarevich
49913c692a
PML UCX: unite the code for all the sending modes.
...
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-26 13:17:06 +03:00
KAWASHIMA Takahiro
f036bac4c2
group: Fix ompi_group_have_remote_peers
...
`ompi_group_t::grp_proc_pointers[i]` may have sentinel values even
for processes which reside in the local node because the array for
`MPI_COMM_WORLD` is set up before `ompi_proc_complete_init`, which
allocates `ompi_proc_t` objects for processes reside in the local
node, is called in `MPI_INIT`. So using `ompi_proc_is_sentinel`
against `ompi_group_t::grp_proc_pointers[i]` in order to determine
whether the process resides in a remote node is not appropriate.
This bug sometimes causes an `MPI_ERR_RMA_SHARED` error when
`MPI_WIN_ALLOCATE_SHARED` is called, where sm OSC uses
`ompi_group_have_remote_peers`.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-25 11:00:52 +09:00
Jeff Squyres
7ea05954bf
Merge pull request #3399 from jsquyres/pr/add-aint-add-diff
...
mpif-externals.h: add missing MPI_AINT_ADD/MPI_AINT_DIFF
2017-04-24 15:47:43 -04:00
KAWASHIMA Takahiro
3699ce1f75
mpi/java: Set the given error handler to Win
...
Probably setting `MPI_ERRORS_RETURN` is unintentional. Probably...
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-24 16:55:13 +09:00
KAWASHIMA Takahiro
8558185c85
mpi/java: Add missing Java binding methods
...
This commit add the following methods.
| Language-indep. notation | Java binding |
| ------------------------ | ----------------------- |
| MPI_WIN_GET_ERRHANDLER | mpi.Win.getErrhandler |
| MPI_FILE_SET_ERRHANDLER | mpi.File.setErrhandler |
| MPI_FILE_GET_ERRHANDLER | mpi.File.getErrhandler |
| MPI_COMM_CALL_ERRHANDLER | mpi.Comm.callErrhandler |
| MPI_FILE_CALL_ERRHANDLER | mpi.File.callErrhandler |
| MPI_FILE_IREAD_AT_ALL | mpi.File.iReadAtAll |
| MPI_FILE_IWRITE_AT_ALL | mpi.File.iWriteAtAll |
| MPI_FILE_IREAD_ALL | mpi.File.iReadAll |
| MPI_FILE_IWRITE_ALL | mpi.File.iWriteAll |
| MPI_FILE_GET_ATOMICITY | mpi.File.getAtomicity |
`MPI_FILE_I{READ,WRITE}(_AT)_ALL` routines are added in MPI-3.1.
I don't know why other methods were missing.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2017-04-24 16:55:03 +09:00
Ralph Castain
2d75962726
Merge pull request #3400 from rhc54/topic/defaults
...
Set the default modex parameters back to full blocking modex while w…
2017-04-22 17:20:56 -07:00
Ralph Castain
8b1f01dfe6
Set the default modex parameters back to full blocking modex while we continue to test and debug the slow modex - it seems to be having issues on the Cray
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-22 15:19:46 -07:00
Howard Pritchard
f2a27cc991
Merge pull request #3396 from hppritcha/topic/swat_compiler_warning
...
btl/sm: swat a compiler warning
2017-04-22 14:31:21 -06:00
Jeff Squyres
d32eff6ea2
mpif-externals.h: add missing MPI_AINT_ADD/MPI_AINT_DIFF
...
MPI_AINT_ADD and MPI_AINT_DIFF are functions and must be declared as
externals with the proper return type. This is already done properly
in the mpi and mpi_f08 modules; these declarations for these functions
were only missing from mpif.h (i.e., mpif-externals.h).
Thanks to Aboorva Devarajan (@AboorvaDevarajan) for the bug report.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-04-22 08:57:54 -07:00
Gilles Gouaillardet
ebe6125750
mpi/c: MPI_PROC_NULL is not a valid rank in MPI_Win_{lock,unlock}
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-22 11:13:13 +09:00