openmpi

Автор	SHA1	Сообщение	Дата
KAWASHIMA Takahiro	913adce59b	Revert "group: Fix `ompi_group_have_remote_peers`"	2017-05-08 18:42:18 +09:00
Gilles Gouaillardet	e101f2b3f9	orte/util: fix vpids parsing in orte_util_nidmap_parse() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-05-08 16:46:13 +09:00
Ralph Castain	ee4ce13e16	Merge pull request #3467 from rhc54/topic/slurm Enable full operations under SLURM on Cray systems	2017-05-07 06:38:27 -07:00
Ralph Castain	a143800bce	Enable full operations under SLURM on Cray systems by co-locating a daemon with mpirun when mpirun is executing on a compute node in that environment. This allows local application procs to inherit their security credential from the daemon as it will have been launched via SLURM Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-06 19:08:50 -07:00
Jeff Squyres	88948f752f	Merge pull request #3454 from nmorey/devel/master-s390-support master: opal: add support for s390 and s390x architectures	2017-05-06 06:49:41 -04:00
Artem Polyakov	858d8cdff7	Merge pull request #3375 from artpol84/comm_create/master ompi/comm: Improve MPI_Comm_create algorithm	2017-05-05 20:41:16 -07:00
Ralph Castain	4dc27fe7fc	Merge pull request #3460 from rhc54/topic/pmix-static Fix pmix configury so that libpmix is still emitted when --with-devel-headers is given, even under static builds	2017-05-05 12:11:17 -07:00
Ralph Castain	3bca715780	Fix pmix configury so that libpmix is still emitted when --with-devel-headers is given, even under static builds Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-05 11:15:32 -07:00
Ralph Castain	6d65e50da3	Merge pull request #3459 from rhc54/topic/oobdefaults By default, use the system default snd/recv buffer sizes	2017-05-05 11:11:23 -07:00
Ralph Castain	3a434d75d6	By default, use the system default snd/recv buffer sizes Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-05 09:58:05 -07:00
Nicolas Morey-Chaisemartin	b4d9d5ee0f	opal: add support for s390 and s390x architectures Signed-off-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin@suse.com>	2017-05-05 17:23:42 +02:00
Jeff Squyres	eb03679d7f	Merge pull request #3444 from jsquyres/pr/fix-pmix-static-devel-header-builds pmix/configure.m4: always use embedded mode	2017-05-04 14:25:28 -04:00
Jeff Squyres	af336ac0e8	pmix/configure.m4: always use embedded mode Looks like embedded mode was mistakenly disabled when --with-devel-headers was specified. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-05-04 10:01:41 -07:00
Nathan Hjelm	4676575343	Merge pull request #3410 from kawashima-fj/pr/group-remote-peers group: Fix `ompi_group_have_remote_peers`	2017-05-04 09:20:35 -06:00
Ralph Castain	a737d0f963	Merge pull request #3430 from bosilca/topic/tcp_hostname Use the OPAL function to get the hostname.	2017-05-03 06:42:02 -07:00
Brian Barrett	3b991498be	btl tcp: Don't set socket buffer size by default Set the default send and receive socket buffer size to 0, which means Open MPI will not try to set a buffer size during startup. The default behavior since near day one of the TCP BTL has been to set the send and receive socket buffer sizes to 128 KiB. A number that works great on 1 GbE, but not so great on 10 GbE fabrics of any real size. Modern TCP stacks, particularly on Linux, have gotten much smarter about buffer sizes and are much less efficient if a buffer size is set (even if set to something large). Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2017-04-28 14:14:49 -07:00
George Bosilca	2d8943d920	Use the OPAL function to get the hostname.	2017-04-28 02:48:15 -04:00
Gilles Gouaillardet	c793dc8881	Merge pull request #3424 from ggouaillardet/topic/compress_hwloc_topo compress the XML topology sent out-of-band	2017-04-28 11:05:58 +09:00
Nathan Hjelm	1707022f12	Merge pull request #3426 from hjelmn/ugni_fix btl/ugni: remove erroneous mca_btl_ugni_frag_return call	2017-04-27 13:02:52 -06:00
Nathan Hjelm	387467c358	btl/ugni: remove erroneous mca_btl_ugni_frag_return call Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-04-27 09:14:51 -06:00
Gilles Gouaillardet	57b4144e57	orte: use compression for ORTE_DAEMON_REPORT_TOPOLOGY_CMD answer Refs open-mpi/ompi#3414 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-27 17:21:59 +09:00
Gilles Gouaillardet	49cd40b2df	compress the topology sent by the first orted Refs open-mpi/ompi#3414 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-27 16:20:11 +09:00
KAWASHIMA Takahiro	28281190eb	Merge pull request #3402 from kawashima-fj/pr/java mpi/java: Add missing Java binding methods	2017-04-27 15:45:49 +09:00
Gilles Gouaillardet	c38ef3d46f	oob/tcp: fix short writev handling in send_msg() Fixes open-mpi/ompi#3414 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-27 10:24:38 +09:00
Yossi	f56847542e	Merge pull request #3347 from alinask/topic/ucx-sync-send PML UCX: handle a synchronous send.	2017-04-26 18:02:09 +03:00
Alina Sklarevich	49913c692a	PML UCX: unite the code for all the sending modes. Signed-off-by: Alina Sklarevich <alinas@mellanox.com>	2017-04-26 13:17:06 +03:00
KAWASHIMA Takahiro	f036bac4c2	group: Fix `ompi_group_have_remote_peers` `ompi_group_t::grp_proc_pointers[i]` may have sentinel values even for processes which reside in the local node because the array for `MPI_COMM_WORLD` is set up before `ompi_proc_complete_init`, which allocates `ompi_proc_t` objects for processes reside in the local node, is called in `MPI_INIT`. So using `ompi_proc_is_sentinel` against `ompi_group_t::grp_proc_pointers[i]` in order to determine whether the process resides in a remote node is not appropriate. This bug sometimes causes an `MPI_ERR_RMA_SHARED` error when `MPI_WIN_ALLOCATE_SHARED` is called, where sm OSC uses `ompi_group_have_remote_peers`. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-04-25 11:00:52 +09:00
Jeff Squyres	7ea05954bf	Merge pull request #3399 from jsquyres/pr/add-aint-add-diff mpif-externals.h: add missing MPI_AINT_ADD/MPI_AINT_DIFF	2017-04-24 15:47:43 -04:00
KAWASHIMA Takahiro	3699ce1f75	mpi/java: Set the given error handler to `Win` Probably setting `MPI_ERRORS_RETURN` is unintentional. Probably... Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-04-24 16:55:13 +09:00
KAWASHIMA Takahiro	8558185c85	mpi/java: Add missing Java binding methods This commit add the following methods. \| Language-indep. notation \| Java binding \| \| ------------------------ \| ----------------------- \| \| MPI_WIN_GET_ERRHANDLER \| mpi.Win.getErrhandler \| \| MPI_FILE_SET_ERRHANDLER \| mpi.File.setErrhandler \| \| MPI_FILE_GET_ERRHANDLER \| mpi.File.getErrhandler \| \| MPI_COMM_CALL_ERRHANDLER \| mpi.Comm.callErrhandler \| \| MPI_FILE_CALL_ERRHANDLER \| mpi.File.callErrhandler \| \| MPI_FILE_IREAD_AT_ALL \| mpi.File.iReadAtAll \| \| MPI_FILE_IWRITE_AT_ALL \| mpi.File.iWriteAtAll \| \| MPI_FILE_IREAD_ALL \| mpi.File.iReadAll \| \| MPI_FILE_IWRITE_ALL \| mpi.File.iWriteAll \| \| MPI_FILE_GET_ATOMICITY \| mpi.File.getAtomicity \| `MPI_FILE_I{READ,WRITE}(_AT)_ALL` routines are added in MPI-3.1. I don't know why other methods were missing. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-04-24 16:55:03 +09:00
Ralph Castain	2d75962726	Merge pull request #3400 from rhc54/topic/defaults Set the default modex parameters back to full blocking modex while w…	2017-04-22 17:20:56 -07:00
Ralph Castain	8b1f01dfe6	Set the default modex parameters back to full blocking modex while we continue to test and debug the slow modex - it seems to be having issues on the Cray Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-22 15:19:46 -07:00
Howard Pritchard	f2a27cc991	Merge pull request #3396 from hppritcha/topic/swat_compiler_warning btl/sm: swat a compiler warning	2017-04-22 14:31:21 -06:00
Jeff Squyres	d32eff6ea2	mpif-externals.h: add missing MPI_AINT_ADD/MPI_AINT_DIFF MPI_AINT_ADD and MPI_AINT_DIFF are functions and must be declared as externals with the proper return type. This is already done properly in the mpi and mpi_f08 modules; these declarations for these functions were only missing from mpif.h (i.e., mpif-externals.h). Thanks to Aboorva Devarajan (@AboorvaDevarajan) for the bug report. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-04-22 08:57:54 -07:00
Gilles Gouaillardet	ebe6125750	mpi/c: MPI_PROC_NULL is not a valid rank in MPI_Win_{lock,unlock} Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-22 11:13:13 +09:00
Ralph Castain	f2ed293ecd	Merge pull request #3398 from rhc54/topic/modex Implement a background fence that collects all data during modex operation	2017-04-21 15:15:49 -07:00
Jeff Squyres	dec707f018	Merge pull request #3397 from jsquyres/pr/usnic-moar-iov-limit-fixes usnic: more iov_limit fixes	2017-04-21 16:19:45 -04:00
Ralph Castain	9fc3079ac2	Implement a background fence that collects all data during modex operation The direct modex operation is slow, especially at scale for even modestly-connected applications. Likewise, blocking in MPI_Init while we wait for a full modex to complete takes too long. However, as George pointed out, there is a middle ground here. We could kickoff the modex operation in the background, and then trap any modex_recv's until the modex completes and the data is delivered. For most non-benchmark apps, this may prove to be the best of the available options as they are likely to perform other (non-communicating) setup operations after MPI_Init, and so there is a reasonable chance that the modex will actually be done before the first modex_recv gets called. Once we get instant-on-enabled hardware, this won't be necessary. Clearly, zero time will always out-perform the time spent doing a modex. However, this provides a decent compromise in the interim. This PR changes the default settings of a few relevant params to make "background modex" the default behavior: * pmix_base_async_modex -> defaults to true * pmix_base_collect_data -> continues to default to true (no change) * async_mpi_init - defaults to true. Note that the prior code attempted to base the default setting of this value on the setting of pmix_base_async_modex. Unfortunately, the pmix value isn't set prior to setting async_mpi_init, and so that attempt failed to accomplish anything. The logic in MPI_Init is: * if async_modex AND collect_data are set, AND we have a non-blocking fence available, then we execute the background modex operation * if async_modex is set, but collect_data is false, then we simply skip the modex entirely - no fence is performed * if async_modex is not set, then we block until the fence completes (regardless of collecting data or not) * if we do NOT have a non-blocking fence (e.g., we are not using PMIx), then we always perform the full blocking modex operation. * if we do perform the background modex, and the user requested the barrier be performed at the end of MPI_Init, then we check to see if the modex has completed when we reach that point. If it has, then we execute the barrier. However, if the modex has NOT completed, then we block until the modex does complete and skip the extra barrier. So we never perform two barriers in that case. HTH Ralph Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-21 10:29:23 -07:00
Jeff Squyres	1d5e08f44a	usnic: more iov_limit fixes Follow on to 7bd2de9960419422a4591f4b5d286f1f911a0a47: move setting the iov_limit to 1 earlier in the startup sequence. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-04-21 09:14:28 -07:00
Howard Pritchard	782f1bb9af	btl/sm: swat a compiler warning gnu 6.3.1 complaining about uninitialized variable Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-04-21 10:02:56 -05:00
Jeff Squyres	e9e89e502b	Merge pull request #3245 from hjelmn/auto_bool mca/base: accept y and n for bool and auto bool enumerator	2017-04-21 10:41:10 -04:00
Howard Pritchard	462342d148	Merge pull request #3311 from hppritcha/topic/libfabric_moves_to_ofi common/libfabric: move libfabric to ofi	2017-04-21 07:50:38 -06:00
Jeff Squyres	68956ea100	Merge pull request #3386 from jsquyres/pr/usnic-set-iov-limit usnic: ensure to set the iov_limit to 1	2017-04-20 21:51:19 -04:00
Artem Polyakov	68167ec879	ompi/comm: Improve MPI_Comm_create algorithm Force only procs that are participating in the ne Comm to decide what CID is appropriate. This will have 2 advantages: * Speedup Comm creation for small communicators: non-participating procs will not interfere * Reduce CID fragmentation: non-overlaping groups will be allowed to use same CID. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-04-21 08:33:29 +07:00
Jeff Squyres	7bd2de9960	usnic: ensure to set the iov_limit to 1 The usNIC BTL does not use more than 1 iov, so be sure to set it to 1 so that we don't allocate cq/rq/sq entries based on a default (i.e., >1) number of iovs per entry. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-04-20 13:28:15 -07:00
Howard Pritchard	841192645b	common/libfabric: move libfabric to ofi This PR renames the common library for OFI libfabric from libfabric to ofi. There are a number of reasons this is good to do: 1) its shorter and replaces 9 characters with three for function names for what may eventually be a fairly extensive interface 2) OFI is the term used for MTL and RML components that use the OFI libfabric interface 3) A planned OSC component will also use the OFI term. 4) Other HPC libraries that can use OFI libfabric tend to use the term "ofi" internally and also in their configure options relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM) There seem to be comments in places in the Open MPI source code that indicate that this common library will be going away. Far from it as we will want to be able to share things like AV objects between OMPI and possibly OSHMEM components that use the OFI libfabric interface. This PR also adds a synonym to the --with-libfabric(-libdir) configury options: --with-ofi and with-ofi-libdir. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-04-20 13:07:16 -06:00
Ralph Castain	d2b603986d	Merge pull request #3383 from rhc54/topic/timing Increase fine grain of timing info	2017-04-20 05:37:51 -07:00
Ralph Castain	c86f71376a	Increase fine grain of timing info Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-20 00:17:40 -07:00
Ralph Castain	46ea7bf841	Merge pull request #3382 from rhc54/topic/gadget Update gadget platform file	2017-04-19 21:02:37 -07:00
Ralph Castain	243076dd8c	Update gadget platform file Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-19 21:45:13 -06:00

1 2 3 4 5 ...

27019 Коммитов