openmpi

Автор	SHA1	Сообщение	Дата
David Solt	50aa143ab6	Major structural changes to data types: .super infosubscriber ompi_communicator_t, ompi_win_t, ompi_file_t all have a super class of type opal_infosubscriber_t instead of a base/super type of opal_object_t (in previous code comm used c_base, but file used super). It may be a bit bold to say that being a subscriber of MPI_Info is the foundational piece that ties these three things together, but if you object, then I would prefer to turn infosubscriber into a more general name that encompasses other common features rather than create a different super class. The key here is that we want to be able to pass comm, win and file objects as if they were opal_infosubscriber_t, so that one routine can heandle all 3 types of objects being passed to it. MPI_INFO_NULL is still an ompi_predefined_info_t type since an MPI_Info is part of ompi but the internal details of the underlying information concept is part of opal. An ompi_info_t type still exists for exposure to the user, but it is simply a wrapper for the opal object. Routines such as ompi_info_dup, etc have all been moved to opal_info_dup and related to the opal directory. Fortran to C translation tables are only used for MPI_Info that is exposed to the application and are therefore part of the ompi_info_t and not the opal_info_t The data structure changes are primarily in the following files: communicator/communicator.h ompi/info/info.h ompi/win/win.h ompi/file/file.h The following new files were created: opal/util/info.h opal/util/info.c opal/util/info_subscriber.h opal/util/info_subscriber.c This infosubscriber concept is that communicators, files and windows can have subscribers that subscribe to any changes in the info associated with the comm/file/window. When xxx_set_info is called, the new info is presented to each subscriber who can modify the info in any way they want. The new value is presented to the next subscriber and so on until all subscribers have had a chance to modify the value. Therefore, the order of subscribers can make a difference but we hope that there is generally only one subscriber that cares or modifies any given key/value pair. The final info is then stored and returned by a call to xxx_get_info. The new model can be seen in the following files: ompi/mpi/c/comm_get_info.c ompi/mpi/c/comm_set_info.c ompi/mpi/c/file_get_info.c ompi/mpi/c/file_set_info.c ompi/mpi/c/win_get_info.c ompi/mpi/c/win_set_info.c The current subscribers where changed as follows: mca/io/ompio/io_ompio_file_open.c mca/io/ompio/io_ompio_module.c mca/osc/rmda/osc_rdma_component.c (This one actually subscribes to "no_locks") mca/osc/sm/osc_sm_component.c (This one actually subscribes to "blocking_fence" and "alloc_shared_contig") Signed-off-by: Mark Allen <markalle@us.ibm.com> Conflicts: AUTHORS ompi/communicator/comm.c ompi/debuggers/ompi_mpihandles_dll.c ompi/file/file.c ompi/file/file.h ompi/info/info.c ompi/mca/io/ompio/io_ompio.h ompi/mca/io/ompio/io_ompio_file_open.c ompi/mca/io/ompio/io_ompio_file_set_view.c ompi/mca/osc/pt2pt/osc_pt2pt.h ompi/mca/sharedfp/addproc/sharedfp_addproc.h ompi/mca/sharedfp/addproc/sharedfp_addproc_file_open.c ompi/mca/topo/treematch/topo_treematch_dist_graph_create.c ompi/mpi/c/lookup_name.c ompi/mpi/c/publish_name.c ompi/mpi/c/unpublish_name.c opal/mca/mpool/base/mpool_base_alloc.c opal/util/Makefile.am	2017-05-12 14:41:05 -04:00
KAWASHIMA Takahiro	0650d4141f	Merge pull request #3401 from kawashima-fj/pr/fortran-argv-null fortran: Fix `MPI_ARGV(S)_NULL` compilation error	2017-05-11 11:23:12 +09:00
KAWASHIMA Takahiro	854fa5fc55	Merge pull request #3489 from kawashima-fj/pr/group-remote-peers-2nd group: Fix `ompi_group_have_remote_peers` (2nd try)	2017-05-11 11:22:15 +09:00
Matias A Cabral	644641d06f	PSM and PSM2 MTLs check on the max message size allowed by API. OMPI send and receive mesages use size_t for the lenght while PSM and PSM2 psm(2)mq_send/receive use uint32_t. Type size_t is 64 bits in 64 bits arch. Therefore, this patch adds a sanity check on the lenght of the message and fails gracefully. Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2017-05-10 12:45:11 -07:00
bosilca	cbf03b3113	Topic/datatype (#3441 ) * Don't overflow the internal datatype count. Change the type of the count to be a size_t (it does not alter the total size of the internal structures, so has no impact on the ABI). Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Optimize the datatype creation. The internal array of counts of predefined types is now only created when needed, which is either in a heterogeneous environment, or when one call get_elements. It saves space and makes the convertor creation a little faster in some cases. Rearrange the fields in the datatype description structs. The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the static array was only partially created. All predefined types should have the ptypes array created and initialized. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Fix the boundary computation. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * test/datatype: add test for short unpack on heteregeneous cluster Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Trying to reduce the cost of creating a convertor. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Respect the unpack boundaries. As Gilles suggested on #2535 the opal_unpack_general_function was unpacking based on the requested count and not on the amount of packed data provided. Fixes #2535. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-05-09 09:31:40 -04:00
Gilles Gouaillardet	a66909b8b4	Merge pull request #3488 from ggouaillardet/topic/romio314_ad_nfs romio314: ad_nfs fixes for large files from upstream mpich	2017-05-09 16:58:02 +09:00
Gilles Gouaillardet	26f44da429	coll/base: fix mca_coll_base_alltoallv_intra_basic_inplace() correctly handle the case when a MPI task has no data to send/recv Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-05-09 15:19:14 +09:00
Gilles Gouaillardet	eaf050cfe1	romio314: adio/ad_nfs: fix buffer overflows in ADIOI_NFS_{Read,Write}Strided Refs: models/mpich#2338 Refs: models/mpich#2617 Signed-off-by: Rob Latham <robl@mcs.anl.gov> (back-ported from upstream commit pmodels/mpich@642db57648) Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-05-09 11:11:12 +09:00
Gilles Gouaillardet	02af10ce6e	romio314: update NFS read/write routines for large xfers When we updated UFS and others we left NFS alone. HDF group would like a fix, so here we go. Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov> (back-ported from upstream commit pmodels/mpich@684df9f4c9) Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-05-09 11:07:47 +09:00
Jeff Squyres	7185567d50	Merge pull request #3455 from jsquyres/pr/fix-lustre-configure Lustre configure fixes	2017-05-08 16:49:23 -04:00
Ralph Castain	ef0e0171c9	Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c). Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-08 10:04:50 -07:00
KAWASHIMA Takahiro	e453e42279	group: Fix `ompi_group_have_remote_peers` `ompi_group_t::grp_proc_pointers[i]` may have sentinel values even for processes which reside in the local node because the array for `MPI_COMM_WORLD` is set up before `ompi_proc_complete_init`, which allocates `ompi_proc_t` objects for processes reside in the local node, is called in `MPI_INIT`. So using `ompi_proc_is_sentinel` against `ompi_group_t::grp_proc_pointers[i]` in order to determine whether the process resides in a remote node is not appropriate. This bug sometimes causes an `MPI_ERR_RMA_SHARED` error when `MPI_WIN_ALLOCATE_SHARED` is called, where sm OSC uses `ompi_group_have_remote_peers`. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-05-08 20:28:51 +09:00
KAWASHIMA Takahiro	913adce59b	Revert "group: Fix `ompi_group_have_remote_peers`"	2017-05-08 18:42:18 +09:00
Artem Polyakov	858d8cdff7	Merge pull request #3375 from artpol84/comm_create/master ompi/comm: Improve MPI_Comm_create algorithm	2017-05-05 20:41:16 -07:00
Jeff Squyres	c81bc50198	fs/lustre: remove redundant/dead code We check for liblustreapi.h in OMPI_CHECK_LUSTRE, so this code was commented out here. Might as well fully delete it, since it's redundant and dead. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-05-05 05:28:33 -07:00
Nathan Hjelm	4676575343	Merge pull request #3410 from kawashima-fj/pr/group-remote-peers group: Fix `ompi_group_have_remote_peers`	2017-05-04 09:20:35 -06:00
KAWASHIMA Takahiro	28281190eb	Merge pull request #3402 from kawashima-fj/pr/java mpi/java: Add missing Java binding methods	2017-04-27 15:45:49 +09:00
Yossi	f56847542e	Merge pull request #3347 from alinask/topic/ucx-sync-send PML UCX: handle a synchronous send.	2017-04-26 18:02:09 +03:00
Alina Sklarevich	49913c692a	PML UCX: unite the code for all the sending modes. Signed-off-by: Alina Sklarevich <alinas@mellanox.com>	2017-04-26 13:17:06 +03:00
KAWASHIMA Takahiro	f036bac4c2	group: Fix `ompi_group_have_remote_peers` `ompi_group_t::grp_proc_pointers[i]` may have sentinel values even for processes which reside in the local node because the array for `MPI_COMM_WORLD` is set up before `ompi_proc_complete_init`, which allocates `ompi_proc_t` objects for processes reside in the local node, is called in `MPI_INIT`. So using `ompi_proc_is_sentinel` against `ompi_group_t::grp_proc_pointers[i]` in order to determine whether the process resides in a remote node is not appropriate. This bug sometimes causes an `MPI_ERR_RMA_SHARED` error when `MPI_WIN_ALLOCATE_SHARED` is called, where sm OSC uses `ompi_group_have_remote_peers`. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-04-25 11:00:52 +09:00
Jeff Squyres	7ea05954bf	Merge pull request #3399 from jsquyres/pr/add-aint-add-diff mpif-externals.h: add missing MPI_AINT_ADD/MPI_AINT_DIFF	2017-04-24 15:47:43 -04:00
KAWASHIMA Takahiro	3699ce1f75	mpi/java: Set the given error handler to `Win` Probably setting `MPI_ERRORS_RETURN` is unintentional. Probably... Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-04-24 16:55:13 +09:00
KAWASHIMA Takahiro	8558185c85	mpi/java: Add missing Java binding methods This commit add the following methods. \| Language-indep. notation \| Java binding \| \| ------------------------ \| ----------------------- \| \| MPI_WIN_GET_ERRHANDLER \| mpi.Win.getErrhandler \| \| MPI_FILE_SET_ERRHANDLER \| mpi.File.setErrhandler \| \| MPI_FILE_GET_ERRHANDLER \| mpi.File.getErrhandler \| \| MPI_COMM_CALL_ERRHANDLER \| mpi.Comm.callErrhandler \| \| MPI_FILE_CALL_ERRHANDLER \| mpi.File.callErrhandler \| \| MPI_FILE_IREAD_AT_ALL \| mpi.File.iReadAtAll \| \| MPI_FILE_IWRITE_AT_ALL \| mpi.File.iWriteAtAll \| \| MPI_FILE_IREAD_ALL \| mpi.File.iReadAll \| \| MPI_FILE_IWRITE_ALL \| mpi.File.iWriteAll \| \| MPI_FILE_GET_ATOMICITY \| mpi.File.getAtomicity \| `MPI_FILE_I{READ,WRITE}(_AT)_ALL` routines are added in MPI-3.1. I don't know why other methods were missing. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-04-24 16:55:03 +09:00
KAWASHIMA Takahiro	0fcd96486a	fortran: Fix `MPI_ARGV(S)_NULL` compilation error Fortran constants `MPI_ARGV_NULL` and `MPI_ARGVS_NULL` are defined in MPI-3.1 p.680 as below. > `MPI_ARGVS_NULL` > 2-dim. array of `CHARACTER()` > `MPI_ARGV_NULL` > array of `CHARACTER()` `MPI_ARGV_NULL` and `MPI_ARGVS_NULL` are used as an argument of `MPI_COMM_SPAWN` and `MPI_COMM_SPAWN_MULTIPLE` respectively and their argument `argv` and `array_of_argv` are defined as below for `USE mpi_f08` binding in MPI-3.1. ``` CHARACTER(LEN=), INTENT(IN) :: argv() CHARACTER(LEN=), INTENT(IN) :: array_of_argv(count, ) ``` Defining them as `INTEGER` in `mpi_f08` module will cause a compilation error of user programs like "There is no specific subroutine for the generic 'mpi_comm_spawn'". Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-04-24 13:53:12 +09:00
Ralph Castain	8b1f01dfe6	Set the default modex parameters back to full blocking modex while we continue to test and debug the slow modex - it seems to be having issues on the Cray Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-22 15:19:46 -07:00
Jeff Squyres	d32eff6ea2	mpif-externals.h: add missing MPI_AINT_ADD/MPI_AINT_DIFF MPI_AINT_ADD and MPI_AINT_DIFF are functions and must be declared as externals with the proper return type. This is already done properly in the mpi and mpi_f08 modules; these declarations for these functions were only missing from mpif.h (i.e., mpif-externals.h). Thanks to Aboorva Devarajan (@AboorvaDevarajan) for the bug report. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-04-22 08:57:54 -07:00
Gilles Gouaillardet	ebe6125750	mpi/c: MPI_PROC_NULL is not a valid rank in MPI_Win_{lock,unlock} Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-22 11:13:13 +09:00
Ralph Castain	f2ed293ecd	Merge pull request #3398 from rhc54/topic/modex Implement a background fence that collects all data during modex operation	2017-04-21 15:15:49 -07:00
Ralph Castain	9fc3079ac2	Implement a background fence that collects all data during modex operation The direct modex operation is slow, especially at scale for even modestly-connected applications. Likewise, blocking in MPI_Init while we wait for a full modex to complete takes too long. However, as George pointed out, there is a middle ground here. We could kickoff the modex operation in the background, and then trap any modex_recv's until the modex completes and the data is delivered. For most non-benchmark apps, this may prove to be the best of the available options as they are likely to perform other (non-communicating) setup operations after MPI_Init, and so there is a reasonable chance that the modex will actually be done before the first modex_recv gets called. Once we get instant-on-enabled hardware, this won't be necessary. Clearly, zero time will always out-perform the time spent doing a modex. However, this provides a decent compromise in the interim. This PR changes the default settings of a few relevant params to make "background modex" the default behavior: * pmix_base_async_modex -> defaults to true * pmix_base_collect_data -> continues to default to true (no change) * async_mpi_init - defaults to true. Note that the prior code attempted to base the default setting of this value on the setting of pmix_base_async_modex. Unfortunately, the pmix value isn't set prior to setting async_mpi_init, and so that attempt failed to accomplish anything. The logic in MPI_Init is: * if async_modex AND collect_data are set, AND we have a non-blocking fence available, then we execute the background modex operation * if async_modex is set, but collect_data is false, then we simply skip the modex entirely - no fence is performed * if async_modex is not set, then we block until the fence completes (regardless of collecting data or not) * if we do NOT have a non-blocking fence (e.g., we are not using PMIx), then we always perform the full blocking modex operation. * if we do perform the background modex, and the user requested the barrier be performed at the end of MPI_Init, then we check to see if the modex has completed when we reach that point. If it has, then we execute the barrier. However, if the modex has NOT completed, then we block until the modex does complete and skip the extra barrier. So we never perform two barriers in that case. HTH Ralph Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-21 10:29:23 -07:00
Howard Pritchard	462342d148	Merge pull request #3311 from hppritcha/topic/libfabric_moves_to_ofi common/libfabric: move libfabric to ofi	2017-04-21 07:50:38 -06:00
Artem Polyakov	68167ec879	ompi/comm: Improve MPI_Comm_create algorithm Force only procs that are participating in the ne Comm to decide what CID is appropriate. This will have 2 advantages: * Speedup Comm creation for small communicators: non-participating procs will not interfere * Reduce CID fragmentation: non-overlaping groups will be allowed to use same CID. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2017-04-21 08:33:29 +07:00
Howard Pritchard	841192645b	common/libfabric: move libfabric to ofi This PR renames the common library for OFI libfabric from libfabric to ofi. There are a number of reasons this is good to do: 1) its shorter and replaces 9 characters with three for function names for what may eventually be a fairly extensive interface 2) OFI is the term used for MTL and RML components that use the OFI libfabric interface 3) A planned OSC component will also use the OFI term. 4) Other HPC libraries that can use OFI libfabric tend to use the term "ofi" internally and also in their configure options relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM) There seem to be comments in places in the Open MPI source code that indicate that this common library will be going away. Far from it as we will want to be able to share things like AV objects between OMPI and possibly OSHMEM components that use the OFI libfabric interface. This PR also adds a synonym to the --with-libfabric(-libdir) configury options: --with-ofi and with-ofi-libdir. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-04-20 13:07:16 -06:00
Ralph Castain	c86f71376a	Increase fine grain of timing info Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-20 00:17:40 -07:00
Gilles Gouaillardet	ded63c5e0c	ompi: use ompi_coll_base_sendrecv_actual() whenever possible Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-20 10:01:28 +09:00
Gilles Gouaillardet	52551d96c1	Merge pull request #3285 from ggouaillardet/topic/coll_zerobyte_messages coll/base: always send/recv zero-byte messages	2017-04-20 09:22:47 +09:00
Gilles Gouaillardet	fa5cd0dbe5	use ptrdiff_t instead of OPAL_PTRDIFF_TYPE since Open MPI now requires a C99, and ptrdiff_t type is part of C99, there is no more need for the abstract OPAL_PTRDIFF_TYPE type. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-19 13:41:56 +09:00
Gilles Gouaillardet	dcf9cca21f	ompi/datatype: add the OMPI_DATATYPE_INIT_UNAVAILABLE_BASIC_TYPE macro Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-19 13:09:33 +09:00
bosilca	872cf44c28	Improve the opal_pointer_array & more (#3369 ) * Complete rewrite of opal_pointer_array Instead of a cache oblivious linear search use a bits array to speed up the management of the free space. As a result we slightly increase the memory used by the structure, but we get a significant boost in performance. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Do not register datatypes in the f2c translation table. The registration is now done up into the Fortran layer, by forcing a call to MPI_Type_c2f. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-04-18 21:41:26 -04:00
Gilles Gouaillardet	23dad50d51	mpi/c: allow MPI_PROC_NULL in MPI_Win_shared_query() This fixes a regression introduced in open-mpi/ompi@b3a20100d3 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-19 10:06:41 +09:00
Yossi	9ebcafd6d6	Merge pull request #3260 from derbeyn/fix_yalla Fix yalla PML: MPI_Recv does not return MPI_ERR_TRUNCATE upon overflow	2017-04-18 11:37:48 +03:00
Alina Sklarevich	d93b67257b	PML UCX: handle a synchronous send. MCA_PML_BASE_SEND_SYNCHRONOUS Signed-off-by: Alina Sklarevich <alinas@mellanox.com>	2017-04-13 18:11:55 +03:00
Alina Sklarevich	eec310c99c	PML/UCX/YALLA: Fix the message release call. Set message to MPI_MESSAGE_NULL. Signed-off-by: Alina Sklarevich <alinas@mellanox.com>	2017-04-13 14:41:13 +03:00
Gilles Gouaillardet	6886c1229a	Merge pull request #3327 from jeffhammond/fix-issue-3326 check for negative ranks in ompi_win_peer_invalid	2017-04-13 10:53:32 +09:00
Ralph Castain	dadc924cde	Cleanup warnings when timing is not enabled Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-11 17:29:27 -07:00
Jeff Hammond	b3a20100d3	check for negative ranks in ompi_win_peer_invalid resolves #3326 (https://github.com/open-mpi/ompi/issues/3326) Signed-off-by: jeff.r.hammond@intel.com	2017-04-11 14:26:16 -07:00
Nathan Hjelm	bea7d9e4f7	Merge pull request #3320 from hjelmn/osc_pt2pt_fix osc/pt2pt: fix infinite frag allocation loop	2017-04-11 09:09:30 -06:00
Artem Polyakov	4477b87e1d	Merge pull request #3303 from karasevb/timing2/master OMPI timings	2017-04-11 07:52:40 -07:00
Boris Karasev	d132eab4a5	ompi/timings: fixed the error of opal timings env import Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2017-04-11 12:08:48 +06:00
Nathan Hjelm	12b52b2b2c	osc/pt2pt: fix infinite frag allocation loop Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-04-10 16:30:47 -06:00
KAWASHIMA Takahiro	b4599d7bb7	datatype: Fix darray MPI_ACCUMULATE bug Array sizes of `array_of_gsizes`, `array_of_distribs`, `array_of_dargs`, and `array_of_psizes` parameters of the `ompi_datatype_create_darray` function (and `MPI_TYPE_CREATE_DARRAY`) are all `ndims`. `ndims` are `i[2]`, not `i[0]`. See MPI-3.1 p.122. Because this function `__ompi_datatype_create_from_args` is used by pt2pt OSC, using a datatype created by `MPI_TYPE_CREATE_DARRAY` for `MPI_(R)(GET_)ACCUMULATE` caused a segmentation fault or something on a target process. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-04-10 17:31:59 +09:00

1 2 3 4 5 ...

9512 Коммитов