openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	5e9be7667b	Merge pull request #3600 from ggouaillardet/topic/osc_rdma_get_segment osc/rdma: fix osc_rdma_get_remote_segment() length parameter	2017-06-01 13:09:14 +09:00
Nathan Hjelm	e1a997c0cb	Merge pull request #3593 from hjelmn/bug_3575 osc/rdma: fix typo in ompi_osc_rdma_lock_acquire_exclusive	2017-05-31 08:54:40 -06:00
Gilles Gouaillardet	e622ca8c1c	osc/rdma: fix osc_rdma_get_remote_segment() length parameter a buffer defined by (buf, count, dt) will have data starting at buf+offset and ending len bytes later with len = opal_datatype_span(&dt.super, count, &offset); Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-05-29 11:08:03 +09:00
Nathan Hjelm	b83c5dbee5	osc/rdma: fix typo in ompi_osc_rdma_lock_acquire_exclusive Fixes #3575 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-05-26 14:21:08 -06:00
Josh Hursey	4bfb0fcddd	Merge pull request #3577 from markalle/pr/osc_rdma_rangecheck fix for buffer length check (rdma osc w/ odd datatypes)	2017-05-26 10:44:33 -05:00
Gilles Gouaillardet	0f79259b94	osc/rdma: use extent of the appropriate datatype in ompi_osc_rdma_rget_accumulate_internal() origin_datatype and target_datatype might be different and hence have different extent, so use either origin_extent or target_extent when appropriate. Refs open-mpi/ompi#3569 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-05-26 13:59:38 +09:00
Mark Allen	df14cbf039	fix for buffer length check (rdma osc w/ odd datatypes) The osc_rdma_get_remote_segment() has the 3rd and 4th args as * target_disp * length which it uses to determine if the rdma falls within the bounds of the window or not (actually it only checks the upper bound, but I'm okay with that). Anyway the caller previously was passing in the length argument as target_datatype->super.size * target_count which which doesn't really represent the number of bytes after target_disp for which data exists. In particular I could create a datatype as { disp -4, len 4 } and use target_disp 4 and that would be bytes 0-3 of the window where the original code would think it was bytes 4-7 and could abort at the range check. Ive changed it to use the opal_datatype_span() function. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2017-05-24 19:10:39 -04:00
Mark Allen	c9f31a8d39	fix for 1sided with some hosts single rank See bug report https://github.com/open-mpi/ompi/issues/3548 If a 1sided test is launched -host hostA:2,hostB:1 some of the ranks call allocate_state_single() and others call allocate_state_shared(). These functions were producing different values for module->state_size but that's used when they lookup peer info from each other in ompi_osc_rdma_peer_setup() so they need to all have matching module->state_offset values. This change adds a few unused bytes in the memory allocate_state_single() creates so it matches. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2017-05-22 15:10:49 -04:00
Mark Allen	482d84b6e5	fixes for Dave's get/set info code The expected sequence of events for processing info during object creation is that if there's an incoming info arg, it is opal_info_dup()ed into the obj at obj->s_info first. Then interested components register callbacks for keys they want to know about using opal_infosubscribe_infosubscribe(). Inside info_subscribe_subscribe() the specified callback() is called with whatever matching k/v is in the object's info, or with the default. The return string from the callback goes into the new k/v stored in info, and the input k/v is saved as __IN_<key>/<val>. It's saved the same way whether the input came from info or whether it was a default. A null return from the callback indicates an ignored key/val, and no k/v is stored for it, but an __IN_<key>/<val> is still kept so we still have access to the original. At MPI__set_info() time, opal_infosubscribe_change_info() is used. That function calls the registered callbacks for each item in the provided info. If the callback returns non-null, the info is updated with that k/v, or if the callback returns null, that key is deleted from info. An __IN_<key>/<val> is saved either way, and overwrites any previously saved value. When MPI__get_info() is called, opal_info_dup_mpistandard() is used, which allows relatively easy changes in interpretation of the standard, by looking at both the <key>/<val> and __IN_<key>/<val> in info. Right now it does 1. includes system extras, eg k/v defaults not expliclty set by the user 2. omits ignored keys 3. shows input values, not callback modifications, eg not the internal values Currently the callbacks are doing things like return some_condition ? "true" : "false" that is, returning static strings that are not to be freed. If the return strings start becoming more dynamic in the future I don't see how unallocated strings could support that, so I'd propose a change for the future that the callback()s registered with info_subscribe_subscribe() do a strdup on their return, and we change the callers of callback() to free the strings it returns (there are only two callers). Rough outline of the smaller changes spread over the less central files: comm.c initialize comm->super.s_info to NULL copy into comm->super.s_info in comm creation calls that provide info OBJ_RELEASE comm->super.s_info at free time comm_init.c initialize comm->super.s_info to NULL file.c copy into file->super.s_info if file creation provides info OBJ_RELEASE file->super.s_info at free time win.c copy into win->super.s_info if win creation provides info OBJ_RELEASE win->super.s_info at free time comm_get_info.c file_get_info.c win_get_info.c change_info() if there's no info attached (shouldn't happen if callbacks are registered) copy the info for the user The other category of change is generally addressing compiler warnings where ompi_info_t and opal_info_t were being used a little too interchangably. An ompi_info_t* contains an opal_info_t*, at &(ompi_info->super) Also this commit updates the copyrights. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2017-05-17 01:12:49 -04:00
David Solt	50aa143ab6	Major structural changes to data types: .super infosubscriber ompi_communicator_t, ompi_win_t, ompi_file_t all have a super class of type opal_infosubscriber_t instead of a base/super type of opal_object_t (in previous code comm used c_base, but file used super). It may be a bit bold to say that being a subscriber of MPI_Info is the foundational piece that ties these three things together, but if you object, then I would prefer to turn infosubscriber into a more general name that encompasses other common features rather than create a different super class. The key here is that we want to be able to pass comm, win and file objects as if they were opal_infosubscriber_t, so that one routine can heandle all 3 types of objects being passed to it. MPI_INFO_NULL is still an ompi_predefined_info_t type since an MPI_Info is part of ompi but the internal details of the underlying information concept is part of opal. An ompi_info_t type still exists for exposure to the user, but it is simply a wrapper for the opal object. Routines such as ompi_info_dup, etc have all been moved to opal_info_dup and related to the opal directory. Fortran to C translation tables are only used for MPI_Info that is exposed to the application and are therefore part of the ompi_info_t and not the opal_info_t The data structure changes are primarily in the following files: communicator/communicator.h ompi/info/info.h ompi/win/win.h ompi/file/file.h The following new files were created: opal/util/info.h opal/util/info.c opal/util/info_subscriber.h opal/util/info_subscriber.c This infosubscriber concept is that communicators, files and windows can have subscribers that subscribe to any changes in the info associated with the comm/file/window. When xxx_set_info is called, the new info is presented to each subscriber who can modify the info in any way they want. The new value is presented to the next subscriber and so on until all subscribers have had a chance to modify the value. Therefore, the order of subscribers can make a difference but we hope that there is generally only one subscriber that cares or modifies any given key/value pair. The final info is then stored and returned by a call to xxx_get_info. The new model can be seen in the following files: ompi/mpi/c/comm_get_info.c ompi/mpi/c/comm_set_info.c ompi/mpi/c/file_get_info.c ompi/mpi/c/file_set_info.c ompi/mpi/c/win_get_info.c ompi/mpi/c/win_set_info.c The current subscribers where changed as follows: mca/io/ompio/io_ompio_file_open.c mca/io/ompio/io_ompio_module.c mca/osc/rmda/osc_rdma_component.c (This one actually subscribes to "no_locks") mca/osc/sm/osc_sm_component.c (This one actually subscribes to "blocking_fence" and "alloc_shared_contig") Signed-off-by: Mark Allen <markalle@us.ibm.com> Conflicts: AUTHORS ompi/communicator/comm.c ompi/debuggers/ompi_mpihandles_dll.c ompi/file/file.c ompi/file/file.h ompi/info/info.c ompi/mca/io/ompio/io_ompio.h ompi/mca/io/ompio/io_ompio_file_open.c ompi/mca/io/ompio/io_ompio_file_set_view.c ompi/mca/osc/pt2pt/osc_pt2pt.h ompi/mca/sharedfp/addproc/sharedfp_addproc.h ompi/mca/sharedfp/addproc/sharedfp_addproc_file_open.c ompi/mca/topo/treematch/topo_treematch_dist_graph_create.c ompi/mpi/c/lookup_name.c ompi/mpi/c/publish_name.c ompi/mpi/c/unpublish_name.c opal/mca/mpool/base/mpool_base_alloc.c opal/util/Makefile.am	2017-05-12 14:41:05 -04:00
Gilles Gouaillardet	fa5cd0dbe5	use ptrdiff_t instead of OPAL_PTRDIFF_TYPE since Open MPI now requires a C99, and ptrdiff_t type is part of C99, there is no more need for the abstract OPAL_PTRDIFF_TYPE type. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-19 13:41:56 +09:00
Nathan Hjelm	12b52b2b2c	osc/pt2pt: fix infinite frag allocation loop Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-04-10 16:30:47 -06:00
Nathan Hjelm	fad0803920	osc/rdma: fix typo in atomic code Fixes #3267 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-04-03 15:54:28 -06:00
Nathan Hjelm	c72fb30eb5	osc/pt2pt: fix typo Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2017-03-23 09:00:21 -06:00
Jeff Squyres	760db0d5ce	osc/pt2pt: fix compiler warning Remove unused variable. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-03-16 05:46:11 -07:00
Valentin Petrov	fe069c9570	Fixes the coll_allgather usage bug One should use the correct module object when calling c_coll.coll_allgather. Otherwise there will be a segfault in the case, for example, when hcoll is used. In that case c_coll.coll_allgather = mca_coll_hcoll_allgather while c_coll.coll_gather_module = tuned. Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2017-03-14 09:47:39 +02:00
Nathan Hjelm	0195d15401	osc/pt2pt: flush pending fragments on lock ack This commit addresses an issue that can occur in cases where a lot of fragments are outstanding. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-06 13:58:46 -07:00
Josh Hursey	b1c4e50500	Merge pull request #2934 from jjhursey/topic/coll-comm-restructure Move coll structure outside of the communicator	2017-02-28 08:45:18 -06:00
Nathan Hjelm	032bcf915a	osc/rdma: fix compile warning Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-27 16:26:00 -07:00
George Bosilca	366d64b7e5	Move the collective structure outside the communicator. As we changed the ABI (forcing a major release), we can limit the size of the predefined communicators by moving the collective structure outside the communicator. This might have a minimal, but unnoticeable, impact on performance. This approach has been discussed during the January 2017 devel meeting. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-27 11:54:17 -06:00
Nathan Hjelm	581bff9871	Merge pull request #3034 from hjelmn/osc_rdma_atomic osc/rdma: make locking code more robust	2017-02-27 08:46:52 -07:00
Nathan Hjelm	4707c7c5e0	osc/rdma: make locking code more robust Under heavy load the locking code could fail if the underlying btl module started to return OPAL_ERR_OUT_OF_RESOURCE on atomic operations. This commit updates the code to gracefully handle btl errors. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-27 00:01:26 -07:00
Gilles Gouaillardet	af0b5cffb4	asm: rename the AMD64 into X86_64 in this context, AMD64 really means amd64 or em64t, so let's rename this into X86_64 in order to avoid any confusion Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-27 15:10:50 +09:00
Gilles Gouaillardet	4184c01be5	Merge pull request #2393 from bosilca/topic/no_predefined_ddt_refcount Don't refcount the predefined datatypes.	2017-02-21 09:38:11 +09:00
Todd Kordenbrock	048f757d9f	osc-portals4: add support for noncontiguous datatypes This commit implements onesided operations for noncontiguous datatypes using two different algorithms. * If the result and/or origin datatype is noncontiguous and the target datatype is contiguous, then an iovec MD is created for the result and origin. The operation is performed using a single Portals4 call (unless it exceeds the max message size). * If the target datatype is noncontigous, then an algorithm similar to the one in osc-rdma is used to loop over the contiguous blocks of each datatype. The operation is performed using multiple Portals4 calls. This commit ensures that individual operations do not exceed the max atomic size or the max message size supported by the device. Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>	2017-02-15 16:17:13 -06:00
Gilles Gouaillardet	cd4537193c	osc/sm: fix MPI_Win_allocate_shared() alignment add padding so the memory allocated by MPI_Win_allocate_shared() is 64 bytes aligned. Thanks Joseph Schuchart for the bug report Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-15 13:40:48 +09:00
Nathan Hjelm	cc4a0fabcf	Merge pull request #2727 from hjelmn/osc_rdma osc/rdma: fix typo in check for MPI_MODE_NOCHECK	2017-02-14 10:50:33 -07:00
Geoff Paulsen	4917e44a7d	Merge pull request #2832 from jjhursey/topic/ibm/osc-base-dt-abort osc/base: Detect unsupported data types and abort	2017-02-05 04:26:04 -06:00
Nathan Hjelm	362ac8b87e	osc/pt2pt: fix threading issues This commit fixes a number of threading issues discovered in osc/pt2pt. This includes: - Lock the synchronization object not the module in osc_pt2pt_start. This fixes a race between the start function and processing post messages. - Always lock before calling cond_broadcast. Fixes a race between the waiting thread and signaling thread. - Make all atomically updated values volatile. - Make the module lock recursive to protect against some deadlock conditions. Will roll this back once the locks have been re-designed. - Mark incoming complete after completing an accumulate not before. This was causing an incorrect answer under certain conditions. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-01 10:33:01 -07:00
Nysal Jan K.A	94f92f6b49	osc/base: Detect unsupported data types and abort Using MPI_MINLOC or MPI_MAXLOC with the following data types leads to data corruption: * MPI_DOUBLE_INT * MPI_LONG_INT * MPI_SHORT_INT * MPI_LONG_DOUBLE_INT Detect this print a error message and abort. This workaround should be removed once the following issue is resolved: * https://github.com/open-mpi/ompi/issues/1666 Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-01-25 15:28:28 -06:00
Nathan Hjelm	0497ec0b70	osc/rdma: fix typo in check for MPI_MODE_NOCHECK This commit fixes two typos in the lock_all path that inverted the MPI_MODE_NOCHECK flag. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-01-12 11:28:11 -07:00
George Bosilca	c2cd717f82	Don't refcount the predefined datatypes. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-01-11 16:48:59 -05:00
Ralph Castain	dadc6fbaf6	Merge pull request #2448 from thananon/remove_request_lock Completely removed ompi_request_lock and ompi_request_cond	2017-01-03 19:31:46 -08:00
Mark Allen	eec1d5bf2e	osc/pt2pt: Fix hang with Put and Win_lock_all * When using `MPI_Put` with `MPI_Win_lock_all` a hang is possible since the `put` is waiting on `eager_send_active` to become `true` but that variable might not be reset in the case of `MPI_Win_lock_all` depending on other incoming events (e.g., `post` or ACKs of lock requests. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2016-12-16 11:52:53 -05:00
Mark Allen	0d1336b4a8	osc/pt2pt: Fix Lock/Unlock and Get wrong answer * When using `MPI_Lock`/`MPI_Unlock` with `MPI_Get` and non-contiguous datatypes is is possible that the unlock finishes too early before the data is actually present in the recv buffer. * We need to wait for the irecv to complete before unlocking the target. This commit waits for the outgoing fragment counts to become equal before unlocking. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2016-12-16 11:52:51 -05:00
Mark Allen	1ebf9fd3a4	osc/pt2pt: Fix PSCW after Fence wrong answer. * If the user uses PSCW synchronization after a Fence then the previous epoch is not reset which can cause the PSCW to transfer data before it is ready leading to wrong answers. * This commit resets the `eager_send_active` in the start call. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2016-12-16 11:52:49 -05:00
Ralph Castain	585540bcee	Reduce the flood of warnings due to uninitialized variables, mismatched types, and unused things to a more bearable trickle Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-14 16:33:50 -08:00
Ralph Castain	1e2019ce2a	Revert "Update to sync with OMPI master and cleanup to build" This reverts commit cb55c88a8b7817d5891ff06a447ea190b0e77479.	2016-11-22 15:03:20 -08:00
Thananon Patinyasakdikul	b25a8c3fa5	Completely removed ompi_request_lock and ompi_request_cond as we dont need them anymore. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>	2016-11-22 17:58:31 -05:00
Ralph Castain	cb55c88a8b	Update to sync with OMPI master and cleanup to build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-22 14:24:54 -08:00
Gilles Gouaillardet	bd364d29f7	osc/sm: plug an other memory leak in ompi_osc_sm_free Fixes open-mpi/ompi@f1b473ee63 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-14 23:19:07 -07:00
Gilles Gouaillardet	f1b473ee63	osc/sm: plug a memory leak in ompi_osc_sm_free Thanks Joseph Schuchart for the report. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-14 22:22:43 -07:00
Gilles Gouaillardet	958e29f929	osc/rdma: silence a warning declare a local variable volatile and silence CID 1372692	2016-10-13 16:10:07 +09:00
Nathan Hjelm	e8ef503bee	osc/rdma: fix warnings Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-10-12 10:17:25 -06:00
Nathan Hjelm	432d79046b	Merge pull request #2197 from tkordenbrock/topic/master/osc-rdma.put.use.true_extent osc-rdma: fix datatype lower bound errors in ompi_osc_rdma_master()	2016-10-11 10:42:02 -06:00
Todd Kordenbrock	05f86b5df7	osc-rdma: fix datatype lower bound errors in ompi_osc_rdma_master() Instead of ompi_datatype_get_extent(), use ompi_datatype_get_true_extent() to get the local and remote lower bound. For derived types like subarray, true_lb is the correct offset for RDMA operations.	2016-10-10 06:45:28 -05:00
Todd Kordenbrock	cc863ff9fb	osc-portals4: fix datatype errors in put() Instead of ompi_datatype_get_extent(), use ompi_datatype_get_true_extent() to get the origin and target lower bound. For derived types like subarray, true_lb is the correct offset for RDMA operations. Also, instead of the extent use the size of the datatype.	2016-10-10 06:45:14 -05:00
Todd Kordenbrock	c536e11cf3	osc-portals4: fix offset bug in raccumulate() This commit fixes a bug where the remote offset was used as both the local and remote offset. Thanks to @PDeveze for the patch.	2016-10-04 09:09:17 -05:00
Nathan Hjelm	59bae1a330	osc/rdma: fix typo in compare-and-swap This commit fixes a typo in compare-and-swap when retrieving the memory region associated with a displacement. It was erroneously 8 bytes instead of the datatype size. This can cause an incorrect RMA range error when the compare-and-swap is less than 4 bytes from the end of the region. Fixed open-mpi/ompi#2080 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-14 16:49:42 -06:00
Nathan Hjelm	7c8e7691a7	Merge pull request #2045 from hjelmn/osc_rdma_atomics osc/rdma: add support for network AMOs	2016-09-08 11:21:49 -06:00

1 2 3 4 5 ...

568 Коммитов