openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	79fd359848	Merge pull request #3713 from rhc54/topic/ofi Enable use of OFI fabrics for launch and other collective operations.…	2017-06-25 11:47:40 -07:00
Ralph Castain	ed85512a7c	Update to track PMIx v2.0.1 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-25 07:29:32 -07:00
Ralph Castain	ef56c7d47a	Correctly transfer size_t data fields Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-24 20:11:54 -07:00
Ralph Castain	f4411c4393	Enable use of OFI fabrics for launch and other collective operations. Update the PMIx repo to the latest master to get the required support for the server to "push" modex info, and to retrieve all its own "modex" values for sending back to mpirun. Have mpirun cache them in its local modex hash as OFI goes point-to-point direct and doesn't route - so the remote daemons don't need a copy of this connection info. Remove the opal_ignore from the RML/OFI component, but disable that component unless the user specifically requests it via the "rml_ofi_desired=1" MCA param. This will let us test compile in various environments without interfering with operations while we continue to debug Fix an error when computing the number of infos during server init Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-23 19:57:21 -07:00
Ralph Castain	8263efff65	Fix uninitialized variables Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-23 11:12:26 -07:00
Ralph Castain	6ec2ad5288	Fix the pmix_query API when it asks for something that returns an array of pmix_info_t. Protect the PMIX_INFO_FREE macro from NULL arrays. Update the mpi_memprobe scaling test Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-22 20:11:36 -07:00
Nathan Hjelm	4252258338	Merge pull request #3721 from hjelmn/list_cleanup opal: use opal_list_t convienience macros	2017-06-22 09:12:23 -06:00
Ralph Castain	3e78f84093	Silence Coverity warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-21 13:19:51 -07:00
Ralph Castain	cba127bc43	Update the ext2x component to match the internal one Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-20 11:42:14 -07:00
Nathan Hjelm	ffd8ee2dfd	opal: use opal_list_t convienience macros This commit cleans up code in opal to use OPAL_LIST_FOREACH(_SAFE), OPAL_LIST_DESTRUCT, and OPAL_LIST_RELEASE. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-06-20 12:37:12 -06:00
Ralph Castain	952726c121	Update to latest PMIx master - equivalent to 2.0rc2. Update the thread support in the opal/pmix framework to protect the framework-level structures. This now passes the loop test, and so we believe it resolves the random hangs in finalize. Changes in PMIx master that are included here: * Fixed a bug in the PMIx_Get logic * Fixed self-notification procedure * Made pmix_output functions thread safe * Fixed a number of thread safety issues * Updated configury to use 'uname -n' when hostname is unavailable Work on cleaning up the event handler thread safety problem Rarely used functions, but protect them anyway Fix the last part of the intercomm problem Ensure we don't cover any PMIx calls with the framework-level lock. Protect against NULL argv comm_spawn Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-20 09:02:15 -07:00
Ralph Castain	8f09929469	Fix rank-file mapper launch by correctly setting up the remote map from the provided data Put a simple protection for the case where procs fail while we are trying to deregister handlers Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-15 08:33:29 -07:00
KAWASHIMA Takahiro	b5b6b22848	Merge pull request #3678 from kawashima-fj/pr/signal-abort-delay Apply `opal_abort_delay` to the OPAL signal handler	2017-06-12 10:35:11 +09:00
Ralph Castain	548cd24e4e	Forward-port changes proposed for v3.0 to master from PR #3677 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-09 07:51:21 -07:00
KAWASHIMA Takahiro	362445d486	Use same prefix format for `[host:pid]` Hostname and PID are output as a message prefix in many places in our code. Their printf-formats were either `[%s:%d]` or `[%s:%05d]`. This commit changes `[%s:%d]` to `[%s:%05d]`. The latter was more widely used in our code (including OPAL output system and the signal handler). Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-06-08 19:35:03 +09:00
Ralph Castain	2d65908184	Correct the external pmix configury Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-07 00:33:29 -07:00
Ralph Castain	bd1793ad17	Get the pmix/ext2x component to work. Fix a minor problem in the libevent external component. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-06 20:06:28 -07:00
Ralph Castain	c3e6dc2022	Update to pmix v2.0.0rc1, including thread safety fixes Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-06 15:16:34 -07:00
Ralph Castain	93cf3c7203	Update OPAL and ORTE for thread safety (I swear, if I look this over one more time, I'll puke) Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-06 12:30:57 -07:00
Ralph Castain	2f85d10600	Update to PMIx master Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-06 08:19:25 -07:00
Ralph Castain	8f526968c2	Do not hang if we cannot relay messages. Eliminate extra error log message Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-06-05 06:35:19 -07:00
Ralph Castain	9d6b929894	Fix uninitialized variable. Set exit codes for failed launch so we get pretty error messages Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-31 07:38:37 -07:00
Ralph Castain	26d96061aa	Roll in latest PMIx updates Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-30 21:35:35 -07:00
Ralph Castain	9f1f9d6606	Update to PMIx v2.0.0rc1 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-28 10:30:58 -07:00
Ralph Castain	9f60cd0fe7	Update the connect/accept support so we check to see if we have the proper infrastructure and RTE support, including whether we have ompi-server available if the connect/accept spans multiple applications. Print pretty help messages in all cases where we do not have support Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-27 10:47:08 -07:00
Nathan Hjelm	33d59886e1	Merge pull request #3587 from hjelmn/event_abstraction pmix/pmix2x: fix errors in event abstration	2017-05-26 10:44:18 -06:00
Nathan Hjelm	a512b8962d	pmix/pmix2x: fix errors in event abstration Parts of the pmix2x component called the event_* functions directly instead of the opal_event_* wrappers. This is fine as long as we are using libevent but becomes a problem with other event libraries. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-05-26 09:49:11 -06:00
Ralph Castain	2f721a3366	Merge pull request #3585 from rhc54/topic/pmix20 Update to pmix v2.0beta	2017-05-26 06:05:44 -07:00
Ralph Castain	e1e264711a	Update to pmix v2.0beta Fix atomics - again Fix initialization of notification ring buffer Fix wait_sync definitions Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-26 03:33:18 -07:00
Ralph Castain	657e701c65	Add debug verbosity to the orte data server and pmix pub/lookup functions Start updating the various mappers to the new procedure. Remove the stale lama component as it is now very out-of-date. Bring round_robin and PPR online, and modify the mindist component (but cannot test/debug it). Remove unneeded test Fix memory corruption by re-initializing variable to NULL in loop Resolve the race condition identified by @ggouaillardet by resetting the mapped flag within the same event where it was set. There is no need to retain the flag beyond that point as it isn't used again. Add a new job attribute ORTE_JOB_FULLY_DESCRIBED to indicate that all the job information (including locations and binding) is included in the launch message. Thus, the backend daemons do not need to do any map computation for the job. Use this for the seq, rankfile, and mindist mappers until someone decides to update them. Note that this will maintain functionality, but means that users of those three mappers will see large launch messages and less performant scaling than those using the other mappers. Have the mindist module add procs to the job's proc array as it is a fully described module Protect the hnp-not-in-allocation case Per path suggested by Gilles - protect the HNP node when it gets added in the absence of any other allocation or hostfile Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-25 18:41:27 -07:00
Thananon Patinyasakdikul	bf7534d32c	btl/usnic: changed fi_ep_bind flags for AV from NULL to 0 due to compiler warning. This commit fixed compiler warning generated from earlier commit : `ddbe1726c5` Signed-off-by: Thananon Patinyasakdikul <apatinya@cisco.com>	2017-05-22 10:09:43 -07:00
Geoff Paulsen	50f9287c03	Merge pull request #2941 from markalle/pr/mpi-info-update2 Finally Merging this in. MPI_*_get_info/set_info(). Targeting v3.1 release. @hjelmn were you interested in switching some internal pieces to begin using this? Should we target v3.1 (or whatever we call the Oct 15th release?)	2017-05-22 09:22:04 -05:00
Mark Allen	482d84b6e5	fixes for Dave's get/set info code The expected sequence of events for processing info during object creation is that if there's an incoming info arg, it is opal_info_dup()ed into the obj at obj->s_info first. Then interested components register callbacks for keys they want to know about using opal_infosubscribe_infosubscribe(). Inside info_subscribe_subscribe() the specified callback() is called with whatever matching k/v is in the object's info, or with the default. The return string from the callback goes into the new k/v stored in info, and the input k/v is saved as __IN_<key>/<val>. It's saved the same way whether the input came from info or whether it was a default. A null return from the callback indicates an ignored key/val, and no k/v is stored for it, but an __IN_<key>/<val> is still kept so we still have access to the original. At MPI__set_info() time, opal_infosubscribe_change_info() is used. That function calls the registered callbacks for each item in the provided info. If the callback returns non-null, the info is updated with that k/v, or if the callback returns null, that key is deleted from info. An __IN_<key>/<val> is saved either way, and overwrites any previously saved value. When MPI__get_info() is called, opal_info_dup_mpistandard() is used, which allows relatively easy changes in interpretation of the standard, by looking at both the <key>/<val> and __IN_<key>/<val> in info. Right now it does 1. includes system extras, eg k/v defaults not expliclty set by the user 2. omits ignored keys 3. shows input values, not callback modifications, eg not the internal values Currently the callbacks are doing things like return some_condition ? "true" : "false" that is, returning static strings that are not to be freed. If the return strings start becoming more dynamic in the future I don't see how unallocated strings could support that, so I'd propose a change for the future that the callback()s registered with info_subscribe_subscribe() do a strdup on their return, and we change the callers of callback() to free the strings it returns (there are only two callers). Rough outline of the smaller changes spread over the less central files: comm.c initialize comm->super.s_info to NULL copy into comm->super.s_info in comm creation calls that provide info OBJ_RELEASE comm->super.s_info at free time comm_init.c initialize comm->super.s_info to NULL file.c copy into file->super.s_info if file creation provides info OBJ_RELEASE file->super.s_info at free time win.c copy into win->super.s_info if win creation provides info OBJ_RELEASE win->super.s_info at free time comm_get_info.c file_get_info.c win_get_info.c change_info() if there's no info attached (shouldn't happen if callbacks are registered) copy the info for the user The other category of change is generally addressing compiler warnings where ompi_info_t and opal_info_t were being used a little too interchangably. An ompi_info_t* contains an opal_info_t*, at &(ompi_info->super) Also this commit updates the copyrights. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2017-05-17 01:12:49 -04:00
Thananon Patinyasakdikul	a705f2cf7b	usNIC: fix fi_ep_bind flag. FI_RECV should not be associated with av. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>	2017-05-16 18:22:28 -04:00
Jeff Squyres	23325c31d3	Merge pull request #3338 from jjhursey/topic/ompi_info_show_failed `ompi_info --show-failed` feature	2017-05-16 17:08:43 -04:00
David Solt	50aa143ab6	Major structural changes to data types: .super infosubscriber ompi_communicator_t, ompi_win_t, ompi_file_t all have a super class of type opal_infosubscriber_t instead of a base/super type of opal_object_t (in previous code comm used c_base, but file used super). It may be a bit bold to say that being a subscriber of MPI_Info is the foundational piece that ties these three things together, but if you object, then I would prefer to turn infosubscriber into a more general name that encompasses other common features rather than create a different super class. The key here is that we want to be able to pass comm, win and file objects as if they were opal_infosubscriber_t, so that one routine can heandle all 3 types of objects being passed to it. MPI_INFO_NULL is still an ompi_predefined_info_t type since an MPI_Info is part of ompi but the internal details of the underlying information concept is part of opal. An ompi_info_t type still exists for exposure to the user, but it is simply a wrapper for the opal object. Routines such as ompi_info_dup, etc have all been moved to opal_info_dup and related to the opal directory. Fortran to C translation tables are only used for MPI_Info that is exposed to the application and are therefore part of the ompi_info_t and not the opal_info_t The data structure changes are primarily in the following files: communicator/communicator.h ompi/info/info.h ompi/win/win.h ompi/file/file.h The following new files were created: opal/util/info.h opal/util/info.c opal/util/info_subscriber.h opal/util/info_subscriber.c This infosubscriber concept is that communicators, files and windows can have subscribers that subscribe to any changes in the info associated with the comm/file/window. When xxx_set_info is called, the new info is presented to each subscriber who can modify the info in any way they want. The new value is presented to the next subscriber and so on until all subscribers have had a chance to modify the value. Therefore, the order of subscribers can make a difference but we hope that there is generally only one subscriber that cares or modifies any given key/value pair. The final info is then stored and returned by a call to xxx_get_info. The new model can be seen in the following files: ompi/mpi/c/comm_get_info.c ompi/mpi/c/comm_set_info.c ompi/mpi/c/file_get_info.c ompi/mpi/c/file_set_info.c ompi/mpi/c/win_get_info.c ompi/mpi/c/win_set_info.c The current subscribers where changed as follows: mca/io/ompio/io_ompio_file_open.c mca/io/ompio/io_ompio_module.c mca/osc/rmda/osc_rdma_component.c (This one actually subscribes to "no_locks") mca/osc/sm/osc_sm_component.c (This one actually subscribes to "blocking_fence" and "alloc_shared_contig") Signed-off-by: Mark Allen <markalle@us.ibm.com> Conflicts: AUTHORS ompi/communicator/comm.c ompi/debuggers/ompi_mpihandles_dll.c ompi/file/file.c ompi/file/file.h ompi/info/info.c ompi/mca/io/ompio/io_ompio.h ompi/mca/io/ompio/io_ompio_file_open.c ompi/mca/io/ompio/io_ompio_file_set_view.c ompi/mca/osc/pt2pt/osc_pt2pt.h ompi/mca/sharedfp/addproc/sharedfp_addproc.h ompi/mca/sharedfp/addproc/sharedfp_addproc_file_open.c ompi/mca/topo/treematch/topo_treematch_dist_graph_create.c ompi/mpi/c/lookup_name.c ompi/mpi/c/publish_name.c ompi/mpi/c/unpublish_name.c opal/mca/mpool/base/mpool_base_alloc.c opal/util/Makefile.am	2017-05-12 14:41:05 -04:00
Gilles Gouaillardet	026f3dd2dd	pmix2x: plug a misc memory leak Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-05-10 14:57:44 +09:00
Ralph Castain	0afcb1a448	Update to support server self-notifications Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-08 10:04:50 -07:00
Ralph Castain	ef0e0171c9	Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c). Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-08 10:04:50 -07:00
Ralph Castain	3bca715780	Fix pmix configury so that libpmix is still emitted when --with-devel-headers is given, even under static builds Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-05-05 11:15:32 -07:00
Jeff Squyres	eb03679d7f	Merge pull request #3444 from jsquyres/pr/fix-pmix-static-devel-header-builds pmix/configure.m4: always use embedded mode	2017-05-04 14:25:28 -04:00
Jeff Squyres	af336ac0e8	pmix/configure.m4: always use embedded mode Looks like embedded mode was mistakenly disabled when --with-devel-headers was specified. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-05-04 10:01:41 -07:00
Ralph Castain	a737d0f963	Merge pull request #3430 from bosilca/topic/tcp_hostname Use the OPAL function to get the hostname.	2017-05-03 06:42:02 -07:00
Brian Barrett	3b991498be	btl tcp: Don't set socket buffer size by default Set the default send and receive socket buffer size to 0, which means Open MPI will not try to set a buffer size during startup. The default behavior since near day one of the TCP BTL has been to set the send and receive socket buffer sizes to 128 KiB. A number that works great on 1 GbE, but not so great on 10 GbE fabrics of any real size. Modern TCP stacks, particularly on Linux, have gotten much smarter about buffer sizes and are much less efficient if a buffer size is set (even if set to something large). Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2017-04-28 14:14:49 -07:00
George Bosilca	2d8943d920	Use the OPAL function to get the hostname.	2017-04-28 02:48:15 -04:00
Nathan Hjelm	387467c358	btl/ugni: remove erroneous mca_btl_ugni_frag_return call Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-04-27 09:14:51 -06:00
Ralph Castain	8b1f01dfe6	Set the default modex parameters back to full blocking modex while we continue to test and debug the slow modex - it seems to be having issues on the Cray Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-22 15:19:46 -07:00
Howard Pritchard	f2a27cc991	Merge pull request #3396 from hppritcha/topic/swat_compiler_warning btl/sm: swat a compiler warning	2017-04-22 14:31:21 -06:00
Ralph Castain	f2ed293ecd	Merge pull request #3398 from rhc54/topic/modex Implement a background fence that collects all data during modex operation	2017-04-21 15:15:49 -07:00
Ralph Castain	9fc3079ac2	Implement a background fence that collects all data during modex operation The direct modex operation is slow, especially at scale for even modestly-connected applications. Likewise, blocking in MPI_Init while we wait for a full modex to complete takes too long. However, as George pointed out, there is a middle ground here. We could kickoff the modex operation in the background, and then trap any modex_recv's until the modex completes and the data is delivered. For most non-benchmark apps, this may prove to be the best of the available options as they are likely to perform other (non-communicating) setup operations after MPI_Init, and so there is a reasonable chance that the modex will actually be done before the first modex_recv gets called. Once we get instant-on-enabled hardware, this won't be necessary. Clearly, zero time will always out-perform the time spent doing a modex. However, this provides a decent compromise in the interim. This PR changes the default settings of a few relevant params to make "background modex" the default behavior: * pmix_base_async_modex -> defaults to true * pmix_base_collect_data -> continues to default to true (no change) * async_mpi_init - defaults to true. Note that the prior code attempted to base the default setting of this value on the setting of pmix_base_async_modex. Unfortunately, the pmix value isn't set prior to setting async_mpi_init, and so that attempt failed to accomplish anything. The logic in MPI_Init is: * if async_modex AND collect_data are set, AND we have a non-blocking fence available, then we execute the background modex operation * if async_modex is set, but collect_data is false, then we simply skip the modex entirely - no fence is performed * if async_modex is not set, then we block until the fence completes (regardless of collecting data or not) * if we do NOT have a non-blocking fence (e.g., we are not using PMIx), then we always perform the full blocking modex operation. * if we do perform the background modex, and the user requested the barrier be performed at the end of MPI_Init, then we check to see if the modex has completed when we reach that point. If it has, then we execute the barrier. However, if the modex has NOT completed, then we block until the modex does complete and skip the extra barrier. So we never perform two barriers in that case. HTH Ralph Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-04-21 10:29:23 -07:00

1 2 3 4 5 ...

3112 Коммитов