openmpi

Автор	SHA1	Сообщение	Дата
rhc54	52acd5b7ee	Merge pull request #1354 from rhc54/topic/sing Add support for Singularity containers	2016-02-13 06:57:47 -08:00
Jeff Squyres	7bc62e8f4c	Merge pull request #1356 from hjelmn/get_address Fix MPI_Get_address (MPI_BOTTOM, ...)	2016-02-13 08:27:18 -05:00
Ralph Castain	aa9e5a1a27	Add support for Singularity containers, including a .m4 file for checking if Singularity is available and an orte/schizo component for setting the proper support if a container was given as the executable Cleanup the configury so we properly check for Singularity under the various typical use-cases Bring the Singularity support online. We have to turn "off" the sm BTL as it segfaults from inside the container - root cause remains unclear. Also turned "off" the various OPAL shmem components in case they are involved and someone else tries to use them. Happily, the vader BTL works just fine!	2016-02-13 04:40:22 -08:00
Nathan Hjelm	064a67f5b9	Fix MPI_Get_address (MPI_BOTTOM, ...) Nowhere in the standard does it say that it is invalid to pass MPI_BOTTOM to MPI_Get_address yet we were returning an error. This commit removes the error check on NULL == location. Fixes open-mpi/ompi#1355. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-12 16:34:21 -07:00
yohann	67ce4a080a	mtl/ofi: FI_AV_MAP support only.	2016-02-12 10:06:52 -08:00
yohann	b3d8ead76e	mtl/ofi: Fix dynamic add_procs.	2016-02-12 10:05:52 -08:00
Jeff Squyres	d98616b9ed	Merge pull request #1337 from ggouaillardet/poc/f08_fn mpi_f08: correctly implements MPI_{COMM,TYPE,WIN}_{DUP,NULL_{COPY,DEL…	2016-02-11 12:27:29 -05:00
Nathan Hjelm	39b44d0652	Merge pull request #1345 from ggouaillardet/topic/sentinel_proc_name_conversion Topic/sentinel proc name conversion	2016-02-10 19:08:33 -07:00
Gilles Gouaillardet	96310f439b	sentinel: fix 32 bits arch since a sentinel is only made from the current job, only store the first 31 bits of the vpid into the sentinel.	2016-02-10 15:44:07 +09:00
Gilles Gouaillardet	b55b9e6aee	sentinel: fix sentinel to proc_name conversion converting an opal_process_name_t means the loss of one bit, it was decided to restrict the local job id to 15 bits, so the useful information of an opal_process_name_t can fit in 63 bits.	2016-02-10 15:44:07 +09:00
Gilles Gouaillardet	030a5f2054	sentinel: use type uintptr_t for sentinel MSB is now automatically cleared when right shifting Thanks George for pointing this	2016-02-10 11:28:56 +09:00
Jeff Squyres	7850517215	brucks: rename the "brks" component to be "brucks" After hearing the 3rd person ask what "brks" stood for, I'm renaming this component to be "brucks" (because it uses a Bruck-based algorithm).	2016-02-09 13:17:11 -08:00
Jeff Squyres	d537ee9f26	Merge pull request #1340 from jsquyres/pr/decrease-mpi_add_procs_cutoff RFC: ompi_mpi_params.c: set mpi_add_procs_cutoff default to 0	2016-02-09 13:36:43 -05:00
Jeff Squyres	902b477aac	ompi_mpi_params.c: set mpi_add_procs_cutoff default to 0 Decrease the default value of the "mpi_add_procs_cutoff" MCA param from 1024 to 0.	2016-02-09 09:41:36 -08:00
Jeff Squyres	8558def858	opal.pc.in: fix typo; use the write AC_SUBST'ed variable As reported by @marksantcroos, this substitution in opal.pc was incorrect -- it left @{libdir} in the string (vs. ${libdir}). The fix is simple: use the proper substitution variable in opal.pc (it was never updated to reflect the new/correct name that was created just for the pkg-config files). Fixes open-mpi/ompi#1343.	2016-02-08 10:55:18 -08:00
Ralph Castain	3fbad2e2bd	Transfer across the -host number of slots	2016-02-08 10:38:03 -08:00
George Bosilca	7c574a3530	Typo.	2016-02-07 07:22:22 +02:00
Jeff Squyres	8d0a592563	usnic: update a few verbose reachability messages	2016-02-06 03:28:48 -08:00
Jeff Squyres	87dbe6ce01	usnic: add high-verbose reachability messages	2016-02-06 03:28:47 -08:00
Jeff Squyres	dac2fe1589	usnic: ensure to use ntohl() for network-order values	2016-02-06 03:28:47 -08:00
Jeff Squyres	51240394a7	usnic: ensure to init module->av_eq_num	2016-02-06 03:28:47 -08:00
Nathan Hjelm	a1e784d76f	Merge pull request #1341 from hjelmn/osc_pt2pt_fixes osc/pt2pt: bug fixes	2016-02-04 19:10:03 -07:00
Nathan Hjelm	5b9c82a964	osc/pt2pt: bug fixes This commit fixes several bugs identified by @ggouaillardet and MTT: - Fix SEGV in long send completion caused by missing update to the request callback data. - Add an MPI_Barrier to the fence short-cut. This fixes potential semantic issues where messages may be received before fence is reached. - Ensure fragments are flushed when using request-based RMA. This allows MPI_Test/MPI_Wait/etc to work as expected. - Restore the tag space back to 16-bits. It was intended that the space be expanded to 32-bits but the required change to the fragment headers was not committed. The tag space may be expanded in a later commit. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-04 16:59:39 -07:00
rhc54	f38ad4adf3	Merge pull request #1335 from rhc54/topic/gcom Cleanup grpcomm race conditions	2016-02-04 05:47:51 -08:00
Ralph Castain	68912d04a8	Fix the grpcomm operations at scale. Restore the direct component to be the default, and to execute a rollup collective. This may in fact be faster than the alternatives, and something appears broken at scale when using brks in particular. Turn off the rcd and brks components as they don't work at scale right now - they can be restored at some future point when someone can debug them. Adjust to Jeff's quibbles Fixes open-mpi/mpi#1215	2016-02-04 05:42:29 -08:00
Gilles Gouaillardet	6eac6a8b00	osc/sm: create datafile into the per proc directory in order to make it unique per communicator Thanks Peter Wind for the report	2016-02-03 10:12:37 +09:00
Jeff Squyres	89eea51075	usnic: fix calculation for number of blocks	2016-02-02 16:56:34 -08:00
Nathan Hjelm	615b27ca82	Merge pull request #1339 from hjelmn/osc_pt2pt_fixes osc/pt2pt: various threading fixes	2016-02-02 16:47:09 -07:00
Jeff Squyres	d812695201	verbs: fix typo	2016-02-02 14:23:45 -08:00
Jeff Squyres	2cf9b26d34	verbs_usnic: previous commit missed a symbol 0715802f52c24c236700ac085090d5441524644c missed that there is a call to a common/verbs_usnic symbol in the common/verbs component. This call needs to be compiled out when the common/verbs_usnic component is not built.	2016-02-02 14:05:59 -08:00
Nathan Hjelm	a016c17714	Merge pull request #1338 from hjelmn/ugni_threading UNGI threading fixes	2016-02-02 13:22:57 -07:00
Nathan Hjelm	519fffb65e	osc/pt2pt: eager sends are always active if MPI_MODE_NOCHECK is used This commit fixes open-mpi/ompi#1299. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-02 12:44:17 -07:00
Nathan Hjelm	d7264aa613	osc/pt2pt: various threading fixes This commit fixes several bugs identified by a new multi-threaded RMA benchmarking suite. The following bugs have been identified and fixed: - The code that signaled the actual start of an access epoch changed the eager_send_active flag on a synchronization object without holding the object's lock. This could cause another thread waiting on eager sends to block indefinitely because the entirety of ompi_osc_pt2pt_sync_expected could exectute between the check of eager_send_active and the conditon wait of ompi_osc_pt2pt_sync_wait. - The bookkeeping of fragments could get screwed up when performing long put/accumulate operations from different threads. This was caused by the fragment flush code at the end of both put and accumulate. This code was put in place to avoid sending a large number of unexpected messages to a peer. To fix the bookkeeping issue we now 1) wait for eager sends to be active before stating any large isend's, and 2) keep track of the number of large isends associated with a fragment. If the number of large isends reaches 32 the active fragment is flushed. - Use atomics to update the large receive/send tag counters. This prevents duplicate tags from being used. The tag space has also been updated to use the entire 16-bits of the tag space. These changes should also fix open-mpi/ompi#1299. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-02 12:33:33 -07:00
Jeff Squyres	0715802f52	verbs_usnic: do not build by default This component is a workaround to a bug in libibverbs that prints a dire warning that usNIC devices are not supported (of course not -- usNIC devices provide functionality through libfabric, not libibverbs). This component was written before a better workaround was created: a "no op" libibverbs plugin for usNIC devices (https://github.com/cisco/libusnic_verbs, and is also available in binary form on cisco.com). Hence, this component no longer builds by default. It's still available if a user specifically asks for it (e.g., if they do not want to install the "no op" libibverbs plugin), but it's not the default. This component also has the side-effect of making libopen-pal.so depend on libibverbs.so, which can be annoying for packagers (which is another reason it isn't built by default any more).	2016-02-02 11:22:04 -08:00
Nathan Hjelm	cd11fc3081	btl/ugni: fix race condition that causes completions to be dropped The send code in the ugni btl has an optimization that enables it to return 1 (fragment gone) in some cases. This optimization involved removing the btl ownership and callback flags to ensure the fragment stuck around long enough for its completion flag to be checked. This works fine for the single-threaded case but not in the multi-threaded case. It is possible that a fragment will be completed by another thread while a thread is in mca_btl_ugni_send. This competition can lead to a leaked fragment, missed callback, or both. To fix the issue without removing the optimization a reference count has been added to the fragment. Callbacks and fragment release will not be made until the fragment reference count has reach 0. The count is incremented before sending the frag and decremented after the completion flag has been checked. The fix has been verified to work using a multi-threaded RMA benchmark with the osc/pt2pt component. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-02 12:14:31 -07:00
Nathan Hjelm	14704201e2	btl/ugni: fix race condition when adding endpoint to wait list This commit fixes a race condition that can cause an endpoint to be added to the wait list multiple times. To fix the issue an additional check has been added to ensure the endpoint is not on the wait list after the wait list lock is held. The wait list processing code has also been updated to keep the wait list lock until all wait listed endpoints have been handled. This reduces the chance that an endpoint that is being processed by the wait list code is not re-added to the list by a competing send. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-02 12:13:49 -07:00
Gilles Gouaillardet	cda094afc7	mpi_f08: correctly implements MPI_{COMM,TYPE,WIN}_{DUP,NULL_{COPY,DELETE}}_FN Fixes open-mpi/ompi#1323	2016-02-02 13:38:01 +09:00
Gilles Gouaillardet	728a97c558	use-mpi-f08: remove duplicates from Makefile.am	2016-02-02 13:33:07 +09:00
Jeff Squyres	910eca751f	Merge pull request #1327 from ggouaillardet/poc/mpi_xxx_dup_yyy_no_bind f08: do not BIND(C) to subroutines with LOGICAL parameters	2016-02-01 17:56:27 -05:00
Jeff Squyres	9f3ed00125	usnic: minor updates from code review Three minor updates from the code review of https://github.com/open-mpi/ompi-release/pull/933: * Remove an extra blank line a show_help message * We no longer allow -1 for the MCA param btl_usnic_av_eq_num, so change the flag to REGINT_GE_ONE * Change "num_blocks" definition to be in terms of block_len (not eq_size)	2016-02-01 11:14:30 -08:00
Jeff Squyres	c2615a4732	usnic: change retrans timeout to 5ms A bunch of empirical testing has shown that increasing the retranmit timeout from 1ms to 5ms doesn't adversely affect performance, yet decreases the number of gratuitious retransmissions.	2016-01-30 10:49:14 -08:00
Jeff Squyres	797d5026c8	usnic: better av_eq_num default value handling	2016-01-30 10:46:14 -08:00
Jeff Squyres	db825abc00	usnic: don't overrun the fi_av_insert() EQ Add endpoints in a blocked manner so that we don't overrun the fi_av_insert() event queue. Also make the AV EQ length an MCA param, and report it in mca_btl_base_verbose >=5 output.	2016-01-30 08:33:48 -08:00
Jeff Squyres	d624e0d60f	usnic: fix wraparound sequence number issue Sequence numbers will wrap around; it is not sufficient to check for (seq-1) -- must use the SEQ_DIFF macro to properly handle the wraparound. This bug wasn't serious; it just meant we might retransmit one or two extra times when retransmits were triggerd and the sequence numbers wrapped around their sliding windows.	2016-01-30 08:32:13 -08:00
Jeff Squyres	4de4a263f5	usnic: ensure all messages are sent on the data channel Messages should go on the data channel, even if they're short. Only ACKs go on the priority channel.	2016-01-30 08:31:21 -08:00
Gilles Gouaillardet	d529951206	hwloc: correctly count cores with at least one allowed PU when SMT is enabled, a core must be counted as long as one of its hwthread is allowed Thanks Ben Menadue for the report. This fixes a regression from open-mpi/ompi@6d149554a7	2016-01-29 11:54:34 +09:00
Edgar Gabriel	3f7fff5780	Merge pull request #1331 from edgargabriel/solaris-statfs-fix Solaris statfs fix	2016-01-28 20:16:33 -06:00
Gilles Gouaillardet	f5a53b5f1e	pmix: fix Makefile.am to correctly exclude autogenerated file from tarball (back-ported from pmix/master@73daf58ee5)	2016-01-28 11:42:03 +09:00
Gilles Gouaillardet	69ba2a9b6b	ddt: fix support of MPI_COMBINER_RESIZED in __ompi_datatype_create_from_args Thanks James Ramsey for the report	2016-01-28 11:32:29 +09:00
Nathan Hjelm	e564c69769	Merge pull request #1330 from hjelmn/osc_rdma_fix osc/rdma: fix typo in ompi_osc_rdma_complete_atomic	2016-01-26 19:26:59 -07:00

1 2 3 4 5 ...

24452 Коммитов