openmpi

Автор	SHA1	Сообщение	Дата
Brian Barrett	e9e4d2a4bc	Handle asprintf errors with opal_asprintf wrapper The Open MPI code base assumed that asprintf always behaved like the FreeBSD variant, where ptr is set to NULL on error. However, the C standard (and Linux) only guarantee that the return code will be -1 on error and leave ptr undefined. Rather than fix all the usage in the code, we use opal_asprintf() wrapper instead, which guarantees the BSD-like behavior of ptr always being set to NULL. In addition to being correct, this will fix many, many warnings in the Open MPI code base. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-10-08 16:43:53 -07:00
Nathan Hjelm	000f9eed4d	opal: add types for atomic variables This commit updates the entire codebase to use specific opal types for all atomic variables. This is a change from the prior atomic support which required the use of the volatile keyword. This is the first step towards implementing support for C11 atomics as that interface requires the use of types declared with the _Atomic keyword. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-09-14 10:48:55 -06:00
Nathan Hjelm	bd5cd62df9	btl/ugni: fix up some warnings Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-03 16:30:44 -06:00
Nathan Hjelm	d8916a4672	btl/ugni: fix race condition in completing frags The descriptor flags field in a fragment were being ready after the fragment may have been freed. This commit reads the flags before calling the user callback. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-03 10:48:54 -06:00
Nathan T. Weeks	08f9ae97ee	btl/ugni: update BTL_VERBOSE argument list Signed-off-by: Nathan T. Weeks <weeks@iastate.edu>	2018-07-02 09:23:30 -06:00
Nathan Hjelm	b0ac6276a6	btl/ugni: improve multi-threaded RDMA performance This commit improves the injection rate and latency for RDMA operations. This is done by the following improvements: - If C11's _Thread_local keyword is available then always use the same virtual device index for the same thread when using RDMA. If the keyword is not available then attempt to use any device that isn't already in use. The binding support is enabled by default but can be disabled via the btl_ugni_bind_devices MCA variable. - When posting FMA and RDMA operations always attempt to reap completions after posting the operation. This allows us to better balance the work of reaping completions across all application threads. - Limit the total number of outstanding BTE transactions. This fixes a performance bug when using many threads. - Split out RDMA and local SMSG completion queue sizes. The RDMA queue size is better tuned for performance with RMA-MT. - Split out put and get FMA limits. The old btl_ugni_fma_limit MCA variable is deprecated. The new variable names are: btl_ugni_fma_put_limit and btl_ugni_fma_get_limit. - Change how post descriptors are handled. They are no longer allocated seperately from the RDMA endpoints. - Some cleanup to move error code out of the critical path. - Disable the FMA sharing flag on the CDM when we detect that there should be enough FMA descriptors for the number of virtual devices we plan will create. If the user sets this flag we will not unset it. This change should improve the small-message RMA performance by ~ 10%. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-26 11:31:35 -06:00
Howard Pritchard	64de269cc3	topo/treematch - quash compiler warning quash a compiler warning showing up with gcc 7.3 Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2018-06-13 16:34:17 -05:00
Nathan Hjelm	1282e98a01	opal/asm: rename existing arithmetic atomic functions This commit renames the arithmetic atomic operations in opal to indicate that they return the new value not the old value. This naming differentiates these routines from new functions that return the old value. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-30 10:41:22 -07:00
Nathan Hjelm	9d0b3fe9f4	opal/asm: remove opal_atomic_bool_cmpset functions This commit eliminates the old opal_atomic_bool_cmpset functions. They have been replaced by the opal_atomic_compare_exchange_strong functions. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-30 10:41:22 -07:00
Nathan Hjelm	3ff34af355	opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset* This commit renames the atomic compare-and-swap functions to indicate the return value. This is in preperation for adding support for a compare-and-swap that returns the old value. At the same time the return type has been changed to bool. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-10-31 12:47:23 -06:00
Joshua Hursey	e1d079544b	mca: Dynamic components link against project lib * Resolves #3705 * Components should link against the project level library to better support `dlopen` with `RTLD_LOCAL`. * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am` with the appropriate project level library: ``` MCA components in ompi/ $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la MCA components in orte/ $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la MCA components in opal/ $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la MCA components in oshmem/ $(top_builddir)/oshmem/liboshmem.la" ``` Note: The changes in this commit were automated by the script in the commit that proceeds it with the `libadd_mca_comp_update.py` script. Some components were not included in this change because they are statically built only. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-08-24 11:56:16 -04:00
Howard Pritchard	12a5aacdfd	btl/ugni: swat compiler warning Signed-off-by: Howard Pritchard <hppritcha@gmail.com>	2017-08-01 12:21:57 -06:00
Ralph Castain	31130a4bee	Replace syntax with something less strictly C99 Fixes #3809 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-07-05 16:54:36 -07:00
Nathan Hjelm	387467c358	btl/ugni: remove erroneous mca_btl_ugni_frag_return call Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-04-27 09:14:51 -06:00
Ralph Castain	ecc8000136	Silence a flood of warnings when compiling with gcc on Cray Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-24 13:37:11 -06:00
Nathan Hjelm	6b210fa2c4	btl/ugni: do not return a frag from sendi if an endpoint is waitlisted This fixes a hang that can occur when running bandwidth tests. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-14 10:14:13 -06:00
Nathan Hjelm	2e42b0afbd	btl/ugni: move connection check into sync event This commit makes datagram checks time based and reduces their frequency when only the wildcard datagram is posted. This change improves latency on knl systems. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-14 10:10:05 -06:00
Nathan Hjelm	d5aaeb74b6	btl/ugni: return a descriptor from sendi Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-13 14:56:54 -06:00
Nathan Hjelm	a19e7023d1	btl/ugni: always check local SMSG CQ This commit removes the local operation count check from the local SMSG completion queue. This check was leading to hangs due to an undocumented feature of the ugni library. The local SMSG CQ is used to send credit return messages back to the sender. The ugni library never checks for the completion itself but relying on the SMSG user to periodically check the CQ. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-13 14:56:54 -06:00
Nathan Hjelm	d5cdeb81d0	btl/ugni: improve multi-threaded performance This commit updates the ugni btl to make use of multiple device contexts to improve the multi-threaded RMA performance. This commit contains the following: - Cleanup the endpoint structure by removing unnecessary field. The structure now also contains all the fields originally handled by the common/ugni endpoint. - Clean up the fragment allocation code to remove the need to initialize the my_list member of the fragment structure. This member is not initialized by the free list initializer function. - Remove the (now unused) common/ugni component. btl/ugni no longer need the component. common/ugni was originally split out of btl/ugni to support bcol/ugni. As that component exists there is no reason to keep this component. - Create wrappers for the ugni functionality required by btl/ugni. This was done to ease supporting multiple device contexts. The wrappers are thread safe and currently use a spin lock instead of a mutex. This produces better performance when using multiple threads spread over multiple cores. In the future this lock may be replaced by another serialization mechanism. The wrappers are located in a new file: btl_ugni_device.h. - Remove unnecessary device locking from serial parts of the ugni btl. This includes the first add-procs and module finalize. - Clean up fragment wait list code by moving enqueue into common function. - Expose the communication domain flags as an MCA variable. The defaults have been updated to reflect the recommended setting for knl and haswell. - Avoid allocating fragments for communication with already overloaded peers. - Allocate RDMA endpoints dyncamically. This is needed to support spreading RMA operations accross multiple contexts. - Add support for spreading RMA communication over multiple ugni device contexts. This should greatly improve the threading performance when communicating with multiple peers. By default the number of virtual devices depends on 1) whether opal_using_threads() is set, 2) how many local processes are in the job, and 3) how many bits are available in the pid. The last is used to ensure that each CDM is created with a unique id. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-13 14:46:06 -06:00
Nathan Hjelm	12bf38a25c	btl/ugni: add MPI_T performance variables for ugni counters This commit exposes ugni statistics for use with MPI_T. There is no overhead to providing these counters. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-13 14:42:58 -06:00
Howard Pritchard	09f47fcf8e	btl/ugni:vader swat some compiler warnings Swat some compiler warnings. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-11-21 14:58:34 -06:00
Gilles Gouaillardet	c92e9a5406	use the new OPAL_HASH_TABLE_FOREACH convenience macro	2016-10-08 16:58:20 +09:00
Nathan Hjelm	f93c1f2106	btl/ugni: fix erroneous warning message This commit prevents the connection code from trying to connect an endpoint if the directed datagram has been posted but not received. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-02 09:17:44 -06:00
Nathan Hjelm	83062db7cb	btl/ugni: actually make the endpoint lock recursive Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-24 10:36:08 -06:00
Nathan Hjelm	adb668209b	btl/ugni: fix another connection race This commit fixes a race that can occur when two threads are in the ugni progress function at the same time. This race occurs when one thread calls GNI_PostDataProbeById then goes to sleep then another thread calls GNI_PostDataProbeById then GNI_EpPostDataWaitById before the other thread wakes up. If this happens the first thread will print a warning on GNI_EpPostDataWaitById about no matching post. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-08 15:38:11 -06:00
Nathan Hjelm	14b36d4503	btl/ugni: protect against re-entry and races in connections This commit fixes two issues that can occur during a connection: - Re-entry to connection progress from modex lookup. Added an additional endpoint state that will keep the code from re-entering the common endpoint create. - Fixed a race between a process posting a directed datagram through a send and a connection being progressed through opal_progress(). The progress code was not obtaining the endpoint lock before attempting to update the endpoint. To limit the amount of code changed for 2.0.1 this commit makes the endpoint lock recursive. In a future update this may be changed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-04 16:08:01 -06:00
Nathan Hjelm	9f43b23725	Merge pull request #1710 from hjelmn/ugni_atomics Additional ugni atomics	2016-06-02 18:25:49 -06:00
Nathan Hjelm	bf10d79914	btl/ugni: remove erroneous unlock The endpoint lock was being released twice in mca_btl_ugni_get_ep. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-31 16:52:53 -06:00
Nathan Hjelm	cc96097873	btl/ugni: fix bug when attempting unaligned get on aries This commit fixes a programming error when using an aries nic. The documentation of ugni shows that only the local alignment restriction for get was lifted on aries. There is still a remote address alignment restriction. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-31 16:52:09 -06:00
Nathan Hjelm	28dfa36a3f	btl/ugni: fix bug when attempting unaligned get on aries This commit fixes a programming error when using an aries nic. The documentation of ugni shows that only the local alignment restriction for get was lifted on aries. There is still a remote address alignment restriction. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 08:22:13 -06:00
Nathan Hjelm	c19426ac1b	btl/ugni: add support for additional atomic operations This commit adds support for Cray Aries atomic operations. This includes 32-bit and floating point support. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 08:22:13 -06:00
Nathan Hjelm	99627319f0	btl/ugni: reduce overhead of progress function This commit reduces the overhead of calling the ugni progress function. It does the following: - Check for new connections once every eight calls. - Do not call remote smsg progress unless we are connected to at least one remote peer. - Do not call rdma progress unless at least one rdma fragment is outstanding. - Check endpoint wait list size before obtaining a lock. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 14:27:34 -06:00
Nathan Hjelm	c2b6fbb124	opal/memory: move initialization to first rcache creation Because of the removal of the linux memory component it is no longer necessary to initialize the memory component in opal_init(). This commit moves the initialization to the creation of the first rcache component. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:21:46 -06:00
Nathan Hjelm	d7874920aa	btl/ugni: set the frag reference count in the eager get path This comit adds code that sets the fragment reg_cnt to 1 when sending the completion message for an eager get. Without this the btl will either hang or abort. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-02 12:10:22 -06:00
Nathan Hjelm	d4afb16f5a	opal: rework mpool and rcache frameworks This commit rewrites both the mpool and rcache frameworks. Summary of changes: - Before this change a significant portion of the rcache functionality lived in mpool components. This meant that it was impossible to add a new memory pool to use with rdma networks (ugni, openib, etc) without duplicating the functionality of an existing mpool component. All the registration functionality has been removed from the mpool and placed in the rcache framework. - All registration cache mpools components (udreg, grdma, gpusm, rgpusm) have been changed to rcache components. rcaches are allocated and released in the same way mpool components were. - It is now valid to pass NULL as the resources argument when creating an rcache. At this time the gpusm and rgpusm components support this. All other rcache components require non-NULL resources. - A new mpool component has been added: hugepage. This component supports huge page allocations on linux. - Memory pools are now allocated using "hints". Each mpool component is queried with the hints and returns a priority. The current hints supported are NULL (uses posix_memalign/malloc), page_size=x (huge page mpool), and mpool=x. - The sm mpool has been moved to common/sm. This reflects that the sm mpool is specialized and not meant for any general allocations. This mpool may be moved back into the mpool framework if there is any objection. - The opal_free_list_init arguments have been updated. The unused0 argument is not used to pass in the registration cache module. The mpool registration flags are now rcache registration flags. - All components have been updated to make use of the new framework interfaces. As this commit makes significant changes to both the mpool and rcache frameworks both versions have been bumped to 3.0.0. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-14 10:50:41 -06:00
Nathan Hjelm	cd11fc3081	btl/ugni: fix race condition that causes completions to be dropped The send code in the ugni btl has an optimization that enables it to return 1 (fragment gone) in some cases. This optimization involved removing the btl ownership and callback flags to ensure the fragment stuck around long enough for its completion flag to be checked. This works fine for the single-threaded case but not in the multi-threaded case. It is possible that a fragment will be completed by another thread while a thread is in mca_btl_ugni_send. This competition can lead to a leaked fragment, missed callback, or both. To fix the issue without removing the optimization a reference count has been added to the fragment. Callbacks and fragment release will not be made until the fragment reference count has reach 0. The count is incremented before sending the frag and decremented after the completion flag has been checked. The fix has been verified to work using a multi-threaded RMA benchmark with the osc/pt2pt component. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-02 12:14:31 -07:00
Nathan Hjelm	14704201e2	btl/ugni: fix race condition when adding endpoint to wait list This commit fixes a race condition that can cause an endpoint to be added to the wait list multiple times. To fix the issue an additional check has been added to ensure the endpoint is not on the wait list after the wait list lock is held. The wait list processing code has also been updated to keep the wait list lock until all wait listed endpoints have been handled. This reduces the chance that an endpoint that is being processed by the wait list code is not re-added to the list by a competing send. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-02 12:13:49 -07:00
Howard Pritchard	eaba98ce5d	btl/ugni: fix very poor aries bw problem The handling of RDMA get alignment in ugni BTL for Aries (cray xc) was wrong, resulting in very poor bandwidth for ugni BTL on aries. Verified using osu_bw now gives sensible bandwidth on Aries. Fixes #1005 Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2015-10-13 16:01:17 -05:00
Nathan Hjelm	59aa93e1b6	opal/mpool: add support for passing access flags to register This commit adds a access_flags argument to the mpool registration function. This flag indicates what kind of access is being requested: local write, remote read, remote write, and remote atomic. The values of the registration access flags in the btl are tied to the new flags in the mpool. All mpools have been updated to include the new argument but only the grdma and udreg mpools have been updated to make use of the access flags. In both mpools existing registrations are checked for sufficient access before being returned. If a registration does not contain sufficient access it is marked as invalid and a new registration is generated. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-10-05 13:53:55 -06:00
Nathan Hjelm	3c33a8e94b	btl/ugni: adjust exclusivity below sm and vader Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-29 15:40:35 -06:00
Nathan Hjelm	408da16d50	ompi/proc: add proc hash table for ompi_proc_t objects This commit adds an opal hash table to keep track of mapping between process identifiers and ompi_proc_t's. This hash table is used by the ompi_proc_by_name() function to lookup (in O(1) time) a given process. This can be used by a BTL or other component to get a ompi_proc_t when handling an incoming message from an as yet unknown peer. Additionally, this commit adds a new MCA variable to control the new add_procs behavior: mpi_add_procs_cutoff. If the number of ranks in the process falls below the threshold a ompi_proc_t is created for every process. If the number of ranks is above the threshold then a ompi_proc_t is only created for the local rank. The code needed to generate additional ompi_proc_t's for a communicator is not yet complete. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-10 08:55:54 -06:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Howard Pritchard	d9f080b0c7	btl/ugni: silence common symbol squawk Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2015-05-16 10:23:06 -05:00
Nathan Hjelm	855d422e62	Merge pull request #408 from hjelmn/btl_3_0_mod btl: expose local registration thresholds	2015-02-26 12:57:43 -07:00
Nathan Hjelm	8a17e69067	btl/ugni: fix typos introduced by free list update	2015-02-25 12:43:05 -07:00
Nathan Hjelm	5f1254d710	Update code base to use the new opal_free_list_t Use of the old ompi_free_list_t and ompi_free_list_item_t is deprecated. These classes will be removed in a future commit. This commit updates the entire code base to use opal_free_list_t and opal_free_list_item_t. Notes: OMPI_FREE_LIST__MT -> opal_free_list_ (uses opal_using_threads ()) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-24 10:05:45 -07:00
Howard Pritchard	bf89131f9e	add owner files to opa/ompi/orte mca directories This commit adds an owner file in each of the component directories for each framework. This allows for a simple script to parse the contents of the files and generate, among other things, tables to be used on the project's wiki page. Currently there are two "fields" in the file, an owner and a status. A tool to parse the files and generate tables for the wiki page will be added in a subsequent commit.	2015-02-22 15:10:23 -07:00
Nathan Hjelm	cc750b00a6	btl: export local registration thresholds Some BTLs do not require local registration for some rdma transactions. For example: inline put on openib, fma put on ugni. This commit adds code to expose the local registration thresholds to BTL users. Optimized code can take advantage of this information to improve rdma performance.	2015-02-19 16:13:37 -07:00

1 2

87 Коммитов