openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	eef7590e58	wrappers: add the $(EXEEXT) extension to the installed symbolic links	2014-10-28 16:42:51 +09:00
Ralph Castain	4f0c1ae8d9	Continue cleanup of the PMI config code. Eliminate the multiple calls to check for pmi1 and pmi2 - we must check it only once to get the pmix components to build only in the correct situations. Ensure we set the wrapper flags so we handle static builds correctly.	2014-10-27 20:37:33 -07:00
Gilles Gouaillardet	b4e445afb5	btl/sm: fix a typo in the error message	2014-10-28 11:25:42 +09:00
Gilles Gouaillardet	62bde1fcb5	opal/util/proc.c: handle unaligned opal_process_name_t parameters	2014-10-27 14:40:10 +09:00
Jeff Squyres	9334abc474	Makefile: fix problems with static linking Avoid a problem with double-derefence of a variable macro name (i.e., a macro with part of its name from an AC_SUBST, such as ```$(foo@BAR@baz)```. In what might be a bug in Automake 1.14.1, if you do a pattern like this: ```makefile lib_LTLIBRARIES = lib@A_PREFIX@a_lib.la noinst_LTLIBRARIES = lib@A_PREFIX@a_noinst.la lib@A_PREFIX@a_lib_la_SOURCES = a.c lib@A_PREFIX@a_noinst_la_SOURCES = $(lib@A_PREFIX@a_lib_la_SOURCES) ``` Then in the resulting Makefile, the value of ```$(lib@A_PREFIX@a_lib_la_OBJECTS)``` will be blank (when it really should be ```a.o```). To workaround this potential bug, I've simply avoided doing double-derefences like this, and effectively set the second ```_SOURCES``` line equal to ```a.c``` (just like the first ```_SOURCES``` line). Fixes #250.	2014-10-24 16:27:54 -07:00
rolfv	9134f48d4c	Do not use sendi path with GPU buffer	2014-10-24 13:35:01 -07:00
rolfv	b5eec888e5	Merge branch 'master' of github.com:open-mpi/ompi	2014-10-24 12:40:45 -07:00
rolfv	3e29eaf1d6	Fix CUDA compile error	2014-10-24 12:28:03 -07:00
Nathan Hjelm	56a8687c2a	shmem/mmap: do not use O_CREAT in shared memory attach	2014-10-24 11:02:04 -06:00
Jeff Squyres	5207429734	help-opal-shmem-mmap.txt: trivial typo fix	2014-10-24 03:23:53 -07:00
Nathan Hjelm	d72fc7a05f	btl/vader: more updates to the help messages	2014-10-23 08:48:54 -06:00
Gilles Gouaillardet	55a5c99ff0	btl/vader: fix typos in the help file	2014-10-23 19:28:09 +09:00
Gilles Gouaillardet	248acbbc3b	pmix/slurm: correctly set locality of the local ranks as "not found"	2014-10-23 17:02:07 +09:00
Ralph Castain	894acb0aa8	configury: new OPAL_SET_MCA_PREFIX/ORTE_SET_MCA_CMD_LINE_ID macros These two macros set the MCA prefix and MCA cmd line id, respectively. Specifically, MCA parameters will be named PREFIX<foo> in the environment, and the cmd line will use -ID foo bar. These macros must be called during configure.ac and a value supplied. In the case of Open MPI, the values given are PREFIX=OMPI_MCA_ and ID=mca. Other projects (such as ORCM) will call these macros with their own unique values. For example, ORCM uses PREFIX=ORCM_MCA_ and ID=omca This scheme is necessary to allow running Open MPI applications under systems that use their own versions of ORTE and OPAL. For example, when running OMPI applications under ORCM, we need the MCA params passed to the ORCM daemons to be separated from those recognized by the OMPI application.	2014-10-22 18:57:40 -07:00
Ralph Castain	5059077510	Sigh - revert changes to file that shouldn't have been included in the prior commit	2014-10-22 14:06:14 -07:00
Ralph Castain	2ec59acac4	Silence a slew of warnings when --enable-memchecker is given. Reviewed by Jeff	2014-10-22 13:59:08 -07:00
Nathan Hjelm	e1bc2de853	btl/vader: defensive programming: use an actual function for the dummy btl_get and btl_put	2014-10-22 14:57:55 -06:00
Nathan Hjelm	19fbe868b8	btl/sm: defensive programming: use an actual function for the dummy btl_get	2014-10-22 14:57:55 -06:00
Aurélien Bouteiller	f232e94c02	Merge branch 'master' of github.com:open-mpi/ompi	2014-10-22 16:56:06 -04:00
Aurélien Bouteiller	55e49470de	Patch from Nathan outlined with a crash the mishandling of the case where CMA is requested but not available.	2014-10-22 16:55:18 -04:00
Nathan Hjelm	998e69a6fa	btl/sm: add some protection for the use_knem = -1 case Need to unset the dummy btl_get and remove the MCA_BTL_FLAGS_GET flag if neither knem nor cma can be used.	2014-10-22 13:57:01 -06:00
Nathan Hjelm	d7c7bb3993	btl/sm: re-enable the use of CMA and knem At some point we added a sanity check to the btl base to ensure that the btl flags match the available functions (this prevents user's from specifying get or put when no function exists). This check was disabling get for the sm btl since at the time of the check there is no btl_get function. The simplest fix is to set a dummy value to btl_get that will be overwritten with the proper value on btl initialization. Closes #239.	2014-10-22 13:30:27 -06:00
Jeff Squyres	ec4268b59c	usnic: do not send zero-length modex message If there are no usnic BTL modules, then just avoid sending any modex message at all (other BTLs do this; it's safe to do). The change is smaller than it looks: I added a "if 0 ==..." check at the top to return immediately if there are no BTL modules. Then I removed some now-unnecessary conditionals and un-indented as appropriate. Fixes #248	2014-10-22 11:11:58 -07:00
Jeff Squyres	e415c8f9a8	vader: Remove stale comment	2014-10-22 10:32:33 -07:00
Jeff Squyres	c22e1ae33b	configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros These two macros set the prefix for the OPAL and ORTE libraries, respectively. Specifically, the OPAL library will be named libPREFIXopen-pal.la and the ORTE library will be named libPREFIXopen-rte.la. These macros must be called, even if the prefix argument is empty. The intent is that Open MPI will call these macros with an empty prefix, but other projects (such as ORCM) will call these macros with a non-empty prefix. For example, ORCM libraries can be named liborcm-open-pal.la and liborcm-open-rte.la. This scheme is necessary to allow running Open MPI applications under systems that use their own versions of ORTE and OPAL. For example, when running MPI applications under ORTE, if the ORTE and OPAL libraries between OMPI and ORCM are not identical (which, because they are released at different times, are likely to be different), we need to ensure that the OMPI applications link against their ORTE and OPAL libraries, but the ORCM executables link against their ORTE and OPAL libraries.	2014-10-22 10:32:19 -07:00
Jeff Squyres	01fd96bfa5	Revert "Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build." This reverts commit `63f619f871`.	2014-10-22 10:32:11 -07:00
Gilles Gouaillardet	75e8387a4e	vader: vader_add_procs report the error if init_vader_endpoint fails	2014-10-22 19:11:54 +09:00
Gilles Gouaillardet	7508c6f3ad	pmix: correctly handle NULL OPAL_BYTE_OBJECT object	2014-10-22 17:15:21 +09:00
Nathan Hjelm	1a3734ae57	btl/vader: fix compilation on OS X	2014-10-21 09:27:36 -06:00
Gilles Gouaillardet	f56169cee6	btl/vader: silence warning correctly check HAVE_SYS_PRCTL_H	2014-10-21 19:51:29 +09:00
Gilles Gouaillardet	d60f0cbd88	btl/vader: report an error when a segment cannot be attached	2014-10-21 10:42:22 +09:00
Nathan Hjelm	13643f5b6e	btl/vader: improved single-copy support This commit makes the folowing changes: - Add support for the knem single-copy mechanism. Initially vader will only support the synchronous copy mode. Asynchronous copy support may be added int the future. - Improve Linux cross memory attach (CMA) when using restrictive ptrace settings. This will allow Open MPI to use CMA without modifying the system settings to support ptrace attach (see /etc/sysctl.d/10-ptrace.conf). - Allow runtime selection of the single copy mechanism. The default behavior is to use the best available. The priority list of single-copy mehanisms is as follows: xpmem, cma, and knem. - Allow disabling support for kernel-assisted single copy. - Some tuning and bug fixes.	2014-10-20 11:44:52 -06:00
Nadezhda Kogteva	2bce929330	MTL MXM cleanup: unnecessary OMPI_MTL_MXM_CONNECT_ON_FIRST_COMM variable removed	2014-10-20 10:29:47 +03:00
Aurélien Bouteiller	e3be1fb9a5	Quick pass over the sm-knem code, indent fixes	2014-10-17 10:38:35 -04:00
Jeff Squyres	43aff4d8b3	btl sm: error if knem support is requested and cannot be activated Restore the functionality to error out (and show a helpful message) if knem support is requested by is either not compiled in or cannot be activated. Thanks to Gus Correa for bringing the matter to our attention.	2014-10-16 20:01:26 -07:00
Jeff Squyres	b04a2634c6	btl sm: restore btl_sm_have_knem_support MCA param Somehow, this MCA param was accidentally dropped after v1.6.5. Thanks to Gus Correa for bringing this matter to our attention. Also moving some MCA params down from level 9 to levels 4/5.	2014-10-16 19:48:21 -07:00
Ralph Castain	b6aa691e0a	Fix incorrect implementation of new MCA param mca_base_env_list - it was not picking up envars and forwarding them, but only worked if you explicitly set a value for the envar. Ensure it works for both direct and indirect launch modes. Remove stale code as this replaced orte_forward_envars. Ensure it doesn't get passed to the ORTE daemons.	2014-10-16 12:58:56 -07:00
Gilles Gouaillardet	27dcca0bb2	pmi/s1: fix large keys do not overwrite the PMI key when pushing a message that does not fit within 255 bytes	2014-10-16 13:29:32 +09:00
Gilles Gouaillardet	b5aea782ce	Revert "Fix heterogeneous support" Per the discussion at http://www.open-mpi.org/community/lists/devel/2014/10/16050.php This reverts commit `c9c5d4011b`.	2014-10-16 12:24:38 +09:00
George Bosilca	63ba754f3f	Remove unnecessary includes from the datatype	2014-10-15 21:49:32 -04:00
George Bosilca	7541c03b4c	Mark all instances where atomic operations are used but their return value is unnecessary	2014-10-15 21:47:32 -04:00
Jeff Squyres	dc66e197cc	var: fix segv in deprecated file var show_help() Ensure to include the new variable filename in the show_help() output when we load a deprecated MCA param from a file. Fixes #236	2014-10-15 08:07:31 -07:00
Jeff Squyres	51027a6635	usnic: fix minor typo Change harmless-but-weird comma to semicolon. Found during code review.	2014-10-15 05:32:36 -07:00
Gilles Gouaillardet	c9c5d4011b	Fix heterogeneous support * redefine orte_process_name_t so it can be converted between host and network format as an opal_identifier_t aka uint64_t by the OPAL layer. * correctly send OPAL_DSTORE_ARCH key	2014-10-15 17:19:13 +09:00
Gilles Gouaillardet	5c81658d58	pmix: fix big endian arch use the appropriate 64 bits type otherwise data gets incorrectly truncated on big endian arch	2014-10-15 17:17:09 +09:00
Ralph Castain	3ef94a0675	Per email thread on devel list: Revert "OPAL: drop dead with core on bad flow. rarely happens with helloworld on large scale." This reverts commit `86f1d5af3e`. Will be reconsidered via RFC as it represents a significant change in behavior	2014-10-12 21:13:42 -07:00
Ralph Castain	4d27eb70f2	Extend the dstore framework to include a new "update_handle" API so the attributes of an existing handle can be changed. We can't just open a new handle as the upper layers won't know where to find the info. :-(	2014-10-10 12:40:32 -07:00
Ralph Castain	1ae34da5e5	Add an attributes parameter to the dstore.open function so we can pass directives to the active storage component. This can, for example, include the backing file info for a new shared memory segment.	2014-10-10 12:13:25 -07:00
Ralph Castain	63f619f871	Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build.	2014-10-10 11:39:08 -07:00
Nathan Hjelm	a31cf3b740	btl/vader: missing include	2014-10-09 13:57:21 -06:00
Nathan Hjelm	9e0c07e4ce	btl/ugni: improve the handling of eager get fragments when the btl runs out of preregistered buffers Before this change eager gets we retried on each progress loop. This commit modifies the protocol to only retry eager gets when another eager get has completed. This commit also cleans up some callback code that is no longer needed.	2014-10-09 13:57:21 -06:00
Howard Pritchard	ebc368d26b	remove GNI_RDMAMODE_FENCE bit in GNI_PostRdma The GNI_RDMAMODE_FENCE bit was a left over from async progress work that is not needed at this point in the gni BTL. Removing the bit also allows for the removal of the GNI_CDM_MODE_BTE_SINGLE_CHANNEL bit from the GNI_CdmCreate call.	2014-10-09 12:41:19 -06:00
Ralph Castain	ce8e33447f	Silence warning	2014-10-09 10:45:25 -07:00
Joshua Ladd	1cabd73522	Adding a new OPAL hash table routine. Please read the algorithm description in opal/class/opal_hash_table.c for more precise details on the design and implementation. This algorithm was contributed by David Linden of H.P. in partnership with Mellanox Technologies. This contribution achieves two objectives: 1. It's actually hashing now, whereas the old OPAL hash table was not. Thus, it is a bug fix for and, as such, should be included in the 1.8 series. 2. It is dynamic and can grow and shrink the number of buckets in accordance with job size, whereas the old OPAL hash table had a fixed number of buckets which resulted in poor retrieval performance at large scale. This scheme has been deployed in the field on very large H.P./Mellanox systems and has been demonstrated to significantly decrease job start-up time (~ 20% improvement) when launching applications directly with srun in SLURM environments. However, neither SLURM nor direct launch are prerequisites to take advantage of this change as any entity that utilizes OPAL hash table objects can benefit (at least partially) from this contribution.	2014-10-09 17:24:23 +02:00
Elena	c905fe9b78	pmix: removed pmix_base_direct modex mca parameter, renamed orte_full_modex_cutoff and ompi_hostname_cutoff to direct_modex_cutoff	2014-10-09 06:15:31 +02:00
Howard Pritchard	9947758d98	initial thread safety for ugni btl This commit adds initial ugni thread safety support. With this commit, sun thread tests (excepting MPI-2 RMA) pass with various process counts and threads/process. Also osu_latency_mt passes.	2014-10-08 10:13:22 -06:00
Jeff Squyres	a422d893b8	memchecker: per RFC, use calloc for OBJ_NEW With --enable-memchecker builds, use calloc(3) for OBJ_NEW instead of malloc(3). This cuts down on a lot of valgrind/memory checker false positive output. Also make a minor change in the valgrind configure.m4; have it assign 0xf to a char. The prior assignment (of 0xff) was warning about an overflow. This didn't really matter, but we might as well make the test not have a gratuitious warning in it.	2014-10-07 09:55:54 -07:00
Mike Dubman	86f1d5af3e	OPAL: drop dead with core on bad flow. rarely happens with helloworld on large scale.	2014-10-07 14:07:41 +03:00
Ralph Castain	fd6a044b7f	Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages. Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.	2014-10-03 16:02:57 -06:00
Howard Pritchard	5428301c81	Remove catamount timer support With the 1.9 release, support for catamount is being dropped. Hence, removing catamount timer support.	2014-10-03 14:53:09 -06:00
rolfv	697b18db63	Making async copy the default	2014-10-03 06:42:18 -07:00
Gilles Gouaillardet	5c5453b8b1	pmix: fix test in native_get_attr	2014-10-03 11:54:08 +09:00
Jeff Squyres	413e775dbf	version configury: make dist now works Update the VERSION file scheme: * Remove "want_repo_rev". * Add "tarball_version". All values are now always included (major, minor, release, greek, repo_rev). However, configure.ac now runs "opal_get_version.sh ... --tarball", which will return the value of tarball_version (if it is non-empty) or the "full" version string (i.e., "major.minor.releasegreek").	2014-10-02 11:32:54 -07:00
Jeff Squyres	8468424f45	distscript: remove configure.params and autogen.subdirs kruft Remove configure.params support: configure.params hasn't been used in years. Also remove autogen.subdirs support; those should really be handled by their respective Makefile.am's.	2014-10-02 11:32:54 -07:00
Jeff Squyres	54544f64b3	wrappers: update URLs for GitHub	2014-10-01 14:37:17 -07:00
Ralph Castain	9e35f80ab6	Don't multiply define WANT_PMI_SUPPORT and friends. Turns out they weren't being used anywhere anyway, so no point in defining them at all This commit was SVN r32822.	2014-09-30 20:43:25 +00:00
Howard Pritchard	8da51fab81	cray pmi equivalent to commit 5eb65b24 This commit was SVN r32820.	2014-09-30 19:25:00 +00:00
Ralph Castain	8d0b4f222a	The pmix.get functions should not be returning "success" if the requested info isn't found. Fix the macros and the component functions so they correctly return "not found" in that situation, and set the data regions and size to NULL and 0, respectively. This commit was SVN r32818.	2014-09-30 18:03:12 +00:00
Jeff Squyres	d4e2809531	version: always use all 3 version numbers In all previous releases, the version number would be "A.B.C" unless C was 0, in which case it would be "A.B". This commit changes that scheme to always be "A.B.C", even if C==0. Hence, v1.9.0 will be the first release where this new scheme is evident. This commit was SVN r32816.	2014-09-30 15:54:18 +00:00
Howard Pritchard	1df933ea27	remove ompi/runtime/params.h include in ugni btl This commit was SVN r32813.	2014-09-29 19:26:33 +00:00
Howard Pritchard	201d4ec3ad	fix setting of PMIX_NODE_RANK in cray pmix comp. Per discussions with pmix folks, it was determined that the way the cray pmi pmix component was computing the PMIX_NODE_RANK attribute for a process was incorrect. This commit fixes the problem. This commit was SVN r32810.	2014-09-29 16:55:31 +00:00
Rolf vandeVaart	399dc3db43	Code to check for managed memory. Configure support also. This commit was SVN r32801.	2014-09-26 16:24:45 +00:00
Rolf vandeVaart	35858f837a	Revert r32713. Have different code for this. This commit was SVN r32800. The following SVN revision numbers were found above: r32713 --> open-mpi/ompi@9a2bab0e27	2014-09-26 14:56:18 +00:00
Nathan Hjelm	e0eb1f2e73	btl/vader: make vader registration lookup/caching thread safe This commit was SVN r32798.	2014-09-25 22:24:06 +00:00
George Bosilca	53e012ae97	Fix typo. This commit was SVN r32795.	2014-09-25 17:18:27 +00:00
Nathan Hjelm	aba87f3776	btl/vader:silence warning This commit was SVN r32788.	2014-09-24 22:10:23 +00:00
Nathan Hjelm	79881ca892	btl/vader: prevent double-destruction of endpoints and move endpoint teardown code into destructor This commit was SVN r32779.	2014-09-23 21:51:15 +00:00
Nathan Hjelm	2d8fba0861	btl/vader: silence warning This commit was SVN r32778.	2014-09-23 21:33:45 +00:00
Nathan Hjelm	8bd3160432	btl/vader: fix several typos in vader update This commit was SVN r32775.	2014-09-23 20:25:36 +00:00
Nathan Hjelm	12bfd13150	btl/vader: improve performance for both single and multiple threads This is a large update that does the following: - Only allocate fast boxes for a peer if a send count threshold has been reached (default: 16). This will greatly reduce the memory usage with large numbers of local peers. - Improve performance by limiting the number of fast boxes that can be allocated per peer (default: 32). This will reduce the amount of time spent polling for fast box messages. - Provide new MCA variables to configure the size, maximum count, and send count thresholds for fast boxes allocations. - Updated buffer design to increase the range of message sizes that can be sent with a fast box. - Add thread protection around fast box allocation (locks). When spin locks are available this should be updated to use spin locks. - Various fixes and cleanup. This commit was SVN r32774.	2014-09-23 18:11:22 +00:00
Howard Pritchard	1508a01325	Fixes to enable mpirun to work again on Cray The ess pmi module was not handling aprun launched daemons. All daemons were thinking they were vpid 1. Also, turns out that on cray systems using MOM nodes for launched jobs, just detecting whether or not a process is in a PAGG container is not sufficient. Crank up the priority of the alps PLM component in the event that the configure detected the presence of both slurm and alps. Have the ESS pmi component open the pmix framework and select a pmix component. This commit was SVN r32773.	2014-09-23 15:37:26 +00:00
Artem Polyakov	f2e586980b	Fix timing framework: 1. Fixes according to (http://www.open-mpi.org/community/lists/devel/2014/09/15869.php) 2. Force mpisync:rank0 to gather results. Now sync info is written by rank0 to the output file. 3. Improve mpirun_prof: 1) adopt to the environment (SLURM/TORQUE); 2) recognize some noteset-related mpirun options. This commit was SVN r32772.	2014-09-23 12:59:54 +00:00
Ralph Castain	70896550bf	Per input from Artem, update the copyrights on these files, ensuring to include all the licensing info for the files broght over from the mpiperf project. This commit was SVN r32770.	2014-09-20 14:54:24 +00:00
Artem Polyakov	70587d1804	Remove outdated OPAL parameter "opal_pmi_version". Now PMI selection is handled by PMIx MCA. This commit was SVN r32767.	2014-09-20 02:30:23 +00:00
Ralph Castain	dfb952fa78	[Contribution from Artem - moved it to svn from git for him] Replace our old, clunky timing setup with a much nicer one that is only available if configured with --enable-timing. Add a tool for profiling clock differences between the nodes so you can get more precise timing measurements. I'll ask Artem to update the Github wiki with full instructions on how to use this setup. This commit was SVN r32738.	2014-09-15 18:00:46 +00:00
Vasily Filipov	e26af91a64	BTL/OPENIB: set "max_lmc" param to be "1" and not "all available values" by default. cmr=v1.8.3:reviewer=miked This commit was SVN r32736.	2014-09-15 13:56:41 +00:00
Alex Mikheev	31d0724a08	OMPI: btl openib: fix detection of max registarable memory Deal with the case when mlx4 module is loaded but device is not present cmr=v1.8.3:reviewer=miked This commit was SVN r32734.	2014-09-15 12:17:23 +00:00
Ralph Castain	fad4384463	Not sure how we could get to this point without having already detected the error, but just to be safe - check for end-of-array and return if error. Refs trac:4897 This commit was SVN r32731. The following Trac tickets were found above: Ticket 4897 --> https://svn.open-mpi.org/trac/ompi/ticket/4897	2014-09-13 02:23:30 +00:00
Jeff Squyres	66aeadacff	opal_search_libs: correctly AC_DEFINE results of search 1. It is not sufficient to put the result of m4_toupper() in a variable and use that variable as the variable name in AC_DEFINE_UNQUOTED. Instead, just use m4_toupper() directly in AC_DEFINE_UNQUOTED. Also, save the result value in a "permanent" variable that isn't erased, just in case autoconf decides to be lazy about instantiating the body AC_DEFINE_UNQUOTED and move it later (this is probably overkill :-) ). 1. Use the OMPI Way of always defining macros (to 0 or 1). Then also slightly change the logic in util/basename.c to just check OPAL_HAVE_DIRNAME (because it will always be defined). Refs trac:4894 This commit was SVN r32723. The following Trac tickets were found above: Ticket 4894 --> https://svn.open-mpi.org/trac/ompi/ticket/4894	2014-09-13 00:28:30 +00:00
Ralph Castain	7269dae2da	Per patch from Samuel Thibault, silence warning from Clang This commit was SVN r32720.	2014-09-12 22:22:11 +00:00
Ralph Castain	0445052a1c	Check for multiple declarations of a given MCA param and error out if detected as that can create an ambiguous definition of the param value. Refs trac:4897 This commit was SVN r32719. The following Trac tickets were found above: Ticket 4897 --> https://svn.open-mpi.org/trac/ompi/ticket/4897	2014-09-12 22:21:30 +00:00
Jeff Squyres	d244b7b860	mca_base_var: fix possibilty of unaligned variable assignments Add a debugging check that ensures that the registered storage is aligned appropriately for the type that is specified. When we know that the storage is properly aligned, we can cast the mbv_storage to the appropriate type and then simply do the assignment. We used to do this assignment via a union, but clang's -fsanitizer=alignment complained about this. This commit was SVN r32716.	2014-09-11 23:02:49 +00:00
Ralph Castain	1f2c5863f0	Revert r32675 in favor of a different solution proposed by Brice This commit was SVN r32715. The following SVN revision numbers were found above: r32675 --> open-mpi/ompi@916f98a3ee	2014-09-11 21:58:48 +00:00
Howard Pritchard	e43715574a	remove ignored restrct return type qualifier The use of restrict in the return type qualifier for mca_btl_vader_reserve_fbox is being ignored by gnu compiler. for newer gcc, one sees this warning only with -Wignored-qualifiers set, but for older variants of gcc it was reported that numerous warning messages about this ignored qualifier were being generated as vader is being compiled. The warning reported by gcc is btl_vader_fbox.h:53:47: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] static inline mca_btl_vader_fbox_t * restrict mca_btl_vader_reserve_fbox (struct mca_btl_base_endpoint_t *ep, const size_t size) This commit was SVN r32714.	2014-09-11 21:12:41 +00:00
Rolf vandeVaart	9a2bab0e27	Add support for detecting CUDA managed memory. Disabled for now. This commit was SVN r32713.	2014-09-11 21:07:17 +00:00
Howard Pritchard	820b34e5d2	Fix bad cut/paste for commit c19e7369 This commit was SVN r32712.	2014-09-11 21:00:04 +00:00
Howard Pritchard	d07c5674a3	Fix potential double free in cray pmi cray_fini This commit was SVN r32711.	2014-09-11 20:30:40 +00:00
Ralph Castain	cb2ad98f57	Silence an unused function warning This commit was SVN r32704.	2014-09-10 17:36:34 +00:00
Ralph Castain	a7c5b77d70	Just because the openib BTL can't reach a process doesn't mean it is a job-ending error. If we have other methods for reaching the process (e.g., sm for a local proc), then that's okay. If there is no method for reaching a proc, then that's an error - but the BML will report that situation. The question of whether or not the openib BTL supports loopback is a separate question. It may be more appropriate to make the modex be PMIX_GLOBAL for cases where openib can support loopback so someone can run without a shared memory component. I'll leave that decision to the IB vendors. This commit was SVN r32702.	2014-09-10 17:02:16 +00:00
Ralph Castain	93948f0c4e	Resolve alignment issues when unpacking buffers cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32698.	2014-09-10 10:19:16 +00:00
Ralph Castain	e671620ac7	Per request from Jeff: tune up the help messages for binding options Refs trac:4898 This commit was SVN r32691. The following Trac tickets were found above: Ticket 4898 --> https://svn.open-mpi.org/trac/ompi/ticket/4898	2014-09-09 22:39:22 +00:00
Ralph Castain	4207b4c4ad	Improve the --bind-to help message to better indicate the default options under various values of np. Remove the warning message if the user doesn't specify a binding policy and we are overloaded cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32687.	2014-09-08 21:03:51 +00:00
Ralph Castain	4df1aa63f7	Since we've run into the situation where someone puts a script wrapper around a launcher such as srun, we need to always protect MCA cmd line params with quotes. This means we also need to protect the backend from quotes coming into the system as part of a value, or else the parser gets confused. So add a new function for wrapping MCA arguments, and tell the backend parser to ignore/remove leading/trailing quotes. cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32686.	2014-09-08 20:38:46 +00:00
Ralph Castain	5649841e26	Provide missing include file - generates errors when used with Intel compilers This commit was SVN r32685.	2014-09-08 19:04:40 +00:00
Ralph Castain	e32d541c8d	Bring over a slight modification to the opal_init_test routine This commit was SVN r32676.	2014-09-07 15:46:53 +00:00
Ralph Castain	916f98a3ee	Rename an HWLOC member of a union in the diff.h file to avoid a naming conflict with an external library - it isn't that HWLOC did something wrong, but rather that the name being used is so close to a type name that other folks has a tendency to #define it as well. We could argue with those folks that what they are doing is incorrect, but it is just easier to make a slight change and resolve the problem. This commit was SVN r32675.	2014-09-07 15:42:05 +00:00
Ralph Castain	6323b226c7	Bring over some updates from the PMIx branch - mostly just minor cleanups. Make the direct grpcomm component no longer be the default. For now, we seem to be having problems with non-blocking fence operations, so make them not be the default under any scenario (e.g., when sm is the only btl in operation). This commit was SVN r32673.	2014-09-06 19:19:44 +00:00
Ralph Castain	f1a33b6476	Use the accessor function to get the jobid and vpid This commit was SVN r32672.	2014-09-06 19:18:21 +00:00
Howard Pritchard	fe2ea1f0fb	fix handling of OPAL_DSTORE_LOCALITY and ref cnt This commit was SVN r32671.	2014-09-05 21:36:19 +00:00
Ralph Castain	ec51cbab9f	We are failing to use the system dirname function because we are not correctly flagging that we found it. Modify opal_search_libs_core to set an "opal_have_foo" flag to indicate that we found the specified function, and then modify the have_dirname check to look for it. cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32669.	2014-09-04 16:10:38 +00:00
Ralph Castain	41c6058153	Bring over changes to MXM from pmix branch: MTL MXM: establish endpoint connection on the first communication when direct_modex used This commit was SVN r32668.	2014-09-03 18:22:11 +00:00
Ralph Castain	a51d1d7a97	find_last_path_separator returns NULL if the filename doesn't contain a path separator in it - i.e., it's just a local file. So protect the loop to avoid a segfault cmr=v1.8.3:reviewer=rolfv This commit was SVN r32667.	2014-09-03 18:13:42 +00:00
Ralph Castain	3fed455bbc	If something goes wrong in add_procs, let's not segfault during finalize This commit was SVN r32665.	2014-09-03 17:27:31 +00:00
Ralph Castain	b372cd02d0	Ensure the hwloc headers get installed when --with-devel-headers is given This commit was SVN r32663.	2014-09-02 19:58:25 +00:00
Ralph Castain	d13fb37ef9	Add array types to opal_value_t This commit was SVN r32656.	2014-08-31 08:07:03 +00:00
Ralph Castain	9500939042	Fix abstraction violation This commit was SVN r32655.	2014-08-31 08:06:35 +00:00
Ralph Castain	60eb7124ab	Upgrade to hwloc 1.9.1 This commit was SVN r32652.	2014-08-31 03:13:06 +00:00
Ralph Castain	5cdbc00136	Re-enable the usock oob component. Ensure the TCP component promotes messages for other procs to the OOB base so that other components have a chance to send the relay. Seems to be passing MTT, so let's see how it works for others. This commit was SVN r32650.	2014-08-30 19:33:46 +00:00
Ralph Castain	9ac75451ff	Nathan had requested this before as he needs to know the #procs in the job to optimize the UGNI btl. Add the fetch for that data - the native pmix component already provides it, but ensure the Slurm PMI-1 support does too. If not found, fall back to the non-optimized number This commit was SVN r32648.	2014-08-29 22:53:35 +00:00
Ralph Castain	f865ef61ab	Need local_size returned by the Slurm components This commit was SVN r32646.	2014-08-29 22:23:27 +00:00
Howard Pritchard	9a2891f2d6	handle PMIX_LOCAL_SIZE attr arg in cray pmix This commit was SVN r32645.	2014-08-29 21:18:02 +00:00
Ralph Castain	8faabed2cd	Add some further initialization and protection for zero-byte messages This commit was SVN r32644.	2014-08-29 17:24:55 +00:00
Gilles Gouaillardet	6916bfc368	btl/openib: fix use of mca_btl_openib_component.default_recv_qps - do not have mca_btl_openib_component.default_recv_qps point to the stack - do not reset mca_btl_openib_component.default_recv_qps in btl_openib_component_open cmr=v1.8.3:reviewer=miked This commit was SVN r32642.	2014-08-29 04:41:34 +00:00
Gilles Gouaillardet	b8a2e90f2d	btl/openib: fix a typo cmr=v1.8.3:reviewer=miked This commit was SVN r32639.	2014-08-29 04:23:42 +00:00
Ralph Castain	730e28349e	Some minor uninitialized variable cleanups This commit was SVN r32629.	2014-08-29 02:21:13 +00:00
Jeff Squyres	733316372b	usnic: remove suggestion of enabling no-drop in the fabric Reviewed by Reese Faucette cmr=v1.8.3:reviewer=ompi-rm1.8 This commit was SVN r32628.	2014-08-28 23:56:56 +00:00
Howard Pritchard	2a12fd833d	Fix compile problem from pmix merge This commit was SVN r32626.	2014-08-28 22:14:12 +00:00
Gilles Gouaillardet	d743da18bf	pmix: fix process name parsing on 32 bits systems opal_process_name_t is an uint64_t which is not equivalent to an unsigned long on 32 bits systems. this is now parsed as an unsigned long long. This commit was SVN r32592.	2014-08-25 03:08:02 +00:00
Ralph Castain	f00af81c1d	Little more cleanup under the abort cases cited by Gilles. All seem to be working now This commit was SVN r32585.	2014-08-22 19:57:57 +00:00
Ralph Castain	b1a7375192	Fix the "unreachable" message so it outputs the correct hostname for the remote proc. Cleanup some of the pmix stuff when running corner cases of errors This commit was SVN r32584.	2014-08-22 19:20:45 +00:00
Joshua Ladd	97abb7c727	Backing out the new Opal Hash table until the legal issues are address by H.P. Refs trac:4872 This commit was SVN r32583. The following Trac tickets were found above: Ticket 4872 --> https://svn.open-mpi.org/trac/ompi/ticket/4872	2014-08-22 19:10:09 +00:00
Ralph Castain	6ff2a60829	Handle the non-blocking fence case correctly, and ensure we always at least pass back the hostname of the process whose info is being requested so that the ompi_proc_t can correctly initialize it when we are in a non-blocking fence with np < cutoff scenario This commit was SVN r32578.	2014-08-22 14:26:24 +00:00
Ralph Castain	8f1b9b463e	Fix shared memory operations - need to pass the local topology and cpusets of all local peers so we can properly compute relative locality for them. Also need to set default locality to "on node" in case where cpusets are not passed because procs are not bound. This commit was SVN r32577.	2014-08-22 05:17:51 +00:00
Jeff Squyres	b0dfb9f401	usnic: avoid a possible race condition Per #4874, code review revealed a possible race condition in the module struct and the connectivity agent. Move the setup of the connectivity agent listener until the module struct has been fully setup. This commit was SVN r32573.	2014-08-22 02:34:24 +00:00
Jeff Squyres	a896f90712	btl_base_select: fix faulty/incorrect show_help message When no components were able to be found, btl_base_select() was showing the wrong help message -- one that indicated that a specific component could not be found. And it left off a string argument, so the end of the help message was garbage. This commit creates a new help message for this case and updates the show_help call to use the new message. This commit was SVN r32572.	2014-08-22 01:53:38 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Mike Dubman	c3beb0472e	openib/btl: better detect max reg memory. OFED has no runtime versioning API :( based on http://www.open-mpi.org/community/lists/users/2014/08/25048.php reviewed by AlexM cmr=v1.8.2:reviewer=ompi-rm1.8 This commit was SVN r32569.	2014-08-21 12:12:43 +00:00
Gilles Gouaillardet	cfc0773c8c	btl/scif: use safe syntax Thanks to Ashley Pittman for pointing the modex should now be zero'ed cmr=v1.8.2:ticket=trac:4871 This commit was SVN r32568. The following Trac tickets were found above: Ticket 4871 --> https://svn.open-mpi.org/trac/ompi/ticket/4871	2014-08-21 10:32:05 +00:00
Mike Dubman	acd5a9acac	udcm: psn should be 24 bit, new OFED actually checks and fails if it is not 24 bit. This commit was SVN r32567.	2014-08-21 08:49:43 +00:00
Gilles Gouaillardet	59caebe3ea	new opal hash table Decrease the hash table size when an element is removed cmr=v1.8.2:ticket=trac:4872 This commit was SVN r32566. The following Trac tickets were found above: Ticket 4872 --> https://svn.open-mpi.org/trac/ompi/ticket/4872	2014-08-21 06:51:11 +00:00
Ralph Castain	ea94659bd9	Silence warning Refs trac:4872 This commit was SVN r32565. The following Trac tickets were found above: Ticket 4872 --> https://svn.open-mpi.org/trac/ompi/ticket/4872	2014-08-20 22:27:03 +00:00
Joshua Ladd	84d0cc27a2	Adding a new OPAL hash table routine. Contributed by David Linden of H.P. in partnership with Mellanox Technologies. This should be added to cmr=v1.8.2:subject=New OPAL hash table:reviewer=rhc This commit was SVN r32564.	2014-08-20 21:40:28 +00:00
Gilles Gouaillardet	3c1944054e	btl/scif: use safe syntax PGI compilers 2013 and older do not support the following syntax : mca_btl_scif_modex_t modex = {.port_id = mca_btl_scif_module.port_id}; so split it on two lines cmr=v1.8.2:reviewer=hjelmn This commit was SVN r32555.	2014-08-20 02:48:47 +00:00
Jeff Squyres	0a398c155f	opal MCA params: Move (and adapt) help message to opal help file This commit was SVN r32547.	2014-08-16 11:54:41 +00:00
Jeff Squyres	ac7c907f8d	usnic: ensure to have a safe destruction of an opal_list_item_t It turns out that we ''can'' get to the endpoint destructor with the endpoint still on the "endpoints needing ACKs" list. So if it's on the list, remove it first, and then DESTRUCT the opal_list_item_t. This prevents an assert() fail in debug builds. We'd like to let this soak over the weekend. cmr=v1.8.2:reviewer=dgoodell This commit was SVN r32546.	2014-08-15 21:52:36 +00:00
Jeff Squyres	1cdcb7290b	usnic: no need to check before calling this function This function is intentionally always safe to call -- no need for a double redundant check. This commit was SVN r32545.	2014-08-15 21:39:29 +00:00
Jeff Squyres	082ab15d19	usnic: increase the listen() backlog size Rarely -- but it happens -- the connectivity client gets ECONNREFUSED because the connectivity agent listen() backlog is too small. Rather than put in a loop on the client side, take the simple way out for now: increase the backlog size to an arbitrarily-large number. Reviewed by Dave Goodell. cmr=v1.8.2:reviewer=ompi-rm1.8 This commit was SVN r32543.	2014-08-15 19:12:18 +00:00
Jeff Squyres	9373d6420e	usnic: when a module is finalized, "unlisten" the connectivity checker Instead of waiting to destroy the connectivity agent during component shutdown, have the module shutdown send an "unlisten" command to the cagent that will tell it to stop listening on a given interface. This commit was SVN r32536.	2014-08-15 00:52:43 +00:00
Jeff Squyres	6b592d3016	usnic: convert some BTL_ERRORs to more descriptive show_help messages 1. After we receive N abnormally-short messages (meaning: corrupted), print a show_help message about it. N defaults to 25. N can be set to 0 disable the message via btl_usnic_max_short_packets. 1. If we receive a completion error for something other than a receive, display a show_help message. Reviewed by Dave Goodell. CMR'ing to v1.8.3, but it will require a custom patch because of the OMPI->OPAL BTL move. cmr=v1.8.3 This commit was SVN r32522.	2014-08-13 15:01:20 +00:00
Mike Dubman	5b90af601c	btl/openib: add missing definition for ConnectX3 card This commit was SVN r32521.	2014-08-13 13:56:34 +00:00

1 2 3 4 5 ...

2900 Коммитов