openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	a01a5487a8	opal/util/ethtool: use system ethtool_cmd_speed when available Refs: open-mpi/ompi#1679	2016-05-20 09:05:09 +09:00
Jeff Squyres	87233aae49	ethtool: better handle portability Be sure to handle the case where we don't have ethtool support at all. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-19 10:57:14 -07:00
Gilles Gouaillardet	fd93d236b1	opal/util/ethtool: fix compilation on older Linux when struct ethtool_cmd has no speed_hi field Refs: open-mpi/ompi#1628	2016-05-19 11:58:04 +09:00
Karol Mroz	31e33a64f9	opal/util: add function to obtain interface speed If kernel ethtool_cmd_speed() is not available, use copies if possible. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-05-18 16:25:51 +02:00
Jeff Squyres	265e5b9795	Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1 ompi/opal/orte/oshmem/test: max hostname length cleanup	2016-05-02 09:44:18 -04:00
Ralph Castain	6ac7929bd0	Extend the schizo framework to allow definition of CLI options by environment. Refactor orterun to mesh with the orted_submit code, thus improving code reuse. Eliminate the orte-submit tool as orterun can now meet that need. Cleanups per @jjhursey review	2016-05-01 11:30:25 -07:00
Karol Mroz	e1c64e6e59	opal: standardize on max hostname length Define OPAL_MAXHOSTNAMELEN to be either: (MAXHOSTNAMELEN + 1) or (limits.h:HOST_NAME_MAX + 1) or (255 + 1) For pmix code, define above using PMIX_MAXHOSTNAMELEN. Fixup opal layer to use the new max. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-04-24 08:19:47 +02:00
Nathan Hjelm	27f8a4e806	opal: add code patcher framework This commit adds a framework to abstract runtime code patching. Components in the new framework can provide functions for either patching a named function or a function pointer. The later functionality is not being used but may provide a way to allow memory hooks when dlopen functionality is disabled. This commit adds two different flavors of code patching. The first is provided by the overwrite component. This component overwrites the first several instructions of the target function with code to jump to the provided hook function. The hook is expected to provide the full functionality of the hooked function. The linux patcher component is based on the memory hooks in ucx. It only works on linux and operates by overwriting function pointers in the symbol table. In this case the hook is free to call the original function using the function pointer returned by dlsym. Both components restore the original functions when the patcher framework closes. Changes had to be made to support Power/PowerPC with the Linux dynamic loader patcher. Some of the changes: - Move code necessary for powerpc/power support to the patcher base. The code is needed by both the overwrite and linux components. - Move patch structure down to base and move the patch list to mca_patcher_base_module_t. The structure has been modified to include a function pointer to the function that will unapply the patch. This allows the mixing of multiple different types of patches in the patch_list. - Update linux patching code to keep track of the matching between got entry and original (unpatched) address. This allows us to completely clean up the patch on finalize. All patchers keep track of the changes they made so that they can be reversed when the patcher framework is closed. At this time there are bugs in the Linux dynamic loader patcher so its priority is lower than the overwrite patcher. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:16:13 -06:00
Nathan Hjelm	4cac623aeb	opal/patch: add call to check if binary patching is supported Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:16:12 -06:00
Nathan Hjelm	7aa03d66b3	opal/memory: add support for patch based memory hooks This commit adds support for runtime binary patching. The support is broken down into two parts: util/opal_patcher.[ch] which contains the functionality for runtime patching of symbols, and mca/memory/patcher which patches the various symbols needed to provide support for memory hooks. This work is preliminary and is based off work donated by IBM. The patcher code is disabled if dlopen is disabled. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-13 17:14:31 -06:00
Ralph Castain	8c14df2328	Revert "Modify singularity support per patch from Greg Kurtzer" This reverts commit open-mpi/ompi@f7257a8310. Ensure that we properly cleanup the session directory tree. Prior code had issues with symlinks, especially if the file that the link points to was already removed as we traverse the tree. Also found that the dirent checks for directory type weren't fully portable, and so fall back to the stat-based approach which is known to be portable. Fix singularity singletons by detecting we are in a container and properly setting the pmix selection to pick the isolated component. Remove a stale restriction blocking use of the sm btl	2016-03-24 11:27:18 -07:00
Nathan Hjelm	607be72de9	opal/keval_parse: fix conditional ordering Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-08 10:06:14 -07:00
Nathan Hjelm	63bac9a4e0	opal/util: fix bug in key value parser This commit fixes a bug in the opal key value parser that might cause the filename parser to go past the beginning of the string. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-07 14:51:29 -07:00
Nathan Hjelm	32236736a4	Fix parsing of envvars in MCA files This commit fixes a memory corruption bug when parsing lines of the form: -x FOO=bar The code was making changes to the size of the buffer allocated for key_buffer without making the appropriate changes to key_buffer_len. This was causing subsequent calls to save_param_name to write to invalid memory. This commit makes the following changes: - Fix the above bug by modifying trim_name to move the string within the buffer instead of re-allocating space for the trimmed string. - Cleaned up both trim_name and save_param_name. Both functions took a prefix and suffix to trim. Problem was the prefix was not treated like a prefix. Instead the "prefix" was located inside the string using strstr then the trimmed value started after the substring (even in the middle of the string). To allow trimming both -x and --x (as well as -mca and --mca) trim_name is now called with each prefix. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-17 14:58:05 -07:00
Ralph Castain	06c3dfc052	Refactor the ORTE DVM code so that external codes can submit multiple jobs using only a single connection to the HNP. * Clean up the DVM so it continues to run even when applications error out and we would ordinarily abort the daemons. * Create a new errmgr component for the DVM to handle the differences. * Cleanup the DVM state component. * Add ORTE bindings directory and brief README * Pass a local tool index around to match jobs. * Pass the jobid on job completion. * Fix initialization logic. * Add framework for python wrapper. * Fix terminate-with-non-zero-exit behavior so it properly terminates only the indicated procs, notifies orte-submit, and orte-dvm continues executing. * Add some missing options to orte-dvm * Fix a bug in -host processing that caused us to ignore the #slots designator. Add a new attribute to indicate "do not expand the DVM" when submitting job spawn requests. * It actually makes no sense that we treat the termination of all children differently than terminating the children of a specific job - it only creates confusion over the difference in behavior. So terminate children the same way regardless. Extend the cmd_line utility to easily allow layering of command line definitions Catch up with ORTE interface change and make build more generic. Disable "fixed dvm" logic for now. Add another cmd_line function to merge a table of cmd line options with another one, reporting as errors any duplicate entries. Use this to allow orterun to reuse the orted_submit code Fix the "fixed_dvm" logic by ensuring we reset num_new_daemons to zero. Also ensure that the nidmap is sent with the first job so the downstream daemons get the node info. Remove a duplicate cmd line entry in orterun. Revise the DVM startup procedure to pass the nidmap only once, at the startup of the DVM. This reduces the overhead on each job launch and ensures that the nidmap doesn't get overwritten. Add new commands to get_orted_comm_cmd_str(). Move ORTE command line options to orte_globals.[ch]. Catch up with extra orte_submit_init parameter. Add example code. Add documentation. Bump version. The nidmap and routing data must be updated prior to propagating the xcast or else the xcast will fail. Fix the return code so it is something more expected when an error occurs. Ensure we get an error returned to us when we fail to launch for some reason. In this case, we will always get a launch_cb as we did indeed attempt to spawn it. The error code will be returned in the complete_cb. Fix the return code from orte_submit_job - it was returning the tracker index instead of "success". Take advantage of ORTE's pretty-print capabilities to provide a nice error output explaining why we failed to launch. Ensure we always get a launch_cb when we fail to launch, but no complete_cb as the job never launched. Extend the error reporting capability to job completion as well. Add index parameter to orte_submit_job(). Add orte_job_cancel and implement ORTE_DAEMON_TERMINATE_JOB_CMD. Factor out dvm termination. Parse the terminate option at tool level. Add error string for ORTE_ERR_JOB_CANCELLED. Add some safeguards. Cleanup and/of comments. Enable the return. Properly ORTE_DECLSPEC orte_submit_halt. Add orte_submit_halt and orte_submit_cancel to interface. Use the plm interface to terminate the job	2016-02-13 08:10:44 -08:00
Edgar Gabriel	722aab92e6	- extend opal_path_nfs to retrieve the file system type - use opal_path_nfs in the fs_base function to avoid code duplication.	2016-01-26 13:36:21 -06:00
Ralph Castain	4dad5de8ff	Silence a couple of warnings - strncpy returns a char*, not an int	2016-01-16 09:44:52 -08:00
Gilles Gouaillardet	1d38430e43	opal: replace opal_convert_jobid_to_string with opal_snprintf_jobid	2016-01-14 10:39:03 +09:00
KAWASHIMA Takahiro	2dcb2d711b	Makefile: Move fd.c to `SOURCES` from `headers`. And reorder fd.h and few.h in alphabetical order.	2015-11-04 11:28:43 +09:00
Nathan Hjelm	6d3041335f	opal/keyval: reset buffer pointer/size in finalize Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-10-20 13:10:44 -06:00
Gilles Gouaillardet	dc883cff8d	opal/util: fix parse_ipv4_dots prototype	2015-10-01 14:03:08 +09:00
Nathan Hjelm	408da16d50	ompi/proc: add proc hash table for ompi_proc_t objects This commit adds an opal hash table to keep track of mapping between process identifiers and ompi_proc_t's. This hash table is used by the ompi_proc_by_name() function to lookup (in O(1) time) a given process. This can be used by a BTL or other component to get a ompi_proc_t when handling an incoming message from an as yet unknown peer. Additionally, this commit adds a new MCA variable to control the new add_procs behavior: mpi_add_procs_cutoff. If the number of ranks in the process falls below the threshold a ompi_proc_t is created for every process. If the number of ranks is above the threshold then a ompi_proc_t is only created for the local rank. The code needed to generate additional ompi_proc_t's for a communicator is not yet complete. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-10 08:55:54 -06:00
Ralph Castain	d97bc29102	Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given	2015-09-04 16:54:40 -07:00
Ralph Castain	0d5814b5ca	Cleanup Coverity issues	2015-08-29 21:19:27 -07:00
Ralph Castain	cf6137b530	Integrate PMIx 1.0 with OMPI. Bring Slurm PMI-1 component online Bring the s2 component online Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways. Bring the OMPI pubsub/pmi component online Get comm_spawn working again Ensure we always provide a cpuset, even if it is NULL pmix/cray: adjust cray pmix component for pmix Make changes so cray pmix can work within the integrated ompi/pmix framework. Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet Cleanup comm_spawn - procs now starting, error in connect_accept Complete integration	2015-08-29 16:04:10 -07:00
Ralph Castain	023936e84b	Silence coverity warnings	2015-07-29 07:28:08 -07:00
Ralph Castain	8d128fe090	Remove the non-null attributes from the cmd_line parser as this isn't something we can guarantee, and the optimization isn't worth the potential for error	2015-06-25 13:26:20 -07:00
Nathan Hjelm	4d92c9989e	more c99 updates This commit does two things. It removes checks for C99 required headers (stdlib.h, string.h, signal.h, etc). Additionally it removes definitions for required C99 types (intptr_t, int64_t, int32_t, etc). Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-06-25 10:14:13 -06:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Gilles Gouaillardet	58d1b3f4d0	opal_os_dirpath_create: fix TOCTOU as reported by Coverity with CID 70396	2015-06-17 11:17:54 +09:00
Gilles Gouaillardet	de66447ebb	opal_cmd_line_get_usage_msg: silence warning as reported by Coverity with CID 1269967	2015-06-17 11:17:54 +09:00
Gilles Gouaillardet	f2f66e6e63	opal_daemon_init: silence warning as reported by Coverity with CID 710642	2015-06-17 11:17:53 +09:00
Gilles Gouaillardet	8427e87ee9	opal_argv_delete: silence warning as reported by Coverity with CID 71914	2015-06-17 11:17:53 +09:00
Gilles Gouaillardet	bcdb2d1380	add missing #include sscanf requires stdio.h fixes commit open-mpi/ompi@6ca57724c4	2015-06-08 09:13:11 +09:00
Jeff Squyres	0acec2b676	opal/util/net.c: remove stale comment Also wrap a long "if" statement -- but make no code logic changes.	2015-06-06 10:17:20 -07:00
Jeff Squyres	6ca57724c4	opal/util/net.c: remove superflous #include	2015-06-06 10:17:20 -07:00
Nathan Hjelm	0e3c32a98a	opal/sys_limits: fix coverity issue CID 996175 Dereference before null check (REVERSE_NULL) If lims is NULL then we ran out of memory. Return an error and remove the NULL check at cleanup. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-05-28 08:38:10 -06:00
Nathan Hjelm	f5389cbb03	opal/keyval: fix coverity issues CID 1292738 Dereference after null check (FORWARD_NULL) It is an error if NULL is passed for val in add_to_env_str. Removed the NULL-check @ keyval_parse.c:253 and added a NULL check and an error return. CID 1292737 Logically dead code (DEADCODE) Coverity is correct, the error code at the end of parse_line_new is never reached. This means we fail to report parsing errors when parsing -x and -mca lines in keyval files. I moved the error code into the loop and removed the checks @ keyval_parse.c:314. I also named the parse state enum type and updated parse_line_new to use this type. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-05-28 08:38:09 -06:00
Nathan Hjelm	9caffa5dd8	mca/base: fix source file name bug for synonyms This commit fixes synonyms so the source file is correctly printed out by ompi_info. This commit also adds support for printing out the line number where the variable is set. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-05-12 09:52:31 -06:00
Gilles Gouaillardet	c809aace47	initialize common symbols from opal A few uninitialized common symbols are remaining: common symbols generated by flex : * opal/util/keyval/keyval_lex.l: opal_util_keyval_yyleng * opal/util/keyval/keyval_lex.o: opal_util_keyval_yytext * opal/util/show_help_lex.l: opal_show_help_yyleng * opal/util/show_help_lex.l: opal_show_help_yytext common symbol generated by "external" hwloc library: * opal/mca/hwloc/hwloc191/hwloc/src/components.o: component_map	2015-05-08 09:48:51 +09:00
Ralph Castain	9cb2fcfa5c	Cleanup the qos code when --enable-timings is given	2015-05-06 20:24:27 -07:00
Nadezhda Kogteva	116169c38a	opal timing: added ability to choose the timer type	2015-04-17 11:15:55 +03:00
Nathan Hjelm	75f210fdb9	opal/util/error: check for existing convertor for error range This commit fixes a bug when opal_error_init is called with the same values multiple times. If opal_error_init is called too many times it will start failing with OPAL_ERR_OUT_OF_RESOURCE. To fix the problem check if an existing convertor matching the requested one and return that one instead. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-09 11:51:36 -06:00
Nathan Hjelm	9cd955badf	opal: fix multiple bugs in MCA and opal This commit fixes the following bugs: - opal_output_finalize did not properly set internal state. This caused problems when calling the sequence opal_output_init (), opal_output_finalize (), opal_output_init (). - opal_info support called mca_base_open () but never called the matching mca_base_close (). mca_base_open () and mca_base_close () have been updated to use a open count instead of an open flag to allow mca_base_open to be called through multiple paths (as may be the case when MPI_T is in use). - orte_info support did not register opal variables. This can cause orte-info to not return opal variables. - opal_info, orte_info, and ompi_info support have been updated to use a register count. - When opening the dl framework the reference count was added to ensure the framework stuck around. The framework being closed prematurely was a bug in the MCA base that has since been corrected. The increment (and associated decrement) have been removed. - dl/dlopen did not set the value of mca_dl_dlopen_component.filename_suffixes_mca_storage on each call to register. Instead the value was set in the component structure. This caused the value to be lost when re-loading the component. Fixed by setting the default value in register. - Reset shmem framework state on close to avoid returning a stale component after reloading opal/shmem. - MCA base parameters were not properly deregistered when the MCA base was closed. This commit may fix #374. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-07 19:13:20 -06:00
Elena	90f5b2bb84	Introduce -tune command line option to set env vars and mca params from file	2015-03-26 18:33:53 +02:00
Gilles Gouaillardet	dc0bc756dc	iof/base: fix misc memory leak as reported by Coverity with CID 1196732	2015-03-10 14:37:53 +09:00
Jeff Squyres	0a2767a5d3	opal lt_interface: remove in favor of opal_dl interface	2015-03-09 08:18:13 -07:00
Gilles Gouaillardet	3511475e29	opal/util: fix misc memory leak as reported by Coverity with CID 996174	2015-02-27 19:19:46 +09:00
Jeff Squyres	9d7171e8f1	convert: remove unnecessary/unused opal_size2int() function The comments in the file even said "This file will hopefully not last long in the tree...".	2015-02-16 07:17:33 -08:00
Gilles Gouaillardet	ccbdf64de4	opal/util: fix memory leak in opal_util_init_sys_limits as reported by Coverity with CID 996174 previous commit (open-mpi/ompi@ca3a275823) dit not fix this CID	2015-02-16 11:05:35 +09:00
Gilles Gouaillardet	ca3a275823	opal/util: fix misc memory leaks reported by Coverity fixes CID 996174, 996920, 1196735, 1196769 and 1196770	2015-02-13 14:28:59 +09:00
Jeff Squyres	a1037cd70a	if.c: fix minor memory leak This was CID 1269846.	2015-02-12 13:41:29 -08:00
Jeff Squyres	29794af0e9	cmd_line.c: use strncat() instead of strcat() Be safe about appending to the end of strings. This was CID 71932 (and probably also others).	2015-02-12 13:41:29 -08:00
Jeff Squyres	e188c75edc	opal_environ.c: ensure "value" is a valid string for the setenv() case This was CID 1269764.	2015-02-12 13:41:29 -08:00
Jeff Squyres	167d72ec68	net.c: ensure to free the args in the error case This was CID 710643.	2015-02-12 10:24:02 -08:00
Jeff Squyres	08285c6361	lt_interface: properly check OPAL_HAVE_LTDL_ADVISE	2015-02-11 12:25:20 -08:00
Mike Dubman	da5b8c6879	OPAL: skip comparison when when fs=autofs in mtab, because we are looking for reals fs type	2014-12-18 21:42:25 +02:00
Artem Polyakov	01601f3284	Merge pull request #305 from artpol84/timing Timing framework improvement	2014-12-16 15:13:48 +06:00
Mike Dubman	2fbe87defe	Merge pull request #314 from miked-mellanox/topic/fix_opal_path_nfs add support for autofs and make check pass. jenkins: check,src_rpm	2014-12-15 20:52:52 +02:00
Mike Dubman	42f3fa0d1e	OPAL: add support for autofs magic type	2014-12-13 20:27:47 +02:00
Jeff Squyres	9e6b157cb6	opal: minor update to guess_strlen This is a minor update to open-mpi/ompi@c52601f0c5. If we have vsnprintf(), we might as well not have the rest of the guess_strlen() routine. Also document the nifty trick/behavior of vsnprintf() that enables this shortcut (it was new to me!).	2014-12-13 08:09:34 -05:00
Ralph Castain	c52601f0c5	It looks like the guess_len function in our local printf.c has some questionable code in it. Now that we are checking in configure for vsnprintf, take advantage of that check to use the far simpler method if it is available. Given that we no longer support such ancient systems where this might not be available, one suspects the other questionable code may no longer be required - but set that aside for another day.	2014-12-12 17:47:17 -08:00
Artem Polyakov	8ffad75a0a	Introduce timing interval measurement facility in timing framework	2014-12-10 16:47:49 +06:00
Ralph Castain	780c93ee57	Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.	2014-11-11 17:00:42 -08:00
Ralph Castain	4e4920a0fd	Fix stupid typo	2014-11-05 08:56:40 -08:00
Ralph Castain	2c9987b7d1	Update the opal_environ code so it behaves correct with the environ if setenv is not available	2014-11-05 08:54:06 -08:00
Ralph Castain	907b4606c5	Check for the presence of setenv. If it is present, then use it in opal_setenv when setting values in the environ	2014-11-04 16:11:54 -08:00
Gilles Gouaillardet	62bde1fcb5	opal/util/proc.c: handle unaligned opal_process_name_t parameters	2014-10-27 14:40:10 +09:00
Gilles Gouaillardet	b5aea782ce	Revert "Fix heterogeneous support" Per the discussion at http://www.open-mpi.org/community/lists/devel/2014/10/16050.php This reverts commit `c9c5d4011b`.	2014-10-16 12:24:38 +09:00
Gilles Gouaillardet	c9c5d4011b	Fix heterogeneous support * redefine orte_process_name_t so it can be converted between host and network format as an opal_identifier_t aka uint64_t by the OPAL layer. * correctly send OPAL_DSTORE_ARCH key	2014-10-15 17:19:13 +09:00
Ralph Castain	fd6a044b7f	Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages. Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.	2014-10-03 16:02:57 -06:00
Artem Polyakov	f2e586980b	Fix timing framework: 1. Fixes according to (http://www.open-mpi.org/community/lists/devel/2014/09/15869.php) 2. Force mpisync:rank0 to gather results. Now sync info is written by rank0 to the output file. 3. Improve mpirun_prof: 1) adopt to the environment (SLURM/TORQUE); 2) recognize some noteset-related mpirun options. This commit was SVN r32772.	2014-09-23 12:59:54 +00:00
Ralph Castain	70896550bf	Per input from Artem, update the copyrights on these files, ensuring to include all the licensing info for the files broght over from the mpiperf project. This commit was SVN r32770.	2014-09-20 14:54:24 +00:00
Ralph Castain	dfb952fa78	[Contribution from Artem - moved it to svn from git for him] Replace our old, clunky timing setup with a much nicer one that is only available if configured with --enable-timing. Add a tool for profiling clock differences between the nodes so you can get more precise timing measurements. I'll ask Artem to update the Github wiki with full instructions on how to use this setup. This commit was SVN r32738.	2014-09-15 18:00:46 +00:00
Jeff Squyres	66aeadacff	opal_search_libs: correctly AC_DEFINE results of search 1. It is not sufficient to put the result of m4_toupper() in a variable and use that variable as the variable name in AC_DEFINE_UNQUOTED. Instead, just use m4_toupper() directly in AC_DEFINE_UNQUOTED. Also, save the result value in a "permanent" variable that isn't erased, just in case autoconf decides to be lazy about instantiating the body AC_DEFINE_UNQUOTED and move it later (this is probably overkill :-) ). 1. Use the OMPI Way of always defining macros (to 0 or 1). Then also slightly change the logic in util/basename.c to just check OPAL_HAVE_DIRNAME (because it will always be defined). Refs trac:4894 This commit was SVN r32723. The following Trac tickets were found above: Ticket 4894 --> https://svn.open-mpi.org/trac/ompi/ticket/4894	2014-09-13 00:28:30 +00:00
Ralph Castain	ec51cbab9f	We are failing to use the system dirname function because we are not correctly flagging that we found it. Modify opal_search_libs_core to set an "opal_have_foo" flag to indicate that we found the specified function, and then modify the have_dirname check to look for it. cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32669.	2014-09-04 16:10:38 +00:00
Ralph Castain	a51d1d7a97	find_last_path_separator returns NULL if the filename doesn't contain a path separator in it - i.e., it's just a local file. So protect the loop to avoid a segfault cmr=v1.8.3:reviewer=rolfv This commit was SVN r32667.	2014-09-03 18:13:42 +00:00
Ralph Castain	8f1b9b463e	Fix shared memory operations - need to pass the local topology and cpusets of all local peers so we can properly compute relative locality for them. Also need to set default locality to "on node" in case where cpusets are not passed because procs are not bound. This commit was SVN r32577.	2014-08-22 05:17:51 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Jeff Squyres	eefa17026d	windows: effectively revert r32449 The _strdup usage in opal/util/basename looks like it was a product of Windows compatibility (see r11336), which we don't care about any more. Further, opal/win32/win_compat.h, which we sitll maintain for cygwin compatibility, #define's strdup to _strdup (which is what Microsoft wants you to use). So this old _strdup in opal/util/basename.c (and its corresponding check in configure.ac) should just be removed. This commit was SVN r32450. The following SVN revision numbers were found above: r11336 --> open-mpi/ompi@a28b025150 r32449 --> open-mpi/ompi@d5a3448b8b	2014-08-08 11:36:45 +00:00
Gilles Gouaillardet	d5a3448b8b	Fix missing prototype for _strdup _strdup is not part of any include file i could find on Solaris 10. manually add the _strdup prototype if needed. cmr=v1.8.2:reviewer=jsquyres This commit was SVN r32449.	2014-08-08 02:51:56 +00:00
Gilles Gouaillardet	3c2e75c6b7	Fix OPAL_PROCESS_NAME_xTOy for heterogeneous support This commit was SVN r32425.	2014-08-05 05:22:50 +00:00
Howard Pritchard	4beab705aa	different way to fix opal_config compile problem This commit was SVN r32415.	2014-08-04 17:37:12 +00:00
Ralph Castain	61bf7af9d2	Per Paul Hargrove's suggestion, create an opal_pagesize function to abstract the various ways of obtaining that value. Rather than creating a separate file for only that one function, put it in a convenient place that is at least somewhat related. Refs trac:4826 This commit was SVN r32407. The following Trac tickets were found above: Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826	2014-08-02 18:38:16 +00:00
George Bosilca	1e37b67e5d	No more assert in the proc destructor. This commit was SVN r32401.	2014-08-01 16:36:23 +00:00
Ralph Castain	daeb9b6c4f	Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain. Thanks to Gilles for pointing out some of the discrepancies. This commit was SVN r32398.	2014-08-01 14:44:11 +00:00
George Bosilca	f39abb9e69	Reverting r32355: a number of processes is not a notion that a low level communication library should use to initialize itself. Ralph will champion this change back with an RFC if there is a realistic need/use case from the community. This commit was SVN r32361. The following SVN revision numbers were found above: r32355 --> open-mpi/ompi@c903917f47	2014-07-30 20:11:35 +00:00
Ralph Castain	c903917f47	Expose the num_procs information to the opal layer as the info is needed in several BTLs This commit was SVN r32355.	2014-07-30 09:33:41 +00:00
George Bosilca	a3feb627cf	Move some of the ompi_process_info down in OPAL. This commit was SVN r32324.	2014-07-26 21:43:34 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
George Bosilca	ed3d98a76d	Up to strlen and not to sizeof. This is guaranteed to work as in the worst case we just forced a \0 at the end of the string. This commit was SVN r32238.	2014-07-15 05:03:06 +00:00
George Bosilca	a648fcdeb0	Upon close reset the search_dirs. This commit was SVN r32237.	2014-07-15 05:02:19 +00:00
Nathan Hjelm	1d1cef76df	opal: fix leaks Two leaks are fixed by this commit: - opal_dss.lookup_data_type returns an allocated string. Free it. - opal_ifaddrtokindex was leaking a struct addrinfo. Ensure that is released before returning. cmr=v1.8.2:reviewer=rhc This commit was SVN r31777.	2014-05-15 15:59:41 +00:00
Ralph Castain	5602156a1c	Use the correct abstraction layer name for the data dirs This commit was SVN r31684.	2014-05-08 14:32:24 +00:00
Ralph Castain	11faab1091	The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees. This commit was SVN r31679.	2014-05-08 02:01:35 +00:00
Ralph Castain	f9d892b7a4	As Nathan pointed out, C99 reserves all _foo identifiers, so rename _WORD_MASK as OPAL_CRC_WORD_MASK This commit was SVN r31615.	2014-05-02 17:21:28 +00:00
Jeff Squyres	790cdb5cc7	Sigh. It helps when you commit the right version of the finished code. This commit fixes minor errors in the incorrectly-committed r31513 (new fd close-on-exec convenience function). Refs trac:4550 This commit was SVN r31514. The following SVN revision numbers were found above: r31513 --> open-mpi/ompi@e1655ae68d The following Trac tickets were found above: Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550	2014-04-24 13:20:32 +00:00
Jeff Squyres	e1655ae68d	opal/util/fd.c: add new convenience function for setting FD_CLOEXEC Paul Hargrove pointed out that Stevens tells us that we should FD_GETFL before FD_SETFL. And so we shall. Make a new convenience function to do this (opal_fd_set_cloexec()), just so that we don't have to litter this 2-step process throughout the code. Refs trac:4550 This commit was SVN r31513. The following Trac tickets were found above: Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550	2014-04-24 13:04:49 +00:00
Ralph Castain	ac421c931d	The random number generator changes were incomplete (typo errors) in some places, and is missing the required declspec's for visibility. cmr=v1.7.5:reviewer=jsquyres This commit was SVN r31053.	2014-03-12 22:37:27 +00:00
Jeff Squyres	3f845edfdd	* Prefix the preprocessor macro used to protect the file * Include opal_stdint.h so that we have uin32_t cmr=v1.7.5:ticket=trac:4298 This commit was SVN r30890. The following Trac tickets were found above: Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298	2014-02-28 16:56:38 +00:00
Ralph Castain	fea8a52983	Cleanup trailing spaces and use of tab instead of spaces Refs trac:4298 This commit was SVN r30827. The following Trac tickets were found above: Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298	2014-02-25 23:41:55 +00:00
Joshua Ladd	9ea9bec4ad	Addressing Jeff's comments: 1. Changed rng_buff_t --> opal_rng_buff_t 2. All global variables obey the prefix rule 3. Old code has been removed 4. Found a couple of unnecessary includes Refs trac:4298 This commit was SVN r30807. The following Trac tickets were found above: Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298	2014-02-24 23:18:35 +00:00
Joshua Ladd	e39d9f4080	Per the RFC schedule, add an additive lagged Fibonacci parallel random number generator to OPAL. In order to use, please add the following header to your code: opal/util/alfg.h. See ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c for an example how to seed with opal_srand and invoke the generator with opal_rand. This should be added to cmr=v1.7.5:reviewer=rhc:subject=Add an OPAL RNG This commit was SVN r30801.	2014-02-23 21:41:38 +00:00
Ralph Castain	d246d190ed	Fix typo - thanks to Andreas Schwab for the patch RM-approved cmr:v1.7.5:reviewer=ompi-gk1.7 This commit was SVN r30751.	2014-02-17 19:36:16 +00:00
Ralph Castain	86e8a147c6	Resolve uninitialized variables on some systems. Thanks to Paul Hargrove for finding the problem and suggesting the patch. cmr=v1.7.5:reviewer=ompi-gk1.7 This commit was SVN r30656.	2014-02-10 21:17:34 +00:00
Jeff Squyres	ab31428bd3	opal_path_nfs(): If we get EPERM, just give up. Also fix the wording in a comment. This is worth fixing, but not worth holding up 1.7.4. cmr=v1.7.5:reviewer=rhc This commit was SVN r30307.	2014-01-17 14:28:12 +00:00
Jeff Squyres	9950471df7	Fixes for opal_path_nfs(): * Fix some typos in macro names. * Add case for OS's that have statfs() but no struct statfs (!). * Add case for NetBSD with struct statvfs.f_fstypename. Many thanks to Paul Hargrove who developed the majority of this patch. Reviewed by Jeff Squyres. cmr=v1.7.4:reviewer=ompi-rm1.7 This commit was SVN r30255.	2014-01-11 01:07:10 +00:00
Jeff Squyres	023c50e864	Fix typo in macro name (#$%@#$% defined-or-not macros!!) Refs trac:4079 This commit was SVN r30206. The following Trac tickets were found above: Ticket 4079 --> https://svn.open-mpi.org/trac/ompi/ticket/4079	2014-01-09 23:47:36 +00:00
Jeff Squyres	c67c8e8187	Make the use of statfs()/statvfs() be more robust. As noted by Paul Hargrove, the #if's surrounding the use of statfs() and statvfs() in opal/util/path.c have apparently gotten stale (e.g., modern flavors of BSD OSs no longer define __BSD). Changes: Add statfs and statvfs to the AC_CHECK_FUNCS in configure.ac * Add a sanity check to ensure that we have at least one of statfs() or statvfs(). Add a similar sanity check in opal/util/path.c, just as defensive programming. * Use AC_CHECK_MEMBERS in configure.ac to check for specific struct statfs/struct statvfs members that we use in opal/util/path.c * In path.c, add some #includes as listed on the OS man page for statfs(2) (OS X 10.8.5/Mountain Lion) * The previous code used statvfs() on Solaris and statfs() everywhere else. Attempting to replicate this with behavior-based configure testing led to fairly complicted if/else logic, so the new code uses whichever of the two are available (i.e., it might actually use both -- OS X 10.8.5 and RHEL 6.5 have both statfs() and statvfs()). The rationale here is that we don't really care which of the two functions report the answer; we'll take the answer regardless of where it comes from. For example, if one function returns a failure and the other does not, we'll use the results from the successful function and ignore the failed one. This new code seems to work on OS X and Linux. We'll have to see what happens with MTT and future Paul Hargrove testing... cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Make statfs/statvfs more robust This commit was SVN r30198.	2014-01-09 21:28:52 +00:00
Ralph Castain	2b92fccfd1	Looks like this code was intended to separate Sun's vfs struct from everyone else's, yet the #elif can make it fail on some systems that actually support the capability. So just make it an #else to cover the range of systems we now support and move on. cmr=v1.7.4:reviewer=jsquyres:subject=correct opal_path_df logic This commit was SVN r30172.	2014-01-09 04:10:26 +00:00
Jeff Squyres	13b29cff2c	This commit compliements/completes r30140. r30140 made all the configury/Makefile.am changes; this commit renames the internal installdirs.h framework struct field names to match the configry macro names: * pkgdatdir -> ompidatadir * pkglibdir -> ompilibdir * pkgincludedir -> ompiincludedir This commit was SVN r30145. The following SVN revision numbers were found above: r30140 --> open-mpi/ompi@8b778903d8	2014-01-07 23:36:33 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
George Bosilca	24879f9def	Code cleanup while chasing valgrind complaints. This commit was SVN r30048.	2013-12-21 23:28:14 +00:00
Ralph Castain	7cf0fc5578	One more round of sys_limit fixes...sigh Refs trac:4010 This commit was SVN r30011. The following Trac tickets were found above: Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010	2013-12-20 14:44:51 +00:00
Ralph Castain	e49c16b975	Grrr....use #if instead of #ifdef Refs trac:4010 This commit was SVN r30010. The following Trac tickets were found above: Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010	2013-12-20 14:24:26 +00:00
Ralph Castain	6e6351959d	Check for all the RLIMIT_foo constants that we use, and update the limit checks to use the new #define values. Fix a bug where failure of some might lead to incorrect bracketing. Refs trac:4010 This commit was SVN r30009. The following Trac tickets were found above: Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010	2013-12-20 14:09:43 +00:00
Jeff Squyres	090ce4187a	Fix compiler errors on Solaris, NetBSD, and OpenBSD: * Per http://www.open-mpi.org/community/lists/devel/2013/12/13504.php, protect usage of struct ifreq->ifr_hwaddr * Per http://www.open-mpi.org/community/lists/devel/2013/12/13503.php, avoid #define conflict with the token "if_mtu" * Also fix some whitespace and string naming issues in opal/util/if.c Tested by Paul Hargrove. Refs trac:4010 This commit was SVN r30006. The following Trac tickets were found above: Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010	2013-12-20 11:17:30 +00:00
Ralph Castain	f15b0c9863	Add protections around the various system limits to protect code on unusual systems Thanks to Paul Hargrove for reporting it on OpenBSD-5 cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30003.	2013-12-20 03:18:07 +00:00
Ralph Castain	79af9825ac	Update of patch from Takahiro Kawashima Refs trac:3986 This commit was SVN r29984. The following Trac tickets were found above: Ticket 3986 --> https://svn.open-mpi.org/trac/ompi/ticket/3986	2013-12-19 17:22:37 +00:00
Jeff Squyres	42e3e5cd4b	Fixes trac:3990: ensure we don't SIGBUS on SPARC by forcing a memory copy and preventing access to potentially unaligned data. Reviewed by Dave Goodell. Tested by Siegmarr Gross. cmr=v1.7.4:reviewer=ompi-rm1.7:subject=fix SPARC SIGBUS in opal net code This commit was SVN r29983. The following Trac tickets were found above: Ticket 3990 --> https://svn.open-mpi.org/trac/ompi/ticket/3990	2013-12-19 16:51:34 +00:00
Ralph Castain	77553f72be	Per this email thread: http://www.open-mpi.org/community/lists/devel/2013/12/13412.php fix the backtrace function to avoid async issues. Thanks to Takahiro Kawashima for the patch This commit was SVN r29955.	2013-12-18 17:57:37 +00:00
Jeff Squyres	0ab48ad0d2	Fix some annoying flex warnings that have been there for years. Many thanks to Tom Fogal for the initial patch. cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings This commit was SVN r29904.	2013-12-14 00:36:12 +00:00
Jeff Squyres	ad51705891	Fix compiler warnings about signed/unsigned comparisons Change static opal_setlimit() function to return its value in an OUT parameter and return the usual int error code indicating success or failure. The OUT param and return code need to be separated because the OUT param is an unsigned type, but opal_setlimit() was returning -1 upon failure. Hence, the caller could not know that it had failed because the return type was previously an unsigned type. cmr=v1.7.4:reviewer=rhc:subject=Fix opal sys_limits.c signed/unsigned warnings This commit was SVN r29685.	2013-11-13 15:40:34 +00:00
Ralph Castain	8c5c7d0db4	Correct a bug in handling of oob_tcp_if_include/exclude addresses by using the kernel index instead of the raw index of the interface. Refs trac:3696 This commit was SVN r29522. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-10-26 00:47:14 +00:00
Nathan Hjelm	50b4b92758	hostname may not NULL-terminate the string if the buffer is too small. Thanks to Kevin M. Hildebrand for catching this. cmr=v1.7.3:reviewer=jsquyres This commit was SVN r29412.	2013-10-09 15:49:18 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Ralph Castain	10ca1c1b04	Turns out that there was exactly ONE place in all of the OMPI code base that still referred to OPAL_TRACE, though a few places retained the include file for no reason. So no point in letting this sit as it is clearly an unused "feature". This commit was SVN r28789.	2013-07-14 18:57:20 +00:00
Ralph Castain	bd65937bf3	If we enable ipv6, we resolve a hosts addresses and check them all against our local interfaces to determine if the given host is us. However, if we don't enable ipv6, we only checked the first address returned. This can cause us to incorrectly identify a hostname as "not us". Make -disable-ipv6 behave the same as --enable-ipv6 by checking all the returned addresses. This commit was SVN r28716.	2013-07-03 21:41:36 +00:00
Jeff Squyres	089c632cce	Remove a bunch of dead code: gcc 4.7 warns of set-but-unused variables. So get rid of them. This commit was SVN r28538.	2013-05-17 21:45:49 +00:00
Ralph Castain	1ec13d530c	Allow simple way to request comparison to full address regardless of addr family This commit was SVN r28519.	2013-05-14 22:08:39 +00:00
Ralph Castain	eb2edb4b2b	Silence warning This commit was SVN r28516.	2013-05-14 22:00:01 +00:00
Ralph Castain	37088f23d8	When ipv6 disabled, we still have getaddrinfo, so use it when checking common networks for resolving to kindex This commit was SVN r28496.	2013-05-14 15:54:46 +00:00
Ralph Castain	3fc1bafd82	fix typo This commit was SVN r28490.	2013-05-14 12:36:45 +00:00
Ralph Castain	f4f07bdb21	Ensure the opal_ifaddrtokindex function considers the full range of address space by using the netmask This commit was SVN r28487.	2013-05-14 03:37:44 +00:00
Ralph Castain	b73f25e839	Add a function to return the kernel index of the corresponding interface from an IPv4/6 string or hostname This commit was SVN r28397.	2013-04-25 19:40:34 +00:00
Ralph Castain	cef639f578	Ahem....cleanup a copy/paste error in naming of these functions This commit was SVN r28395.	2013-04-25 15:21:53 +00:00
Jeff Squyres	c722440411	Add public functions for retrieving the MAC and MTU (paired with r28344). This commit was SVN r28345. The following SVN revision numbers were found above: r28344 --> open-mpi/ompi@e88881c25f	2013-04-17 22:32:32 +00:00
Ralph Castain	1f011bef99	Cleanup the updated sys limits capability. Fix a few copy/paste bugs (my bad). Shift the limit set to the ODLS default module so that we sete the limits for all apps, even those that don't call opal_init. Leave it in opal_init as well to support direct-launch apps, but ensure we only set the limits once by removing the envar after launch by ODLS. Provide some nice error messages if we fail to set the limits. Since the user had to specifically request we set the limit, treat failure as an error-out situation. This commit was SVN r28288.	2013-04-04 16:00:17 +00:00
Ralph Castain	d09a9e8096	Upgrade the system limit code to support a broader range of parameters. For now, we support stack size, #open files, #children, and file size we can c reate. Continue to support the old "1" or "0" options for backward compatibility. This commit was SVN r28282.	2013-04-03 18:57:53 +00:00
Nathan Hjelm	365cf48db5	Update OPAL frameworks to use the MCA framework system. This commit was SVN r28239.	2013-03-27 21:11:47 +00:00
Nathan Hjelm	cf377db823	MCA/base: Add new MCA variable system Features: - Support for an override parameter file (openmpi-mca-param-override.conf). Variable values in this file can not be overridden by any file or environment value. - Support for boolean, unsigned, and unsigned long long variables. - Support for true/false values. - Support for enumerations on integer variables. - Support for MPIT scope, verbosity, and binding. - Support for command line source. - Support for setting variable source via the environment using OMPI_MCA_SOURCE_<var name>=source (either command or file:filename) - Cleaner API. - Support for variable groups (equivalent to MPIT categories). Notes: - Variables must be created with a backing store (char *, int , or bool *) that must live at least as long as the variable. - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of mca_base_var_set_value() to change the value. - String values are duplicated when the variable is registered. It is up to the caller to free the original value if necessary. The new value will be freed by the mca_base_var system and must not be freed by the user. - Variables with constant scope may not be settable. - Variable groups (and all associated variables) are deregistered when the component is closed or the component repository item is freed. This prevents a segmentation fault from accessing a variable after its component is unloaded. - After some discussion we decided we should remove the automatic registration of component priority variables. Few component actually made use of this feature. - The enumerator interface was updated to be general enough to handle future uses of the interface. - The code to generate ompi_info output has been moved into the MCA variable system. See mca_base_var_dump(). opal: update core and components to mca_base_var system orte: update core and components to mca_base_var system ompi: update core and components to mca_base_var system This commit also modifies the rmaps framework. The following variables were moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode, rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables. This commit was SVN r28236.	2013-03-27 21:09:41 +00:00
George Bosilca	a856f926de	Remove a bunch of unused variables. This commit was SVN r28213.	2013-03-26 14:34:29 +00:00
Ralph Castain	b7f0e46319	Provide a nicer error message when someone gives a bad signal number to opal_signal cmr:v1.7.1 This commit was SVN r28188.	2013-03-20 15:30:59 +00:00
Jeff Squyres	7f34dc266b	Add missing unlocks. Fixes CID 967022 (which covers the unlock on line 627; there's probably another CID for the unlock added on line 537). This commit was SVN r28179.	2013-03-18 23:19:25 +00:00
Ralph Castain	a4b6fb241f	Remove all remaining vestiges of the Windows integration This commit was SVN r28137.	2013-02-28 17:31:47 +00:00
Ralph Castain	e71b40fdcb	If we are redirecting to files, ensure we don't create duplicate file descriptors for output streams going to the same file. If we do, then the output gets completely jumbled - best to avoid that problem. This commit was SVN r28136.	2013-02-28 17:21:53 +00:00
Brian Barrett	33cb4d21fe	Need to include libltdl's includes so that the lt wrappers can compile This commit was SVN r28042.	2013-02-12 00:41:03 +00:00
Rolf vandeVaart	6843f02b37	Add wrapper functions to LTDL functions so other parts of the library can access the LTDL functionality. Reviewed by jsquyres. This commit was SVN r28041.	2013-02-11 15:11:47 +00:00
Rolf vandeVaart	82fb093955	Revert changeset 28011. This can break the build on some systems. This commit was SVN r28017.	2013-02-01 20:41:47 +00:00
Rolf vandeVaart	79b623d7e3	Add wrapper interface to LTDL functions so that other parts of the library can access the LTDL functionality. Reviewed by jsquyres. This commit was SVN r28011.	2013-02-01 14:11:39 +00:00

1 2 3 4 5 ...

666 Коммитов