openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	ca3a275823	opal/util: fix misc memory leaks reported by Coverity fixes CID 996174, 996920, 1196735, 1196769 and 1196770	2015-02-13 14:28:59 +09:00
Jeff Squyres	a1037cd70a	if.c: fix minor memory leak This was CID 1269846.	2015-02-12 13:41:29 -08:00
Jeff Squyres	29794af0e9	cmd_line.c: use strncat() instead of strcat() Be safe about appending to the end of strings. This was CID 71932 (and probably also others).	2015-02-12 13:41:29 -08:00
Jeff Squyres	e188c75edc	opal_environ.c: ensure "value" is a valid string for the setenv() case This was CID 1269764.	2015-02-12 13:41:29 -08:00
Jeff Squyres	167d72ec68	net.c: ensure to free the args in the error case This was CID 710643.	2015-02-12 10:24:02 -08:00
Jeff Squyres	08285c6361	lt_interface: properly check OPAL_HAVE_LTDL_ADVISE	2015-02-11 12:25:20 -08:00
Mike Dubman	da5b8c6879	OPAL: skip comparison when when fs=autofs in mtab, because we are looking for reals fs type	2014-12-18 21:42:25 +02:00
Artem Polyakov	01601f3284	Merge pull request #305 from artpol84/timing Timing framework improvement	2014-12-16 15:13:48 +06:00
Mike Dubman	2fbe87defe	Merge pull request #314 from miked-mellanox/topic/fix_opal_path_nfs add support for autofs and make check pass. jenkins: check,src_rpm	2014-12-15 20:52:52 +02:00
Mike Dubman	42f3fa0d1e	OPAL: add support for autofs magic type	2014-12-13 20:27:47 +02:00
Jeff Squyres	9e6b157cb6	opal: minor update to guess_strlen This is a minor update to open-mpi/ompi@c52601f0c5. If we have vsnprintf(), we might as well not have the rest of the guess_strlen() routine. Also document the nifty trick/behavior of vsnprintf() that enables this shortcut (it was new to me!).	2014-12-13 08:09:34 -05:00
Ralph Castain	c52601f0c5	It looks like the guess_len function in our local printf.c has some questionable code in it. Now that we are checking in configure for vsnprintf, take advantage of that check to use the far simpler method if it is available. Given that we no longer support such ancient systems where this might not be available, one suspects the other questionable code may no longer be required - but set that aside for another day.	2014-12-12 17:47:17 -08:00
Artem Polyakov	8ffad75a0a	Introduce timing interval measurement facility in timing framework	2014-12-10 16:47:49 +06:00
Ralph Castain	780c93ee57	Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL. We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.	2014-11-11 17:00:42 -08:00
Ralph Castain	4e4920a0fd	Fix stupid typo	2014-11-05 08:56:40 -08:00
Ralph Castain	2c9987b7d1	Update the opal_environ code so it behaves correct with the environ if setenv is not available	2014-11-05 08:54:06 -08:00
Ralph Castain	907b4606c5	Check for the presence of setenv. If it is present, then use it in opal_setenv when setting values in the environ	2014-11-04 16:11:54 -08:00
Gilles Gouaillardet	62bde1fcb5	opal/util/proc.c: handle unaligned opal_process_name_t parameters	2014-10-27 14:40:10 +09:00
Gilles Gouaillardet	b5aea782ce	Revert "Fix heterogeneous support" Per the discussion at http://www.open-mpi.org/community/lists/devel/2014/10/16050.php This reverts commit `c9c5d4011b`.	2014-10-16 12:24:38 +09:00
Gilles Gouaillardet	c9c5d4011b	Fix heterogeneous support * redefine orte_process_name_t so it can be converted between host and network format as an opal_identifier_t aka uint64_t by the OPAL layer. * correctly send OPAL_DSTORE_ARCH key	2014-10-15 17:19:13 +09:00
Ralph Castain	fd6a044b7f	Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages. Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.	2014-10-03 16:02:57 -06:00
Artem Polyakov	f2e586980b	Fix timing framework: 1. Fixes according to (http://www.open-mpi.org/community/lists/devel/2014/09/15869.php) 2. Force mpisync:rank0 to gather results. Now sync info is written by rank0 to the output file. 3. Improve mpirun_prof: 1) adopt to the environment (SLURM/TORQUE); 2) recognize some noteset-related mpirun options. This commit was SVN r32772.	2014-09-23 12:59:54 +00:00
Ralph Castain	70896550bf	Per input from Artem, update the copyrights on these files, ensuring to include all the licensing info for the files broght over from the mpiperf project. This commit was SVN r32770.	2014-09-20 14:54:24 +00:00
Ralph Castain	dfb952fa78	[Contribution from Artem - moved it to svn from git for him] Replace our old, clunky timing setup with a much nicer one that is only available if configured with --enable-timing. Add a tool for profiling clock differences between the nodes so you can get more precise timing measurements. I'll ask Artem to update the Github wiki with full instructions on how to use this setup. This commit was SVN r32738.	2014-09-15 18:00:46 +00:00
Jeff Squyres	66aeadacff	opal_search_libs: correctly AC_DEFINE results of search 1. It is not sufficient to put the result of m4_toupper() in a variable and use that variable as the variable name in AC_DEFINE_UNQUOTED. Instead, just use m4_toupper() directly in AC_DEFINE_UNQUOTED. Also, save the result value in a "permanent" variable that isn't erased, just in case autoconf decides to be lazy about instantiating the body AC_DEFINE_UNQUOTED and move it later (this is probably overkill :-) ). 1. Use the OMPI Way of always defining macros (to 0 or 1). Then also slightly change the logic in util/basename.c to just check OPAL_HAVE_DIRNAME (because it will always be defined). Refs trac:4894 This commit was SVN r32723. The following Trac tickets were found above: Ticket 4894 --> https://svn.open-mpi.org/trac/ompi/ticket/4894	2014-09-13 00:28:30 +00:00
Ralph Castain	ec51cbab9f	We are failing to use the system dirname function because we are not correctly flagging that we found it. Modify opal_search_libs_core to set an "opal_have_foo" flag to indicate that we found the specified function, and then modify the have_dirname check to look for it. cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32669.	2014-09-04 16:10:38 +00:00
Ralph Castain	a51d1d7a97	find_last_path_separator returns NULL if the filename doesn't contain a path separator in it - i.e., it's just a local file. So protect the loop to avoid a segfault cmr=v1.8.3:reviewer=rolfv This commit was SVN r32667.	2014-09-03 18:13:42 +00:00
Ralph Castain	8f1b9b463e	Fix shared memory operations - need to pass the local topology and cpusets of all local peers so we can properly compute relative locality for them. Also need to set default locality to "on node" in case where cpusets are not passed because procs are not bound. This commit was SVN r32577.	2014-08-22 05:17:51 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Jeff Squyres	eefa17026d	windows: effectively revert r32449 The _strdup usage in opal/util/basename looks like it was a product of Windows compatibility (see r11336), which we don't care about any more. Further, opal/win32/win_compat.h, which we sitll maintain for cygwin compatibility, #define's strdup to _strdup (which is what Microsoft wants you to use). So this old _strdup in opal/util/basename.c (and its corresponding check in configure.ac) should just be removed. This commit was SVN r32450. The following SVN revision numbers were found above: r11336 --> open-mpi/ompi@a28b025150 r32449 --> open-mpi/ompi@d5a3448b8b	2014-08-08 11:36:45 +00:00
Gilles Gouaillardet	d5a3448b8b	Fix missing prototype for _strdup _strdup is not part of any include file i could find on Solaris 10. manually add the _strdup prototype if needed. cmr=v1.8.2:reviewer=jsquyres This commit was SVN r32449.	2014-08-08 02:51:56 +00:00
Gilles Gouaillardet	3c2e75c6b7	Fix OPAL_PROCESS_NAME_xTOy for heterogeneous support This commit was SVN r32425.	2014-08-05 05:22:50 +00:00
Howard Pritchard	4beab705aa	different way to fix opal_config compile problem This commit was SVN r32415.	2014-08-04 17:37:12 +00:00
Ralph Castain	61bf7af9d2	Per Paul Hargrove's suggestion, create an opal_pagesize function to abstract the various ways of obtaining that value. Rather than creating a separate file for only that one function, put it in a convenient place that is at least somewhat related. Refs trac:4826 This commit was SVN r32407. The following Trac tickets were found above: Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826	2014-08-02 18:38:16 +00:00
George Bosilca	1e37b67e5d	No more assert in the proc destructor. This commit was SVN r32401.	2014-08-01 16:36:23 +00:00
Ralph Castain	daeb9b6c4f	Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain. Thanks to Gilles for pointing out some of the discrepancies. This commit was SVN r32398.	2014-08-01 14:44:11 +00:00
George Bosilca	f39abb9e69	Reverting r32355: a number of processes is not a notion that a low level communication library should use to initialize itself. Ralph will champion this change back with an RFC if there is a realistic need/use case from the community. This commit was SVN r32361. The following SVN revision numbers were found above: r32355 --> open-mpi/ompi@c903917f47	2014-07-30 20:11:35 +00:00
Ralph Castain	c903917f47	Expose the num_procs information to the opal layer as the info is needed in several BTLs This commit was SVN r32355.	2014-07-30 09:33:41 +00:00
George Bosilca	a3feb627cf	Move some of the ompi_process_info down in OPAL. This commit was SVN r32324.	2014-07-26 21:43:34 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
George Bosilca	ed3d98a76d	Up to strlen and not to sizeof. This is guaranteed to work as in the worst case we just forced a \0 at the end of the string. This commit was SVN r32238.	2014-07-15 05:03:06 +00:00
George Bosilca	a648fcdeb0	Upon close reset the search_dirs. This commit was SVN r32237.	2014-07-15 05:02:19 +00:00
Nathan Hjelm	1d1cef76df	opal: fix leaks Two leaks are fixed by this commit: - opal_dss.lookup_data_type returns an allocated string. Free it. - opal_ifaddrtokindex was leaking a struct addrinfo. Ensure that is released before returning. cmr=v1.8.2:reviewer=rhc This commit was SVN r31777.	2014-05-15 15:59:41 +00:00
Ralph Castain	5602156a1c	Use the correct abstraction layer name for the data dirs This commit was SVN r31684.	2014-05-08 14:32:24 +00:00
Ralph Castain	11faab1091	The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees. This commit was SVN r31679.	2014-05-08 02:01:35 +00:00
Ralph Castain	f9d892b7a4	As Nathan pointed out, C99 reserves all _foo identifiers, so rename _WORD_MASK as OPAL_CRC_WORD_MASK This commit was SVN r31615.	2014-05-02 17:21:28 +00:00
Jeff Squyres	790cdb5cc7	Sigh. It helps when you commit the right version of the finished code. This commit fixes minor errors in the incorrectly-committed r31513 (new fd close-on-exec convenience function). Refs trac:4550 This commit was SVN r31514. The following SVN revision numbers were found above: r31513 --> open-mpi/ompi@e1655ae68d The following Trac tickets were found above: Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550	2014-04-24 13:20:32 +00:00
Jeff Squyres	e1655ae68d	opal/util/fd.c: add new convenience function for setting FD_CLOEXEC Paul Hargrove pointed out that Stevens tells us that we should FD_GETFL before FD_SETFL. And so we shall. Make a new convenience function to do this (opal_fd_set_cloexec()), just so that we don't have to litter this 2-step process throughout the code. Refs trac:4550 This commit was SVN r31513. The following Trac tickets were found above: Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550	2014-04-24 13:04:49 +00:00
Ralph Castain	ac421c931d	The random number generator changes were incomplete (typo errors) in some places, and is missing the required declspec's for visibility. cmr=v1.7.5:reviewer=jsquyres This commit was SVN r31053.	2014-03-12 22:37:27 +00:00
Jeff Squyres	3f845edfdd	* Prefix the preprocessor macro used to protect the file * Include opal_stdint.h so that we have uin32_t cmr=v1.7.5:ticket=trac:4298 This commit was SVN r30890. The following Trac tickets were found above: Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298	2014-02-28 16:56:38 +00:00
Ralph Castain	fea8a52983	Cleanup trailing spaces and use of tab instead of spaces Refs trac:4298 This commit was SVN r30827. The following Trac tickets were found above: Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298	2014-02-25 23:41:55 +00:00
Joshua Ladd	9ea9bec4ad	Addressing Jeff's comments: 1. Changed rng_buff_t --> opal_rng_buff_t 2. All global variables obey the prefix rule 3. Old code has been removed 4. Found a couple of unnecessary includes Refs trac:4298 This commit was SVN r30807. The following Trac tickets were found above: Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298	2014-02-24 23:18:35 +00:00
Joshua Ladd	e39d9f4080	Per the RFC schedule, add an additive lagged Fibonacci parallel random number generator to OPAL. In order to use, please add the following header to your code: opal/util/alfg.h. See ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c for an example how to seed with opal_srand and invoke the generator with opal_rand. This should be added to cmr=v1.7.5:reviewer=rhc:subject=Add an OPAL RNG This commit was SVN r30801.	2014-02-23 21:41:38 +00:00
Ralph Castain	d246d190ed	Fix typo - thanks to Andreas Schwab for the patch RM-approved cmr:v1.7.5:reviewer=ompi-gk1.7 This commit was SVN r30751.	2014-02-17 19:36:16 +00:00
Ralph Castain	86e8a147c6	Resolve uninitialized variables on some systems. Thanks to Paul Hargrove for finding the problem and suggesting the patch. cmr=v1.7.5:reviewer=ompi-gk1.7 This commit was SVN r30656.	2014-02-10 21:17:34 +00:00
Jeff Squyres	ab31428bd3	opal_path_nfs(): If we get EPERM, just give up. Also fix the wording in a comment. This is worth fixing, but not worth holding up 1.7.4. cmr=v1.7.5:reviewer=rhc This commit was SVN r30307.	2014-01-17 14:28:12 +00:00
Jeff Squyres	9950471df7	Fixes for opal_path_nfs(): * Fix some typos in macro names. * Add case for OS's that have statfs() but no struct statfs (!). * Add case for NetBSD with struct statvfs.f_fstypename. Many thanks to Paul Hargrove who developed the majority of this patch. Reviewed by Jeff Squyres. cmr=v1.7.4:reviewer=ompi-rm1.7 This commit was SVN r30255.	2014-01-11 01:07:10 +00:00
Jeff Squyres	023c50e864	Fix typo in macro name (#$%@#$% defined-or-not macros!!) Refs trac:4079 This commit was SVN r30206. The following Trac tickets were found above: Ticket 4079 --> https://svn.open-mpi.org/trac/ompi/ticket/4079	2014-01-09 23:47:36 +00:00
Jeff Squyres	c67c8e8187	Make the use of statfs()/statvfs() be more robust. As noted by Paul Hargrove, the #if's surrounding the use of statfs() and statvfs() in opal/util/path.c have apparently gotten stale (e.g., modern flavors of BSD OSs no longer define __BSD). Changes: Add statfs and statvfs to the AC_CHECK_FUNCS in configure.ac * Add a sanity check to ensure that we have at least one of statfs() or statvfs(). Add a similar sanity check in opal/util/path.c, just as defensive programming. * Use AC_CHECK_MEMBERS in configure.ac to check for specific struct statfs/struct statvfs members that we use in opal/util/path.c * In path.c, add some #includes as listed on the OS man page for statfs(2) (OS X 10.8.5/Mountain Lion) * The previous code used statvfs() on Solaris and statfs() everywhere else. Attempting to replicate this with behavior-based configure testing led to fairly complicted if/else logic, so the new code uses whichever of the two are available (i.e., it might actually use both -- OS X 10.8.5 and RHEL 6.5 have both statfs() and statvfs()). The rationale here is that we don't really care which of the two functions report the answer; we'll take the answer regardless of where it comes from. For example, if one function returns a failure and the other does not, we'll use the results from the successful function and ignore the failed one. This new code seems to work on OS X and Linux. We'll have to see what happens with MTT and future Paul Hargrove testing... cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Make statfs/statvfs more robust This commit was SVN r30198.	2014-01-09 21:28:52 +00:00
Ralph Castain	2b92fccfd1	Looks like this code was intended to separate Sun's vfs struct from everyone else's, yet the #elif can make it fail on some systems that actually support the capability. So just make it an #else to cover the range of systems we now support and move on. cmr=v1.7.4:reviewer=jsquyres:subject=correct opal_path_df logic This commit was SVN r30172.	2014-01-09 04:10:26 +00:00
Jeff Squyres	13b29cff2c	This commit compliements/completes r30140. r30140 made all the configury/Makefile.am changes; this commit renames the internal installdirs.h framework struct field names to match the configry macro names: * pkgdatdir -> ompidatadir * pkglibdir -> ompilibdir * pkgincludedir -> ompiincludedir This commit was SVN r30145. The following SVN revision numbers were found above: r30140 --> open-mpi/ompi@8b778903d8	2014-01-07 23:36:33 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
George Bosilca	24879f9def	Code cleanup while chasing valgrind complaints. This commit was SVN r30048.	2013-12-21 23:28:14 +00:00
Ralph Castain	7cf0fc5578	One more round of sys_limit fixes...sigh Refs trac:4010 This commit was SVN r30011. The following Trac tickets were found above: Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010	2013-12-20 14:44:51 +00:00
Ralph Castain	e49c16b975	Grrr....use #if instead of #ifdef Refs trac:4010 This commit was SVN r30010. The following Trac tickets were found above: Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010	2013-12-20 14:24:26 +00:00
Ralph Castain	6e6351959d	Check for all the RLIMIT_foo constants that we use, and update the limit checks to use the new #define values. Fix a bug where failure of some might lead to incorrect bracketing. Refs trac:4010 This commit was SVN r30009. The following Trac tickets were found above: Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010	2013-12-20 14:09:43 +00:00
Jeff Squyres	090ce4187a	Fix compiler errors on Solaris, NetBSD, and OpenBSD: * Per http://www.open-mpi.org/community/lists/devel/2013/12/13504.php, protect usage of struct ifreq->ifr_hwaddr * Per http://www.open-mpi.org/community/lists/devel/2013/12/13503.php, avoid #define conflict with the token "if_mtu" * Also fix some whitespace and string naming issues in opal/util/if.c Tested by Paul Hargrove. Refs trac:4010 This commit was SVN r30006. The following Trac tickets were found above: Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010	2013-12-20 11:17:30 +00:00
Ralph Castain	f15b0c9863	Add protections around the various system limits to protect code on unusual systems Thanks to Paul Hargrove for reporting it on OpenBSD-5 cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30003.	2013-12-20 03:18:07 +00:00
Ralph Castain	79af9825ac	Update of patch from Takahiro Kawashima Refs trac:3986 This commit was SVN r29984. The following Trac tickets were found above: Ticket 3986 --> https://svn.open-mpi.org/trac/ompi/ticket/3986	2013-12-19 17:22:37 +00:00
Jeff Squyres	42e3e5cd4b	Fixes trac:3990: ensure we don't SIGBUS on SPARC by forcing a memory copy and preventing access to potentially unaligned data. Reviewed by Dave Goodell. Tested by Siegmarr Gross. cmr=v1.7.4:reviewer=ompi-rm1.7:subject=fix SPARC SIGBUS in opal net code This commit was SVN r29983. The following Trac tickets were found above: Ticket 3990 --> https://svn.open-mpi.org/trac/ompi/ticket/3990	2013-12-19 16:51:34 +00:00
Ralph Castain	77553f72be	Per this email thread: http://www.open-mpi.org/community/lists/devel/2013/12/13412.php fix the backtrace function to avoid async issues. Thanks to Takahiro Kawashima for the patch This commit was SVN r29955.	2013-12-18 17:57:37 +00:00
Jeff Squyres	0ab48ad0d2	Fix some annoying flex warnings that have been there for years. Many thanks to Tom Fogal for the initial patch. cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings This commit was SVN r29904.	2013-12-14 00:36:12 +00:00
Jeff Squyres	ad51705891	Fix compiler warnings about signed/unsigned comparisons Change static opal_setlimit() function to return its value in an OUT parameter and return the usual int error code indicating success or failure. The OUT param and return code need to be separated because the OUT param is an unsigned type, but opal_setlimit() was returning -1 upon failure. Hence, the caller could not know that it had failed because the return type was previously an unsigned type. cmr=v1.7.4:reviewer=rhc:subject=Fix opal sys_limits.c signed/unsigned warnings This commit was SVN r29685.	2013-11-13 15:40:34 +00:00
Ralph Castain	8c5c7d0db4	Correct a bug in handling of oob_tcp_if_include/exclude addresses by using the kernel index instead of the raw index of the interface. Refs trac:3696 This commit was SVN r29522. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-10-26 00:47:14 +00:00
Nathan Hjelm	50b4b92758	hostname may not NULL-terminate the string if the buffer is too small. Thanks to Kevin M. Hildebrand for catching this. cmr=v1.7.3:reviewer=jsquyres This commit was SVN r29412.	2013-10-09 15:49:18 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Ralph Castain	10ca1c1b04	Turns out that there was exactly ONE place in all of the OMPI code base that still referred to OPAL_TRACE, though a few places retained the include file for no reason. So no point in letting this sit as it is clearly an unused "feature". This commit was SVN r28789.	2013-07-14 18:57:20 +00:00
Ralph Castain	bd65937bf3	If we enable ipv6, we resolve a hosts addresses and check them all against our local interfaces to determine if the given host is us. However, if we don't enable ipv6, we only checked the first address returned. This can cause us to incorrectly identify a hostname as "not us". Make -disable-ipv6 behave the same as --enable-ipv6 by checking all the returned addresses. This commit was SVN r28716.	2013-07-03 21:41:36 +00:00
Jeff Squyres	089c632cce	Remove a bunch of dead code: gcc 4.7 warns of set-but-unused variables. So get rid of them. This commit was SVN r28538.	2013-05-17 21:45:49 +00:00
Ralph Castain	1ec13d530c	Allow simple way to request comparison to full address regardless of addr family This commit was SVN r28519.	2013-05-14 22:08:39 +00:00
Ralph Castain	eb2edb4b2b	Silence warning This commit was SVN r28516.	2013-05-14 22:00:01 +00:00
Ralph Castain	37088f23d8	When ipv6 disabled, we still have getaddrinfo, so use it when checking common networks for resolving to kindex This commit was SVN r28496.	2013-05-14 15:54:46 +00:00
Ralph Castain	3fc1bafd82	fix typo This commit was SVN r28490.	2013-05-14 12:36:45 +00:00
Ralph Castain	f4f07bdb21	Ensure the opal_ifaddrtokindex function considers the full range of address space by using the netmask This commit was SVN r28487.	2013-05-14 03:37:44 +00:00
Ralph Castain	b73f25e839	Add a function to return the kernel index of the corresponding interface from an IPv4/6 string or hostname This commit was SVN r28397.	2013-04-25 19:40:34 +00:00
Ralph Castain	cef639f578	Ahem....cleanup a copy/paste error in naming of these functions This commit was SVN r28395.	2013-04-25 15:21:53 +00:00
Jeff Squyres	c722440411	Add public functions for retrieving the MAC and MTU (paired with r28344). This commit was SVN r28345. The following SVN revision numbers were found above: r28344 --> open-mpi/ompi@e88881c25f	2013-04-17 22:32:32 +00:00
Ralph Castain	1f011bef99	Cleanup the updated sys limits capability. Fix a few copy/paste bugs (my bad). Shift the limit set to the ODLS default module so that we sete the limits for all apps, even those that don't call opal_init. Leave it in opal_init as well to support direct-launch apps, but ensure we only set the limits once by removing the envar after launch by ODLS. Provide some nice error messages if we fail to set the limits. Since the user had to specifically request we set the limit, treat failure as an error-out situation. This commit was SVN r28288.	2013-04-04 16:00:17 +00:00
Ralph Castain	d09a9e8096	Upgrade the system limit code to support a broader range of parameters. For now, we support stack size, #open files, #children, and file size we can c reate. Continue to support the old "1" or "0" options for backward compatibility. This commit was SVN r28282.	2013-04-03 18:57:53 +00:00
Nathan Hjelm	365cf48db5	Update OPAL frameworks to use the MCA framework system. This commit was SVN r28239.	2013-03-27 21:11:47 +00:00
Nathan Hjelm	cf377db823	MCA/base: Add new MCA variable system Features: - Support for an override parameter file (openmpi-mca-param-override.conf). Variable values in this file can not be overridden by any file or environment value. - Support for boolean, unsigned, and unsigned long long variables. - Support for true/false values. - Support for enumerations on integer variables. - Support for MPIT scope, verbosity, and binding. - Support for command line source. - Support for setting variable source via the environment using OMPI_MCA_SOURCE_<var name>=source (either command or file:filename) - Cleaner API. - Support for variable groups (equivalent to MPIT categories). Notes: - Variables must be created with a backing store (char *, int , or bool *) that must live at least as long as the variable. - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of mca_base_var_set_value() to change the value. - String values are duplicated when the variable is registered. It is up to the caller to free the original value if necessary. The new value will be freed by the mca_base_var system and must not be freed by the user. - Variables with constant scope may not be settable. - Variable groups (and all associated variables) are deregistered when the component is closed or the component repository item is freed. This prevents a segmentation fault from accessing a variable after its component is unloaded. - After some discussion we decided we should remove the automatic registration of component priority variables. Few component actually made use of this feature. - The enumerator interface was updated to be general enough to handle future uses of the interface. - The code to generate ompi_info output has been moved into the MCA variable system. See mca_base_var_dump(). opal: update core and components to mca_base_var system orte: update core and components to mca_base_var system ompi: update core and components to mca_base_var system This commit also modifies the rmaps framework. The following variables were moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode, rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables. This commit was SVN r28236.	2013-03-27 21:09:41 +00:00
George Bosilca	a856f926de	Remove a bunch of unused variables. This commit was SVN r28213.	2013-03-26 14:34:29 +00:00
Ralph Castain	b7f0e46319	Provide a nicer error message when someone gives a bad signal number to opal_signal cmr:v1.7.1 This commit was SVN r28188.	2013-03-20 15:30:59 +00:00
Jeff Squyres	7f34dc266b	Add missing unlocks. Fixes CID 967022 (which covers the unlock on line 627; there's probably another CID for the unlock added on line 537). This commit was SVN r28179.	2013-03-18 23:19:25 +00:00
Ralph Castain	a4b6fb241f	Remove all remaining vestiges of the Windows integration This commit was SVN r28137.	2013-02-28 17:31:47 +00:00
Ralph Castain	e71b40fdcb	If we are redirecting to files, ensure we don't create duplicate file descriptors for output streams going to the same file. If we do, then the output gets completely jumbled - best to avoid that problem. This commit was SVN r28136.	2013-02-28 17:21:53 +00:00
Brian Barrett	33cb4d21fe	Need to include libltdl's includes so that the lt wrappers can compile This commit was SVN r28042.	2013-02-12 00:41:03 +00:00
Rolf vandeVaart	6843f02b37	Add wrapper functions to LTDL functions so other parts of the library can access the LTDL functionality. Reviewed by jsquyres. This commit was SVN r28041.	2013-02-11 15:11:47 +00:00
Rolf vandeVaart	82fb093955	Revert changeset 28011. This can break the build on some systems. This commit was SVN r28017.	2013-02-01 20:41:47 +00:00
Rolf vandeVaart	79b623d7e3	Add wrapper interface to LTDL functions so that other parts of the library can access the LTDL functionality. Reviewed by jsquyres. This commit was SVN r28011.	2013-02-01 14:11:39 +00:00
Brian Barrett	29aaa21c5a	Fix some warnings when we don't have sockets or syslog This commit was SVN r27973.	2013-01-29 23:02:26 +00:00
Brian Barrett	fc3df11e08	Remove the (only two) fortran constants from OPAL. The only places that actually care if opal_pointer_array is limited to handle_max already passes that in as the max_size during init, so don't need it there. The arch constant was a bit more difficult, so pass that in during MPI init and leave empty otherwise. This is to help with the effort to allow building ompi against an external opal or orte. This commit was SVN r27817.	2013-01-15 01:27:36 +00:00
Nathan Hjelm	3e1b13b13a	Re-add support for old flex (2.5.4a and earlier) while still cleaning up properly in new flex. This commit was SVN r27657.	2012-12-07 00:12:43 +00:00
Ralph Castain	fdf7633cff	Per Jeff's suggestion, set the default answer when asking for IP aliases in case we don't find any This commit was SVN r27620.	2012-11-16 14:28:30 +00:00
Ralph Castain	a52071a17d	Add a function to return the aliases (based on IP addrs) for the current node This commit was SVN r27618.	2012-11-16 04:02:29 +00:00
Ralph Castain	f9f07e9535	Add a function to test if a string is in the form of an IP address - doesn't test for validity of the address This commit was SVN r27583.	2012-11-10 14:01:12 +00:00
Nathan Hjelm	e0f5137e46	add prototypes for lex destroy functions This commit was SVN r27580.	2012-11-09 22:00:27 +00:00
Nathan Hjelm	8658bbc902	instead of relying on yyterminate to clean up the lex context call the destroy functions directly (after closing the file) This commit was SVN r27577.	2012-11-09 16:10:55 +00:00
Nathan Hjelm	7fb5caea92	Remove the finish_parsing function from various .l files. The function is incomplete (doesn't clean up the lex state) and should be replaced by *_yylex_destroy which correctly cleans up the state. Checked with the flex 2.5.35. Verified with valgrind that this fixes several "still reachable" leaks. cmr:v1.7 This commit was SVN r27571.	2012-11-06 19:26:14 +00:00
Ralph Castain	a1c51dc1d6	Wow - fix an error that has been around for a long time. opal_path_access requires a NULL pointer, not an empty string, to correctly operate. Thanks to Marco Atzeri for chasing this down! cmr:v1.6,v1.7 This commit was SVN r27539.	2012-10-31 14:10:51 +00:00
Nathan Hjelm	2acd0f83de	Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter". It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now. This commit was SVN r27527. The following SVN revision numbers were found above: r27451 --> open-mpi/ompi@d59034e6ef r27456 --> open-mpi/ompi@ecdbf34937	2012-10-30 19:45:18 +00:00
Ralph Castain	094d6f3143	Add a new "distributed file system" capability to support file access operations across nodes that do not have a network file system attached to them. Add a set of URI create/parse utilities This commit was SVN r27483.	2012-10-25 17:15:17 +00:00
Ralph Castain	e6014bf2e1	Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter This commit was SVN r27477. The following SVN revision numbers were found above: r27451 --> open-mpi/ompi@d59034e6ef r27456 --> open-mpi/ompi@ecdbf34937	2012-10-24 18:38:44 +00:00
Nathan Hjelm	d59034e6ef	MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions. cmr:v1.7 This commit was SVN r27451.	2012-10-17 20:17:37 +00:00
Nathan Hjelm	47fff80a56	remove unused, deprecated function opal_cmd_line_make_opt This commit was SVN r27437.	2012-10-11 18:50:11 +00:00
Samuel Gutierrez	1f24f1d305	Update the data types used in opaldf to minimize the chance of overflow when determining the amount of available space. Thanks to Eugene for pointing out the issue. This commit was SVN r27436.	2012-10-11 16:11:23 +00:00
Samuel Gutierrez	21be553e21	Add Windows support to opaldf and shmem/windows -- thanks Shiqing. Next commit will fix issues found by Eugene. This commit was SVN r27435.	2012-10-11 14:49:41 +00:00
Samuel Gutierrez	0461826a4b	Fix bus errors caused by an inadequate amount of space during opal_shmem_segment_create by testing whether or not the target mount has enough space to accommodate the shared-memory backing store. Fixes trac:2827. Will work with Shiqing to add Windows support (if required). This commit was SVN r27433. The following Trac tickets were found above: Ticket 2827 --> https://svn.open-mpi.org/trac/ompi/ticket/2827	2012-10-09 20:48:04 +00:00
Ralph Castain	cb48fd52d4	Implement the MPI_Info part of MPI-3 Ticket 313. Add an MPI_info object MPI_INFO_GET_ENV that contains a number of run-time related pieces of info. This includes all the required ones in the ticket, plus a few that specifically address recent user questions: "num_app_ctx" - the number of app_contexts in the job "first_rank" - the MPI rank of the first process in each app_context "np" - the number of procs in each app_context Still need clarification on the MPI_Init portion of the ticket. Specifically, does the ticket call for returning an error is someone calls MPI_Init more than once in a program? We set a flag to tell us that we have been initialized, but currently never check it. This commit was SVN r27005.	2012-08-12 01:28:23 +00:00
George Bosilca	f7528bb404	Remove unused variables. This commit was SVN r26966.	2012-08-08 12:43:13 +00:00
George Bosilca	2303cd0bdb	Remove initialized but unused variables. This commit was SVN r26959.	2012-08-07 12:05:25 +00:00
Jeff Squyres	0b7b3feba9	Minor fix for the command line parser: we didn't previously distinguish between unknown ''options'' (i.e., command line options that are registered and have some meaning) and unknown ''tokens'' (i.e., strings that do not begin with "-"). Hence, if you did: mpirun --fo my_mpi_program (when perhaps you meant to type "--foo", mpirun would complain that no such executable "--fo" existed. That is ''correct,'' but perhaps not completely useful. It is more accurate for mpirun to report that there is no such "--fo" option. This change to cmd_line.c makes it so that we will ''always'' report errors regarding tokens that begin with "-". This commit was SVN r26953.	2012-08-06 17:13:08 +00:00
Ralph Castain	bd8b4f7f1e	Sorry for mid-day commit, but I had promised on the call to do this upon my return. Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code. Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch. This commit was SVN r26242.	2012-04-06 14:23:13 +00:00
Jeff Squyres	97b3603036	A bunch of fixes and improvements to Open MPI's various command line tools. * fixed some bugs where "unknown" tokens were allowed on the command line (which should really only be used for ortertun). * if an unknown token is encountered, print a short error to stderr and quit with a nonzero exit status * if we don't find the right number of parameters to an option, print a short error to stderr and quit with a nonzero exit status * when --help is given, print the help message to stdout (not stderr) and quit with a zero exit status * added --showme:help option to the wrapper compilers * updated docs in opal/util/cmd_line.h * other small/miscellaneous CLI parsing bugs in various tools I won't bore you with what we did before. :-) Here's some examples of what the new behavior looks like: {{{ % ompi_info --bogus ompi_info: Error: unknown option "--bogus" Type 'ompi_info --help' for usage. % ompi_info --param bogus ompi_info: Error: option "--param" did not have enough parameters (2) Type 'ompi_info --help' for usage. % }}} This commit was SVN r26072.	2012-02-29 17:52:38 +00:00
Ralph Castain	8446673dc4	Update the cmd line parser to return an error if someone forgets to include a numeric parameter to a cmd line option that requires one. Can't do anything about options that require strings, but we can at least bark when someone forgets the "-np N" argument. This commit was SVN r26068.	2012-02-28 20:33:53 +00:00
Jeff Squyres	5f9ac93455	Fix suggested by Paul Hargrove to elminate a dangerous trailing context This commit was SVN r25983.	2012-02-21 13:29:58 +00:00
Jeff Squyres	6bb98f072f	Fix typo; could hypothetically fix the problem reported by Paul Hargrove: http://www.open-mpi.org/community/lists/devel/2012/02/10483.php This commit was SVN r25982.	2012-02-21 13:19:09 +00:00
Ralph Castain	47c64ec837	Roll in Java bindings per telecon discussion. Man pages still under revision This commit was SVN r25973.	2012-02-20 22:12:43 +00:00
Jeff Squyres	435aea9ccd	A better solution -- just look for !__linux!__. This commit was SVN r25841.	2012-01-31 20:27:33 +00:00
Jeff Squyres	538cdce8fb	Add checks for !__linux and !__linux__, per Paul Hargrove's analysis: http://www.open-mpi.org/community/lists/devel/2012/01/10283.php. Also remove some unused #defines. This commit was SVN r25836.	2012-01-31 16:45:50 +00:00
Jeff Squyres	6fbbfd0f7a	Gah! r25545 acidentally included ''waaaay'' more stuff than it was supposed to. I.e., half-baked/not complete stuff. This commit backs out all of r25545. Sorry folks! This commit was SVN r25546. The following SVN revision numbers were found above: r25545 --> open-mpi/ompi@7f9ae11faf	2011-11-29 23:24:52 +00:00
Jeff Squyres	7f9ae11faf	Per http://www.open-mpi.org/community/lists/users/2011/11/17862.php , to make MPI_IN_PLACE (and other sentinel Fortran constants) work on OS X, we need to use the following compiler (linker) flag: -Wl,-commons,use_dylibs So if we're compiling on OS X, test to see if that flag works with the compiler. If so, add it to the wrapper FFLAGS and FCFLAGS (note that per a future update, we'll only have one Fortran compiler anyway). Fixes trac:1982. This commit was SVN r25545. The following Trac tickets were found above: Ticket 1982 --> https://svn.open-mpi.org/trac/ompi/ticket/1982	2011-11-29 23:05:54 +00:00
Jeff Squyres	21dc0b44e1	Fix minor typo in comment This commit was SVN r25542.	2011-11-29 20:39:53 +00:00
Christopher Yeoh	7e7701e7fc	Removes misleading debug warning from opal_free when a NULL pointer is passed to it. Fixes trac:2884 This commit was SVN r25430. The following Trac tickets were found above: Ticket 2884 --> https://svn.open-mpi.org/trac/ompi/ticket/2884	2011-11-03 23:57:26 +00:00
Rainer Keller	4e6a6fc146	- Check, whether the compiler supports __builtin_clz (count leading zeroes); if so, use it for bit-operations like opal_cube_dim and opal_hibit. Implement two versions of power-of-two. In case of opal_next_poweroftwo, this reduces the average execution time from 83 cycles to 4 cycles (Intel Nehalem, icc, -O2, inlining, measured rdtsc, with loop over 2^27 values). Numbers for other functions are similar (but of course heavily depend on the usage, e.g. opal_hibit() with a start of 4 does not save much). The bsr instruction on AMD Opteron is also not as fast. - Replace various places where the next power-of-two is computed. Tested on Intel Nehalem Cluster with openib, compilers GNU-4.6.1 and Intel-12.0.4 using mpi_testsuite -t "Collective" with 128 processes. This commit was SVN r25270.	2011-10-11 22:49:01 +00:00
Swen Boehm	08b4322a1a	patched the lex files to not issue the following compiler warning: 'yyunput' defined but not used This commit was SVN r25246.	2011-10-10 18:13:04 +00:00
George Bosilca	649af6c925	Enumerated mixed with another type (int) is tolerated but easily fixable. This commit was SVN r25241.	2011-10-09 03:54:52 +00:00
Ralph Castain	da9bbf68ec	Fix the output of error strings. Every convertor is returning OPAL_SUCCESS, so you have to check each convertor to find which one this error belongs to, and then run ONLY that convertor. This commit was SVN r25009.	2011-08-08 04:10:40 +00:00
Ralph Castain	2af867d26f	Don't segfault if show_help is called prior to calling opal_init_util This commit was SVN r24825.	2011-06-27 16:35:19 +00:00
Ralph Castain	4c06c9c07c	Simplify the code a little bit by recognizing that end=start isn't an error, but just indicates a partial address typical of CIDR notation. This commit was SVN r24757.	2011-06-07 11:33:22 +00:00
Ralph Castain	666fdeab8f	Okay to return an error on end=start of string conversion so long as the strlen > 0, so restore that error check. This commit was SVN r24756.	2011-06-07 03:20:01 +00:00
Ralph Castain	f3cae3d6f3	Cleanup the handling of if_include and if_exclude arguments based on CIDR notation. Fix a bug in the new code that prevented the system from correctly matching addresses. Remove comments in the show-help text indicating that we would continue in the face of incorrect specifications - leave that to the calling layer to decide. Modify the new opal_ifmatches so it returns error codes letting the caller better understand the result. Modify the oob to ensure we abort if we don't find interfaces matching specified constraints, and that we do so without multiple error messages. NOTE: we have a conflict in our standards. We have been using comma-delimited lists of interfaces for all our params. However, one param - opal_net_private_ipv4 - now uses semicolons instead of comma separators. No idea why, but it is confusing. This commit was SVN r24755.	2011-06-07 02:09:11 +00:00
George Bosilca	910a289e97	Remove the explicit "attemt to continue". This commit was SVN r24754.	2011-06-07 01:27:08 +00:00
George Bosilca	7ebd094ecf	Cleanup the IPv4 address parsing, and correct the error message. This commit was SVN r24750.	2011-06-06 03:08:02 +00:00
Ralph Castain	1491d52bd7	Extend the parsing capability of the oob tcp module's if_include and if_exclude options to support subnet+mask notation, and to handle virtual IP addresses (it was previously having problems distinguishing between "eth1" and "eth1.3"). This commit was SVN r24747.	2011-06-05 19:16:42 +00:00
Ralph Castain	486041f89d	Get rid of the annoying error messages when setrlimit fails, which seems to be a constant problem on the Mac. Don't use the changed values for max limits if the setrlimit call failed. This commit was SVN r24703.	2011-05-17 03:27:43 +00:00
Ralph Castain	a3e43594a4	Extend node stats to include additional memory info. Change "darwin" pstat module to "test" as we don't really know how to get all the stat info for darwin. Add a new OPAL_ERROR_LOG macro similar to the ORTE_ERROR_LOG one. This commit was SVN r24692.	2011-05-08 14:45:16 +00:00
George Bosilca	34abbce82c	More accurate and trustworthy descriptions of the netmask exist. Interested readers can quench their curiosity either with one of the Richard Stevens books (ISBN 9780201633467) or the Wikipedia page (http://en.wikipedia.org/wiki/Subnetwork). This commit was SVN r24680.	2011-05-03 21:59:51 +00:00
Ralph Castain	257473ebca	Remove an extra "break" - thanks to Rainer for pointing it out. This commit was SVN r24667.	2011-05-02 12:20:37 +00:00
Ralph Castain	7b29a6153e	Cover all the netmask values This commit was SVN r24665.	2011-04-29 17:56:15 +00:00
Shiqing Fan	4490fdbd34	Add the initial support for MinGW and MSYS. Correctly check the dependencies of MSYS env. Set up configure include and lib path for building the package. update a few more CMake scripts. This commit was SVN r24663.	2011-04-29 14:42:07 +00:00
Jeff Squyres	16d8e9216b	Ran across this comment about i18n support, so I figured I'd update it. :-) This commit was SVN r24631.	2011-04-22 12:14:20 +00:00
George Bosilca	eb8383802e	ret might have been used uninitialized. Not anymore. This commit was SVN r24452.	2011-02-24 03:02:48 +00:00
Shiqing Fan	baad4e1844	fix a non if-controlled brace. This commit was SVN r24428.	2011-02-22 11:45:43 +00:00
Ralph Castain	e22262602e	Extend the opal output code to support systems that cannot allow stdout/err to be output to console or files. This occurs in some embedded environments where file systems are in flash and consoles are redirected to NULL. Add three new envars (not MCA params!) that control this behavior (see output.h for explanation). This commit was SVN r24422.	2011-02-21 21:42:59 +00:00
Ralph Castain	bf1cff3711	Plug a couple of additional memory leaks - try to highlight a little better that strings returned from reg_string_name must be freed by caller This commit was SVN r24383.	2011-02-14 20:58:22 +00:00
Ralph Castain	b5de068533	Clean up an error in r24371 - can't use a const parameter as target in asprintf as it changes the value of the address. Add some new proc/job states Rename a constant to reflect coming change - remove the arbitrary difference between restarting a proc locally and relocating it to another node in terms of the number of restarts allowed. Add pretty-print of signals for "proc aborted due to signal" reports. This commit was SVN r24378. The following SVN revision numbers were found above: r24371 --> open-mpi/ompi@93d28a5792	2011-02-14 19:29:09 +00:00
Abhishek Kulkarni	93d28a5792	Change opal_err2str_fn_t to return the error string as an argument. This means that the converters (opal_err2str, orte_err2str) can now return NULL as a "silent error". The return value of opal_err2str_fn_t is the status of the operation (OPAL_SUCCESS or OPAL_ERROR). This fixes the "Unknown error" message issues on the trunk. This commit was SVN r24371.	2011-02-13 16:09:17 +00:00
Nysal Jan	92e06b0a1f	Missed this change suggested by Terry This commit was SVN r24364.	2011-02-08 04:06:52 +00:00
Nysal Jan	a31025bb48	Fix pty setup code on AIX This commit was SVN r24363.	2011-02-08 02:54:47 +00:00
Abhishek Kulkarni	d711c5a4b1	SOS fix for the Studio compilers (Thanks to Terry for spotting this). This commit was SVN r24355.	2011-02-03 22:36:28 +00:00
Abhishek Kulkarni	3243b16bb3	Decode SOS error code before checking it with the native error code. This commit was SVN r24281.	2011-01-20 23:21:38 +00:00
Ralph Castain	ac1853b5d8	Took me a couple of days, but finally tracked this one down. Some compilers/glibc's don't like composite test statements in a return and just randomly pick one of the two options. So....don't do that!!! This commit was SVN r24212.	2011-01-10 16:29:42 +00:00
Jeff Squyres	a525e70f46	Convert "opal_show_help" to be a global variable pointer. It is statically initialized to the real back-end OPAL show_help function. During orte_show_help_init(), the variable is re-assigned with the value of the back-end ORTE show_help function (the one that does error message aggregation). Therefore, anything that calls opal_show_help() after a certain point in orte_init() will have their show_help messages be aggregated. w00t! Even code down in OPAL -- that has no knowledge of ORTE -- will have their messages aggregated. '''Double w00t!''' During orte_show_help_finalize(), we restore the original pointer value so that it something calls opal_show_help() after orte_finalize(), it'll still work properly (but it won't be aggregated). This commit was SVN r24185.	2010-12-16 23:00:25 +00:00
Terry Dontje	b3f2ac8d46	removed direct include of stdbool.h from event.h that was causing studio C++ issues. Also removed include of stdbool.h in a couple other places since it was already being pulled in via opal_config_bottom.h. This commit was SVN r23963.	2010-10-27 20:47:42 +00:00
Ralph Castain	bab990d812	Revert r23928 as being the incorrect fix. The correct fix is not to include ipv6 interfaces when ipv6 support was not requested. This commit was SVN r23930. The following SVN revision numbers were found above: r23928 --> open-mpi/ompi@7394f6d167	2010-10-25 14:31:18 +00:00
Ralph Castain	7394f6d167	Silence warnings about IPV6 sa_family not known when ipv6 support is not enabled in configure This commit was SVN r23928.	2010-10-25 13:56:23 +00:00
Jeff Squyres	73bcc4a36b	Fix mistake that came in via the ompi-agen tree in r23764. The mistake wasn't part of the core autogen upgrade; it was an additional 'bonus' cleanup. Oops. The mistake will always create a set of directories under installdir, even if you do not --with-devel-headers. The set of directories will be empty, but still -- they should not be there at all. This commit fixes that -- the directories are not created at all if you do not --with-devel-headers This commit was SVN r23801. The following SVN revision numbers were found above: r23764 --> open-mpi/ompi@40a2bfa238	2010-09-24 22:53:28 +00:00
Ralph Castain	3631e4e936	Revert remaining svn kruft from r23764 This commit was SVN r23786. The following SVN revision numbers were found above: r23764 --> open-mpi/ompi@40a2bfa238	2010-09-22 01:11:40 +00:00
Ralph Castain	40a2bfa238	WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues. This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change. Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation. This commit was SVN r23764.	2010-09-17 23:04:06 +00:00
Ralph Castain	e96b5f486f	Reorganize the opal interface code in opal/util/if.c per prior emails and telecon discussions. Move the interface discovery code into a framework so that configuration logic can separate it out (instead of the prior #if-#else confusion). All interface APIs for accessing the info remain unchanged in opal/util/if.c. This has been tested on Mac, Linux, and NetBSD. Nobody else seemed interested in testing it, so there may be some future problems revealed as people try it on other OSs. This commit was SVN r23743.	2010-09-13 01:58:51 +00:00
Rainer Keller	5eb571c458	- As suggested in CMR #2558 , attribute-macros should be be tested on function pointers and assigned accordingly, instead of using the pre-processor in the header files. A functional change is (re-) specifying __opal_attribute_noreturn__ on orte_errmgr_base_abort(): All modules in the errmgr framework either use this function, or define their own abort function, which sets __opal_attribute_noreturn__. This attributes was taken out with the errmgr overhaul in r22872. This commit was SVN r23689. The following SVN revision numbers were found above: r22872 --> open-mpi/ompi@e4f2d03d28	2010-08-31 10:28:51 +00:00
Brad Benton	09c4f4d95c	Added copyright notices for the files modified in r23669. This commit was SVN r23687. The following SVN revision numbers were found above: r23669 --> open-mpi/ompi@271cfa8c9a	2010-08-30 17:46:47 +00:00
Nysal Jan	271cfa8c9a	Fix the the opal_path_nfs test for GPFS. Reported by Paul H. Hargrove This commit was SVN r23669.	2010-08-26 10:10:16 +00:00
Jeff Squyres	a5ce58f098	Define that we return OPAL_ERR_TIMEOUT if the other end of the socket closes in an opal_fd_read(). This commit was SVN r23650.	2010-08-24 19:07:04 +00:00
Ralph Castain	51833bfe6c	Not -everyone- wants to ignore loopback devices. Give us a choice. This commit was SVN r23637.	2010-08-24 02:37:05 +00:00
Rainer Keller	14aad075eb	- On Jaguar, we don't have pretty printed stackframe, aka no opal_stackframe_output* This commit was SVN r23602.	2010-08-12 14:44:56 +00:00
Rolf vandeVaart	3d9b05ba2b	Fix bug introduced by r23463. We now handle positive error codes correctly again. Also fix a typo. Reviewed by Jeff Squyres. This commit was SVN r23531. The following SVN revision numbers were found above: r23463 --> open-mpi/ompi@2af3e6e5ae	2010-07-28 19:19:27 +00:00
Jeff Squyres	245dc1a86d	Add a cast to avoid a compiler warnings on BSD. This commit was SVN r23502.	2010-07-27 14:14:37 +00:00
Jeff Squyres	0ce1a82cde	This commit looks much bigger than it is. There are only 2 substantive changes in this commit; the rest are minor style changes: 1. Change an OBJ_NEW(opal_list_item_t) to OBJ_NEW(opal_if_t). This was causing memory corruption in the BSD code paths. 1. Move some local variables from the top of opal_if_init() to inside the non-BSD code paths so that we avoid bunches of warnings about unused variables when compiling on BSD. In doing so, I indented the whole non-BSD section one level deeper, making the commit look huge. I also added a few {} around 1-line blocks, added some spaces, broke a few lines, re-formatted a few comments, ...etc. Trivial stuff. This commit was SVN r23501.	2010-07-27 13:46:55 +00:00
Shiqing Fan	71d2749b6b	Fix a header problem on Windows. This commit was SVN r23483.	2010-07-23 07:52:34 +00:00
Jeff Squyres	29c1ad4196	Forgot BEGIN/END C_DECLS. This commit was SVN r23453.	2010-07-21 11:05:08 +00:00
Jeff Squyres	b3952e4f07	Use const for the opal_fd_write() function, just to be nice. This commit was SVN r23452.	2010-07-21 11:01:16 +00:00
Jeff Squyres	ab5fc1b570	Add trivial functions to loop over read()'ing and write()'ing with a file descriptor (i.e., read and write complete messages, transparently handling partial reads/writes, EAGAIN, and EINTR). This code effectively already exists in a few places in the code base; this is mainly a consolidation. This commit was SVN r23450.	2010-07-20 19:53:49 +00:00
Ralph Castain	f325ac030a	Add a function to prepend a string to the beginning of an argv array - useful when building app_contexts from user input This commit was SVN r23303.	2010-06-24 15:52:36 +00:00
Shiqing Fan	43bd92272a	Remove an unnecessary inline definition, in order to solve the conflict of function exporting on Windows. This commit was SVN r23230.	2010-06-01 15:44:46 +00:00
Shiqing Fan	857f1669e2	Solve a few compilation problems on Windows. This commit was SVN r23193.	2010-05-21 14:30:15 +00:00
Abhishek Kulkarni	0b3e5f5d79	Silence a opal_sos compiler warning. This commit was SVN r23163.	2010-05-17 23:14:44 +00:00
Abhishek Kulkarni	afbe3e99c6	* Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with (OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns back the native error code. * Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to decode 'ret' to get the native error code. This commit was SVN r23162.	2010-05-17 23:08:56 +00:00
Abhishek Kulkarni	b0e963299a	Adding a new function to return the stack trace (not including the call to the function itself) as a string (which must be freed by the caller). This commit was SVN r23160.	2010-05-17 22:57:42 +00:00
Abhishek Kulkarni	5e05546194	Adding SOS headers and package data to the Makefile. This commit was SVN r23159.	2010-05-17 22:53:33 +00:00
Abhishek Kulkarni	4e33e6aeaa	Merge OPAL SOS into the trunk. The OPAL SOS framework tries to meet the following objectives: * reduce the cascading error messages and the amount of code needed to print an error message. * build and aggregate stacks of encountered errors and associate related individual errors with each other. * allow registration of custom callbacks to intercept error events. For more information, refer to https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages This commit was SVN r23158.	2010-05-17 22:51:52 +00:00
Jeff Squyres	8a85c4617f	Fixes trac:2366: dragonboy noticed that the PGI compiler is picky about #if directives -- had to change a pair of #if conditionals in opal/util/stacktrace.c to make the PGI compiler accept it. This commit was SVN r22923. The following Trac tickets were found above: Ticket 2366 --> https://svn.open-mpi.org/trac/ompi/ticket/2366	2010-04-01 17:04:06 +00:00
Jeff Squyres	c26dae01ce	Update the if.c code to properly use the OBJ_* system. This commit was SVN r22869.	2010-03-23 20:37:06 +00:00
Rainer Keller	814fb9399f	- Further patches for support on NetBSD (and DragonFly) by Aleksej Saushev. Dont use bash or bashism in shell scripts We should use Posix' setpgid(0,0), which is equivalent to setpgrp(). This commit was SVN r22829.	2010-03-15 05:33:42 +00:00
Josh Hursey	e9b5162d79	Fix the configure logic for --with-ft so that it properly takes a comma separated list. Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those. The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed. This commit was SVN r22824.	2010-03-12 23:57:50 +00:00
Ralph Castain	8c7f3a0c44	Silence warnings by correctly identifying when we are on a Mac This commit was SVN r22724.	2010-02-27 08:15:49 +00:00
Iain Bason	7445b23e0d	Fixed a minor typo. This commit was SVN r22706.	2010-02-24 19:05:19 +00:00
Terry Dontje	cfe37fb5a1	Fixed issue with detecting root dir and used appropriate defines for solaris detection This commit was SVN r22686.	2010-02-23 15:58:49 +00:00
Rainer Keller	a46cecf4f2	- Use strrchr instead of loop for '/' as Nysal suggests. This commit was SVN r22649.	2010-02-17 23:40:08 +00:00

... 2 3 4 5 6 ...

666 Коммитов