openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	9950471df7	Fixes for opal_path_nfs(): * Fix some typos in macro names. * Add case for OS's that have statfs() but no struct statfs (!). * Add case for NetBSD with struct statvfs.f_fstypename. Many thanks to Paul Hargrove who developed the majority of this patch. Reviewed by Jeff Squyres. cmr=v1.7.4:reviewer=ompi-rm1.7 This commit was SVN r30255.	2014-01-11 01:07:10 +00:00
Jeff Squyres	023c50e864	Fix typo in macro name (#$%@#$% defined-or-not macros!!) Refs trac:4079 This commit was SVN r30206. The following Trac tickets were found above: Ticket 4079 --> https://svn.open-mpi.org/trac/ompi/ticket/4079	2014-01-09 23:47:36 +00:00
Jeff Squyres	c67c8e8187	Make the use of statfs()/statvfs() be more robust. As noted by Paul Hargrove, the #if's surrounding the use of statfs() and statvfs() in opal/util/path.c have apparently gotten stale (e.g., modern flavors of BSD OSs no longer define __BSD). Changes: Add statfs and statvfs to the AC_CHECK_FUNCS in configure.ac * Add a sanity check to ensure that we have at least one of statfs() or statvfs(). Add a similar sanity check in opal/util/path.c, just as defensive programming. * Use AC_CHECK_MEMBERS in configure.ac to check for specific struct statfs/struct statvfs members that we use in opal/util/path.c * In path.c, add some #includes as listed on the OS man page for statfs(2) (OS X 10.8.5/Mountain Lion) * The previous code used statvfs() on Solaris and statfs() everywhere else. Attempting to replicate this with behavior-based configure testing led to fairly complicted if/else logic, so the new code uses whichever of the two are available (i.e., it might actually use both -- OS X 10.8.5 and RHEL 6.5 have both statfs() and statvfs()). The rationale here is that we don't really care which of the two functions report the answer; we'll take the answer regardless of where it comes from. For example, if one function returns a failure and the other does not, we'll use the results from the successful function and ignore the failed one. This new code seems to work on OS X and Linux. We'll have to see what happens with MTT and future Paul Hargrove testing... cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Make statfs/statvfs more robust This commit was SVN r30198.	2014-01-09 21:28:52 +00:00
Ralph Castain	2b92fccfd1	Looks like this code was intended to separate Sun's vfs struct from everyone else's, yet the #elif can make it fail on some systems that actually support the capability. So just make it an #else to cover the range of systems we now support and move on. cmr=v1.7.4:reviewer=jsquyres:subject=correct opal_path_df logic This commit was SVN r30172.	2014-01-09 04:10:26 +00:00
Jeff Squyres	13b29cff2c	This commit compliements/completes r30140. r30140 made all the configury/Makefile.am changes; this commit renames the internal installdirs.h framework struct field names to match the configry macro names: * pkgdatdir -> ompidatadir * pkglibdir -> ompilibdir * pkgincludedir -> ompiincludedir This commit was SVN r30145. The following SVN revision numbers were found above: r30140 --> open-mpi/ompi@8b778903d8	2014-01-07 23:36:33 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
George Bosilca	24879f9def	Code cleanup while chasing valgrind complaints. This commit was SVN r30048.	2013-12-21 23:28:14 +00:00
Ralph Castain	7cf0fc5578	One more round of sys_limit fixes...sigh Refs trac:4010 This commit was SVN r30011. The following Trac tickets were found above: Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010	2013-12-20 14:44:51 +00:00
Ralph Castain	e49c16b975	Grrr....use #if instead of #ifdef Refs trac:4010 This commit was SVN r30010. The following Trac tickets were found above: Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010	2013-12-20 14:24:26 +00:00
Ralph Castain	6e6351959d	Check for all the RLIMIT_foo constants that we use, and update the limit checks to use the new #define values. Fix a bug where failure of some might lead to incorrect bracketing. Refs trac:4010 This commit was SVN r30009. The following Trac tickets were found above: Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010	2013-12-20 14:09:43 +00:00
Jeff Squyres	090ce4187a	Fix compiler errors on Solaris, NetBSD, and OpenBSD: * Per http://www.open-mpi.org/community/lists/devel/2013/12/13504.php, protect usage of struct ifreq->ifr_hwaddr * Per http://www.open-mpi.org/community/lists/devel/2013/12/13503.php, avoid #define conflict with the token "if_mtu" * Also fix some whitespace and string naming issues in opal/util/if.c Tested by Paul Hargrove. Refs trac:4010 This commit was SVN r30006. The following Trac tickets were found above: Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010	2013-12-20 11:17:30 +00:00
Ralph Castain	f15b0c9863	Add protections around the various system limits to protect code on unusual systems Thanks to Paul Hargrove for reporting it on OpenBSD-5 cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30003.	2013-12-20 03:18:07 +00:00
Ralph Castain	79af9825ac	Update of patch from Takahiro Kawashima Refs trac:3986 This commit was SVN r29984. The following Trac tickets were found above: Ticket 3986 --> https://svn.open-mpi.org/trac/ompi/ticket/3986	2013-12-19 17:22:37 +00:00
Jeff Squyres	42e3e5cd4b	Fixes trac:3990: ensure we don't SIGBUS on SPARC by forcing a memory copy and preventing access to potentially unaligned data. Reviewed by Dave Goodell. Tested by Siegmarr Gross. cmr=v1.7.4:reviewer=ompi-rm1.7:subject=fix SPARC SIGBUS in opal net code This commit was SVN r29983. The following Trac tickets were found above: Ticket 3990 --> https://svn.open-mpi.org/trac/ompi/ticket/3990	2013-12-19 16:51:34 +00:00
Ralph Castain	77553f72be	Per this email thread: http://www.open-mpi.org/community/lists/devel/2013/12/13412.php fix the backtrace function to avoid async issues. Thanks to Takahiro Kawashima for the patch This commit was SVN r29955.	2013-12-18 17:57:37 +00:00
Jeff Squyres	0ab48ad0d2	Fix some annoying flex warnings that have been there for years. Many thanks to Tom Fogal for the initial patch. cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings This commit was SVN r29904.	2013-12-14 00:36:12 +00:00
Jeff Squyres	ad51705891	Fix compiler warnings about signed/unsigned comparisons Change static opal_setlimit() function to return its value in an OUT parameter and return the usual int error code indicating success or failure. The OUT param and return code need to be separated because the OUT param is an unsigned type, but opal_setlimit() was returning -1 upon failure. Hence, the caller could not know that it had failed because the return type was previously an unsigned type. cmr=v1.7.4:reviewer=rhc:subject=Fix opal sys_limits.c signed/unsigned warnings This commit was SVN r29685.	2013-11-13 15:40:34 +00:00
Ralph Castain	8c5c7d0db4	Correct a bug in handling of oob_tcp_if_include/exclude addresses by using the kernel index instead of the raw index of the interface. Refs trac:3696 This commit was SVN r29522. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-10-26 00:47:14 +00:00
Nathan Hjelm	50b4b92758	hostname may not NULL-terminate the string if the buffer is too small. Thanks to Kevin M. Hildebrand for catching this. cmr=v1.7.3:reviewer=jsquyres This commit was SVN r29412.	2013-10-09 15:49:18 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Ralph Castain	10ca1c1b04	Turns out that there was exactly ONE place in all of the OMPI code base that still referred to OPAL_TRACE, though a few places retained the include file for no reason. So no point in letting this sit as it is clearly an unused "feature". This commit was SVN r28789.	2013-07-14 18:57:20 +00:00
Ralph Castain	bd65937bf3	If we enable ipv6, we resolve a hosts addresses and check them all against our local interfaces to determine if the given host is us. However, if we don't enable ipv6, we only checked the first address returned. This can cause us to incorrectly identify a hostname as "not us". Make -disable-ipv6 behave the same as --enable-ipv6 by checking all the returned addresses. This commit was SVN r28716.	2013-07-03 21:41:36 +00:00
Jeff Squyres	089c632cce	Remove a bunch of dead code: gcc 4.7 warns of set-but-unused variables. So get rid of them. This commit was SVN r28538.	2013-05-17 21:45:49 +00:00
Ralph Castain	1ec13d530c	Allow simple way to request comparison to full address regardless of addr family This commit was SVN r28519.	2013-05-14 22:08:39 +00:00
Ralph Castain	eb2edb4b2b	Silence warning This commit was SVN r28516.	2013-05-14 22:00:01 +00:00
Ralph Castain	37088f23d8	When ipv6 disabled, we still have getaddrinfo, so use it when checking common networks for resolving to kindex This commit was SVN r28496.	2013-05-14 15:54:46 +00:00
Ralph Castain	3fc1bafd82	fix typo This commit was SVN r28490.	2013-05-14 12:36:45 +00:00
Ralph Castain	f4f07bdb21	Ensure the opal_ifaddrtokindex function considers the full range of address space by using the netmask This commit was SVN r28487.	2013-05-14 03:37:44 +00:00
Ralph Castain	b73f25e839	Add a function to return the kernel index of the corresponding interface from an IPv4/6 string or hostname This commit was SVN r28397.	2013-04-25 19:40:34 +00:00
Ralph Castain	cef639f578	Ahem....cleanup a copy/paste error in naming of these functions This commit was SVN r28395.	2013-04-25 15:21:53 +00:00
Jeff Squyres	c722440411	Add public functions for retrieving the MAC and MTU (paired with r28344). This commit was SVN r28345. The following SVN revision numbers were found above: r28344 --> open-mpi/ompi@e88881c25f	2013-04-17 22:32:32 +00:00
Ralph Castain	1f011bef99	Cleanup the updated sys limits capability. Fix a few copy/paste bugs (my bad). Shift the limit set to the ODLS default module so that we sete the limits for all apps, even those that don't call opal_init. Leave it in opal_init as well to support direct-launch apps, but ensure we only set the limits once by removing the envar after launch by ODLS. Provide some nice error messages if we fail to set the limits. Since the user had to specifically request we set the limit, treat failure as an error-out situation. This commit was SVN r28288.	2013-04-04 16:00:17 +00:00
Ralph Castain	d09a9e8096	Upgrade the system limit code to support a broader range of parameters. For now, we support stack size, #open files, #children, and file size we can c reate. Continue to support the old "1" or "0" options for backward compatibility. This commit was SVN r28282.	2013-04-03 18:57:53 +00:00
Nathan Hjelm	365cf48db5	Update OPAL frameworks to use the MCA framework system. This commit was SVN r28239.	2013-03-27 21:11:47 +00:00
Nathan Hjelm	cf377db823	MCA/base: Add new MCA variable system Features: - Support for an override parameter file (openmpi-mca-param-override.conf). Variable values in this file can not be overridden by any file or environment value. - Support for boolean, unsigned, and unsigned long long variables. - Support for true/false values. - Support for enumerations on integer variables. - Support for MPIT scope, verbosity, and binding. - Support for command line source. - Support for setting variable source via the environment using OMPI_MCA_SOURCE_<var name>=source (either command or file:filename) - Cleaner API. - Support for variable groups (equivalent to MPIT categories). Notes: - Variables must be created with a backing store (char *, int , or bool *) that must live at least as long as the variable. - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of mca_base_var_set_value() to change the value. - String values are duplicated when the variable is registered. It is up to the caller to free the original value if necessary. The new value will be freed by the mca_base_var system and must not be freed by the user. - Variables with constant scope may not be settable. - Variable groups (and all associated variables) are deregistered when the component is closed or the component repository item is freed. This prevents a segmentation fault from accessing a variable after its component is unloaded. - After some discussion we decided we should remove the automatic registration of component priority variables. Few component actually made use of this feature. - The enumerator interface was updated to be general enough to handle future uses of the interface. - The code to generate ompi_info output has been moved into the MCA variable system. See mca_base_var_dump(). opal: update core and components to mca_base_var system orte: update core and components to mca_base_var system ompi: update core and components to mca_base_var system This commit also modifies the rmaps framework. The following variables were moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode, rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables. This commit was SVN r28236.	2013-03-27 21:09:41 +00:00
George Bosilca	a856f926de	Remove a bunch of unused variables. This commit was SVN r28213.	2013-03-26 14:34:29 +00:00
Ralph Castain	b7f0e46319	Provide a nicer error message when someone gives a bad signal number to opal_signal cmr:v1.7.1 This commit was SVN r28188.	2013-03-20 15:30:59 +00:00
Jeff Squyres	7f34dc266b	Add missing unlocks. Fixes CID 967022 (which covers the unlock on line 627; there's probably another CID for the unlock added on line 537). This commit was SVN r28179.	2013-03-18 23:19:25 +00:00
Ralph Castain	a4b6fb241f	Remove all remaining vestiges of the Windows integration This commit was SVN r28137.	2013-02-28 17:31:47 +00:00
Ralph Castain	e71b40fdcb	If we are redirecting to files, ensure we don't create duplicate file descriptors for output streams going to the same file. If we do, then the output gets completely jumbled - best to avoid that problem. This commit was SVN r28136.	2013-02-28 17:21:53 +00:00
Brian Barrett	33cb4d21fe	Need to include libltdl's includes so that the lt wrappers can compile This commit was SVN r28042.	2013-02-12 00:41:03 +00:00
Rolf vandeVaart	6843f02b37	Add wrapper functions to LTDL functions so other parts of the library can access the LTDL functionality. Reviewed by jsquyres. This commit was SVN r28041.	2013-02-11 15:11:47 +00:00
Rolf vandeVaart	82fb093955	Revert changeset 28011. This can break the build on some systems. This commit was SVN r28017.	2013-02-01 20:41:47 +00:00
Rolf vandeVaart	79b623d7e3	Add wrapper interface to LTDL functions so that other parts of the library can access the LTDL functionality. Reviewed by jsquyres. This commit was SVN r28011.	2013-02-01 14:11:39 +00:00
Brian Barrett	29aaa21c5a	Fix some warnings when we don't have sockets or syslog This commit was SVN r27973.	2013-01-29 23:02:26 +00:00
Brian Barrett	fc3df11e08	Remove the (only two) fortran constants from OPAL. The only places that actually care if opal_pointer_array is limited to handle_max already passes that in as the max_size during init, so don't need it there. The arch constant was a bit more difficult, so pass that in during MPI init and leave empty otherwise. This is to help with the effort to allow building ompi against an external opal or orte. This commit was SVN r27817.	2013-01-15 01:27:36 +00:00
Nathan Hjelm	3e1b13b13a	Re-add support for old flex (2.5.4a and earlier) while still cleaning up properly in new flex. This commit was SVN r27657.	2012-12-07 00:12:43 +00:00
Ralph Castain	fdf7633cff	Per Jeff's suggestion, set the default answer when asking for IP aliases in case we don't find any This commit was SVN r27620.	2012-11-16 14:28:30 +00:00
Ralph Castain	a52071a17d	Add a function to return the aliases (based on IP addrs) for the current node This commit was SVN r27618.	2012-11-16 04:02:29 +00:00
Ralph Castain	f9f07e9535	Add a function to test if a string is in the form of an IP address - doesn't test for validity of the address This commit was SVN r27583.	2012-11-10 14:01:12 +00:00

1 2 3 4 5 ...

460 Коммитов