1
1

340 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
6d24b34940 Extend the dpm framework API to support persistent accept/connect operations:
* paccept - establish a persistent listening port for async connect requests

* pconnect - async connect to remote process that has posted a paccept port. Provides a timeout mechanism, and allows the underlying implementation to retry until timeout 

* pclose - shuts down a prior paccept posting

Includes example programs paccept.c and pconnect.c in orte/test/mpi. New MPI extension interfaces coming...

This commit was SVN r29063.
2013-08-23 18:02:50 +00:00
Ralph Castain
a200e4f865 As per the RFC, bring in the ORTE async progress code and the rewrite of OOB:
*** THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE ***

Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro.

***************************************************************************************

I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week.

The code is in  https://bitbucket.org/rhc/ompi-oob2


WHAT:    Rewrite of ORTE OOB

WHY:       Support asynchronous progress and a host of other features

WHEN:    Wed, August 21

SYNOPSIS:
The current OOB has served us well, but a number of limitations have been identified over the years. Specifically:

* it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code)

* we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface.

* the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients

* there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort

* only one transport (i.e., component) can be "active"


The revised OOB resolves these problems:

* async progress is used for all application processes, with the progress thread blocking in the event library

* each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on")

* multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC.

* a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions.

* opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object

* NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions

* obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel

* the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport

* routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active

* all blocking send/recv APIs have been removed. Everything operates asynchronously.


KNOWN LIMITATIONS:

* although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline

* the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker

* routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways

* obviously, not every error path has been tested nor necessarily covered

* determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when *all* transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost.

* reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways

* the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC

This commit was SVN r29058.
2013-08-22 16:37:40 +00:00
Jeff Squyres
089c632cce Remove a bunch of dead code: gcc 4.7 warns of set-but-unused
variables.  So get rid of them.

This commit was SVN r28538.
2013-05-17 21:45:49 +00:00
Ralph Castain
04b11accd3 Silience a few warnings
This commit was SVN r28515.
2013-05-14 21:58:40 +00:00
Nathan Hjelm
c041156f60 Update ORTE frameworks to use the MCA framework system.
This commit was SVN r28240.
2013-03-27 21:14:43 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Ralph Castain
147c6ff9e7 Clean out the cruft leftover from the use_common_ports experiment
cmr:v1.7

This commit was SVN r28184.
2013-03-20 15:07:43 +00:00
Ralph Castain
a4b6fb241f Remove all remaining vestiges of the Windows integration
This commit was SVN r28137.
2013-02-28 17:31:47 +00:00
Ralph Castain
cf9796accd Remove the old configure option for disabling full rte support - we now use the OMPI rte framework for such purposes
This commit was SVN r28134.
2013-02-28 01:35:55 +00:00
Ralph Castain
8d2fa3693b First cut at removing the native Windows support. Remove all the Windows-specific components, and the .windows files sprinkled around. Remove the Windows platform files and MTT scripts. Update the NEWS to point Windows users to the cygwin package.
This commit was SVN r28116.
2013-02-26 20:44:56 +00:00
Ralph Castain
8e8e95ca6b Silence error report - just because someone only defines ipv4 static ports doesn't make a fatal error
This commit was SVN r27976.
2013-01-29 23:48:22 +00:00
Ralph Castain
b403ca5bd8 Silence warning
This commit was SVN r27897.
2013-01-23 22:17:08 +00:00
Ralph Castain
82f1ba0ea8 Fix static port usage, ensure that both ipv4 and ipv6 are given if ipv6 was enabled
This commit was SVN r27682.
2012-12-18 03:59:49 +00:00
Nathan Hjelm
bdedd8b0d3 Per RFC modify the behavior of mca_base_components_close to NOT close the output. Modify frameworks to always close their output and set to -1.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.

This commit was SVN r27570.
2012-11-06 19:09:26 +00:00
Jeff Squyres
c8cee23ee7 Priorities really shouldn't be less than 0.
This commit was SVN r27098.
2012-08-21 15:47:15 +00:00
Ralph Castain
dacb07000d Turn udcm and ud oob off by default, but allow them to build and be used if someone wants to test them
cmr:v1.7

This commit was SVN r27097.
2012-08-21 15:18:34 +00:00
Nathan Hjelm
4557e15c18 oob/ud fix compile error
This commit was SVN r26933.
2012-07-31 21:50:34 +00:00
Jeff Squyres
88cbe9c780 .ompi_ignore this component until it can be fixed.
This commit was SVN r26930.
2012-07-31 21:02:06 +00:00
Nathan Hjelm
980692804d oob/ud: don't start listening for ud requests unless we have one usable port
This commit was SVN r26929.
2012-07-31 19:00:18 +00:00
Ralph Castain
23c2a315a9 Add missing line to set flag indicating at least one port found
This commit was SVN r26914.
2012-07-30 17:54:38 +00:00
George Bosilca
772ec212eb Fix another compiler warning.
This commit was SVN r26775.
2012-07-10 15:57:42 +00:00
George Bosilca
ec760454a6 Cleaning ...
This commit was SVN r26747.
2012-07-04 21:22:13 +00:00
Ralph Castain
6ae5776904 Cleanup IPV6 build
This commit was SVN r26738.
2012-07-04 00:03:50 +00:00
Ralph Castain
0dfe29b1a6 Roll in the rest of the modex change. Eliminate all non-modex API access of RTE info from the MPI layer - in some cases, the info was already present (either in the ompi_proc_t or in the orte_process_info struct) and no call was necessary. This removes all calls to orte_ess from the MPI layer. Calls to orte_grpcomm remain required.
Update all the orte ess components to remove their associated APIs for retrieving proc data. Update the grpcomm API to reflect transfer of set/get modex info to the db framework.

Note that this doesn't recreate the old GPR. This is strictly a local db storage that may (at some point) obtain any missing data from the local daemon as part of an async methodology. The framework allows us to experiment with such methods without perturbing the default one.

This commit was SVN r26678.
2012-06-27 14:53:55 +00:00
Jeff Squyres
148ae6d6e3 This commit unifies the configury of some verbs-lovin' components.
* Add new configure command line options and deprecate some old ones:
   * --with-verbs replaces --with-openib
   * --with-verbs-libdir replaces --with-openib-libdir
 * If you specify --with-openib[-libdir] without
   --with-verbs[-libdir], you'll get a "these options have been
   deprecated!" warning, but then they'll act just like
   --with-verbs[--libdir]. 

  '''Sidenote:''' Note that we are not renaming any components at this
  time, nor are we renaming the top-level OMPI_CHECK_OPENIB m4 macro
  (which is pretty strongly tied to the openib BTL and is bastaridzed
  by the ofud BTL).  Note that there will likely be more changes in
  this area coming soon (next week?) when some long-standing changes
  move to the SVN trunk: some openib BTL infrastructure will move to
  ompi/mca/common, and its configury gets split up / refactored.

We extend our philosophy of other --with-<foo> configure options of
--with-verbs to ''all'' verbs-lovin components:

 * If you specify --with-verbs, then all verbs-lovin' components must
   configure successfully (or abort).  This currently means: OOB ud,
   BTL ofud, BTL openib.
 * If you specify --with-verbs=DIR, then all verbs-lovin' component
   must configure successfully (or abort), and will use DIR to find
   verbs headers and libraries.
 * If you specify --without-verbs, then all verbs-lovin' components
   will be ignored.

This commit also fixes a problem where the --with-openib=DIR form
would not use DIR for ''all'' verbs-lovin' components (I think only
BTL openib and BTL ofud used that DIR).  Now all of them do, as does
hwloc (because hwloc has some !OpenFabrics helper functions that
require ibv types from verbs.h).

There's a little new m4 infrastructure worth mentioning:

 * If you create a new verbs-lovin' component (i.e., a component that
   need verbs), your configure.m4 should
   AC_REQUIRE([OPAL_CHECK_VERBS_DIR]). 
 * You can then use three global shell variables: $opal_want_verbs,
   $opal_verbs_dir, $opal_verbs_libdir, which will be set as follows:
   * opal_want_verbs will be "yes" and opal_verbs_dir and
     opal_verbs_libdir will both be set to directory values, '''OR'''
   * opal_want_verbs will be "no" and opal_verbs_dir and
     opal_verbs_libdir will both be set empty

This commit was SVN r26640.
2012-06-22 19:53:56 +00:00
Ralph Castain
e6f3586415 Remove the orte notifier framework, per discussion at the devel meeting and follow-up with Jeff (who took the action item)
This commit was SVN r26637.
2012-06-22 18:09:23 +00:00
Ralph Castain
96c778656a Improve launch performance on clusters that use dedicated nodes by instructing the orteds to use the same port as the HNP, thus allowing them to "rollup" their initial callback via the routed network. This substantially reduces the HNP bottleneck and the number of ports opened by the HNP.
Restore enable-static-ports option by default - the Cray will have to disable it to get around their library issues, but that's just a warning problem as opposed to blocking the build.

This commit was SVN r26606.
2012-06-15 10:15:07 +00:00
Ralph Castain
269cb2b8d9 Some cleanup to remove calls to opal_progress when running with orte progress threads, and to ensure that all orte-related events are in the orte event base.
This commit was SVN r26591.
2012-06-11 19:59:53 +00:00
Ralph Castain
2812579246 Just because we find an IB device does not mean we can get a QP on it. Check to see if we can before we select the UD OOB module for use.
This commit was SVN r26587.
2012-06-10 01:42:51 +00:00
Ralph Castain
0442a807c0 Default the OOB to the "ud" component IFF the HNP finds itself on a node with a supported Infiniband device. Ensure that the daemons all pick the matching component by dictating the selection via mca param on the orted cmd line.
This commit was SVN r26582.
2012-06-08 01:23:08 +00:00
Nathan Hjelm
625c8078c3 oob/ud: fix typo
This commit was SVN r26569.
2012-06-07 19:21:23 +00:00
Jeff Squyres
99c5afb397 Remove clang compiler warnings.
This commit was SVN r26523.
2012-05-29 23:36:06 +00:00
Ralph Castain
7fb49b1559 Silence warning
This commit was SVN r26480.
2012-05-23 13:59:41 +00:00
Nathan Hjelm
6eeca66475 add an option to enable static ports. diabled by default
This commit was SVN r26462.
2012-05-21 19:56:15 +00:00
Ralph Castain
83d69b6c95 Enable the ORTE progress thread for apps (not needed in the tools as they already continuously loop in the event lib). This appears to be working, at least for MPI apps that only use shared memory (a simple "hello"). More testing is required to identify where problems will occur - this is only intended to allow further development.
In order to use the progress thread, you must configure with:

--enable-orte-progress-threads --enable-event-thread-support

This commit was SVN r26457.
2012-05-20 15:14:43 +00:00
Jeff Squyres
46f47e08b6 Remove typo/extra brackets and parens.
This commit was SVN r26351.
2012-04-27 13:48:43 +00:00
Jeff Squyres
9d0df5a9a6 Update configury in the new oob ud component: actually check to see if
it succeeds and run $1 or $2, accordingly.  This allows "make dist" to
run properly on machines that do not have OpenFabrics stuff installed
(e.g., the nightly tarball build machine).

There's still more to be done here -- it doesn't check for non-uniform
directories where the OpenFabrics headers/libraries might be
installed.  We might need to re-tool/combine
ompi/config/ompi_check_openib.m4 (which checks for way more than
oob/ud needs) and move it up to config/ompi_check_ofa.m4, or
something...?

This commit was SVN r26350.
2012-04-27 11:32:56 +00:00
Jeff Squyres
9829d2279f System-level includes should be at the top of the file, before most
OPAL/ORTE/OMPI includes.

This commit was SVN r26349.
2012-04-27 11:29:22 +00:00
Nathan Hjelm
e1e0d466e5 Merge ssh://ct-fe1/usr/projects/hpctools/hjelmn/ompi-trunk-git into HEAD
This commit was SVN r26344.
2012-04-26 22:06:12 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Jeff Squyres
cdc783925e (Re-)Add oob_tcp_if_(in|ex)clude functionality to allow CIDR notation,
just like the btl_tcp_if_(in|ex)clude MCA param.

This commit was SVN r25953.
2012-02-17 15:38:42 +00:00
Jeff Squyres
3e22450345 Fix the oob_tcp_verbose MCA param; make it actually apply to the OOB
TCP verbose handle (not the generic/0 handle).

This commit was SVN r25942.
2012-02-16 22:28:11 +00:00
Ralph Castain
9b59d8de6f This is actually a much smaller commit than it appears at first glance - it just touches a lot of files. The --without-rte-support configuration option has never really been implemented completely. The option caused various objects not to be defined and conditionally compiled some base functions, but did nothing to prevent build of the component libraries. Unfortunately, since many of those components use objects covered by the option, it caused builds to break if those components were allowed to build.
Brian dealt with this in the past by creating platform files and using "no-build" to block the components. This was clunky, but acceptable when only one organization was using that option. However, that number has now expanded to at least two more locations.

Accordingly, make --without-rte-support actually work by adding appropriate configury to prevent components from building when they shouldn't. While doing so, remove two frameworks (db and rmcast) that are no longer used as ORCM comes to a close (besides, they belonged in ORCM now anyway). Do some minor cleanups along the way.

This commit was SVN r25497.
2011-11-22 21:24:35 +00:00
George Bosilca
1000af1c48 No need to abort there, returning an error trigger the
abort at the upper level.

This commit was SVN r25494.
2011-11-18 19:07:26 +00:00
Wesley Bland
4e7ff0bd5e By popular demand the epoch code is now disabled by default.
To enable the epochs and the resilient orte code, use the configure flag:

--enable-resilient-orte

This will define both:

ORTE_ENABLE_EPOCH
ORTE_RESIL_ORTE

This commit was SVN r25093.
2011-08-26 22:16:14 +00:00
Jeff Squyres
1cbfb53801 r24976 wasn't quite right -- you now actually get a warning if you
specify btl_tcp_if_include because btl_tcp_if_exclude is defaulted to
the loopback devices.

This commit does a few things:

 * Introduce a new OPAL MCA base function:
   mca_base_param_check_exclusive_string().  It checks to see that the
   ''user'' does not set two MCA parameters that are mutually
   exclusive by checking the source of those MCS param values.
 * Use the above function in many BTLs (and the OOB TCP) to ensure
   that <foo>_if_include and <foo>_if_exclude are not both specified
   ''by the user''.
 * Re-arrange many of these BTLs to move their MCA registration code
   into a separate component_register() function (vs. the
   component_open() function).

This code has been nominally reviewed and checked by Ralph, George,
Terry, and Shiqing.

This commit was SVN r25043.

The following SVN revision numbers were found above:
  r24976 --> open-mpi/ompi@8f4ac54336
2011-08-10 17:24:36 +00:00
Ralph Castain
1ee7c39982 Fix some major bit-rot on scalable launch. If static ports are provided, then daemons can connect back to the HNP via the routed connection tree instead of doing so directly. In order to do that at scale, the node list must be passed as a regular expression - otherwise, the orted command line gets too long.
Over the course of time, usage of static ports got corrupted in several places, the "parent" info got incorrectly reset, etc. So correct all that and get the regex-based wireup going again.

Also, don't pass node lists if static ports aren't enabled - they are of no value to the orted and just create the possibility of overly-long cmd lines.

This commit was SVN r24860.
2011-07-07 18:54:30 +00:00
Wesley Bland
e1ba09ad51 Add a resilience to ORTE. Allows the runtime to continue after a process (or
ORTED) failure. Note that more work will be necessary to allow the MPI layer to
take advantage of this.

Per RFC:
http://www.open-mpi.org/community/lists/devel/2011/06/9299.php

This commit was SVN r24815.
2011-06-23 20:38:02 +00:00
Josh Hursey
20339a7900 Minor coding style and intentation fixes.
This commit was SVN r24764.
2011-06-09 14:16:06 +00:00
Ralph Castain
f3cae3d6f3 Cleanup the handling of if_include and if_exclude arguments based on CIDR notation.
Fix a bug in the new code that prevented the system from correctly matching addresses.

Remove comments in the show-help text indicating that we would continue in the face of incorrect specifications - leave that to the calling layer to decide.

Modify the new opal_ifmatches so it returns error codes letting the caller better understand the result.

Modify the oob to ensure we abort if we don't find interfaces matching specified constraints, and that we do so without multiple error messages.

NOTE: we have a conflict in our standards. We have been using comma-delimited lists of interfaces for all our params. However, one param - opal_net_private_ipv4 - now uses semicolons instead of comma separators. No idea why, but it is confusing.

This commit was SVN r24755.
2011-06-07 02:09:11 +00:00