1
1
Граф коммитов

20 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
9ea2b196ce Convert the opal_event framework to use direct function calls instead of hiding functions behind function pointers. Eliminate the opal_object_t abstraction of libevent's event struct so it can be directly passed to the libevent functions.
Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component.

Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work.

This commit was SVN r23966.
2010-10-28 15:22:46 +00:00
Ralph Castain
86c7365e8e Clean up a few initialization issues - don't think these are impacting the shared memory situation as it didn't fix the problem.
Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.

This commit was SVN r23943.
2010-10-26 02:41:42 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Rainer Keller
97511912ec - Fixup several functions, that cannot return
- Add one instance where we do not use a parameter in a function
 - Fix a buglet in commit r23689, where the attribute-for-function ptrs
   was applied.

This commit was SVN r23690.

The following SVN revision numbers were found above:
  r23689 --> open-mpi/ompi@5eb571c458
2010-08-31 12:21:13 +00:00
Jeff Squyres
64cb8f5d7f Another round of man page cleanups from Debian mantainer Manuel
Prinz.  Many thanks!

This commit was SVN r23445.
2010-07-20 14:07:18 +00:00
Abhishek Kulkarni
afbe3e99c6 * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
 SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
 back the native error code.

* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
  (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
  decode 'ret' to get the native error code.

This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
Josh Hursey
e9b5162d79 Fix the configure logic for --with-ft so that it properly takes a comma separated list.
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.

The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.

This commit was SVN r22824.
2010-03-12 23:57:50 +00:00
Jeff Squyres
16b100219d A patch from UTK to allow orte_init(), opal_init(), and associated
friends also receive &argc and &argv (George asked Jeff to Ralph to
review before committing).  The thought is that passing argv and argc
to opal/orte_init be useful to other projects outside of OMPI that are
using OPAL and/or ORTE (especially in conjunction with some other
bootstrapping code where it is helpful to modify argv).  It's such a
small thing that it's easy to apply here to make others' lives a
little easier.

Ask George for more details; I'm just the messenger.  :-)

Judging by the copyrights on this patch, it's been around for a
while.  :-)

This commit was SVN r22260.
2009-12-04 00:51:15 +00:00
Ralph Castain
4be24521aa Modify the orte_process_info structure to handle a broader range of process types by replacing the individual booleans with a 32-bit bitmap. Use a set of #define's to define the individual bits, and a set of matching macros to test for them. Update the orte code base to use the macros instead of the booleans.
Minor mod to the ompi layer to use the new #define's - just one-line name replacements.

This commit was SVN r21144.
2009-05-04 11:07:40 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Terry Dontje
4b43911c6a Remove superfluous spaces in manpages that were causing catman to
generate mangled windex files.  Made ompi-top.1 and ompi-iof.1 build
by default.  Also, added the orte-top synonym to the ompi-top manpage.

This commit was SVN r20915.
2009-04-01 14:40:27 +00:00
Rainer Keller
d8cf4c0fec - Get pgcc on XT to complain less:
In case we use memcmp, strlen, strup and friends include <string.h>
   Also several constants.h are not included directly
 - Let's have mca_topo_base_cart_create  return ompi-errors in
   ompi/mca/topo/base/topo_base_cart_create.c

This commit was SVN r20773.
2009-03-13 02:10:32 +00:00
Rainer Keller
b356e90fa1 - Get rid of include orte/util/proc_info.h, if not needed
Only proc_info.h-internal include file is opal/dss/dss_types.h
 - In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   works fine, no errors.

   Again, let's have MTT the last word.

This commit was SVN r20631.
2009-02-25 03:38:00 +00:00
Ralph Castain
0597fdd778 Ensure that orte-iof barks when given an unrecognized cmd line option
This commit was SVN r20397.
2009-02-02 14:10:54 +00:00
Ralph Castain
5e6d3ba289 Initial implementation of static ports. Provide an mca param to specify static port ranges to the OOB - can provide an
y combination of comma-separated values and ranges. Daemons will use the first port in the range, MPI procs will use the other ports in the range assuming that they know their node rank in time and enough ports were specified.

NOTE: this capability only works under specific conditions. I will outline more about this in a note to devel as the remainder of the implementation progresses. For now, the only environment where this works is slurm. The linear routed module has also been adjusted to work with static ports so that all messaging flows strictly through the topology, including the initial daemon callback - thus limiting the number of sockets opened by mpirun.

This commit was SVN r20390.
2009-01-30 18:31:43 +00:00
Ralph Castain
fd5e15ea58 Since parsing comma-delimited, range-capable options is being used in multiple places, create a new utility that consolidates that code.
Have orte-iof use it.

This commit was SVN r20346.
2009-01-25 17:16:25 +00:00
Ralph Castain
7e3ddb09d3 As requested by Aurelien at the July design meeting - long time coming, but finally got around to it.
Enable one mpirun to act as the server for another mpirun when doing MPI_Publish_name and its associated operations. The user is responsible, of course, for ensuring that the mpirun acting as a server outlives any mpiruns using it in that capacity.

Add a cmd line option to mpirun --report-pid that prints out mpirun's pid. Allow the --ompi-server option to now take pid:# (or PID:#) of the mpirun to be used as the server, and then look that pid up by searching the local mpirun contact infos for it.

This commit was SVN r20102.
2008-12-10 17:10:39 +00:00
Ralph Castain
03257ebb9f Since orte-iof doesn't yet know how to handle multiple ranks, adjust the help message and the code to correctly handle one specified rank
This commit was SVN r19886.
2008-11-03 14:26:01 +00:00
Brad Penoff
d7b0fdfe5c small fix to compile trunk on FreeBSD 7
This commit was SVN r19817.
2008-10-28 03:44:23 +00:00
Ralph Castain
6e5d844c36 Roll in the revamped IOF subsystem. Per the devel mailing list email, this is a complete rewrite of the iof framework designed to simplify the code for maintainability, and to support features we had planned to do, but were too difficult to implement in the old code. Specifically, the new code:
1. completely and cleanly separates responsibilities between the HNP, orted, and tool components.

2. removes all wireup messaging during launch and shutdown.

3. maintains flow control for stdin to avoid large-scale consumption of memory by orteds when large input files are forwarded. This is done using an xon/xoff protocol.

4. enables specification of stdin recipients on the mpirun cmd line. Allowed options include rank, "all", or "none". Default is rank 0.

5. creates a new MPI_Info key "ompi_stdin_target" that supports the above options for child jobs. Default is "none".

6. adds a new tool "orte-iof" that can connect to a running mpirun and display the output. Cmd line options allow selection of any combination of stdout, stderr, and stddiag. Default is stdout.

7. adds a new mpirun and orte-iof cmd line option "tag-output" that will tag each line of output with process name and stream ident. For example, "[1,0]<stdout>this is output"

This is not intended for the 1.3 release as it is a major change requiring considerable soak time.

This commit was SVN r19767.
2008-10-18 00:00:49 +00:00