1
1
Граф коммитов

66 Коммитов

Автор SHA1 Сообщение Дата
Josh Hursey
1e678c3f55 per conversation with Ralph and Jeff take out the opal_init_only logic.
This commit moves the initalization/finalization of opal_event and opal_progress
to opal_init/finalize. These were previously init/final in ORTE which is an
abstraction violation. After talking about it we concluded that there are no
ordering issues that require these to be init/final in ORTE instead of OPAL.

I ran the IBM test suite against this commit and it didn't turn up any new
failures so I think it is good to go.

Let us know if this causes problems.

This commit was SVN r14773.
2007-05-24 21:54:58 +00:00
Ralph Castain
4fff584a68 Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that *did* start.
The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system.

Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed.

Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief.

With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn.

Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put".

This commit was SVN r14711.
2007-05-21 18:31:28 +00:00
Brian Barrett
a25ce44dc1 Clean up the preconnect code:
* Don't need the 2 process case -- we'll send an extra message, but
    at very little cost and less code is better.
  * Use COMPLETE sends instead of STANDARD sends so that the connection
    is fully established before we move on to the next connection.  The
    previous code was still causing minor connection flooding for huge
    numbers of processes.
  * mpi_preconnect_all now connects both OOB and MPI layers.  There's
    also mpi_preconnect_mpi and mpi_preconnect_oob should you want to
    be more specific.
  * Since we're only using the MCA parameters once at the beginning
    of time, no need for global constants.  Just do the quick param
    lookup right before the parameter is needed.  Save some of that
    global variable space for the next guy.

Fixes trac:963

This commit was SVN r14553.

The following Trac tickets were found above:
  Ticket 963 --> https://svn.open-mpi.org/trac/ompi/ticket/963
2007-05-01 04:49:36 +00:00
Ralph Castain
18b2dca51c Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly.
There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd *really* have to try).

This also involved a slight change to the oob.xcast API, so propagated that as required.

Note: this has *only* been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-)

Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately.

This commit was SVN r14475.
2007-04-23 18:41:04 +00:00
Rich Graham
f481722bdf move the code that sets the thread level information before the btl are
initialized, so that the btl's have this information for correct setup.

This commit was SVN r14258.
2007-04-07 05:06:47 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Tim Prins
dbe82c70d6 Get rid of stale file, and remove the (unused) references to it.
This commit was SVN r13771.
2007-02-23 13:50:39 +00:00
Brian Barrett
8b28e5b33d Allow the OOB to connect between all MPI applications during MPI_INIT
without also establishing MPI connectivity. 

This commit was SVN r13595.
2007-02-09 20:17:37 +00:00
Brian Barrett
262cbbc5c9 Back out r13593, which contained a change that shouldn't be committed.
This commit was SVN r13594.

The following SVN revision numbers were found above:
  r13593 --> open-mpi/ompi@81472363ea
2007-02-09 20:13:02 +00:00
Brian Barrett
81472363ea Allow the OOB to connect between all MPI applications during MPI_INIT
without also establishing MPI connectivity.

This commit was SVN r13593.
2007-02-09 20:11:40 +00:00
Ralph Castain
c7be9a7121 Complete backout of prior sched_yield and paffinity changes
This commit was SVN r13530.
2007-02-07 14:22:37 +00:00
Jeff Squyres
0732c555de Refs trac:853
Add new function opal_get_num_processors() that will return the number
of processors on the local host.  Does the Right thing in POSIX
environments (to include a special case for OS X), and will shortly do
the Right Thing for Windows (this commit includes a change to
configure, so I wanted to get that in before the US workday -- the
Windows code can some shortly because it won't involve configury
changes).

This commit was SVN r13506.

The following Trac tickets were found above:
  Ticket 853 --> https://svn.open-mpi.org/trac/ompi/ticket/853
2007-02-06 12:03:56 +00:00
Ralph Castain
3daf8b341b Fix the sched_yield problem for generic environments. We now determine and set sched_yield during mpi_init based on the following logical sequence:
1. if the user has specified sched_yield, we simply do what we are told

2. if they didn't specify anything, try to get the number of processors on this node. Note that we already now get the number of local procs in our job that are sharing this node - that now comes in through the proc callback and is stored in the ompi_proc_t structures.

3. if we can get the number of processors, compare that to the number of local procs from my job that are sharing my node. If the number of local procs exceeds the number of processors, then set sched_yield to true. If not, then be a hog and set sched_yield to false

4. if we can't get the number of processors, default to conservative behavior and set sched_yield to true.

Note that I have not yet dealt with the need to dynamically adjust this setting as more processes are added via comm_spawn. So far, we are *only* looking within our own job. Given that we have now moved this logic to mpi_init (and away from the orteds), it isn't yet clear to me how a process will be informed about the number of procs in *other* jobs that are also sharing this node.

Something to continue to ponder.

This commit was SVN r13430.
2007-02-01 19:31:44 +00:00
Brian Barrett
49bcec64e4 More sm startup time reduction
* The real fix, don't leave the OOB in blocking mode during comm_dyn_init(),
    as it means no progressing MPI events while the event library is waiting
    for TCP stuff to come in.
  * Add many comments explaining the reasons for the current ordering

This commit was SVN r13422.
2007-02-01 18:47:43 +00:00
Jeff Squyres
0ce78f22fa Compliments r13351 -- use the new field on the ompi_proc_t struct to
know what my local rank is, and therefore set my paffinity ID as
appropriate.  Specifically, we're no longer relying on the
special/secret mpi_paffinity_processor MCA parameter that the orted
would set for us.

This allows processor affinity to be used in environments where the
orted is not used (e.g., bproc, and someday in the hopefully not
too-distant future, SLURM).

This commit was SVN r13352.

The following SVN revision numbers were found above:
  r13351 --> open-mpi/ompi@a338b7e533
2007-01-29 21:53:04 +00:00
Edgar Gabriel
1359ba9b13 Rewriting much of the errorcode and errorclass code, since
- we have to be able to attach a string to an error class, not just to an
 error code
 - according to MPI-2 the attribute MPI_LASTUSEDCODE has to be updated
  everytime you add a new code or a new class. Thus, you have to have single
  list for both. 

Thus, we got rid of the error_class structure. In the error-code structure, we
can distinguish whether we are dealing with an error code or an error class by
looking at the err->code element of the structure. In case its value is
MPI_UNDEFINED, the according entry is a class, else it is an error code. All
predefined error codes have the code and the class field set to the same
value.

The test MPI_Add_error_class1 passes now.

Fixes trac:418

This commit was SVN r12764.

The following Trac tickets were found above:
  Ticket 418 --> https://svn.open-mpi.org/trac/ompi/ticket/418
2006-12-05 19:07:02 +00:00
Ralph Castain
79bd8a842e Add xcast timing in mpi_init since we cannot directly measure it (tree-based comm).
This commit was SVN r12706.
2006-11-30 15:06:40 +00:00
Ralph Castain
bc4e97a435 First stage in the move to a faster startup. Change the ORTE stage gate xcast into a binary tree broadcast (away from a linear broadcast). Also, removed the timing report in the gpr_proxy component that printed out the number of bytes in the compound command message as the answer was "not much" - reduces the clutter in the data.
This commit was SVN r12679.
2006-11-28 00:06:25 +00:00
Brian Barrett
33320b7165 Rework the opal_progress interface to better support dynamic processes and at
the same time, remove some of the MPI-related options from OPAL:

  - provide mechanism to change at runtime whether sched_yield() should 
    be called when the progress engine is idle
  - provide mechanism for changing the rate at which the event engine
    is called when there are "no" users of the event engine (ie, when
    using MPI but not TCP)
  - fix some function names in the progress engine to better match
    their intended use (and remove MPI naming scheme)
  - remove progress_mpi_enable / progress_mpi_disable because 
    we can now use the functions to set the sched_yield and
    tick rate interfaces
  - rename opal_progress_events() to opal_progress_set_event_flag()
    because the first really isn't descriptive of what the function
    does and I always got confused by it

This commit was SVN r12645.
2006-11-22 02:06:52 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
Ralph Castain
4e50cdae52 This commit accomplishes two things:
1. Fix the "hang" condition when an application isn't found. It turned out that the ODLS had some difficulty with the process actually not having been started - hence, it never called the waitpid callback. As a result, the "terminated" trigger didn't fire, and so mpirun didn't wake up. With this change, the HNP's errmgr forces the issue by causing the trigger to fire itself when an abort condition occurs.

2. Shift the recording of the pid and the nodename from mpi_init to the orted launcher. This allows programs such as Eclipse PTP to get the pids even for non-MPI applications. In the case of bproc, the pls handles this chore since we don't use orteds in that system.

This commit was SVN r12558.
2006-11-11 04:03:45 +00:00
Ralph Castain
c77f6c605e Update timing reports:
1. Remove timing of xcast from mpi_init

2. Add timing report from oob_xcast on how long it took to send the message

This commit was SVN r12428.
2006-11-03 18:55:05 +00:00
Ralph Castain
60e27c77e7 Add some additional timing reporting:
1. Added reporting points around the xcasts in MPI_Init. Note that these times will include time spent waiting for a trigger to fire, which is why the times between stage gates did NOT include these times initially. The inter-stage-gate times still do NOT include the xcast time - the xcast time is reported separately.

2. Added the process vpid on the MPI_Init timing reports for clarity.

3. Added a report from the xcast function on the HNP that outputs the number of bytes in the message being sent to the processes.

This commit was SVN r12422.
2006-11-03 16:04:40 +00:00
Ralph Castain
36d4511143 Bring the timing instrumentation to the trunk.
If you want to look at our launch and MPI process startup times, you can do so with two MCA params:

OMPI_MCA_orte_timing: set it to anything non-zero and you will get the launch time for different steps in the job launch procedure. The degree of detail depends on the launch environment. rsh will provide you with the average, min, and max launch time for the daemons. SLURM block launches the daemon, so you only get the time to launch the daemons and the total time to launch the job. Ditto for bproc. TM looks more like rsh. Only those four environments are currently supported - anyone interested in extending this capability to other environs is welcome to do so. In all cases, you also get the time to setup the job for launch.

OMPI_MCA_ompi_timing: set it to anything non-zero and you will get the time for mpi_init to reach the compound registry command, the time to execute that command, the time to go from our stage1 barrier to the stage2 barrier, and the time to go from the stage2 barrier to the end of mpi_init. This will be output for each process, so you'll have to compile any statistics on your own. Note: if someone develops a nice parser to do so, it would be really appreciated if you could/would share!

This commit was SVN r12302.
2006-10-25 15:27:47 +00:00
Ralph Castain
0411f9772e Begin instrumenting for scalability tests.
I have added a new MCA param (hey, you can't have too many!) called OMPI_MCA_orte_timing. If set to anything other than zero, the system will report out critical timing loops. At the moment, this includes three measurements:

1. Time spent going through the RDS->RAS->RMAPS, setting up triggers, etc. prior to calling the actual PLS launch function. This is reported out as time to setup job.

2. Time spent in MPI_Init from start of that function (well, right after opal_init) to the place where we send all of our info the registry. Reported out as time from start to exec_compound_cmd

3. Time actually spent executing the compound cmd. Reported out as time to exec_compound_cmd.

A few additional timing points will be added shortly.

These may eventually be removed or (better) setup with a conditional compile flag.

This commit was SVN r11892.
2006-09-29 13:19:44 +00:00
Rainer Keller
80166a9516 - fix typos
This commit was SVN r11703.
2006-09-19 07:55:41 +00:00
Ralph Castain
8c7f0ed9ae Change the SOH to the new State Monitoring and Reporting (SMR) framework. New API's will be appearing in the new framework shortly - this just gets the name change into the system.
Other changes:

1. Remove the old xcpu components as they are not functional.

2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.

This will require an autogen/configure, I'm afraid.

This commit was SVN r11228.
2006-08-16 16:35:09 +00:00
Brian Barrett
cdffc3158d * only set threads if not running at thread single
This commit was SVN r11193.
2006-08-15 15:55:53 +00:00
Jeff Squyres
b6c6d9a2b7 Bring over r10877 and r10881 from the /tmp/tbird branch:
r10877:
add warm up connection option.. of course this only warms up the first
eager btl but this should be adequate for now..

r10881:
Consulted with Galen and did a few things:

- Fix the algorithm to actually make the connections that we want
- Rename the MCA param to mpi_preconnect_all
- Cleanup the code a bit:
  - move the logic to a separate .c file
  - check return codes properly

This commit was SVN r11114.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r10877
  r10877
  r10881
  r10881
2006-08-04 14:41:31 +00:00
Brian Barrett
a84e557815 Add new loop mode OPAL_EVLOOP_ONELOOP that behaved like OPAL_EVLOOP_ONCE
did pre-libevent update.  The problem is that the behavior of 
OPAL_EVLOOP_ONCE was changed by the OMPI team, which them broke things
during the update, so it had to be reverted to the old meaning of
loop until one event occurs.  OPAL_EVLOOP_ONELOOP will go through the
event loop once (like EVLOOP_NONBLOCK) but will pause in the event
library for a bit (like EVLOOP_ONCE).

fixes trac:234

This commit was SVN r11081.

The following Trac tickets were found above:
  Ticket 234 --> https://svn.open-mpi.org/trac/ompi/ticket/234
2006-08-01 22:23:57 +00:00
Jeff Squyres
520147f209 Clean up the Fortran MPI sentinel values per problem reported on the
users mailing list:

  http://www.open-mpi.org/community/lists/users/2006/07/1680.php

Warning: this log message is not for the weak.  Read at your own
risk.

The problem was that we had several variables in Fortran common blocks
of various types, but their C counterparts were all of a type
equivalent to a fortran double complex.  This didn't seem to matter
for the compilers that we tested, but we never tested static builds
(which is where this problem seems to occur, at least with the Intel
compiler: the linker compilains that the variable in the common block
in the user's .o file was of one size/alignment but the one in the C
library was a different size/alignment).

So this patch fixes the sizes/types of the Fortran common block
variables and their corresponding C instantiations to be of the same
sizes/types. 

But wait, there's more.

We recently introduced a fix for the OSX linker where some C versions
of the fortran common block variables (e.g.,
_ompi_fortran_status_ignore) were not being found when linking
ompi_info (!).  Further research shows that the code path for
ompi_info to require ompi_fortran_status_ignore is, unfortunately,
necessary (a quirk of how various components pull in different
portions of the code base -- nothing in ompi_info itself requires
fortran or MPI knowledge, of course).

Hence, the real problem was that there was no code path from ompi_info
to the portion of the code base where the C globals corresponding to
the Fortran common block variables were instantiated.  This is because
the OSX linker does not automatically pull in .o files that only
contain unintialized global variables; the OSX linker typically only
pulls in a .o file from a library if it either has a function that is
used or have a global variable that is initialized (that's the short
version; lots of details and corner cases omitted).  Hence, we changed
the global C variables corresponding to the fortran common blocks to
be initialized, thereby causing the OSX linker to pull them in
automatically -- problem solved.  At the same time, we moved the
constants to another .c file with a function, just for good measure.

However, this didn't really solve the problem:

1. The function in the file with the C versions of the fortran common
   block variables (ompi/mpi/f77/test_constants_f.c) did not have a
   code path that was reachable from ompi_info, so the only reason
   that the constants were found (on OSX) was because they were
   initialized in the global scope (i.e., causing the OSX compiler to
   pull in that .o file).

2. Initializing these variable in the global scope causes problems for
   some linkers where -- once all the size/type problems mentioned
   above were fixed -- the alignments of fortran common blocks and C
   global variables do not match (even though the types of the Fortran
   and C variables match -- wow!).  Hence, initializing the C
   variables would not necessarily match the alignment of what Fortran
   expected, and the linker would issue a warning (i.e., the alignment
   warnings referenced in the original post).

The solution is two-fold:

1. Move the Fortran variables from test_constants_f.c to
   ompi/mpi/runtime/ompi_mpi_init.c where there are other global
   constants that *are* initialized (that had nothing to do with
   fortran, so the alignment issues described above are not a factor),
   and therefore all linkers (including the OSX linker) will pull in
   this .o file and find all the symbols that it needs.

2. Do not initialize the C variables corresponding to the Fortran
   common blocks in the global scope.  Indeed, never initialize them
   at all (because we never need their *values* - we only check for
   their *locations*).  Since nothing is ever written to these
   variables (particularly in the global scope), the linker does not
   see any alignment differences during initialization, but does make
   both the C and Fortran variables have the same addresses (this
   method has been working in LAM/MPI for over a decade).

There were some comments here in the OMPI code base and in the LAM
code base that stated/implied that C variables corresponding to
Fortran common blocks had to have the same alignment as the Fortran
common blocks (i.e., 16).  There were attempts in both code bases to
ensure that this was true.  However, the attempts were wrong (in both
code bases), and I have now read enough Fortran compiler documentation
to convince myself that matching alignments is not required (indeed,
it's beyond our control).  As long as C variables corresponding to
Fortran common blocks are not initialized in the global scope, the
linker will "figure it out" and adjust the alignment to whatever is
required (i.e., the greater of the alignments).  Specifically (to
counter comments that no longer exist in the OMPI code base but still
exist in the LAM code base):

- there is no need to make attempts to specially align C variables
  corresponding to Fortran common blocks
- the types and sizes of C variables corresponding to Fortran common
  blocks should match, but do not need to be on any particular
  alignment 

Finally, as a side effect of this effort, I found a bunch of
inconsistencies with the intent of status/array_of_statuses
parameters.  For all the functions that I modified they should be
"out" (not inout).

This commit was SVN r11057.
2006-07-31 15:07:09 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Brian Barrett
b1d2424013 Merge in present work on the MPI-2 onesided chapter. The current code is not
complete, but stable enough that it will have no impact on general development,
so into the trunk it goes.  Changes in this commit include:

 - Remove the --with option for disabling MPI-2 onesided support.  It
   complicated code, and has no real reason for existing
 - add a framework osc (OneSided Communication) for encapsulating
   all the MPI-2 onesided functionality
 - Modify the MPI interface functions for the MPI-2 onesided chapter
   to properly call the underlying framework and do the required
   error checking
 - Created an osc component pt2pt, which is layered over the BML/BTL
   for communication (although it also uses the PML for long message
   transfers).  Currently, all support functions, all communication
   functions (Put, Get, Accumulate), and the Fence synchronization
   function are implemented.  The PWSC active synchronization
   functions and Lock/Unlock passive synchronization functions are
   still not implemented

This commit was SVN r8836.
2006-01-28 15:38:37 +00:00
Brian Barrett
60ac1cb5f4 print stack traces (when available) for opal and orte processes, as well as
ompi processes.  Also add SIGABRT to the list of signals that are intercepted
to print out pretty messages.

This commit was SVN r8672.
2006-01-11 04:36:39 +00:00
Brian Barrett
45a3eccec6 re-enable the stack-trace functionality (it had been inadvertently disabled
in a merge a long time ago).  Set default signals to print a stack trace from
none to SIGBUS,SIGSEGV.

This commit was SVN r8603.
2005-12-22 19:39:50 +00:00
George Bosilca
58ba8f033d Handle the case when there is no F77 support available for us.
This commit was SVN r8238.
2005-11-22 21:52:14 +00:00
Jeff Squyres
9812694227 Ensure to instantiate MPI_F_STATUS_IGNORE and MPI_F_STATUSES_IGNORE.
Thanks to Anthony Chan for pointing this out.

Note that these will only work for the Fortran compiler that Open MPI
was configured with -- since these values, are, by definition,
single-value, they cannot support all 4 values that Open MPI may
generate for the different Fortran name-mangling schemes.  A lengthy
comment in ompi_mpi_init.c explains this in more detail.  Added to the
README to explain this situation, as well as the forthcoming
.TRUE. Fortran fixes.

This commit was SVN r8231.
2005-11-22 15:24:39 +00:00
George Bosilca
932c67aeb3 MPI_COMM_WORLD should be the first communicator who get created even before MPI_COMM_SELF and MPI_COMM_NULL.
This commit was SVN r8132.
2005-11-12 03:47:17 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Edgar Gabriel
3633605010 moving op_init further up in ompi_mpi_init, since it is required when
quering some of the collective components. Up to now, it just worked
somehow, but now with correct reference counting for ops in place, it
refused :-)

This commit was SVN r7866.
2005-10-25 18:33:48 +00:00
Jeff Squyres
a459659a33 Print the string name of the return code
This commit was SVN r7789.
2005-10-17 20:47:44 +00:00
David Daniel
e4985c2a07 Moving totalview spin to the very end of mpi_init
This commit was SVN r7444.
2005-09-20 15:22:15 +00:00
Galen Shipman
d932cfd342 merge of rcache work into the trunk.. lotsa fun ;-)..
I regression tested before the merge, I will regression test tonight and
correct issues that might have crept in. 

This commit was SVN r7329.
2005-09-12 22:28:23 +00:00
George Bosilca
948683215b And more fixes ...
This commit was SVN r7321.
2005-09-12 20:25:01 +00:00
Ralph Castain
96f4bb7a63 Hey, sports fans!! Guess what??
Here's the huge registry check-in you've all been waiting for with baited breath. The revised version sends a single message to all processes at the various stage gates, thus making the startup much more scalable. I could provide you with all the tawdry details, but won't for now - you are welcome to ask, though, and I'll merrily bore your ears to tears.

In addition, the commit contains the following:

1. set the ignore properties on ompi/debuggers and orte/mca/pls/poe

2. Added simplified subscribe and put functions to the registry's API. I have also converted all of the ompi functions that registered subscriptions to the new API, and caught their associated put's as well.

In a follow-on commit, I'll be adding support for George's hetero arch registry subscription (wanted to get this one in first).

This commit was SVN r7118.
2005-09-01 01:07:30 +00:00
David Daniel
227947fc51 Moving MPI-side TotalView support into a separate directory
ompi/debuggers/

so that compilation options can be more easily controlled.

This commit was SVN r7113.
2005-08-31 20:35:15 +00:00
David Daniel
995641c1e6 Don't initialize proctable more than once (since the stage gate 1 trigger
seems to get fired at least twice).

This commit was SVN r7101.
2005-08-31 00:21:55 +00:00
David Daniel
a5d9199e7f Adding a simple hook for TotalView that is activated if a particular MCA
parameter is set.

orterun/MPI integration still not quite working.

This commit was SVN r7097.
2005-08-30 17:34:23 +00:00
Jeff Squyres
c9cdb36b0b Finally get this right: move orte_sys_info.[ch] back into the orte
tree.
- fix up #include's throughout the tree (yay contrib/search_replace.pl!)
- remove a few extraneous #include's
- remove orte_sys_info*() from opal_init()/opal_finalize() (it's
  already in orte_init_stage1() and orte_system_finalize())
- remove dependencies in opal on orte_system_info -- util/os_path.c
  and util/os_create_dirpath.c (they only used path_sep, anyway --
  easily changed to #defines)

This commit was SVN r7059.
2005-08-26 21:03:41 +00:00
Josh Hursey
4eefb33182 Some param changes:
- Change orte_base_infrastructre to orte_infrastructre to conform with 
  ompi_info's needs
- Move MCA Param registration in ORTE to a centralized function that is 
  called first in orte_init_stage1
- Set the infrastructre flag as an argument to orte_init
- Adjust initalization functions to properly pass down the infrastructre
  flag.

This commit was SVN r7053.
2005-08-26 20:13:35 +00:00