1
1

49 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
d21c0027df Well, we are getting closer to resolving the comm_spawn problem. For the benefit of those that haven't been in the midst of this discussion, the problem is that this is the first case where the process starting a set of processes has not been mpirun and is not guaranteed to be alive throughout the lifetime of the spawned processes. This sounds simple, but actually has some profound impacts.
Most of this checkin consists of more debugging stuff. Hopefully, you won't see any printf's that aren't protected by debug flags - if you do, let me know and I'll take them out with my apologies.

Outside of debugging, the biggest change was a revamp of the shutdown process. For several reasons, we had chosen to have all processes "wait" for a shutdown message before exiting. This message is typically generated by mpirun, but in the case of comm_spawn we needed to do something else. We have decided that the best way to solve this problem is to:

(a) replace the shutdown message (which needed to be generated by somebody - usually mpirun) with an oob_barrier call. This still requires that the rank 0 process be alive. However, we terminate all processes if one abnormally terminates anyway, so this isn't a problem (with the standard or our implementation); and

(b) have the state-of-health monitoring subsystem issue the call to cleanup the job from the registry. Since the state-of-health subsystem isn't available yet, we have temporarily assigned that responsibility to the rank 0 process.  Once the state-of-health subsystem is available, we will have it monitor the job for all-processes-complete and then it can tell the registry to cleanup the job (i.e., remove all data relating to this job).

Hope that helps a little. I'll put all this into the design docs soon.

This commit was SVN r3754.
2004-12-08 21:44:41 +00:00
Ralph Castain
e259750296 Clean-up a few things - mostly adding a lot of diagnostics to chase down the comm_spawn issue. Believe this may fix that problem, but still checking that for sure. At least now it starts processes correctly again!
This commit was SVN r3738.
2004-12-07 19:38:42 +00:00
Jeff Squyres
616269a9be Add HLRS copyright
This commit was SVN r3665.
2004-11-28 20:09:25 +00:00
Jeff Squyres
e9ed717748 First cut at copyrights: IU, UTK, and some OSU. LANL and HLRS still
pending.

This commit was SVN r3655.
2004-11-22 01:38:40 +00:00
George Bosilca
9659288e74 I hate waiting on the airports. SO I start doing something usefull ...
I remove a lot of inter-dependence, I use the struct_t type.
BEWARE not all the function are ready.

This commit was SVN r3524.
2004-11-05 07:52:30 +00:00
Edgar Gabriel
16abfe9ed7 group_null, and comm_null have to be accessible also in comm_dyn.c, therefore I had to add an 'extern' statement in the header file.
This commit was SVN r3344.
2004-10-26 17:25:49 +00:00
Edgar Gabriel
17fd5308a6 moving all internal communication to negative tags
This commit was SVN r3340.
2004-10-26 15:06:51 +00:00
Edgar Gabriel
22acad7c5c adding the routine to handle the proper disconnect of all dynamic communicators in MPI_Finalize (in cases the user did not explicitly disconnect).
This commit was SVN r3338.
2004-10-26 14:54:23 +00:00
Edgar Gabriel
ee8b9e1897 first cut on the new disconnect infrastructure. More to come,
especially focusing on Finalize.

This commit was SVN r3331.
2004-10-26 11:37:58 +00:00
Prabhanjan Kambadur
4257467fec this is the big windows commit. there are more things which have gone into this than i can remember. but basically, we are looking for
1. header file and source file protections using #ifdef WIN32
2. new files and directories to support windows functionality
3. appropritate linkage symbols added (OMPI_DECLSPEC) for windows
4. some functions are unimplemented on the windows side. this is mostly
because there might not be need to implement it in windows land. eg., forking
a daemon off
5. Introduced locking mechanisms for windows

This commit was SVN r3286.
2004-10-22 16:06:05 +00:00
Prabhanjan Kambadur
1e7ac1194e This somehow got left. If you folks come accross similar files, please protect the extern symbols from name mangling
This commit was SVN r3256.
2004-10-21 00:33:54 +00:00
Prabhanjan Kambadur
dac14aaf94 committing the header file fixes for protection against C++ name mangling. This is a hge commit. Please make sure that your files are protected right. There is some redundan protection in that the protection has been added right at teh beginning and at teh end ion some cases even thught typedefs are not requred to be protected. But this was done in order to have teh minimal change to the code base
This commit was SVN r3246.
2004-10-20 22:31:03 +00:00
Jeff Squyres
3b5f8c87a2 Fix for bug 1007 -- save the selected coll component on the
communicator so that it can be passed down through MPI_COMM_DUP as the
"preferred" component during coll selection.

This commit was SVN r3179.
2004-10-15 22:32:48 +00:00
Prabhanjan Kambadur
4c9f883a9b Changing the logic in ompi_comm_split to pass on the topo information only when required. Also had to change other files to go along wit this. Tested the intel test suites (relevant tests ... MPI_Cart_sub_c MPI_Comm_split1_c MPI_Comm_split2_c ... everything seems to work fine
This commit was SVN r2982.
2004-10-07 19:14:27 +00:00
Prabhanjan Kambadur
4655fe3943 Correcting code which allows topo to function properly ... also adding code to make comm_split and comm_dup work properly
This commit was SVN r2933.
2004-10-05 16:32:52 +00:00
Edgar Gabriel
05a28efd1f first cut on the comm_spawn mechanism. It doesn't work yet
(and I don't know why), but it also doesn't seem to break anything else...

This commit was SVN r2874.
2004-09-29 12:41:55 +00:00
Edgar Gabriel
e1406a1d5d adding the code to manage the communicator cid allocation in multithreaded scenarios. The code seems to work in a single threaded scenario (and thus should not generate any errors hopefully), the multithreaded case still to be tested.
This commit was SVN r2792.
2004-09-21 18:39:06 +00:00
Edgar Gabriel
65e4b61ec2 fixes to make comm_join work
This commit was SVN r2745.
2004-09-17 16:28:58 +00:00
Edgar Gabriel
e14bbf5fc4 adaping some routines to use proc_find_and_add
activating name_publishing (also tested, testcode will be checked in soon).
adaping the port-stuff to the new format.

This commit was SVN r2741.
2004-09-17 10:11:22 +00:00
Edgar Gabriel
718be11bdb fixing a problem for error handlers as discussed with Jeff a couple of weeks ago. The macros used in the MPI functions have not changed, thus the modifications should be transparent to all other functions.
This commit was SVN r2522.
2004-09-06 12:06:27 +00:00
Edgar Gabriel
1ca0115ff7 adding in finalize a verification, whether we still have some communicators there...which should not happen in a correct MPI code.
This commit was SVN r2124.
2004-08-13 18:55:07 +00:00
Edgar Gabriel
b8d6fd5d50 introducing a 'hidden' communicator flag. The communicators marked hidden will not be exported to totalview (e.g. internal communicators of the MagPIe coll framework).
This commit was SVN r1925.
2004-08-06 14:33:23 +00:00
Edgar Gabriel
2e43e4980e a couple of changes:
- because we had the scenario, that fragments for a communicator, which was not 
   not yet set up on all procs, arrived and caused problem, we introduced
   a comm_activate function call, which executes a kind of barrier (using
   the allreduce functions used for the comm_cid allocation).
   Setting up the coll-component has moved *after* this barrier, since 
   some coll-modules (e.g. the MagPIe component) might want to communicate
   using this communicator already (e.g. a comm_split).
-  adding a new file comm_dyn.c, which basically abstracts the required functionality
   for connect-accept, and therefore is the 'magic' code (from the MPI point of view)
   for all dynamically created communicators.

This commit was SVN r1900.
2004-08-05 16:31:30 +00:00
Edgar Gabriel
167e046ee0 sorry, deactivating the comm-mutex to avoid problems.
This commit was SVN r1878.
2004-08-04 19:55:45 +00:00
Edgar Gabriel
177d6ddae2 adding a mutex for locking name and attributes in a communicator.
Adding a functions abstracting the setting of the name. 

This commit was SVN r1875.
2004-08-04 19:46:29 +00:00
Edgar Gabriel
20a512a9b7 restructuring some of the communicator code.
with a couple of internal tricks, intercomm_create works now.

This commit was SVN r1855.
2004-08-03 22:07:45 +00:00
Jeff Squyres
eb8cba98af - massive change for module<-->component name fixes throughout the
code base.
  - many (most) mca type names have "component" or "module" in them,
    as relevant, just to further distinguish the difference between
    component data/actions and module data/actions.  All developers
    are encouraged to perpetuate this convention when you create
    types that are specific to a framework, component, or module
  - did very little to entire framework (just the basics to make it
    compile) because it's just about to be almost entirely replaced
  - ditto for io / romio
  - did not work on elan or ib components; have to commit and then
    convert those on a different machine with the right libraries and
    headers
- renamed a bunch of *_module.c files to *_component.c and *module*c
  to *component*c (a few still remain, e.g., ptl/ib, ptl/elan, etc.)
- modified autogen/configure/build process to match new filenames
  (e.g., output static-components.h instead of static-modules.h)
- removed DOS-style cr/lf stuff in ns/ns.h
- added newline to end of file src/util/numtostr.h
- removed some redundant error checking in the top-level topo
  functions
- added a few {} here and there where people "forgot" to put them in
  for 1 line blocks ;-)
- removed a bunch of MPI_* types from mca header files (replaced with
  corresponding ompi_* types)
- all the ptl components had version numbers in their structs; removed
- converted a few more elements in the MCA base to use the OBJ
  interface -- removed some old manual reference counting kruft

This commit was SVN r1830.
2004-08-02 00:24:22 +00:00
Prabhanjan Kambadur
cae7d0afcb Changing a lot of stuff in topo. This commit is just the intergration
of what has been done in tmp/anju-topo-work branch. For more detailed 
information on the commits, please see those logs

This commit was SVN r1787.
2004-07-20 22:21:47 +00:00
Jeff Squyres
a5a712b31f Lots of changes in this commit, mostly having to do with the first
real commit of the collectives.  MPI_SCAN and MPI_EXSCAN are still not
implemented, but lots of other things are in the critical path and
holding up other people, so it's ok to commit without them:

- better checks for sizes in configure, and add defaults for fortran
  sizes if we don't have a fortran compiler
- fix some logic that was accidentally broken for size checks for the
  file type offset_t
- add some C equivalent types for fortran's complex and double complex
  (for use in internal reduction/op functions)
- additionals and slight reorganization of ompi_mpi_init()
  ompi_mpi_finalize()
- fully implement all top-level MPI collective calls, including all
  param checking for both intra- and inter-communicators (woof)
- change the communicator_t type for stuff that we need in coll, and
  update all references throughout the code base to match
- all kinds of updates to the coll framework base
- next cut of the basic coll module -- has all intracommunicator
  collectives implemented except scan and exscan (see note above).
  All intercommunicator functions return ERR_NOT_IMPLEMENTED.
- MPI_Op is a fixed implementation -- not component-ized yet.  So
  there are generic C loops for all implementations.

This commit was SVN r1491.
2004-06-29 00:02:25 +00:00
Tim Woodall
2ce7ca725b - cleanup of some of the c bindings
- for threaded case - cleanup event libraries progress thread
- cleanup of request handling for persistent sends
- added support for buffered sends

This commit was SVN r1461.
2004-06-24 16:47:00 +00:00
Edgar Gabriel
deec2c1435 preparing the comm_cid allocation routine for dynamic process management
This commit was SVN r1336.
2004-06-16 22:37:03 +00:00
Edgar Gabriel
2598915dc7 adding functions for handling the name publishing. Currently ifdefed out,
just blank routines are provided for keeping the linker happy.

This commit was SVN r1334.
2004-06-16 21:34:42 +00:00
Edgar Gabriel
f8294ab099 extracting the creation of comm_cid from the comm_set routine.
The reason is, that we had to pass four arguments to comm_sdet, which were just passed to comm_cid. However, for the dynamic case, even these four arguments are not enough. So I extracted it. A typical sequence for a comm-creation will therefore be:

 newcomm = ompi_comm_set (...);
 ompi_comm_nextcid (newcomm, oldcomm,...);

This commit was SVN r1312.
2004-06-16 15:40:52 +00:00
Edgar Gabriel
bdef2b4a09 fixed the prototype of ompi_comm_dump
- added a function for implementing an allgather on the local_comm for inter-communicators. Will just be used from MPI_Comm_split for inter-communicators

This commit was SVN r1283.
2004-06-15 21:29:29 +00:00
Edgar Gabriel
df91640da6 just minor modifications to match the modifications in the group-struct.
This commit was SVN r1258.
2004-06-15 00:09:40 +00:00
David Daniel
563ac2a338 First pass of lam -> ompi conversion
This commit was SVN r1191.
2004-06-07 15:33:53 +00:00
Edgar Gabriel
536c279529 adding most of the required functionality for handling MPI-1 and most of MPI-2 communicator functions (except dynamic process management). The cid are currently not yet calculated properly, although the functions are checked in.
Still to do:
- make the CID allocation routine thread safe
- add the ACK in lam_comm_free
- fix a bug in lam_comm_split for inter-communicators ( in this
  case we can not have allgather_intra and allgather_inter at the
  same time at this communicators, that's however what
  the current implemention assumes).

Reviewed by Jeff, Rich and Tim.

This commit was SVN r1148.
2004-05-21 19:36:19 +00:00
Vishal Sahay
ec7b437428 Add more functionality in comm
This commit was SVN r1121.
2004-05-07 23:15:10 +00:00
Prabhanjan Kambadur
ca48b3962b Changing some prototypes and also changing the functions. There were some spelling mistakes and other problems. Also commiting the MPI topology functions
This commit was SVN r1075.
2004-04-21 20:55:54 +00:00
Prabhanjan Kambadur
a8d9e4ac7d Pushing c_cube_dim to communicator since it is used by more than one type
This commit was SVN r1068.
2004-04-21 00:16:05 +00:00
Prabhanjan Kambadur
2f7aeb2d62 Changing some stuff in topo module ...
This commit was SVN r1043.
2004-04-16 20:54:48 +00:00
Edgar Gabriel
6932553a35 fixing the "no MPI-objects underneath the MPI-library functions" problem.
Exchanged therefore MPI_Comm by lam_communicator_t* and MPI_Group by lam_group_t* .

This commit was SVN r1000.
2004-03-28 14:24:43 +00:00
Edgar Gabriel
44a20d832a first set of communicator functions. up to now
supporting MPI-1 intra-communicators.
 
The getnextcid function still to be done properly.

This commit was SVN r982.
2004-03-26 20:02:42 +00:00
Vishal Sahay
af4a977a21 Change references to keyhash according to the comm/datatype/win object -- append a prefix to keyhash in the respective structure
This commit was SVN r938.
2004-03-19 16:34:09 +00:00
Jeff Squyres
19629774f0 First cut at error handlers. Still have more to go.
This commit was SVN r934.
2004-03-19 06:12:43 +00:00
Jeff Squyres
b195c85f60 First cut of the errhandler stuff (invocation of errorhandlers, to be
invoked by top-level MPI functions).  See doxygen comments for
explanations of which macros to use.

This commit was SVN r926.
2004-03-19 00:00:09 +00:00
David Daniel
e572fbb128 Removing explicit references to stdint.h
This commit was SVN r882.
2004-03-17 19:22:14 +00:00
David Daniel
7f8c2c3714 Updating header file names after the great directory reorganization.
This commit was SVN r877.
2004-03-17 18:45:16 +00:00
Jeff Squyres
1b67211597 Massive directory reorganization :-)
This commit was SVN r872.
2004-03-17 17:42:19 +00:00