1
1

40 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
bf9087d9d1 The merged main trunk and gpr integration branch. Tested on Mac only so far - will check out and test on Linux. If that has a problem, will back all changes out (again), but I think we have this one correct. Will send out a more complete change notice once testing is complete.
This commit was SVN r3644.
2004-11-20 19:12:43 +00:00
Brian Barrett
23a6d5bb60 * roll back r3584 (gpr changes to reduce floods) as it appears to cause
some instability on Linux

This commit was SVN r3587.

The following SVN revision numbers were found above:
  r3584 --> open-mpi/ompi@52add381d0
2004-11-17 02:30:07 +00:00
Brian Barrett
52add381d0 * Merge over the gpr changes Ralph has made on the gpr-integration branch.
This may trigger a complete rebuild :(.  Short overview of changes:

  - reduce number of network slams at startup
  - prevent gpr from hanging when doing process death code
  - general gpr cleanups

This commit was SVN r3584.
2004-11-16 22:53:33 +00:00
Brian Barrett
ff5ca38dce * start of abort() and cntl-c support.
- register non-blocking recv for process starter whenever a new spawn
    occurs.
  - send kill message when rte_kill_job or kill_proc is called
  - pcm does its mojo to result in the death of the processes

This commit was SVN r3458.
2004-11-01 16:05:31 +00:00
Ralph Castain
3e19906b95 Begin to "leak" the changes to the registry and supporting subsystems to resolve the flood situation and support abnormal terminations. These changes just define a new message structure for returning all startup/shutdown information in a single broadcast-like transmission. Shouldn't have any impact on existing code as the message object isn't used yet.
This commit was SVN r3311.
2004-10-25 13:36:09 +00:00
Prabhanjan Kambadur
4257467fec this is the big windows commit. there are more things which have gone into this than i can remember. but basically, we are looking for
1. header file and source file protections using #ifdef WIN32
2. new files and directories to support windows functionality
3. appropritate linkage symbols added (OMPI_DECLSPEC) for windows
4. some functions are unimplemented on the windows side. this is mostly
because there might not be need to implement it in windows land. eg., forking
a daemon off
5. Introduced locking mechanisms for windows

This commit was SVN r3286.
2004-10-22 16:06:05 +00:00
Prabhanjan Kambadur
dac14aaf94 committing the header file fixes for protection against C++ name mangling. This is a hge commit. Please make sure that your files are protected right. There is some redundan protection in that the protection has been added right at teh beginning and at teh end ion some cases even thught typedefs are not requred to be protected. But this was done in order to have teh minimal change to the code base
This commit was SVN r3246.
2004-10-20 22:31:03 +00:00
Brian Barrett
cc44f2abc2 * Make the spawn constants slightly more clear in meaning
* fix typo in error message for spawning processes
* Remove the name field from the global ompi_process_info struct, replacing
  usage with calls to ompi_rte_get_self().  Cleaned up the resulting logic
  in ompi_rte_init() to make it slightly simpler when dealing with the
  singleton case.  Reduces data duplication and I believe fixes bug
  #1009 as a nice side effect.

This commit was SVN r3230.
2004-10-20 02:24:40 +00:00
Jeff Squyres
d324a7725c - Add #if protection around non-portable system .h files
- Add #include "ompi_config.h" to all .c files, and ensure that it's
  the first #included file
- remove a few useless #if HAVE_CONFIG_H checks

This commit was SVN r3229.
2004-10-20 01:03:09 +00:00
Ralph Castain
3c92d18fc7 Consolidate the RTE startup sequence into a single function call for simpler maintenance. We seem to have this debugged enough now to commonize the startup across the various programs. Modify mpi_init, mpirun, openmpi, ompid, and ompiconsole accordingly.
This commit was SVN r2910.
2004-10-01 22:22:21 +00:00
Brian Barrett
d5f4ebde71 * add some comments about what the spawn selection constraints mean
* memory leak cleanups
* implement rsh's kill_proc and kill_job for the case where we
  keep the ssh connections alive.  At least, I think this will work.
  Need to test some more.

This commit was SVN r2884.
2004-09-29 21:29:51 +00:00
Ralph Castain
57a224c985 ka-ching
This commit was SVN r2841.
2004-09-23 14:40:05 +00:00
Brian Barrett
bc6ecff582 * improve interface description for ompi_rte_allocate_resources
* make hostfile llm properly deal with over subscribe situation.  Rather
  than returning smaller than requested (which is no longer possible as
  it made for a book keeping nightmaer and no one was paying attention
  to it anyway), we just over subscribe the nodes.  In the future, we
  need to add a flag to allocate resources as to whether to allow
  over subscription (if the resource allocator permits - clearly rsh 
  does, rms not so much).

This commit was SVN r2808.
2004-09-22 22:27:40 +00:00
Brian Barrett
68407587ba * Make it easier to get just the number of peer processes by allowing NULL
to be passed for the first argument of ompi_rte_get_peers().  Also
  cleaned up the documentation for that function.

This commit was SVN r2801.
2004-09-22 16:05:33 +00:00
Brian Barrett
782d3af2b9 * Move to using a lazy selection for pcms so that we can have multiple PCM
sets running at once - requires an additional step in spawning to get a
  handle (that will contain multiple pcms when we support multi-cell)
* change the selection logic of the pcms to not care about setting threads,
  but instead to select based on the selected thread level, since it
  would be a little late by the time we did the selection for pcms.
* started the long process of cleaning up the rsh pcm so that it
  actually kills processes and things.  Still doesn't do anything useful,
  but getting to the point where that might be possible

This commit was SVN r2794.
2004-09-21 20:27:41 +00:00
Brian Barrett
2dc55f12da * add more selection criteria to for the pcm selection code
* remove the ns param switch - always use the ns at this point
* clean up some of the evil rms code that wasn't multi-pcm safe.  still
  have somme work on this front

This commit was SVN r2779.
2004-09-20 18:25:00 +00:00
George Bosilca
efc09dfc94 increase timeout
This commit was SVN r2778.
2004-09-20 17:29:29 +00:00
Ralph Castain
069682e046 A bunch of minor changes, mostly adding diagnostics. Just wanted to checkpoint so I can start fresh since there now seem to be problems in the tree with mpirun2.
Fixed ompid so it reissues the non-blocking receive - should now be close to ready for primetime. Fixed some logic in the svc framework that wasn't checking properly for action flags. 

This commit was SVN r2660.
2004-09-14 14:21:04 +00:00
Ralph Castain
c6cbe33d50 Some of these didn't really change - I was just in/out of them for diagnostics while chasing a bug. Got caught by my good buddy Tim again :) on his parse_contact_info function, which requires that the space for the answer be allocated in advance. Sigh. Anyway, mpirun2 now works again. My apologies if you tried it in the last few hours and found it didn't.
Also removed the mpirun3 directory since we are basically dragging mpirun2 along with us - no need to create a new version after all.

Made a few changes to the universe info structure, eliminating the "webserver" and "socket" fields since we will do those contacts through the oob channel. Also changed the "silent_mode" field to "console" since silent mode is the default - the flag needs to tell you to turn the console on, not off.

Parse environ function now gets the ns and gpr replica contact info and loads it in the proper places to hand it off to the respective components, thus allowing me to check connection to them as part of determining if the named universe already exists. Changed the local_universe_exists function accordingly and gave it a new name (since the replicas may not be local). This name will shortly be changed to "ompi_rte_join_universe" as I complete the logic for doing that function.

Please let me know if you see any problems. I successfully ran some trivial multi-process functions in both mpirun2 and singleton modes, and ran the seed daemon as well, so I think it should all be okay.

This commit was SVN r2611.
2004-09-11 12:56:52 +00:00
Brian Barrett
c8b03b0897 * change the pcm slection to allow for both multiple components to be loaded
at the same time and multiple modules of the same component to be loaded
  at the same time (but not launching procs in the same job).
  - add a "this" pointer to all the PCM functions
  - make base select() function return a list of selected pcms, based on
    given criteria bitmask
  - update all the pcms to match
* Add a insert before position function to the ompi_list code

This commit was SVN r2590.
2004-09-10 04:54:17 +00:00
Ralph Castain
106e07f759 Some reorganization of the startup process functions that is transparent to anyone using mpirun2 and/or running as a singleton. Please note that the old mpirun script may well not work any more - I have not been trying to keep that one running.
For those of you looking into the guts of these functions, the most visible changes are:

- raising the assignment of the process name to a higher level, taking it out of the "hole" it had fallen into. We've been having problems with multiple functions assigning the process name. This is understandable - lots of workarounds were implemented in the early development stages. However, it was becoming hard to determine WHEN the name was being defined - it was being hidden under too many layers of function calls. Hence, it is now assigned in the three primary programs in a very visible fashion. Hopefully, we can now chase down all the other places and get rid of them.

- similarly, I raised the visibility of when the session directory gets constructed to ensure it doesn't get done at the wrong time and/or multiple times.

- created a new function that parses all the non-mca level environmental variables and assigns the info into the corresponding structures. I have also included notes in this function and in the various ompi_rte_init_stage functions about proper ordering.

- modified the rte cmd line parsers to store the options they find into the environment so they can be passed along later

That about does it.

This commit was SVN r2589.
2004-09-10 03:21:03 +00:00
Ralph Castain
02f1342291 Checkpoint the new openmpi/daemon development. Shouldn't impact anyone out there.
This commit was SVN r2523.
2004-09-07 02:58:49 +00:00
Ralph Castain
6b05166b75 Split ompi_rte_init into two stages to allow detection of universe existence prior to starting remainder of rte. Adjust mpirun2 and mpi_init to accommodate.
This commit was SVN r2499.
2004-09-03 21:17:33 +00:00
Ralph Castain
36c6cf574e Fix bad initialization in proc_info, add a pid field into the universe info and save it in the universe info file.
This commit was SVN r2491.
2004-09-03 19:26:49 +00:00
Brian Barrett
1a100e65c1 * rework the pcm/llm interface to be more non-rsh friendly. Push the
host / cpu information down into a handle that need not exist when
  the llm isn't being used.  Fix all the test cases and whatnot to match

This commit was SVN r2490.
2004-09-03 19:19:59 +00:00
Ralph Castain
3c94af6021 Bring tree back to ability to cleanly compile. These changes should be transparent to virtually everyone, but the silly tree insists on compiling everything. Mostly modified things in preparation for releasing the openmpi program, and to begin knitting the next (hopefully final) version of mpirun.
Let me know if it has any impact on you - it shouldn't.

This commit was SVN r2487.
2004-09-03 16:26:15 +00:00
Ralph Castain
3cffa6b549 Inserts some debugging output for use by Tim to help debug a startup problem. Should be transparent to everyone else.
This commit was SVN r2453.
2004-09-02 18:39:42 +00:00
Tim Woodall
16d250b376 - integration of gpr/ns/oob w/ mpirun2
This commit was SVN r2344.
2004-08-28 01:15:19 +00:00
Brian Barrett
5540dc37bc * Change from ompi_list_t to ompi_list_t* in the schedule and allocation
structures to make it easier to swap around lists when doing process ->
  resource mapping
* Fix spawn interface to take an ompi_list_t* instead of an ompi_list_t
  since you can't pass an ompi_list_t by value
* Change allocate_resource to return an ompi_list_t* instead of having
  an ompi_list_t** as an argument, since it's a bit cleaner and makes
  who should call OBJ_NEW much more clear
* Clean up deallocation in error cases for the llm_base_allocate function
* Update test case for llm to not depend on current environment for
  correctness

This commit was SVN r2126.
2004-08-13 19:39:06 +00:00
Brian Barrett
8e8ef21ae8 * Move the PCM and LLM types into src/runtime/runtime_types.h
* Add more of the mpirun shell - still far from functional
* Expand the src/runtime interface to include the parts of the pcm needed
  for mpirun

This commit was SVN r1998.
2004-08-10 03:48:41 +00:00
Brian Barrett
fd27aa08fc * As requested, move mpiruntime to mpi/runtime
This commit was SVN r1960.
2004-08-08 05:20:32 +00:00
Brian Barrett
a27f749134 * Move the MPI runtime code from src/runtime to src/mpiruntime to make the
abstraction a little clearer.
* Include mpiruntime.h instead of runtime.h in errhandler.h since only the
  MPI stuff was needed - speeds compile times greatly when working on the
  RTE...

This commit was SVN r1948.
2004-08-07 00:53:56 +00:00
Brian Barrett
e8c5a60cc9 * add (useless?) timing comment
This commit was SVN r1938.
2004-08-06 21:18:34 +00:00
Brian Barrett
3be27734c0 * add Doxy comments for the init / finalize functions
This commit was SVN r1894.
2004-08-05 14:35:38 +00:00
Jeff Squyres
c9cb40d102 Remove now-useless macro.
This commit was SVN r1818.
2004-07-30 03:00:01 +00:00
David Daniel
563ac2a338 First pass of lam -> ompi conversion
This commit was SVN r1191.
2004-06-07 15:33:53 +00:00
Tim Woodall
ada9c63e79 initialize request type in base class
This commit was SVN r981.
2004-03-26 16:21:13 +00:00
Tim Woodall
c1ee4fec23 - initial integration with datatypes
- p2p mpi i/f functions
- adding doxygen comments

This commit was SVN r976.
2004-03-26 14:15:20 +00:00
David Daniel
0993f49ffe More fixes from directory reorg
This commit was SVN r886.
2004-03-17 20:00:24 +00:00
Jeff Squyres
1b67211597 Massive directory reorganization :-)
This commit was SVN r872.
2004-03-17 17:42:19 +00:00