1
1
Граф коммитов

138 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
3e19906b95 Begin to "leak" the changes to the registry and supporting subsystems to resolve the flood situation and support abnormal terminations. These changes just define a new message structure for returning all startup/shutdown information in a single broadcast-like transmission. Shouldn't have any impact on existing code as the message object isn't used yet.
This commit was SVN r3311.
2004-10-25 13:36:09 +00:00
George Bosilca
0cc12d9b93 Just remove unused variables.
This commit was SVN r3296.
2004-10-23 15:31:20 +00:00
Jeff Squyres
dcd086fd4c Some versions of gcc don't line \n in a string
This commit was SVN r3287.
2004-10-22 17:45:06 +00:00
Prabhanjan Kambadur
4257467fec this is the big windows commit. there are more things which have gone into this than i can remember. but basically, we are looking for
1. header file and source file protections using #ifdef WIN32
2. new files and directories to support windows functionality
3. appropritate linkage symbols added (OMPI_DECLSPEC) for windows
4. some functions are unimplemented on the windows side. this is mostly
because there might not be need to implement it in windows land. eg., forking
a daemon off
5. Introduced locking mechanisms for windows

This commit was SVN r3286.
2004-10-22 16:06:05 +00:00
Prabhanjan Kambadur
dac14aaf94 committing the header file fixes for protection against C++ name mangling. This is a hge commit. Please make sure that your files are protected right. There is some redundan protection in that the protection has been added right at teh beginning and at teh end ion some cases even thught typedefs are not requred to be protected. But this was done in order to have teh minimal change to the code base
This commit was SVN r3246.
2004-10-20 22:31:03 +00:00
Brian Barrett
cc44f2abc2 * Make the spawn constants slightly more clear in meaning
* fix typo in error message for spawning processes
* Remove the name field from the global ompi_process_info struct, replacing
  usage with calls to ompi_rte_get_self().  Cleaned up the resulting logic
  in ompi_rte_init() to make it slightly simpler when dealing with the
  singleton case.  Reduces data duplication and I believe fixes bug
  #1009 as a nice side effect.

This commit was SVN r3230.
2004-10-20 02:24:40 +00:00
Jeff Squyres
d324a7725c - Add #if protection around non-portable system .h files
- Add #include "ompi_config.h" to all .c files, and ensure that it's
  the first #included file
- remove a few useless #if HAVE_CONFIG_H checks

This commit was SVN r3229.
2004-10-20 01:03:09 +00:00
Brian Barrett
dd9726963c * fixes to make runtime code compile with a C++ compiler
This commit was SVN r3197.
2004-10-18 16:08:52 +00:00
Brian Barrett
d7528d1fd3 * remove the pre-ANSI C vararg code, only have the ANSI-C stdarg code
This commit was SVN r3138.
2004-10-14 19:39:21 +00:00
Brian Barrett
4289d0608f * Fix for bug 1019. Really bad idea to call ompi_event_loop() from a thread
that isn't the progress thread when running a threaded build.

This commit was SVN r3097.
2004-10-13 23:08:47 +00:00
Jeff Squyres
ae60cdcafa Fix some symbols highlighted by the illegal symbol report
This commit was SVN r2926.
2004-10-05 11:53:45 +00:00
Ralph Castain
8882983073 Update the sequencing to eliminate generation of unnecessary jobid's. This sequencing appears to work and be stable.
This commit was SVN r2917.
2004-10-04 16:53:53 +00:00
Ralph Castain
b2cc35056c Remove a debug output.
This commit was SVN r2911.
2004-10-01 22:37:29 +00:00
Ralph Castain
3c92d18fc7 Consolidate the RTE startup sequence into a single function call for simpler maintenance. We seem to have this debugged enough now to commonize the startup across the various programs. Modify mpi_init, mpirun, openmpi, ompid, and ompiconsole accordingly.
This commit was SVN r2910.
2004-10-01 22:22:21 +00:00
Brian Barrett
550469cb0b * move wiatpid shutdown code into ompi_rte_finalize
* remove now unneeded ompi_event_fini from mpirun

This commit was SVN r2891.
2004-09-30 16:23:08 +00:00
Tim Woodall
a222c702ec cleanup of finalize code
- unregister all event handlers from event library
- cancel pending non-blocking receives with oob

This commit was SVN r2887.
2004-09-30 15:09:29 +00:00
Ralph Castain
b42a361302 Patch a few things that were causing trouble for programs that re-entered the registry during a callback function. Also fixed a timing problem in rte_monitor - ensured that we were in fact already waiting on a condition before generating a wakeup signal. Adjusted the timing of mpirun to ensure that the synchro to alert mpirun of all-processes-done got registered before they completed.
This commit was SVN r2885.
2004-09-29 21:54:57 +00:00
Brian Barrett
d5f4ebde71 * add some comments about what the spawn selection constraints mean
* memory leak cleanups
* implement rsh's kill_proc and kill_job for the case where we
  keep the ssh connections alive.  At least, I think this will work.
  Need to test some more.

This commit was SVN r2884.
2004-09-29 21:29:51 +00:00
Brian Barrett
452e5fd0f7 * want portable signal.h not non-portable sys/signal.h
This commit was SVN r2877.
2004-09-29 18:40:46 +00:00
Ralph Castain
b5e21eaac3 Fix a missing include file in ompi_rte_wait that caused the build to fail.
Minor change to oob_base_init - point oob_name_self at correct name.

This commit was SVN r2868.
2004-09-28 10:33:09 +00:00
Brian Barrett
a6963be12e * back out parts of r2864, moving calls to ompi_event_fini() back into
MPI_Finalize and mpirun so that we shut down the event library before
  the TCP PTL.  This needs to change before release so that the RTE 
  components can deregister properly, but we need to run in the mean time

This commit was SVN r2867.

The following SVN revision numbers were found above:
  r2864 --> open-mpi/ompi@57ca18ce88
2004-09-28 01:38:16 +00:00
Brian Barrett
57ca18ce88 * move ompi_event_fini() from mpirun/MPI_Finalize to ompi_rte_finalize to
match where ompi_event_init() lived
* initialize and shutdown the code to allow child process wait callbacks
* add comment about few() ussage in rte-enabled jobs (short answer:
  don't).

This commit was SVN r2864.
2004-09-27 19:38:23 +00:00
Brian Barrett
40c0b6b12d * code to deal with getting callbacks / waiting for SIGCHLD. These should
only be used if the RTE init functions have been called.  Not quite as
  flexible as the real waitpid() function (no -1 support), but all I need
  for the SSH / BProc / RMS pcms.  This code is not yet turned on by
  default (need to add the init / finalize calls to ompi_rte_init?? and
  ompi_rte_finalize()

This commit was SVN r2860.
2004-09-26 17:43:35 +00:00
Ralph Castain
ff63f1f1c8 Fix a race condition on the last process to complete.
This commit was SVN r2844.
2004-09-23 16:12:45 +00:00
Ralph Castain
57a224c985 ka-ching
This commit was SVN r2841.
2004-09-23 14:40:05 +00:00
Ralph Castain
0cc082780f ka-ching
This commit was SVN r2826.
2004-09-23 14:33:28 +00:00
Ralph Castain
65fc61e212 ka-ching goes the little counter...how I love that sound!
This commit was SVN r2825.
2004-09-23 14:33:06 +00:00
Ralph Castain
ad395fa825 First commit of the revised startup system.
Having noted the existence of the wondrous Open MPI "statistics" tracker, I feel compelled to commit these changes one file at a time. This will, of course, provide me with wonderful statistics for the number of commits I have done, thus ensuring that those who watch such things become truly impressed by the magnitude of my contribution.

Of course, I will also do a commit for each time I correct a typo in my own software, and each time I add a comment to a file - a comment that, ordinarily, one might expect to have already been in place before the first commit. But then....I wouldn't look as impressive if I did it that way! No, no...far better to add the comments - and do a commit after each one - separately!

So, enjoy all.
Ralph
aka. The longtime Don Quixote crusader against the asinine use of meaningless statistics in place of true performance metrics.

This commit was SVN r2824.
2004-09-23 14:32:31 +00:00
Brian Barrett
bc6ecff582 * improve interface description for ompi_rte_allocate_resources
* make hostfile llm properly deal with over subscribe situation.  Rather
  than returning smaller than requested (which is no longer possible as
  it made for a book keeping nightmaer and no one was paying attention
  to it anyway), we just over subscribe the nodes.  In the future, we
  need to add a flag to allocate resources as to whether to allow
  over subscription (if the resource allocator permits - clearly rsh 
  does, rms not so much).

This commit was SVN r2808.
2004-09-22 22:27:40 +00:00
Brian Barrett
68407587ba * Make it easier to get just the number of peer processes by allowing NULL
to be passed for the first argument of ompi_rte_get_peers().  Also
  cleaned up the documentation for that function.

This commit was SVN r2801.
2004-09-22 16:05:33 +00:00
Jeff Squyres
201611ad3d Add <errno.h> for linux
This commit was SVN r2797.
2004-09-22 08:08:10 +00:00
Brian Barrett
3795754dc8 * random fixups from last commit
This commit was SVN r2796.
2004-09-22 05:34:41 +00:00
Brian Barrett
782d3af2b9 * Move to using a lazy selection for pcms so that we can have multiple PCM
sets running at once - requires an additional step in spawning to get a
  handle (that will contain multiple pcms when we support multi-cell)
* change the selection logic of the pcms to not care about setting threads,
  but instead to select based on the selected thread level, since it
  would be a little late by the time we did the selection for pcms.
* started the long process of cleaning up the rsh pcm so that it
  actually kills processes and things.  Still doesn't do anything useful,
  but getting to the point where that might be possible

This commit was SVN r2794.
2004-09-21 20:27:41 +00:00
Brian Barrett
75e6f7dac5 * remove the can_spawn functions from the pcm. when there was one pcm at
a time and no pcmclient, this made sense.  Now, the selection logic will
  implicitly do this for us.

This commit was SVN r2783.
2004-09-20 20:12:04 +00:00
Brian Barrett
41e17e2758 * rename pack.{c,h} to bufpack.{c,h} because there was already a pack.c in
src/mpi/c and you can't have two object files with the same name in
  the same library

This commit was SVN r2782.
2004-09-20 19:55:01 +00:00
Brian Barrett
2dc55f12da * add more selection criteria to for the pcm selection code
* remove the ns param switch - always use the ns at this point
* clean up some of the evil rms code that wasn't multi-pcm safe.  still
  have somme work on this front

This commit was SVN r2779.
2004-09-20 18:25:00 +00:00
George Bosilca
efc09dfc94 increase timeout
This commit was SVN r2778.
2004-09-20 17:29:29 +00:00
Ralph Castain
0d4e6482cd Continuing the cleanup process. Few minor fixes here and there - mostly just NULLing pointers that were free'd. Console now can connect to any universe, regardless of scope.
This commit was SVN r2734.
2004-09-17 00:59:14 +00:00
Ralph Castain
f6dc129754 Allow mpirun2 and mpi_init to cleanly detect and join an existing universe. Will continue testing to quickly move away from a non-responsive existing universe.
This commit was SVN r2729.
2004-09-16 19:45:32 +00:00
Tim Woodall
3b3855fc23 move this out of rte_finalize - call explicitly from mpi_finalize
This commit was SVN r2706.
2004-09-16 09:25:27 +00:00
Ralph Castain
8f9b399b6d This is a checkpoint of some minor changes made in sequencing the startup - mainly to ensure that those helping me track a bug in mpirun2 are operating from an identical code base.
For everyone else, this is transparent.

This commit was SVN r2693.
2004-09-15 22:50:34 +00:00
Ralph Castain
70dae461e4 MPI_Init will now detect and join a persistent universe - hooray! Fixed the session_dir cleanup process so it is kinder to the universe-setup file (i.e., leaves it alone), thus allowing persistent universes to retain their contact info on the session_dir tree. Adjusted mpirun2, ompid, and ompiconsole accordingly.
Put some error protection in ompi_rte_monitor.

This commit was SVN r2678.
2004-09-15 16:33:36 +00:00
Ralph Castain
5f25433bd3 Hmmm...apparently this change required by my Linux friends. Worked fine without on the Mac, but what they heck?
This commit was SVN r2664.
2004-09-14 14:50:43 +00:00
Ralph Castain
069682e046 A bunch of minor changes, mostly adding diagnostics. Just wanted to checkpoint so I can start fresh since there now seem to be problems in the tree with mpirun2.
Fixed ompid so it reissues the non-blocking receive - should now be close to ready for primetime. Fixed some logic in the svc framework that wasn't checking properly for action flags. 

This commit was SVN r2660.
2004-09-14 14:21:04 +00:00
Ralph Castain
a14ee7eb48 Checkpoint the console and daemon.
Folks - there appears to be something unreliable about communication with the daemon at the moment. We are trying to track it down. Meantime, please be patient if experimenting with it.

This commit was SVN r2633.
2004-09-13 16:51:53 +00:00
Ralph Castain
57ceb5225e Workaround the mca_oob_ping problem by doing rapid multiple checks - works just fine.
We now have the ability to generate and join a persistent universe. You can create one in two ways:

(a) issue the "openmpi" command. This will fork/exec a seed daemon on your local host. You can specify a universe name or else it will just use the default.

(b) issue the "ompid -seed" command. Starts the seed up directly. Takes all the same options as openmpi.

I will be adjusting mpirun2 and mpi_init to allow connection to existing persistent universes, but they don't do it right now. The ompiconsole program simply issues an exit command to the persistent universe, so you can use it to shut the universe down if you like (or a kill -9  - works too).

This commit was SVN r2629.
2004-09-13 14:14:00 +00:00
Ralph Castain
e2a5ba5783 Just save this info for now - will use it later. File not included in the makefile.
This commit was SVN r2622.
2004-09-13 01:59:29 +00:00
Ralph Castain
55a2576f01 Update some basic functions, mostly with diagnostics.
This commit was SVN r2620.
2004-09-13 01:25:25 +00:00
Ralph Castain
c6cbe33d50 Some of these didn't really change - I was just in/out of them for diagnostics while chasing a bug. Got caught by my good buddy Tim again :) on his parse_contact_info function, which requires that the space for the answer be allocated in advance. Sigh. Anyway, mpirun2 now works again. My apologies if you tried it in the last few hours and found it didn't.
Also removed the mpirun3 directory since we are basically dragging mpirun2 along with us - no need to create a new version after all.

Made a few changes to the universe info structure, eliminating the "webserver" and "socket" fields since we will do those contacts through the oob channel. Also changed the "silent_mode" field to "console" since silent mode is the default - the flag needs to tell you to turn the console on, not off.

Parse environ function now gets the ns and gpr replica contact info and loads it in the proper places to hand it off to the respective components, thus allowing me to check connection to them as part of determining if the named universe already exists. Changed the local_universe_exists function accordingly and gave it a new name (since the replicas may not be local). This name will shortly be changed to "ompi_rte_join_universe" as I complete the logic for doing that function.

Please let me know if you see any problems. I successfully ran some trivial multi-process functions in both mpirun2 and singleton modes, and ran the seed daemon as well, so I think it should all be okay.

This commit was SVN r2611.
2004-09-11 12:56:52 +00:00
Ralph Castain
0071f032fe Update to the rte that ensures the defaults are properly set - in some cases, they weren't being set, leading to unexpecgted behavior when certain environmental variables weren't set. Added some more diagnostic messages to the registry.
Succeeded in contacting an existing persistent universe, and connecting!!! Thanks to Tim for the "ping" function.

First cut at a console - all it does right now is tell the universe to "die", but at least comm is being established.

This commit was SVN r2607.
2004-09-11 02:51:32 +00:00