1
1
Граф коммитов

179 Коммитов

Автор SHA1 Сообщение Дата
Josh Hursey
78da530fd2 Fix a bug that Tim highlighted in which orted coredumps when an orterun is
CTRL-C'd. 
We were calling orte_finalize recursively which caused a segv when it tried to 
use a freed framework (orte_rmgr in this case).

I added a status flag to orte_universe_info to indicate where we are in the code.
This was needed to determine if we should call orte_abort or not when shutting
down in the tcp oob.

This commit was SVN r7160.
2005-09-02 21:07:21 +00:00
Ralph Castain
0f797fd40b Few more cleanups
This commit was SVN r7159.
2005-09-02 21:01:59 +00:00
Ralph Castain
4ed7752681 Continue cleaning up memory leaks during launch
This commit was SVN r7158.
2005-09-02 20:41:24 +00:00
Ralph Castain
f352890732 Cleaning up memory leaks for proxy operations.
This commit was SVN r7157.
2005-09-02 19:26:21 +00:00
Ralph Castain
4bd25e0292 Few minor memory leak cleanups
This commit was SVN r7156.
2005-09-02 18:50:01 +00:00
Jeff Squyres
a7fbb0f95e Put in comments about why these assignments exist
This commit was SVN r7146.
2005-09-02 10:27:23 +00:00
Jeff Squyres
7e4f696501 Fix silly compiler warnings
This commit was SVN r7145.
2005-09-02 10:26:41 +00:00
Ralph Castain
66a215eae1 More memory cleanup...
1. Valgrind is good for something - chasing down memory leaks in registry led me to re-visit the dictionary functions and discover that I wasn't keeping track of the number of dictionary entries on each segment! Resulted in wasted time searching blank entries as well as leaked memory. This has now been fixed.

2. Fixed the orte_bitmap test. The init function for that class has been eliminated and the constructor adjusted to provide that functionality.

This commit was SVN r7136.
2005-09-02 00:26:58 +00:00
Ralph Castain
76e622a552 Clean up a few memory leaks - more to go...
This commit was SVN r7134.
2005-09-01 17:38:04 +00:00
Ralph Castain
cb128ab87b Be a little more friendly and tell the user we couldn't reach their specified universe... :-)
This commit was SVN r7132.
2005-09-01 15:52:37 +00:00
Ralph Castain
d0f7dafc47 Revise the universe connection logic. Two cases are now handled:
1. user does NOT specify the universe name. For the default universe case, if we detect an existing default universe and cannot connect to it, we quietly create an alternative default name by adding the pid to the orte_default_universe name and move on - we no longer provide a warning message for this case.

2. user specified a universe name. If we detect an existing universe of that name and cannot connect to it, we consider this an error condition and abort.

This commit was SVN r7131.
2005-09-01 15:50:38 +00:00
Ralph Castain
03e45e6723 Two quick additions:
1. Added OMPI_PROC_ARCH as a defined registry key and added the code so that the architecture info gets properly transmitted across all processes using the startup message.

2. Added an OMPI_MODEX_KEY definition and removed the hard-coded "modex" key from pml_modex_exchange

This commit was SVN r7129.
2005-09-01 15:05:03 +00:00
Jeff Squyres
3962c53e2e - Add to AM_CPPFLAGS $(OPAL_LTDL_CPPFLAGS) where necessary in order to
add a -I to find the included ltdl.h (vs. a system-installed ltdl.h)
- Clean up kruft in a bunch of Makefile.am's to remove now-unnecessary
  AM_CPPFLAGS settings to get static-components.h for each framework
- Move the component_repository API functions out of opal/mca/base/base.h
  and into opal/mca/base/mca_base_component_repository.h in order to
  decrease unnecessary dependencies (e.g., before this, almost
  everything in the tree depended on ltdl.h, which is unnecessary --
  only a small number of files really need ltdl.h)

This commit was SVN r7127.
2005-09-01 12:16:36 +00:00
Ralph Castain
96f4bb7a63 Hey, sports fans!! Guess what??
Here's the huge registry check-in you've all been waiting for with baited breath. The revised version sends a single message to all processes at the various stage gates, thus making the startup much more scalable. I could provide you with all the tawdry details, but won't for now - you are welcome to ask, though, and I'll merrily bore your ears to tears.

In addition, the commit contains the following:

1. set the ignore properties on ompi/debuggers and orte/mca/pls/poe

2. Added simplified subscribe and put functions to the registry's API. I have also converted all of the ompi functions that registered subscriptions to the new API, and caught their associated put's as well.

In a follow-on commit, I'll be adding support for George's hetero arch registry subscription (wanted to get this one in first).

This commit was SVN r7118.
2005-09-01 01:07:30 +00:00
Tim Woodall
d7ff284888 correct selection logic
This commit was SVN r7116.
2005-08-31 21:51:52 +00:00
Tim Woodall
35f96af472 non-destructive read of buffer
This commit was SVN r7114.
2005-08-31 21:21:54 +00:00
Rainer Keller
27f1174d0e - Only return the nodes actually allocated to the job.
(necessary when orted handles several jobs simultaneously).

This commit was SVN r7105.
2005-08-31 07:09:47 +00:00
George Bosilca
53ccf0e58c POE is working. It can spawn jobs, redirect the output and is able to kill the job (with or without CTRL_C).
This commit was SVN r7093.
2005-08-30 16:13:55 +00:00
Rainer Keller
d7901c97a5 - Del whitespaces, to make coming patch smaller.
This commit was SVN r7089.
2005-08-30 06:58:37 +00:00
Brian Barrett
bf8a3632bb * bunch more memory leak / block in use fixes
This commit was SVN r7085.
2005-08-29 21:35:01 +00:00
Jeff Squyres
7d895a4f08 Add missing header file
This commit was SVN r7071.
2005-08-28 11:50:43 +00:00
Brian Barrett
fc71fd5744 * fix place where Jeff changed an exit to a return and we really wanted
it to be an exit.
* Put the srun process (or what is about to become the srun process) in
  it's own process group so that group-wide signals (such as the 
  SIGINT sent by hitting cntl-c in a shell) are not sent to the srun
  process. 

This commit was SVN r7068.
2005-08-27 17:08:48 +00:00
George Bosilca
5b59ffbe4f Handle multiple IP addresses for the OOB TCP module. We check the addresses in order, and we give up if
and only if all of them failed.

This commit was SVN r7067.
2005-08-27 17:03:19 +00:00
Jeff Squyres
27554c19d7 Add missing .h file
This commit was SVN r7062.
2005-08-27 11:01:44 +00:00
Jeff Squyres
c9cdb36b0b Finally get this right: move orte_sys_info.[ch] back into the orte
tree.
- fix up #include's throughout the tree (yay contrib/search_replace.pl!)
- remove a few extraneous #include's
- remove orte_sys_info*() from opal_init()/opal_finalize() (it's
  already in orte_init_stage1() and orte_system_finalize())
- remove dependencies in opal on orte_system_info -- util/os_path.c
  and util/os_create_dirpath.c (they only used path_sep, anyway --
  easily changed to #defines)

This commit was SVN r7059.
2005-08-26 21:03:41 +00:00
Jeff Squyres
b3bd549331 - Change a few calls from exit() to orte_abort() so that we get
session directory cleanup (among other things)
- When we get an abnormal exit in orterun (i.e., timeout expires and
  we haven't gotten termination notices from all processes), print a
  better message an exit in a better way (which includes session
  directory cleanup)
- Fix tm and poe pls's to not exit() but rather propagate the error up
  the stack (where relevant)

This commit was SVN r7058.
2005-08-26 20:36:11 +00:00
Josh Hursey
4eefb33182 Some param changes:
- Change orte_base_infrastructre to orte_infrastructre to conform with 
  ompi_info's needs
- Move MCA Param registration in ORTE to a centralized function that is 
  called first in orte_init_stage1
- Set the infrastructre flag as an argument to orte_init
- Adjust initalization functions to properly pass down the infrastructre
  flag.

This commit was SVN r7053.
2005-08-26 20:13:35 +00:00
Jeff Squyres
b306adf349 The SLURM components are now open for business!
This commit was SVN r7046.
2005-08-26 14:43:18 +00:00
Brian Barrett
17c1bb355e * more memory leak fixes - mainly string params not being freed at end of
time
* Added code to free dps structures at shutdown

This commit was SVN r7043.
2005-08-26 02:08:23 +00:00
Brian Barrett
3e8740e740 * mostly working SLURM component. Had to add a sds for the daemons so that
we could vector launch the daemons and still have the nodenames fixed 
  up in the end

This commit was SVN r7041.
2005-08-25 22:29:23 +00:00
Josh Hursey
7bf744a624 Convert to use new param_reg interface.
Also check to see if infrastructre flag was previously set before assuming it
to be false. This was causing orterun to operate incorrectly in the presence
of a persistant daemon.

This commit was SVN r7039.
2005-08-25 19:13:22 +00:00
Jeff Squyres
524ded4896 A little cleanup and progress:
- build a proper srun argv
- launch the srun
- still have several "JMS" comments that need to be addressed

This commit was SVN r7036.
2005-08-25 16:38:42 +00:00
Jeff Squyres
d5909421a9 Register the priority param in open so that ompi_info can see it
This commit was SVN r7034.
2005-08-25 16:37:24 +00:00
Jeff Squyres
1649c7e855 Find out from SLURM how many slots per node we have
This commit was SVN r7031.
2005-08-25 15:51:58 +00:00
Rainer Keller
f52784bad3 - Just changes to comments, deletion of spaces to make diff smaller
This commit was SVN r7030.
2005-08-25 15:42:41 +00:00
Jeff Squyres
d0e847d1ed Allow oversubscription
This commit was SVN r7027.
2005-08-25 11:02:49 +00:00
Jeff Squyres
a6dd3537f1 Minor fixes.
This commit was SVN r7026.
2005-08-25 02:59:55 +00:00
Jeff Squyres
4d49340421 - Update header file convention
- Use new pls base function for adding orted debug argv (or not)

This commit was SVN r7020.
2005-08-24 22:20:51 +00:00
Jeff Squyres
f20bd3205d Add a utility function that is common to several pls's.
This commit was SVN r7019.
2005-08-24 22:20:05 +00:00
Jeff Squyres
9755a7f7fa First cut -- not working yet -- checkpointing to move to another
machine.

This commit was SVN r7018.
2005-08-24 22:19:48 +00:00
Jeff Squyres
072a59cc02 Properly register the MCA param during the open call
This commit was SVN r7014.
2005-08-24 20:50:26 +00:00
Brian Barrett
918f48ce52 * remove out dated comment
This commit was SVN r7010.
2005-08-24 20:19:58 +00:00
Jeff Squyres
28f716542e First cut of the SLURM ras. Seems to be working! Now need to write
SLURM pls... 

This commit was SVN r7008.
2005-08-24 19:15:11 +00:00
Jeff Squyres
018504480a - Update svn:ignore
- Update to new MCA param API
- Update to new #include format

This commit was SVN r7007.
2005-08-24 18:37:28 +00:00
Jeff Squyres
72d2abe72e Remove some outdated comments
This commit was SVN r7006.
2005-08-24 18:30:09 +00:00
Jeff Squyres
9ee4c6de17 Remove silly compiler warning.
This commit was SVN r6998.
2005-08-24 10:25:53 +00:00
Ralph Castain
5d7e5b17e0 Add these two functions so I don't have to keep adding them when I transfer diff's around.
NOTE: These have NOT been added to the Makefile.am in the repository. Please do NOT add them at this time - I will do so later.

This commit was SVN r6979.
2005-08-23 03:23:53 +00:00
Rainer Keller
f0c2f78dd4 - Another one, just missed.
This commit was SVN r6976.
2005-08-22 18:12:05 +00:00
Rainer Keller
1ac8c75965 - Nothing of interest: Fixed comments, indentation...
To get a clear view on the next patch.

This commit was SVN r6975.
2005-08-22 18:02:10 +00:00
Brian Barrett
f48968d8f4 clean up the error code situation - ensure that OMPI_ERROR == ORTE_ERROR ==
OPAL_ERROR, same for all the other error codes.  Also, make sure that there
are never conflicts between OPAL anr ORTE error codes (for example).
Finally, provide opal_perror(), opal_strerror(), and opal_strerror_r() to
give stringified error messages for the different error codes

This commit was SVN r6969.
2005-08-22 03:05:39 +00:00