1
1
Граф коммитов

220 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
4c59058053 - Add some logic to configure to make a version of CFLAGS that doesn't
include any optimization flags
- Use these flags to always compile ompi/debuggers/* and orterun so
  that parallel debuggers (such as Totalview) can always see the
  debugging symbols (see comments in ompi/debuggers/Makefile.am and
  orte/tools/orterun/Makefile.am)
- Remove some obsolete LAM-named variables from configure.ac

This commit was SVN r7125.
2005-09-01 10:37:20 +00:00
Ralph Castain
96f4bb7a63 Hey, sports fans!! Guess what??
Here's the huge registry check-in you've all been waiting for with baited breath. The revised version sends a single message to all processes at the various stage gates, thus making the startup much more scalable. I could provide you with all the tawdry details, but won't for now - you are welcome to ask, though, and I'll merrily bore your ears to tears.

In addition, the commit contains the following:

1. set the ignore properties on ompi/debuggers and orte/mca/pls/poe

2. Added simplified subscribe and put functions to the registry's API. I have also converted all of the ompi functions that registered subscriptions to the new API, and caught their associated put's as well.

In a follow-on commit, I'll be adding support for George's hetero arch registry subscription (wanted to get this one in first).

This commit was SVN r7118.
2005-09-01 01:07:30 +00:00
Tim Woodall
d7ff284888 correct selection logic
This commit was SVN r7116.
2005-08-31 21:51:52 +00:00
Tim Woodall
33eabd500c close/re-init soh
This commit was SVN r7115.
2005-08-31 21:51:10 +00:00
Tim Woodall
35f96af472 non-destructive read of buffer
This commit was SVN r7114.
2005-08-31 21:21:54 +00:00
David Daniel
c6054662d5 Forgot to add new header to sources
This commit was SVN r7109.
2005-08-31 16:21:58 +00:00
David Daniel
a5eff8fc78 A little more clean-up. TotalView now works with --enable-debug build.
Tested with:
pls = rsh
totalview.6.6.0-2
Linux cadillac82.ccstar.lanl.gov 2.4.24 #1 SMP Thu Jul 1 15:28:04 MDT
2004 i686 i686 i386 GNU/Linux

This commit was SVN r7108.
2005-08-31 16:15:59 +00:00
Jeff Squyres
284328afe3 Add missing .h file so that it is included in the tarball
This commit was SVN r7107.
2005-08-31 11:01:28 +00:00
Rainer Keller
27f1174d0e - Only return the nodes actually allocated to the job.
(necessary when orted handles several jobs simultaneously).

This commit was SVN r7105.
2005-08-31 07:09:47 +00:00
George Bosilca
d64a702a5b There is a missing header. --enable-picky help to track down such kind of errors.
This commit was SVN r7102.
2005-08-31 00:47:52 +00:00
David Daniel
995641c1e6 Don't initialize proctable more than once (since the stage gate 1 trigger
seems to get fired at least twice).

This commit was SVN r7101.
2005-08-31 00:21:55 +00:00
David Daniel
ced11250e4 Basic totalview support for orterun. Close to working, but need to
check hostnames are obtained correctly.

This commit was SVN r7096.
2005-08-30 17:29:43 +00:00
George Bosilca
53ccf0e58c POE is working. It can spawn jobs, redirect the output and is able to kill the job (with or without CTRL_C).
This commit was SVN r7093.
2005-08-30 16:13:55 +00:00
David Daniel
6cb97e6ade Reverting totalview support to *not* use the as yet unimplemented
orte_jobgrp_t.  Now just need to work out where to call it...

This commit was SVN r7092.
2005-08-30 12:59:04 +00:00
Rainer Keller
d7901c97a5 - Del whitespaces, to make coming patch smaller.
This commit was SVN r7089.
2005-08-30 06:58:37 +00:00
Brian Barrett
77ebdf1c6f * Add some debugging output Ralph asked for when an unknown error code is
passed to opal_error

This commit was SVN r7087.
2005-08-29 23:36:53 +00:00
Brian Barrett
d8e5d80892 * add a reasonable first wack at a suppressions file for Valgrind to ignore
some stuff that we can't do anything about
* fix some more memory leaks in session_dir code

This commit was SVN r7086.
2005-08-29 23:05:52 +00:00
Brian Barrett
bf8a3632bb * bunch more memory leak / block in use fixes
This commit was SVN r7085.
2005-08-29 21:35:01 +00:00
Jeff Squyres
7d895a4f08 Add missing header file
This commit was SVN r7071.
2005-08-28 11:50:43 +00:00
Brian Barrett
fc71fd5744 * fix place where Jeff changed an exit to a return and we really wanted
it to be an exit.
* Put the srun process (or what is about to become the srun process) in
  it's own process group so that group-wide signals (such as the 
  SIGINT sent by hitting cntl-c in a shell) are not sent to the srun
  process. 

This commit was SVN r7068.
2005-08-27 17:08:48 +00:00
George Bosilca
5b59ffbe4f Handle multiple IP addresses for the OOB TCP module. We check the addresses in order, and we give up if
and only if all of them failed.

This commit was SVN r7067.
2005-08-27 17:03:19 +00:00
Jeff Squyres
774f879a41 Oops -- add second string in there because we added a second %s to the
help message.

This commit was SVN r7064.
2005-08-27 13:32:25 +00:00
Jeff Squyres
27554c19d7 Add missing .h file
This commit was SVN r7062.
2005-08-27 11:01:44 +00:00
Brian Barrett
2143ed4c81 * move error -> string converter registration from orte_init to
orte_init_stage1(), since not all ORTE processes call orte_init().
* Expad opal_error test case to make sure ORTE error codes print
  properly
* Make project error codes start at easy values (OPAL is -1 to -100,
  ORTE is -101 to -200, OMPI is less than -201) to make it easier
  to figure out what an error code as an integer means.  Also has
  the nice property of not changing the values of error codes ever
  time a new error code is added.

This commit was SVN r7061.
2005-08-26 23:36:57 +00:00
Jeff Squyres
c9cdb36b0b Finally get this right: move orte_sys_info.[ch] back into the orte
tree.
- fix up #include's throughout the tree (yay contrib/search_replace.pl!)
- remove a few extraneous #include's
- remove orte_sys_info*() from opal_init()/opal_finalize() (it's
  already in orte_init_stage1() and orte_system_finalize())
- remove dependencies in opal on orte_system_info -- util/os_path.c
  and util/os_create_dirpath.c (they only used path_sep, anyway --
  easily changed to #defines)

This commit was SVN r7059.
2005-08-26 21:03:41 +00:00
Jeff Squyres
b3bd549331 - Change a few calls from exit() to orte_abort() so that we get
session directory cleanup (among other things)
- When we get an abnormal exit in orterun (i.e., timeout expires and
  we haven't gotten termination notices from all processes), print a
  better message an exit in a better way (which includes session
  directory cleanup)
- Fix tm and poe pls's to not exit() but rather propagate the error up
  the stack (where relevant)

This commit was SVN r7058.
2005-08-26 20:36:11 +00:00
Josh Hursey
4eefb33182 Some param changes:
- Change orte_base_infrastructre to orte_infrastructre to conform with 
  ompi_info's needs
- Move MCA Param registration in ORTE to a centralized function that is 
  called first in orte_init_stage1
- Set the infrastructre flag as an argument to orte_init
- Adjust initalization functions to properly pass down the infrastructre
  flag.

This commit was SVN r7053.
2005-08-26 20:13:35 +00:00
Josh Hursey
3e18fa4555 Insert some signal handlers so orted properly cleans up after it self
when given a kill signal.

This commit was SVN r7050.
2005-08-26 18:56:08 +00:00
Jeff Squyres
b306adf349 The SLURM components are now open for business!
This commit was SVN r7046.
2005-08-26 14:43:18 +00:00
Brian Barrett
17c1bb355e * more memory leak fixes - mainly string params not being freed at end of
time
* Added code to free dps structures at shutdown

This commit was SVN r7043.
2005-08-26 02:08:23 +00:00
Brian Barrett
3e8740e740 * mostly working SLURM component. Had to add a sds for the daemons so that
we could vector launch the daemons and still have the nodenames fixed 
  up in the end

This commit was SVN r7041.
2005-08-25 22:29:23 +00:00
Josh Hursey
7bf744a624 Convert to use new param_reg interface.
Also check to see if infrastructre flag was previously set before assuming it
to be false. This was causing orterun to operate incorrectly in the presence
of a persistant daemon.

This commit was SVN r7039.
2005-08-25 19:13:22 +00:00
Jeff Squyres
524ded4896 A little cleanup and progress:
- build a proper srun argv
- launch the srun
- still have several "JMS" comments that need to be addressed

This commit was SVN r7036.
2005-08-25 16:38:42 +00:00
Jeff Squyres
d5909421a9 Register the priority param in open so that ompi_info can see it
This commit was SVN r7034.
2005-08-25 16:37:24 +00:00
Jeff Squyres
1649c7e855 Find out from SLURM how many slots per node we have
This commit was SVN r7031.
2005-08-25 15:51:58 +00:00
Rainer Keller
f52784bad3 - Just changes to comments, deletion of spaces to make diff smaller
This commit was SVN r7030.
2005-08-25 15:42:41 +00:00
Jeff Squyres
d0e847d1ed Allow oversubscription
This commit was SVN r7027.
2005-08-25 11:02:49 +00:00
Jeff Squyres
a6dd3537f1 Minor fixes.
This commit was SVN r7026.
2005-08-25 02:59:55 +00:00
Jeff Squyres
4d49340421 - Update header file convention
- Use new pls base function for adding orted debug argv (or not)

This commit was SVN r7020.
2005-08-24 22:20:51 +00:00
Jeff Squyres
f20bd3205d Add a utility function that is common to several pls's.
This commit was SVN r7019.
2005-08-24 22:20:05 +00:00
Jeff Squyres
9755a7f7fa First cut -- not working yet -- checkpointing to move to another
machine.

This commit was SVN r7018.
2005-08-24 22:19:48 +00:00
Jeff Squyres
072a59cc02 Properly register the MCA param during the open call
This commit was SVN r7014.
2005-08-24 20:50:26 +00:00
Brian Barrett
918f48ce52 * remove out dated comment
This commit was SVN r7010.
2005-08-24 20:19:58 +00:00
Jeff Squyres
28f716542e First cut of the SLURM ras. Seems to be working! Now need to write
SLURM pls... 

This commit was SVN r7008.
2005-08-24 19:15:11 +00:00
Jeff Squyres
018504480a - Update svn:ignore
- Update to new MCA param API
- Update to new #include format

This commit was SVN r7007.
2005-08-24 18:37:28 +00:00
Jeff Squyres
72d2abe72e Remove some outdated comments
This commit was SVN r7006.
2005-08-24 18:30:09 +00:00
Jeff Squyres
9ee4c6de17 Remove silly compiler warning.
This commit was SVN r6998.
2005-08-24 10:25:53 +00:00
Ralph Castain
5d7e5b17e0 Add these two functions so I don't have to keep adding them when I transfer diff's around.
NOTE: These have NOT been added to the Makefile.am in the repository. Please do NOT add them at this time - I will do so later.

This commit was SVN r6979.
2005-08-23 03:23:53 +00:00
Rainer Keller
f0c2f78dd4 - Another one, just missed.
This commit was SVN r6976.
2005-08-22 18:12:05 +00:00
Rainer Keller
1ac8c75965 - Nothing of interest: Fixed comments, indentation...
To get a clear view on the next patch.

This commit was SVN r6975.
2005-08-22 18:02:10 +00:00