1
1
Граф коммитов

5867 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
88cd561198 * bunch of fixes for Red Storm - missing header files and the like
This commit was SVN r7325.
2005-09-12 21:45:58 +00:00
Josh Hursey
7d403cf1b4 whoops forgot a bit of debug. ;(
This commit was SVN r7313.
2005-09-12 16:58:10 +00:00
Josh Hursey
e1abd9a6e4 Added a couple of missing initalizations that caused orteprobe to
segv when in opal_finalize

This commit was SVN r7311.
2005-09-12 16:52:58 +00:00
George Bosilca
ca100f7c5d Finally ... the command was a uint16_t but was packed as an int8_t. That's always ZERO on a big endian
machine.

Anyway, strange enough the modification of this file trigger a recompiliation of nearly everything
in Open MPI !!!

This commit was SVN r7307.
2005-09-12 09:43:34 +00:00
George Bosilca
361ff6640f Correct the progress thread function name from opal_progress_thread to opal_event_progress_thread.
This commit was SVN r7300.
2005-09-11 20:04:40 +00:00
Jeff Squyres
a107ab3897 Add missing header file
This commit was SVN r7294.
2005-09-11 10:14:29 +00:00
Brian Barrett
1fcf18c211 non-persistent signal behavior isn't quite right, so use the proper SIGNAL
macros and deregister at the appropriate time.

This commit was SVN r7293.
2005-09-10 23:22:37 +00:00
Jeff Squyres
31ea80cd0b Simplify the poe configure.m4 -- if we're on AIX, always build it. If
we're not on AIX, don't build it.

This commit was SVN r7291.
2005-09-10 10:40:26 +00:00
Jeff Squyres
7e4c23d88c Add missing header file
This commit was SVN r7290.
2005-09-10 10:39:15 +00:00
Rainer Keller
3c639efa38 - Silly cleanup
This commit was SVN r7289.
2005-09-10 08:01:47 +00:00
Rainer Keller
5fed46e072 - Allow usernames to be specified in the hostfile.
The following formats are parsed:
    user@IPv4
    user@fqdn
    IPv4 or fqdn [username|user-name|user_name]=user
- Try a better error-detection when parsing (recognize wrong
  IPs, fqdns...)

This commit was SVN r7288.
2005-09-10 07:57:50 +00:00
Ralph Castain
76ccec0cee Upgrade the new opal trace system to utilize verbosity. Begin building the trace command into the ORTE system.
This commit was SVN r7267.
2005-09-09 18:27:17 +00:00
Jeff Squyres
0b255830e0 Slight change to the defaults -- make the prefix "output-pid<pid>-" so
that multiple processes don't overwrite each other.  Change that
default in orte_init_stage1() to just "output-" (because the file will
be in a process-unique directory at that point; the pid is no longer
necessary).

This commit was SVN r7256.
2005-09-09 12:37:39 +00:00
Jeff Squyres
7eadfc4bdc Add a new hook function into the opal_output system:
opal_output_set_output_file_info().  This allows getting and setting
the default directory where output stream files will be opened (for
all *new* streams).  Before this function is not invoked, the default
location is $TMPDIR or $HOME (if $TMPDIR is not defined).  

Added a call into orte_init_stage1() to call this function
immediately after the session directory is created and set the default
location of stream files to be the process' session directory.

This commit was SVN r7254.
2005-09-09 12:18:39 +00:00
Jeff Squyres
d19e5b4af8 Remove unused variable
This commit was SVN r7250.
2005-09-09 10:11:46 +00:00
Ralph Castain
7fbe575edd Make sure rc is initialized.
This commit was SVN r7233.
2005-09-08 13:20:38 +00:00
Thara Angskun
e3feddfdd4 Move from stone aged configure.stub to configure.m4
This commit was SVN r7231.
2005-09-08 09:49:21 +00:00
Ralph Castain
2c6e47e38c Add a trace utility that provides info on progress through functions. This is not enabled yet - need Jeff or Brian to add it to the configure/build system.
This commit was SVN r7222.
2005-09-07 18:52:28 +00:00
George Bosilca
f13690f16e The prototype of ompi_help has been changed.
This commit was SVN r7218.
2005-09-07 17:15:00 +00:00
Brian Barrett
ed56e743b7 * update configure.ac to use the modern version of AC_INIT and
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
  number to be set at autoconf time (instead of at configure time, as
  it was before).  Set the version number, minus the subversion r number,
  at autoconf time.  Override the internal variables to include the r
  number (if needed) at configure time.  Basically, the right thing
  should always happen.  The only place it might not is the version
  reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
  them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
  in the directory containing source files, even if the Makefile.am is
  in another directory.  This should start making it feasible to
  reduce the number of Makefile.am files we have in the tree, which
  will greatly reduce the time to run autogen and configure.

This commit was SVN r7211.
2005-09-07 05:54:53 +00:00
Josh Hursey
a5e5924217 Added a custom arguments MCA param for Slurm PLS.
This allows the user to specify certain options to srun when an application
is launched with this PLS.

A useful example is the need to set the time to wait from when the first
process completes and when slurm kills remaining processes:

  pls_slurm_args=--wait=1200

This commit was SVN r7206.
2005-09-06 21:52:28 +00:00
Jeff Squyres
c5dc8762a2 Remove useless SUBDIRS line
This commit was SVN r7203.
2005-09-06 21:31:50 +00:00
Jeff Squyres
383d9f58e7 Be [slightly] more descriptive. :-)
This commit was SVN r7198.
2005-09-06 16:57:11 +00:00
Ralph Castain
47bf2574e1 Ensure that subscriptions for a specific requestor/subscription return id only get registered once. It appears that sometimes the system registers a subscription for the same return location multiple times. This prevents getting multiple callbacks when that happens. Still need to track down why it is happening at all.
This commit was SVN r7197.
2005-09-06 16:33:41 +00:00
Rainer Keller
a13d513bbe - Rename of unused define, sitting in tree.
This commit was SVN r7196.
2005-09-06 16:17:08 +00:00
Rainer Keller
a36347d728 - Support -prefix specification on mpirun/orterun cmd-line per
app_context:
  mpirun -np 2 -prefix /path/to/ompi/on/machineA ./exec1 : \
         -np 2 -prefix /path/to/ompi/on/machineB ./exec2

- Allow with -mca pls_rsh_assume_same_shell 0, the checking for the
  SHELL-variable on the actual node (currently 1st node).
  Sets the prefix, PATH and LD_LIBRARY_PATH for bash/ksh and 
  csh/tcsh.

This commit was SVN r7195.
2005-09-06 16:10:05 +00:00
Ralph Castain
7fc67f57a5 Little logic cleanup and handle thread locking correctly.
This commit was SVN r7192.
2005-09-06 14:04:43 +00:00
George Bosilca
648ef2ae5c One of the latest gcc version bark about a variable being use uninitialized. It was kind of right, because the
variable was protected by another one ... But with few modifications I get rid of this warning.

This commit was SVN r7189.
2005-09-06 03:13:03 +00:00
Brian Barrett
4e20c83204 * fix memory leak in ORTE startup
This commit was SVN r7186.
2005-09-05 21:03:02 +00:00
Jeff Squyres
3645816aef - Add copyrights
- Minor style fixes

This commit was SVN r7184.
2005-09-05 18:51:59 +00:00
Rainer Keller
588a62cb90 - Missed file in last commit
This commit was SVN r7179.
2005-09-04 20:55:27 +00:00
Rainer Keller
192625d2a1 - Once again: uninteresting cleanup to get diff smaller.
This commit was SVN r7178.
2005-09-04 20:54:19 +00:00
Ralph Castain
12daecb826 More cleanup
This commit was SVN r7167.
2005-09-03 01:22:11 +00:00
Ralph Castain
4b5b3b4164 Properly handle the argv array and clean it up when done.
This commit was SVN r7166.
2005-09-03 00:15:21 +00:00
Josh Hursey
78da530fd2 Fix a bug that Tim highlighted in which orted coredumps when an orterun is
CTRL-C'd. 
We were calling orte_finalize recursively which caused a segv when it tried to 
use a freed framework (orte_rmgr in this case).

I added a status flag to orte_universe_info to indicate where we are in the code.
This was needed to determine if we should call orte_abort or not when shutting
down in the tcp oob.

This commit was SVN r7160.
2005-09-02 21:07:21 +00:00
Ralph Castain
0f797fd40b Few more cleanups
This commit was SVN r7159.
2005-09-02 21:01:59 +00:00
Ralph Castain
4ed7752681 Continue cleaning up memory leaks during launch
This commit was SVN r7158.
2005-09-02 20:41:24 +00:00
Ralph Castain
f352890732 Cleaning up memory leaks for proxy operations.
This commit was SVN r7157.
2005-09-02 19:26:21 +00:00
Ralph Castain
4bd25e0292 Few minor memory leak cleanups
This commit was SVN r7156.
2005-09-02 18:50:01 +00:00
Jeff Squyres
a7fbb0f95e Put in comments about why these assignments exist
This commit was SVN r7146.
2005-09-02 10:27:23 +00:00
Jeff Squyres
7e4f696501 Fix silly compiler warnings
This commit was SVN r7145.
2005-09-02 10:26:41 +00:00
Ralph Castain
66a215eae1 More memory cleanup...
1. Valgrind is good for something - chasing down memory leaks in registry led me to re-visit the dictionary functions and discover that I wasn't keeping track of the number of dictionary entries on each segment! Resulted in wasted time searching blank entries as well as leaked memory. This has now been fixed.

2. Fixed the orte_bitmap test. The init function for that class has been eliminated and the constructor adjusted to provide that functionality.

This commit was SVN r7136.
2005-09-02 00:26:58 +00:00
Ralph Castain
76e622a552 Clean up a few memory leaks - more to go...
This commit was SVN r7134.
2005-09-01 17:38:04 +00:00
Ralph Castain
cb128ab87b Be a little more friendly and tell the user we couldn't reach their specified universe... :-)
This commit was SVN r7132.
2005-09-01 15:52:37 +00:00
Ralph Castain
d0f7dafc47 Revise the universe connection logic. Two cases are now handled:
1. user does NOT specify the universe name. For the default universe case, if we detect an existing default universe and cannot connect to it, we quietly create an alternative default name by adding the pid to the orte_default_universe name and move on - we no longer provide a warning message for this case.

2. user specified a universe name. If we detect an existing universe of that name and cannot connect to it, we consider this an error condition and abort.

This commit was SVN r7131.
2005-09-01 15:50:38 +00:00
Ralph Castain
03e45e6723 Two quick additions:
1. Added OMPI_PROC_ARCH as a defined registry key and added the code so that the architecture info gets properly transmitted across all processes using the startup message.

2. Added an OMPI_MODEX_KEY definition and removed the hard-coded "modex" key from pml_modex_exchange

This commit was SVN r7129.
2005-09-01 15:05:03 +00:00
Jeff Squyres
3962c53e2e - Add to AM_CPPFLAGS $(OPAL_LTDL_CPPFLAGS) where necessary in order to
add a -I to find the included ltdl.h (vs. a system-installed ltdl.h)
- Clean up kruft in a bunch of Makefile.am's to remove now-unnecessary
  AM_CPPFLAGS settings to get static-components.h for each framework
- Move the component_repository API functions out of opal/mca/base/base.h
  and into opal/mca/base/mca_base_component_repository.h in order to
  decrease unnecessary dependencies (e.g., before this, almost
  everything in the tree depended on ltdl.h, which is unnecessary --
  only a small number of files really need ltdl.h)

This commit was SVN r7127.
2005-09-01 12:16:36 +00:00
Jeff Squyres
4c59058053 - Add some logic to configure to make a version of CFLAGS that doesn't
include any optimization flags
- Use these flags to always compile ompi/debuggers/* and orterun so
  that parallel debuggers (such as Totalview) can always see the
  debugging symbols (see comments in ompi/debuggers/Makefile.am and
  orte/tools/orterun/Makefile.am)
- Remove some obsolete LAM-named variables from configure.ac

This commit was SVN r7125.
2005-09-01 10:37:20 +00:00
Ralph Castain
96f4bb7a63 Hey, sports fans!! Guess what??
Here's the huge registry check-in you've all been waiting for with baited breath. The revised version sends a single message to all processes at the various stage gates, thus making the startup much more scalable. I could provide you with all the tawdry details, but won't for now - you are welcome to ask, though, and I'll merrily bore your ears to tears.

In addition, the commit contains the following:

1. set the ignore properties on ompi/debuggers and orte/mca/pls/poe

2. Added simplified subscribe and put functions to the registry's API. I have also converted all of the ompi functions that registered subscriptions to the new API, and caught their associated put's as well.

In a follow-on commit, I'll be adding support for George's hetero arch registry subscription (wanted to get this one in first).

This commit was SVN r7118.
2005-09-01 01:07:30 +00:00
Tim Woodall
d7ff284888 correct selection logic
This commit was SVN r7116.
2005-08-31 21:51:52 +00:00
Tim Woodall
33eabd500c close/re-init soh
This commit was SVN r7115.
2005-08-31 21:51:10 +00:00
Tim Woodall
35f96af472 non-destructive read of buffer
This commit was SVN r7114.
2005-08-31 21:21:54 +00:00
David Daniel
c6054662d5 Forgot to add new header to sources
This commit was SVN r7109.
2005-08-31 16:21:58 +00:00
David Daniel
a5eff8fc78 A little more clean-up. TotalView now works with --enable-debug build.
Tested with:
pls = rsh
totalview.6.6.0-2
Linux cadillac82.ccstar.lanl.gov 2.4.24 #1 SMP Thu Jul 1 15:28:04 MDT
2004 i686 i686 i386 GNU/Linux

This commit was SVN r7108.
2005-08-31 16:15:59 +00:00
Jeff Squyres
284328afe3 Add missing .h file so that it is included in the tarball
This commit was SVN r7107.
2005-08-31 11:01:28 +00:00
Rainer Keller
27f1174d0e - Only return the nodes actually allocated to the job.
(necessary when orted handles several jobs simultaneously).

This commit was SVN r7105.
2005-08-31 07:09:47 +00:00
George Bosilca
d64a702a5b There is a missing header. --enable-picky help to track down such kind of errors.
This commit was SVN r7102.
2005-08-31 00:47:52 +00:00
David Daniel
995641c1e6 Don't initialize proctable more than once (since the stage gate 1 trigger
seems to get fired at least twice).

This commit was SVN r7101.
2005-08-31 00:21:55 +00:00
David Daniel
ced11250e4 Basic totalview support for orterun. Close to working, but need to
check hostnames are obtained correctly.

This commit was SVN r7096.
2005-08-30 17:29:43 +00:00
George Bosilca
53ccf0e58c POE is working. It can spawn jobs, redirect the output and is able to kill the job (with or without CTRL_C).
This commit was SVN r7093.
2005-08-30 16:13:55 +00:00
David Daniel
6cb97e6ade Reverting totalview support to *not* use the as yet unimplemented
orte_jobgrp_t.  Now just need to work out where to call it...

This commit was SVN r7092.
2005-08-30 12:59:04 +00:00
Rainer Keller
d7901c97a5 - Del whitespaces, to make coming patch smaller.
This commit was SVN r7089.
2005-08-30 06:58:37 +00:00
Brian Barrett
77ebdf1c6f * Add some debugging output Ralph asked for when an unknown error code is
passed to opal_error

This commit was SVN r7087.
2005-08-29 23:36:53 +00:00
Brian Barrett
d8e5d80892 * add a reasonable first wack at a suppressions file for Valgrind to ignore
some stuff that we can't do anything about
* fix some more memory leaks in session_dir code

This commit was SVN r7086.
2005-08-29 23:05:52 +00:00
Brian Barrett
bf8a3632bb * bunch more memory leak / block in use fixes
This commit was SVN r7085.
2005-08-29 21:35:01 +00:00
Jeff Squyres
7d895a4f08 Add missing header file
This commit was SVN r7071.
2005-08-28 11:50:43 +00:00
Brian Barrett
fc71fd5744 * fix place where Jeff changed an exit to a return and we really wanted
it to be an exit.
* Put the srun process (or what is about to become the srun process) in
  it's own process group so that group-wide signals (such as the 
  SIGINT sent by hitting cntl-c in a shell) are not sent to the srun
  process. 

This commit was SVN r7068.
2005-08-27 17:08:48 +00:00
George Bosilca
5b59ffbe4f Handle multiple IP addresses for the OOB TCP module. We check the addresses in order, and we give up if
and only if all of them failed.

This commit was SVN r7067.
2005-08-27 17:03:19 +00:00
Jeff Squyres
774f879a41 Oops -- add second string in there because we added a second %s to the
help message.

This commit was SVN r7064.
2005-08-27 13:32:25 +00:00
Jeff Squyres
27554c19d7 Add missing .h file
This commit was SVN r7062.
2005-08-27 11:01:44 +00:00
Brian Barrett
2143ed4c81 * move error -> string converter registration from orte_init to
orte_init_stage1(), since not all ORTE processes call orte_init().
* Expad opal_error test case to make sure ORTE error codes print
  properly
* Make project error codes start at easy values (OPAL is -1 to -100,
  ORTE is -101 to -200, OMPI is less than -201) to make it easier
  to figure out what an error code as an integer means.  Also has
  the nice property of not changing the values of error codes ever
  time a new error code is added.

This commit was SVN r7061.
2005-08-26 23:36:57 +00:00
Jeff Squyres
c9cdb36b0b Finally get this right: move orte_sys_info.[ch] back into the orte
tree.
- fix up #include's throughout the tree (yay contrib/search_replace.pl!)
- remove a few extraneous #include's
- remove orte_sys_info*() from opal_init()/opal_finalize() (it's
  already in orte_init_stage1() and orte_system_finalize())
- remove dependencies in opal on orte_system_info -- util/os_path.c
  and util/os_create_dirpath.c (they only used path_sep, anyway --
  easily changed to #defines)

This commit was SVN r7059.
2005-08-26 21:03:41 +00:00
Jeff Squyres
b3bd549331 - Change a few calls from exit() to orte_abort() so that we get
session directory cleanup (among other things)
- When we get an abnormal exit in orterun (i.e., timeout expires and
  we haven't gotten termination notices from all processes), print a
  better message an exit in a better way (which includes session
  directory cleanup)
- Fix tm and poe pls's to not exit() but rather propagate the error up
  the stack (where relevant)

This commit was SVN r7058.
2005-08-26 20:36:11 +00:00
Josh Hursey
4eefb33182 Some param changes:
- Change orte_base_infrastructre to orte_infrastructre to conform with 
  ompi_info's needs
- Move MCA Param registration in ORTE to a centralized function that is 
  called first in orte_init_stage1
- Set the infrastructre flag as an argument to orte_init
- Adjust initalization functions to properly pass down the infrastructre
  flag.

This commit was SVN r7053.
2005-08-26 20:13:35 +00:00
Josh Hursey
3e18fa4555 Insert some signal handlers so orted properly cleans up after it self
when given a kill signal.

This commit was SVN r7050.
2005-08-26 18:56:08 +00:00
Jeff Squyres
b306adf349 The SLURM components are now open for business!
This commit was SVN r7046.
2005-08-26 14:43:18 +00:00
Brian Barrett
17c1bb355e * more memory leak fixes - mainly string params not being freed at end of
time
* Added code to free dps structures at shutdown

This commit was SVN r7043.
2005-08-26 02:08:23 +00:00
Brian Barrett
3e8740e740 * mostly working SLURM component. Had to add a sds for the daemons so that
we could vector launch the daemons and still have the nodenames fixed 
  up in the end

This commit was SVN r7041.
2005-08-25 22:29:23 +00:00
Josh Hursey
7bf744a624 Convert to use new param_reg interface.
Also check to see if infrastructre flag was previously set before assuming it
to be false. This was causing orterun to operate incorrectly in the presence
of a persistant daemon.

This commit was SVN r7039.
2005-08-25 19:13:22 +00:00
Jeff Squyres
524ded4896 A little cleanup and progress:
- build a proper srun argv
- launch the srun
- still have several "JMS" comments that need to be addressed

This commit was SVN r7036.
2005-08-25 16:38:42 +00:00
Jeff Squyres
d5909421a9 Register the priority param in open so that ompi_info can see it
This commit was SVN r7034.
2005-08-25 16:37:24 +00:00
Jeff Squyres
1649c7e855 Find out from SLURM how many slots per node we have
This commit was SVN r7031.
2005-08-25 15:51:58 +00:00
Rainer Keller
f52784bad3 - Just changes to comments, deletion of spaces to make diff smaller
This commit was SVN r7030.
2005-08-25 15:42:41 +00:00
Jeff Squyres
d0e847d1ed Allow oversubscription
This commit was SVN r7027.
2005-08-25 11:02:49 +00:00
Jeff Squyres
a6dd3537f1 Minor fixes.
This commit was SVN r7026.
2005-08-25 02:59:55 +00:00
Jeff Squyres
4d49340421 - Update header file convention
- Use new pls base function for adding orted debug argv (or not)

This commit was SVN r7020.
2005-08-24 22:20:51 +00:00
Jeff Squyres
f20bd3205d Add a utility function that is common to several pls's.
This commit was SVN r7019.
2005-08-24 22:20:05 +00:00
Jeff Squyres
9755a7f7fa First cut -- not working yet -- checkpointing to move to another
machine.

This commit was SVN r7018.
2005-08-24 22:19:48 +00:00
Jeff Squyres
072a59cc02 Properly register the MCA param during the open call
This commit was SVN r7014.
2005-08-24 20:50:26 +00:00
Brian Barrett
918f48ce52 * remove out dated comment
This commit was SVN r7010.
2005-08-24 20:19:58 +00:00
Jeff Squyres
28f716542e First cut of the SLURM ras. Seems to be working! Now need to write
SLURM pls... 

This commit was SVN r7008.
2005-08-24 19:15:11 +00:00
Jeff Squyres
018504480a - Update svn:ignore
- Update to new MCA param API
- Update to new #include format

This commit was SVN r7007.
2005-08-24 18:37:28 +00:00
Jeff Squyres
72d2abe72e Remove some outdated comments
This commit was SVN r7006.
2005-08-24 18:30:09 +00:00
Jeff Squyres
9ee4c6de17 Remove silly compiler warning.
This commit was SVN r6998.
2005-08-24 10:25:53 +00:00
Ralph Castain
5d7e5b17e0 Add these two functions so I don't have to keep adding them when I transfer diff's around.
NOTE: These have NOT been added to the Makefile.am in the repository. Please do NOT add them at this time - I will do so later.

This commit was SVN r6979.
2005-08-23 03:23:53 +00:00
Rainer Keller
f0c2f78dd4 - Another one, just missed.
This commit was SVN r6976.
2005-08-22 18:12:05 +00:00
Rainer Keller
1ac8c75965 - Nothing of interest: Fixed comments, indentation...
To get a clear view on the next patch.

This commit was SVN r6975.
2005-08-22 18:02:10 +00:00
Brian Barrett
f48968d8f4 clean up the error code situation - ensure that OMPI_ERROR == ORTE_ERROR ==
OPAL_ERROR, same for all the other error codes.  Also, make sure that there
are never conflicts between OPAL anr ORTE error codes (for example).
Finally, provide opal_perror(), opal_strerror(), and opal_strerror_r() to
give stringified error messages for the different error codes

This commit was SVN r6969.
2005-08-22 03:05:39 +00:00
Brian Barrett
acd652a7ac * have rsh setup opal_progress so that call_yield is only called if the nodes
are oversubscribed (based on information from ras and current data in gpr)

This commit was SVN r6941.
2005-08-19 18:56:44 +00:00
Brian Barrett
0a07341c40 * disconnect if an error occurs after we connected
This commit was SVN r6940.
2005-08-19 18:10:37 +00:00
Brian Barrett
e737bba753 * version of the tm pls that uses the proxy orteds, avoiding all the nasty
multi-client issues the old version had.  Also, ignore the NULL iof
   component, since we shouldn't use it when using the proxy orteds

This commit was SVN r6939.
2005-08-19 16:49:59 +00:00
Brian Barrett
80f27b5d87 * fix some bit rot in tm pls/ras
* remove src/ directory for tm pls/ras

This commit was SVN r6937.
2005-08-19 14:46:11 +00:00
Jeff Squyres
1f89200c67 Properly cast and remove compiler warning.
This commit was SVN r6935.
2005-08-19 12:20:24 +00:00
Brian Barrett
4476212aa7 * check for no components found case and return selection error in that
situation

This commit was SVN r6931.
2005-08-18 21:46:15 +00:00
Tim Woodall
ab4aac2c14 stubs
This commit was SVN r6930.
2005-08-18 19:46:42 +00:00
Brian Barrett
331a971b13 * fix mis-named constant
This commit was SVN r6923.
2005-08-18 15:19:17 +00:00
Jeff Squyres
5e5fd5a8f2 The fork pls now checks the total number of processes to be launched
against the total number of processors.  If not oversubscribing, emit
the MCA environment variable mpi_paffinity_processor with the
processor number to bind the process to.  This parameter is picked up
during MPI_Init (i.e., ompi_mpi_init()) and used to bind the process,
but currently iif the MCA param mpi_paffinity_alone is set to a
nonzero value (i.e., the user asks for it).

This commit was SVN r6906.
2005-08-16 16:23:20 +00:00
Brian Barrett
f6a64706ad Fix some poorly choosen constants in the XGrid PLS
This commit was SVN r6901.
2005-08-16 16:07:29 +00:00
Jeff Squyres
dfb6c854ab - Convert to new #include file scheme
- Ensure opal_output*() is properly declared

This commit was SVN r6896.
2005-08-16 10:50:09 +00:00
Jeff Squyres
cce0950df7 - change a bunch of OMPI_* constants or ORTE_* equivalents
- change the framework opens to [mostly] use the new MCA param API
- properly pass in framework debug output streams to the
  mca_base_component_open() function

This commit was SVN r6888.
2005-08-15 18:25:35 +00:00
Brian Barrett
8aca9ef966 * remove need to edit project/Makefile.am and project/{dynamic-,}mca/Makefile.am when adding a new component. Configure / autogen now do it for you.
* Add base to memory framework so that we can do something sane with
  ompi_info
* Updated ompi_info to print components for memory framework and
  show whether we have memory hooks active or not.

This commit was SVN r6861.
2005-08-13 20:19:24 +00:00
Josh Hursey
9bcb2989a5 Some general cleanup.
Added the FT-MPI interface functions. Many are hidden and not implemented,
but serve as place holders for future work.

This commit was SVN r6847.
2005-08-12 22:09:58 +00:00
Josh Hursey
e370141ae3 add memory manager init
This commit was SVN r6842.
2005-08-12 20:45:46 +00:00
Ralph Castain
67bd42d167 Fix a few minor compiler warnings
This commit was SVN r6816.
2005-08-12 04:07:47 +00:00
Tim Prins
311efa5bcc fix a small 64bit problem....
This commit was SVN r6814.
2005-08-11 20:36:07 +00:00
Josh Hursey
22c7f2b3e0 Quite a range of small changes.
ns_replica.c
 - Removed the error logging since I use this function in orte_init_stage1 to 
   check if we have created a cellid yet or not.

ras_types.h & rase_base_node.h
 - This was an empty file. moved the orte_ras_node_t from base/ras_base_node.h
   to this file.
 - Changed the name of orte_ras_base_node_t to orte_ras_node_t to match the 
   naming mechanisms in place.

ras.h
 - Exposed 2 functions:
   - node_insert:
     This takes a list of orte_ras_base_node_t's and places them in the Node 
     Segment of the GPR. This is to be used in orte_init_stage1 for singleton 
     processes, and the hostfile parsing (see rds_hostfile.c). This just puts 
     in the appropriate API interface to keep from calling the 
     orte_ras_base_node_insert function directly.
   - node_query:
     This is used in hostfile parsing. This just puts in the appropriate API 
     interface to keep from calling the orte_ras_base_node_query function 
     directly.
 - Touched all of the implemented components to add reference to these new 
   function pointers

ras_base_select.c & ras_base_open.c
 - Add and set the global module reference

rds.h
 - Exposed 1 function:
   - store_resource:
     This stores a list of rds_cell_desc_t's to the Resource Segment. 
     This is used in conjunction with the orte_ras.node_insert function in 
     both the orte_init_stage1 for singleton processes and rds_hostfile.c

rds_base_select.c & rds_base_open.c
 - Add and set the global module reference

rds_hostfile.c
 - Added functionality to create a new cellid for each hostfile, placing 
   each entry in the hostfile into the same cellid. Currently this is 
   commented out with the cellid hard coded to 0, with the intention of 
   taking this out once ORTE is able to handle multiple cellid's
 - Instead of just adding hosts to the Node Segment via a direct call to 
   the ras_base_node_insert() function. First add the hosts to the Resource 
   Segment of the GPR using the orte_rds.store_resource() function then use 
   the API version of orte_ras.node_insert() to store the hosts on the Node 
   Segment.
 - Add 1 new function pointer to module as required by the API.

rds_hostfile_component.c
 - Converted this to use the new MCA parameter registration

orte_init_stage1.c
 - It is possible that a cellid was not created yet for the current environment. 
   So I put in some logic to test if the cellid 0 existed. If it does then 
   continue, otherwise create the cellid so we can properly interact with the 
   GPR via the RDS.
 - For the singleton case we insert some 'dummy' data into the GPR. The RAS 
   matches this logic, so I took out the duplicate GPR put logic, and 
   replaced it with a call to the orte_ras.node_insert() function.
 - Further before calling orte_ras.node_insert() in the singleton case, 
   we also call orte_rds.store_resource() to add the singleton node to the 
   Resource Segment.

Console:
 - Added a bunch of new functions. Still experimenting with many aspects of the
   implementation. This is a checkpoint, and has very limited functionality.
 - Should not be considered stable at the moment.

This commit was SVN r6813.
2005-08-11 19:51:50 +00:00
Greg Watson
88c4505e6e Enable component by default.
This commit was SVN r6811.
2005-08-11 19:14:24 +00:00
Greg Watson
ad958e3f19 Fix build problem caused by these functions just "going away".
This commit was SVN r6810.
2005-08-11 19:13:09 +00:00
Josh Hursey
bf2789fc66 Added some error logging for unimplemented daemon commands.
This commit was SVN r6809.
2005-08-11 16:57:06 +00:00
Tim Prins
36435cbe4a a small change to make this configure test right.
This component is ompi_ignored so no need to autogen.....

This commit was SVN r6807.
2005-08-11 14:29:40 +00:00
Josh Hursey
c92c3effb6 Made a global base_module of function pointers for both frameworks.
This commit was SVN r6806.
2005-08-10 22:49:41 +00:00
Josh Hursey
abe16e3fa4 Convert to use new MCA param reg functions
This commit was SVN r6805.
2005-08-10 21:05:13 +00:00
Josh Hursey
52cf1e69e0 update makefile for addition of new header file
This commit was SVN r6803.
2005-08-10 20:03:20 +00:00
Josh Hursey
afe7e687cb A bit of cleanup and a couple of bug fixes for remote orted launching
using orteprobe.

Created a header file for orte_setup_hnp. [HNP = Head Node Process]

General cleanup and added a bit of documentation in orte_setup_hnp.c
Also fixed a cellid tokens issue (circa line 285)
Changed the launched scope from private to public

In orteprobe:
- added reference to orted.h to avoid duplicate header contents in orteprobe.h
- removed the version tag, and put in a verbose argument
- Fixed a buffer packing problem that was causing the parent from receiving the
  proper contact information for the new daemon.

This commit was SVN r6802.
2005-08-10 20:01:25 +00:00
Jeff Squyres
ef1abb71d3 Check to see if the MCA param already exists before registering it.
If you register a parameter a second time, it overwrites the default
value (this was causing a problem with mpirun not being marked as orte
infrastructure, and therefore thinking that it was a singleton, and
therefore always adding the localhost into the node list).

This commit was SVN r6789.
2005-08-09 21:03:33 +00:00
Tim Prins
24dc319237 - added more documentation
- converted some things to new MCA param API
 - renamed the pls_bproc_seed component struct so its name isn't the same as
   the pls_bproc component's struct
 - minor bugfixes

This commit was SVN r6774.
2005-08-08 22:17:22 +00:00
Jeff Squyres
f8fa8f4935 Fix a problem found by Tim Prins (patch also supplied by Tim P). From
his e-mail:

I ran into a small bug in rmaps_rr.c: map_app_by_slot which was
triggered by using multiple app contexts. Basically, if not all the
slots we allocated on a node were used by an app, we would
automatically move onto the next node. This caused a problem with
multiple app contexts when the first app takes a partial allocation of
a node, the second app would not be able to access these slots because
we had already move past the node, and the byslot routine does not
wrap back around the list.

This commit was SVN r6766.
2005-08-08 18:56:17 +00:00
Jeff Squyres
32e71e5c6c Fix a problem where orterun itself would not receive MCA parameters
that were set on the command line.  This was techinically exactly the
way the code was designed, but it certainly violated the Law of Least
Astonishment (even to its designer ;-) ).  So now if you execute
something like this:

   mpirun -mca pls_rsh_debug 1 -np 4 hello

You'll see debugging output from the rsh pls component, as you would
expect (this was not previously the case -- the MCA pls_rsh_debug
parame would be set to 1 in the 4 spawned hello processes, but *not*
in the orterun process).

More specifically, MCA parameters will be set in the orterun process
in the following cases:

- The new command line switch "--gmca" (or "-gmca") is used,
  indicating that the MCA parameter is "global".  --gmca also means
  that that MCA parameter will be applied to all context app's.  For
  example:

      mpirun -gmca foo bar -np 1 hello : -np 2 goodbye

  The foo MCA param will be set in both the hello and goodbye
  processes.

- If there is only one context app.  For example:

      mpirun -mca pls_rsh_debug 1 -np 4 hello

  will set pls_rsh_debug to 1 in both the orterun process and the 4
  spawned hello processes.

Also added a few more comments inside orterun to document a somewhat
confusing use of a state variable in a recursive case.

This commit was SVN r6764.
2005-08-08 16:42:28 +00:00
Ralph Castain
3c13d699f8 Remove an old file.
This commit was SVN r6762.
2005-08-08 13:41:53 +00:00
Ralph Castain
e583f6a97f Add a couple of new functions to the schema framework to check if a trigger is a "standard" trigger or not, and to extract a jobid from a standard trigger. Both functions will be used in a later commit.
Ensure that the seed set_my_name function sets all the right initial info in the name services' structures.

This commit was SVN r6760.
2005-08-07 13:26:49 +00:00
Ralph Castain
c530521a8e Add several new interface functions to the name services:
1. dump_xxx - analogous to the registry's dump commands, allows you to examine the contents of the name services' structures

2. get_job_peers - get an array of process names for all processes in the specified job

This commit was SVN r6759.
2005-08-07 13:21:52 +00:00
Ralph Castain
1438009dbd Properly set the MCA parameter to indicate these functions are infrastructure so that the singleton flag does not get set.
Somehow, in changing over to the new MCA interfaces, the "set" part of that logic got lost, so the singleton flag was always being set. This should repair some of the anomalous behavior seen recently where the local host was always being used for an application process.

This commit was SVN r6757.
2005-08-07 04:17:10 +00:00
Josh Hursey
3b187c4db3 Fix the 'delete container' logic in gpr to prevent recursive delete of all
containers when one is requested.

Fix a bug in gpr_replica_del_index_api which doesn't preset num_tokens and
num_keys, but assumes they are 0.

Fix orte_ras_base_node_delete() function to operate properly to delete the
appropriate container in the 'orte-node' segment when requested.

This commit was SVN r6756.
2005-08-05 23:37:39 +00:00
George Bosilca
14ffc85379 I want to have it compiled too.
This commit was SVN r6754.
2005-08-05 18:47:12 +00:00
Jeff Squyres
d0a0434172 Investigating an MCA param problem -- converted over orterun to new
MCA param API in the process.

This commit was SVN r6739.
2005-08-04 18:15:47 +00:00
Josh Hursey
12031db535 Added missing help file.
This commit was SVN r6737.
2005-08-04 17:40:22 +00:00
Jeff Squyres
aa9bdcfec5 Make some fixes and add some features to the rsh pls:
- convert MCA params to the new API
- some style and indenting fixes
- look at local shell, and if [new] MCA param
  pls_rsh_assume_same_shell is 1, then assume that the remote shell is
  the same as the local shell.  If pls_rsh_assume_same_shell is 0, do
  a probe to figure out what the remote shell is (NOT CURRENTLY
  IMPLEMENTED! you'll get a run-time warning if you set this MCA param
  to 0).
- if the remote shell is not csh and not bash, then prefix the remote
  command with "( ! [ -e ./.profile ] || . ./.profile;" (and suffix it
  with ")") so that we run the .profile on the remote side in order to
  set PATHs and the like.  See the LAM FAQ for details (will someday
  be on the Open MPI FAQ:
  http://www.lam-mpi.org/faq/category4.php3#question8)
- add a bunch of debugging output if the MCA param pls_rsh_debug is
  enabled (or the top-level debug MCA param is enabled)
- add more help messages (and corresponding calls to opal_show_help())
  in help-pls-rsh.txt

This commit was SVN r6731.
2005-08-04 15:09:02 +00:00
Tim Prins
2d707f34a0 - make persistent daemons work with bproc
- added documentation
 - code cleanups

This commit was SVN r6726.
2005-08-03 20:24:52 +00:00
Tim Prins
aa0525da98 Improvements in bproc support:
- we now properly support multiple application contexts
 - much improved error messages, using opal_show_help
 - fix some small bugs in the way the processes were discovering their names
 - better searching for orted
 - use the new mca parameter interface

These changes still need some testing, but they seem stable.

This commit was SVN r6719.
2005-08-02 22:22:55 +00:00
Ralph Castain
4e1837687b Finish simplified interfaces for put and subscribe - more details to come.
This commit was SVN r6713.
2005-08-02 19:43:29 +00:00
Jeff Squyres
ef9e06451c Ensure that --mca is listed in the --help message (thanks for pointing
this out Gleb!)

This commit was SVN r6712.
2005-08-02 18:52:12 +00:00
Brian Barrett
d7bb611a74 * make all the sds components follow the prefix rule. Shame on me.
This commit was SVN r6710.
2005-08-02 18:51:28 +00:00
Josh Hursey
c6a3e67f07 switch from if's to ifdef's to prevent compiler warnings
This commit was SVN r6705.
2005-08-02 17:07:21 +00:00
Ralph Castain
716f465ce3 Fix a typo
This commit was SVN r6704.
2005-08-02 15:54:36 +00:00
Josh Hursey
943a74e266 Checkpoint.
- Add functionality to parse multiple arguments provided in the console
- Cleaned up help function
- Added an option to hide commands from the help menu

Working on launching and reaping of daemons from within the console.

This commit was SVN r6699.
2005-08-01 22:52:48 +00:00
Josh Hursey
0c32f34cb7 Let's clean up the block comment parsing to be a bit more lex-like.
This commit was SVN r6695.
2005-08-01 22:25:47 +00:00
Josh Hursey
50545ef082 Additional Comment possibilities supported for both hostfile and
mca param file.
------
/*
 * Block Quote
 */

// Line Quote

# Shell Style Line Quote
-------

This commit was SVN r6694.
2005-08-01 22:11:26 +00:00
Josh Hursey
9447050dcf Added comments to lex parser. Now you can have a hostfile that looks like:
#
# this is my hostfile
#
frogger1
frogger2
#frogger3

and it will only use frogger 1&2.

This commit was SVN r6692.
2005-08-01 21:14:09 +00:00
Ralph Castain
8c6c78c47a Add a few new functions that were requested last week - not tested yet, so please don't use them! I will test them this afternoon on a different computer. For now, they won't cause any problems since they aren't being called.
This commit was SVN r6689.
2005-08-01 16:38:15 +00:00
Tim Prins
40bf905e8e - minor bug fixes
- better error message if the daemon dies

This commit was SVN r6687.
2005-07-29 20:02:56 +00:00
Josh Hursey
835dad20d5 forgot to add orteconsole.h to the distro in the makefile
This commit was SVN r6686.
2005-07-29 17:57:35 +00:00
Ralph Castain
4e79a51395 Add a job_info segment to the system that holds a container for each job. Within each container is a keyval indicating the job state (i.e., all procs at stage1, finalized, etc.). This provides a rough state-of-health for the job.
This required a little fiddling with a number of areas. Biggest problem was that it uncovered a potential for an infinite loop to be created in the registry. If a callback function modified the registry, the registry checked the triggers to see if anything had fired. Well, if the original callback was due to a trigger firing, that condition hadn't changed - so the trigger fired again....which caused the callback to be called, which modified the registry, which checked the triggers, etc. etc.

Triggers are now checked and then "flagged" as being "in process" so that the registry will NOT recheck that trigger until all callbacks have been processed. Tried doing this with subscriptions as well, but that caused a problem - when we release processes from a stagegate, they (at the moment) immediately place data on the registry that should cause a subscription to fire. Unfortunately, the system will just hang if that subscription doesn't get processed. So, I have left the subscription system alone - any callback function that modifies the registry in a fashion that will fire a subscription will indeed fire that subscription. We'll have to see if this causes problems - it shouldn't, but a careless user could lock things up if the callback generates a callback to itself.

Also fixed the code that placed a process' RML contact info on the registry to eliminate the leading '/' from the string.

This commit was SVN r6684.
2005-07-29 14:11:19 +00:00
Josh Hursey
9acbd4e21f forgot to take out initalizer when I removed the verbose stuff
This commit was SVN r6682.
2005-07-29 00:21:10 +00:00
Josh Hursey
e849f7ba07 Significant clean up of the orteconsole.
- Added user help messages.
 - Abstracted the internal commands, and the mechanism for
   parsing and executing them.
 - Cleaned up the command line parsing
 - Some other misc. cleanup items.

Still much more work to do here, but should provide a more
intuitive interface for extending functionality in the 
system.

This commit was SVN r6676.
2005-07-28 23:48:46 +00:00
Tim Prins
5a4f8a257d - enabled new bproc components
- added support for Scyld bproc and old LANL bproc

This commit was SVN r6674.
2005-07-28 22:28:38 +00:00
Josh Hursey
018c4aa44e remove unnecessary slashes
This commit was SVN r6673.
2005-07-28 21:33:33 +00:00
Josh Hursey
8b56769307 removed the version command line option. Added some more user help messages
This commit was SVN r6672.
2005-07-28 21:17:48 +00:00
Josh Hursey
5ad860fc47 forgot to take out a line in the help message.
This commit was SVN r6671.
2005-07-28 20:51:56 +00:00
Josh Hursey
8deed21e00 Replaced some stderr fprintfs with opal_show_help functions, with
more user friendly error messages.

Removed the "--version" command line option, since they should 
get this from ompi_info [later to be orte_info].

If we find an invalid command line option print out the help
screen before exiting.

This commit was SVN r6670.
2005-07-28 20:49:17 +00:00
George Bosilca
9fdfbd9934 correct the printf for 64 bits architectures.
This commit was SVN r6667.
2005-07-28 19:54:06 +00:00
Brian Barrett
747f23099e * fix some warnings
This commit was SVN r6661.
2005-07-28 19:25:47 +00:00
Josh Hursey
033b0be417 clean up help msg for orted
This commit was SVN r6657.
2005-07-28 18:38:37 +00:00
Brian Barrett
f8fb43d792 * don't recurse into badness - call the function we want to call
This commit was SVN r6656.
2005-07-28 18:33:55 +00:00
Josh Hursey
707fbb35ce added help message file to orted
This commit was SVN r6655.
2005-07-28 17:18:33 +00:00
Brian Barrett
b0b6ddd078 * add --enable-heterogeneous (default: enabled) to enable heterogeneous
support in OMPI.  Currently only enables/disables the architecture
  sharing modex in ob1 pml.
* Add sds framework to ompi_info
* Figure out table ids to use for Portals BTL at configure time, since
  we should use 30 & 31 on Red Storm, but the reference implementation
  only supports 0-8.
* Some bug fixes in Portals UTCP sds

This commit was SVN r6650.
2005-07-28 16:16:13 +00:00
Brian Barrett
a474dabab0 * don't assume select has been called during close
* expose sds component list for ompi_info
* forgot to add pipe put into the list of put functions

This commit was SVN r6645.
2005-07-28 15:14:46 +00:00
Jeff Squyres
bbf7da16ff Print a friendly message when the local exec can't find the orted.
This commit was SVN r6643.
2005-07-28 13:00:32 +00:00
Brian Barrett
2852772b32 * add a bunch of svn:ignored files
* Add Portals UTCP reference sds for when we are using the portals
  reference implementation without the ORTE starters (when we want to
  pretend like we're on Red Storm, only with a debugger and valgrind and
  possibly even a printf that actually works...)
* Add super-secret --with flag to cnos rml to enable the cnos rml but
  disable cnos_barrier (for use with portals utcp reference implementation)

This commit was SVN r6642.
2005-07-28 06:23:34 +00:00
Brian Barrett
93ddb4bf73 * some fixups for the cnos components
This commit was SVN r6637.
2005-07-28 00:11:09 +00:00
Brian Barrett
1ce2e26272 Move set_my_name (NDS) functionality from ns_base and universe contact
test from orte_init_stage1 into a new framework, Startup Discovery Service
(sds).  This allows us to have more flexibility with platforms like
Red Storm, which do not have a universe in the usual meaning and don't have
a seed daemon they can contact

This commit was SVN r6630.
2005-07-27 23:18:16 +00:00
Brian Barrett
6aa464b67e More changes from Red Storm port
- only call sched_yield if it exists
  - don't fail out if modex doens't work in ob1
  - bunch of fixes for Portals BTL
  - add cnos rml component
  - add NULL gpr component (should only be used if replica AND proxy
    fail to load)  

This commit was SVN r6629.
2005-07-27 23:07:14 +00:00
Tim Prins
b7ab5f1ec8 only compile the bproc soh component if we are on bproc 4
This commit was SVN r6625.
2005-07-27 22:13:21 +00:00
Tim Prins
384639c5cc - more build system updates for bproc
This commit was SVN r6609.
2005-07-26 22:12:03 +00:00
Tim Prins
dcc81eb598 - fix a bug which made compiles fail when '--with-bproc' is passed
- various bugfixes for bproc components

This commit was SVN r6603.
2005-07-25 22:21:40 +00:00
Tim Prins
6aceaf81b7 - properly kill off daemons
- code cleanup

This commit was SVN r6601.
2005-07-25 15:57:15 +00:00
Tim Prins
70587299f3 - respect configure options --without-bproc and --with-bproc=no
- check for a recent version of LANL bproc by looking for sys/bproc_common.h

This commit was SVN r6596.
2005-07-22 22:41:35 +00:00
Ralph Castain
13fdcff66b Fix a bug Greg was seeing on subscription returns - problem in pointer arithmetic
This commit was SVN r6594.
2005-07-22 20:46:07 +00:00
Thara Angskun
cbed508d9a This commit was SVN r6593. 2005-07-22 18:04:07 +00:00
Tim Prins
73171fb09d - added configure.m4 so we only compile the bproc soh component if we have bproc
- updated svn:ignore

This commit was SVN r6591.
2005-07-22 14:34:39 +00:00
Greg Watson
986f9e5a07 unignore this component
This commit was SVN r6589.
2005-07-21 22:33:04 +00:00
Greg Watson
4ab9a924aa Avoid multiple calls to update_registry(). Also added sanity check.
This commit was SVN r6588.
2005-07-21 22:31:55 +00:00
Brian Barrett
d4058f65e2 * make "no oobs found" output only occur if verbose is set. Needed in rare
times it's ok not to have an oob

This commit was SVN r6582.
2005-07-21 20:21:57 +00:00
Tim Woodall
eb0ed5f3d0 correct typo
This commit was SVN r6580.
2005-07-21 20:18:39 +00:00
Tim Prins
9aa319b082 for new bproc components,
- improved cleanup on slave nodes
- respect the configure option not to use ptys
- various code cleanups

This commit was SVN r6579.
2005-07-21 19:53:04 +00:00
Tim Woodall
f5ad856857 don't kill the seed daemon
This commit was SVN r6578.
2005-07-21 19:45:05 +00:00
Tim Woodall
7010548c1b correct byte order conversions for size_t == 8 bytes
This commit was SVN r6577.
2005-07-21 17:45:09 +00:00
Tim Prins
35041a0f01 - improved error handling
- code cleanups
- improved cleanup, but still needs work

This commit was SVN r6569.
2005-07-20 20:39:06 +00:00
Ralph Castain
f604fb72db Turn "on" the delete functionality for the registry. Should now be able to delete entries and segments, and get an index of the dictionary entries on the registry.
Haven't fully tested these yet (nobody is using them at the moment that I know of - good thing, since they haven't been working for a long time - though I know the MPI-2 stuff needs the functionality), but will do so shortly. For now, they compile.

This commit was SVN r6567.
2005-07-20 18:07:46 +00:00
Tim Woodall
b46023565f set the current directory before trying to exec/dump the binary
This commit was SVN r6562.
2005-07-20 14:26:09 +00:00
George Bosilca
96ff4b0b10 memset require string.h on Linux.
This commit was SVN r6559.
2005-07-20 06:45:00 +00:00
Tim Prins
acb9365793 - added an error message so we don't just segfault when the specified oob
interfaces do not have valid addresses.
- properly record the pids of launched processes in the new bproc component

This commit was SVN r6553.
2005-07-19 20:12:51 +00:00
Ralph Castain
5e437f9a09 Fix a potential "free" that shouldn't happen
This commit was SVN r6552.
2005-07-19 16:21:06 +00:00
Ralph Castain
9af1739d33 Correct an opal_hash_table_get/set_proc name to orte_hash_table_get/set_proc.
Remove a couple of unused variable complaints from registry dump.

This commit was SVN r6550.
2005-07-19 13:33:04 +00:00
Jeff Squyres
74744dd9df Fix a holdover mistake from the directory re-org:
- orte/class/ompi_proc_table.[ch] -> orte/class/orte_proc_table.[ch]
- opal_hash_table_[get|set|remove]_proc -> 
  orte_hash_table_[get|set|remove]_proc

This commit was SVN r6549.
2005-07-19 12:25:19 +00:00
Jeff Squyres
7e413d6c26 Remove mistaken return with a value in a void function.
This commit was SVN r6548.
2005-07-19 12:23:41 +00:00
Jeff Squyres
41f9cd8224 Add missing <sys/types.h> for size_t and friends (which is not
automatically included in optimized builds).

This commit was SVN r6547.
2005-07-19 12:23:07 +00:00
Ralph Castain
485e549f38 missing file
This commit was SVN r6545.
2005-07-18 21:18:26 +00:00
Ralph Castain
19d58ee17e First phase of the scalable RTE changes:
1. Modify the registry to eliminate redundant data copying for startup messages.

2. Revise the subscription/trigger system to avoid redundant storage of triggers and subscriptions. This dramatically reduces the search time when a registry action occurs - to illustrate the point, there are now only a handful of triggers on the system for each job. Before, there were a handful of triggers for each PROCESS in the job, all of which had to be checked every time something happened on the registry. This is much, much faster now.

3. Update all subscriptions to the new format. There are now "named" subscriptions - this allows you to "name" a subscription that all the processes will be using. The first one to hit the registry actually defines the subscription. From then on, any subsequent "subscribes" to the same name just cause that process to "attach" to the existing subscription. This keeps the number of subscriptions being tracked by the registry to a minimum, while ensuring that each process still gets notified.

4. Do the same for triggers.

Also fixed a duplicate subscription problem that was causing people to receive data equal to the number of processes times the data they should have received from a trigger/subscription. Sorry about that... :-( ...but it's all better now!

Uncovered a situation where the modex data seems to be getting entered on the registry a second time - the latter time coming after the compound command has been "fired", thereby causing all the subscriptions to fire. Asked Tim and Jeff to look into this.

Second phase of the changes will involve modifying the xcast system so that the same message gets sent to all processes. This will further reduce the message traffic, and - once we have a true "broadcast" version of xcast - really speed things up and improve scalability.

This commit was SVN r6542.
2005-07-18 18:49:00 +00:00
Tim Prins
75b0fa3c87 cleanup
This commit was SVN r6541.
2005-07-18 16:55:49 +00:00
Tim Prins
03907e12b2 this logic is done elsewhere
This commit was SVN r6540.
2005-07-18 16:31:58 +00:00
Ralph Castain
526217b9fc Two things here:
1. Fix the reigstry's overwrite logic. It was only overwriting the first keyval specified in a value - the rest were just added on regardless of whether or not the keyval already existed. This was the source of the multiple keyvals some people were seeing - should be fixed now.

2. Change the orted command parsing options so it reports options that aren't recognized - should help reduce confusion

This commit was SVN r6536.
2005-07-16 23:08:15 +00:00
Greg Watson
cbb62f4ba3 Tidy up.
This commit was SVN r6529.
2005-07-15 19:54:18 +00:00
Greg Watson
6df94466cf Moved to bproc component header.
This commit was SVN r6513.
2005-07-15 04:24:16 +00:00
Greg Watson
6f38d05a9f Install header.
This commit was SVN r6512.
2005-07-15 04:23:37 +00:00
Greg Watson
7232f648a6 Moved from component header.
This commit was SVN r6511.
2005-07-15 04:23:06 +00:00
Greg Watson
ed8f563a6f Default debug off.
This commit was SVN r6510.
2005-07-15 04:22:35 +00:00
Greg Watson
eaba912169 Only update registry key that actually changed.
This commit was SVN r6509.
2005-07-15 04:22:04 +00:00
Tim Prins
5a12889d4e make launching multiple apps work again and some code cleanups
This commit was SVN r6498.
2005-07-14 20:40:05 +00:00
Brian Barrett
14b89e0e50 Bunch more updates from operation Red Storm:
* Add ability to completely disable libltdl (the dlopen code to load
  dynamic shared objects) to configure: --disable-dlopen
* Added MCA param (component_disable_dlopen) to disable DSO loading
  at runtime
* Made the event library behave in some not-completely-erroneous way
  on platforms where it has absolutely no eventops support (ie, no
  select, poll, or epoll)
* Disabled orte_wait, opal_few, and opal_daemon_init code on
  platforms without fork, waitpid support.  All non-init functions
  will return OPMI_ERR_NOT_SUPPORTED
* Disable orteprobe tool when fork or pipe aren't supported

This commit was SVN r6490.
2005-07-14 18:05:30 +00:00
Tim Woodall
beba576af5 removed bogus error message
This commit was SVN r6489.
2005-07-14 15:41:33 +00:00
Tim Woodall
d52252065d - dont require NODES environment variable
- ignore existing nodes that aren't valid

This commit was SVN r6488.
2005-07-14 15:40:30 +00:00
Tim Woodall
b30540646a provide the node name when setting status as this may create the node
This commit was SVN r6487.
2005-07-14 15:39:44 +00:00
Tim Prins
3295975cea properly kill off the daemons.
This commit was SVN r6486.
2005-07-14 15:08:04 +00:00
Brian Barrett
dbf9820e6b * Add checks for the process management functions (fork, execve, waitpid)
* Add checks for fork() for fork and rsh plses so that they dont' activate
  on platforms without fork

This commit was SVN r6482.
2005-07-14 13:28:06 +00:00
Brian Barrett
52974d0553 * add missing header file when debugging is disabled
This commit was SVN r6479.
2005-07-14 05:02:53 +00:00
Ralph Castain
44ace2f64e Well, I think this will fix the bug Greg encountered when sending no triggers on a subscription. However, I can't test it since the trunk no longer runs on my Mac notebook - I get an error message "No ptl components available. This shouldn't happen." and the processes exit.
This commit was SVN r6476.
2005-07-14 01:32:36 +00:00
Greg Watson
de4b8b1a50 New bproc soh implementation.
This commit was SVN r6475.
2005-07-14 00:20:37 +00:00
Greg Watson
a0971116bd Copied from other compenents.
This commit was SVN r6474.
2005-07-14 00:19:41 +00:00
Greg Watson
f0a440a238 Bproc specific registry keys.
This commit was SVN r6473.
2005-07-14 00:18:36 +00:00
Greg Watson
935df416ab Updated to latest component model.
This commit was SVN r6472.
2005-07-14 00:18:00 +00:00
Tim Prins
66777a7bc7 Lots of changes to the new bproc components:
- it will now wait for the child procs to exit then kill off the daemons
- if orted is in your path it will automatically be found, or you can
  specify its location.
- your LD_LIBRARY_PATH is now forwarded to the backend to make it easier to use
  shared libraries in nonstandard places

Still need to work on cleanup on the backend nodes.

This commit was SVN r6462.
2005-07-13 19:46:55 +00:00
Brian Barrett
4d580fa706 * disable TCP ptl and oob components if there is no TCP support (look at
sockaddr_in - seems to be a good indicator)
* disable util/if code if no inet devices (again, no sockaddr_in)
* add enable/disable flag to disable stacktrace pretty-print code
  (defaults to enabled).  Seems there's something funky going on with
  the preprocessor on Red Storm that was causing problems - this was
  the easiest fix
* clean up a bunch of the configure.m4 files to remove bogus comments,
   properly comment them, fix the dumb logic for happy/unhappy
* Create a macro for testing both header and library for a package, 
  since we seem to do this kind of test quite often.  Handles the
  -I and -L search paths properly (including stripping out /usr and
  /usr/local if not needed)
* Converted mvapi components to configure.m4, using the nice new
  ompi_check_package macro (above)

This commit was SVN r6454.
2005-07-13 04:16:03 +00:00
Brian Barrett
586918853c * Turn thread support on by default, but disable both mpi and progress
threads (basically, same as before, but we now link the right thread
  libraries). 
* Add disable-io-romio flag to disable compiling ROMIO
* Migrathe mvapi btl from configure.stub to configure.m4

This commit was SVN r6453.
2005-07-13 01:07:31 +00:00
George Bosilca
e57b113a94 orte_abort was supposed to accept a variable number of arguments. But internally it didn't honor them. The second problem is that the opal_output does not accept a va_list as argument. So we have to create the string in the orte_abort and then print it out using opal_output.
This commit was SVN r6446.
2005-07-12 19:33:37 +00:00
Tim Prins
fe09e33f14 correct the handling of stdin for bproc
This commit was SVN r6442.
2005-07-12 18:36:41 +00:00
Ralph Castain
81af57707f Don't release the message buffer - the messaging function takes care of it.
This commit was SVN r6437.
2005-07-12 15:41:45 +00:00
Ralph Castain
49dbd29034 Only set singleton flag if not infrstructure when we get our name from a seed daemon.
This commit was SVN r6421.
2005-07-11 19:22:26 +00:00
Tim Prins
ba4d0fe5a1 change the new bproc components to use the new build system
This commit was SVN r6420.
2005-07-11 15:12:49 +00:00
Brian Barrett
6e4f33e48c * after careful consideration, there's really no reason to force config.m4
components to succeed with --enable-dist.  Instead, just add them to
  all_components and make dist will still work - we're going to stamp out
  the Makefiles no matter what
* Add missing header to ob1 pml for make dist
* Clean up the Portals BTL configure code

This commit was SVN r6413.
2005-07-10 01:09:31 +00:00
Brian Barrett
a991d883c1 * Rewrite ompi_mca.m4 to use m4_defined lists of projects (ompi, orte, etc.),
frameworks, and components without configure scripts instead of
  hard-coded shell variables (for projects and frameworks) and 
  shell variable building (for components).
* Add 3rd category of component configuration (in addition to configure
  scripts and no-configured components): configure.m4 components.  These
  components can only be built as part of OMPI (like no-configure), but
  can provide an m4 file that is run as part of the main configure
  script.  These macros can set whether the component should be built, 
  along with just about any other configuration wanted.  More care must
  be taken compared to configure components, as doing things like setting
  variables or calling AC_MSG_ERROR now affects the top-level configure
  script (so calling AC_MSG_ERROR if your component can't configure
  probably isn't what you want)
* Added support to autogen.sh for the configure.m4-style components,
  as well as building up the m4_define lists ompi_mca.m4 now expects
* Updated a number of macros to be more config.cache friendly (both
  so that config.cache can be used and so the test can be quickly
  run multiple times in the same configrue script):
    - ompi_config_asm
    - c_weak_symbols
    - c_get_alignment
* Added new macros to be shared when configuring components:
    - ompi_objc.m4 (this actually provides AC_PROG_OBJC - don't ask...)
    - ompi_check_xgrid
    - ompi_check_tm
    - ompi_check_bproc
* Updated a number of components to use configure.m4 instead of
  configure.stub
    - btl portals
    - io romio
    - tm ras and pls
    - bjs, lsf_bproc ras and bproc_seed pls
    - xgrid ras and pls
    - null iof (used by tm) 

This commit was SVN r6412.
2005-07-09 18:52:53 +00:00
Josh Hursey
de5e0d4f2c (Re-)Added two MCA Parameters that must have been lost in the merge way back when:
* mpi_show_mca_params
   If set to true, this turns on the dumping of all MCA parameters when MPI_INIT is called. 
   Only the 'rank 0' processes will print the parameters.

* mpi_show_mca_params_file
   (This value is only used if the first argument is set to true) If this value is non-NULL 
   it specifies the file to put the dump into. This file can then be used as input to mpirun 
   for debugging purposes. If this value is not set (and mpi_show_mca_params is set) then 
   the parameters are dumped to stdout.

This commit was SVN r6401.
2005-07-08 21:01:37 +00:00
Tim Prins
d4151fa9fd properly fix the usage of the app pointer array by checking for NULLs instead of forcing it to be the same size as the number of entries
This commit was SVN r6395.
2005-07-08 18:48:25 +00:00
Brian Barrett
0ae16f2ab7 * add local hook to remove static-components.h in distclean target. The
files are generated by configure, and not part of the tarball, so
  distclean would be the right place to remove them.

This commit was SVN r6390.
2005-07-08 13:54:12 +00:00
Brian Barrett
e4644c407c * as requested by ralph, include orted.h in the list of developer headers
This commit was SVN r6389.
2005-07-08 13:46:06 +00:00
Tim Prins
5cdf0803d4 make the app pointer array blocksize 1 so the the size of the pointer array is the same as the number of apps. This was causing a segfault when trying to launch multiple apps.
This commit was SVN r6368.
2005-07-07 18:01:26 +00:00
Ralph Castain
c72621c90e Add some finalization logic when universe isn't found and orte_init returns per MCA parameter.
This commit was SVN r6362.
2005-07-06 18:41:40 +00:00
Tim Woodall
c860b92011 don't allocate to nodes that aren't valid (e.g. front end) - ignore them
rather then giving an error message

This commit was SVN r6358.
2005-07-06 17:55:01 +00:00
Jeff Squyres
888f0c5afd Remove the EXTRA_DIST=VERSION stuff from all the Makefile.am's so that
"make dist" can succeed.  Duh.  :-\

This commit was SVN r6351.
2005-07-05 19:01:47 +00:00
George Bosilca
8619097919 Update the xgrid components in order to allow them to compile under the new tree. In other words change the include list to match the one explained in the Jeff email.
This commit was SVN r6345.
2005-07-04 21:19:35 +00:00
Jeff Squyres
ba99409628 Major simplifications to component versioning:
- After long discussions and ruminations on how we run components in
  LAM/MPI, made the decision that, by default, all components included
  in Open MPI will use the version number of their parent project
  (i.e., OMPI or ORTE).  They are certaint free to use a different
  number, but this simplification makes the common cases easy:
  - components are only released when the parent project is released
  - it is easy (trivial?) to distinguish which version component goes
    with with version of the parent project
- removed all autogen/configure code for templating the version .h
  file in components
- made all ORTE components use ORTE_*_VERSION for version numbers
- made all OMPI components use OMPI_*_VERSION for version numbers
- removed all VERSION files from components
- configure now displays OPAL, ORTE, and OMPI version numbers
- ditto for ompi_info
- right now, faking it -- OPAL and ORTE and OMPI will always have the
  same version number (i.e., they all come from the same top-level
  VERSION file).  But this paves the way for the Great Configure
  Reorganization, where, among other things, each project will have
  its own version number.

So all in all, we went from a boatload of version numbers to
[effectively] three.  That's pretty good.  :-)

This commit was SVN r6344.
2005-07-04 20:12:36 +00:00
Jeff Squyres
6a9c9953bc Remove a bunch of -I's that are no longer necessary with
properly-prefixed static-component.h files.

This commit was SVN r6342.
2005-07-04 18:24:58 +00:00
Brian Barrett
170ef8af1f * rename ompi_show_help to opal_show_help
* rename ompi_stacktrace to opal_stacktrace
* rename ompi_strncpy to opal_strncpy

This commit was SVN r6336.
2005-07-04 02:38:44 +00:00
Brian Barrett
ed81e51c3a * rename ompi_printf to opal_printf
* rename ompi pty code to opal pty code
* rename ompi_qsort to opal_qsort

This commit was SVN r6335.
2005-07-04 02:16:57 +00:00
Brian Barrett
46245aaac1 * rename orte_os_create_dirpath to opal_os_create_dirpath
* rename orte_os_path to opal_os_path
* rename ompi_path_find to opal_path_find
* rename ompi_pow2 to opal_pow2

This commit was SVN r6334.
2005-07-04 01:59:52 +00:00
Brian Barrett
e55f99d23a * rename ompi_if to opal_if
* rename ompi_malloc to opal_malloc
* rename ompi_numtostr to opal_numtostr
* start of rename of ompi_environ to opal_environ

This commit was SVN r6332.
2005-07-04 01:36:20 +00:00
Brian Barrett
9f44b80291 * rename ompi_argv to opal_argv
* rename ompi_basename to opal_basename
* rename ompi bitop functions to opal
* rename ompi_cmd_line to opal_cmd_line
* rename ompi_sizet2int to opal_sizet2int
* rename orte_daemon_init to opal_daemon_init
* rename ompi_few to opal_few

This commit was SVN r6330.
2005-07-04 00:13:44 +00:00
Brian Barrett
a13166b500 * rename ompi_output to opal_output
This commit was SVN r6329.
2005-07-03 23:31:27 +00:00
Brian Barrett
23b687b0f4 * rename ompi_event to opal_event
This commit was SVN r6328.
2005-07-03 23:09:55 +00:00
Brian Barrett
39dbeeedfb * rename locking code from ompi to opal
This commit was SVN r6327.
2005-07-03 22:45:48 +00:00
Brian Barrett
ccd2624e3f * rename ompi_progress to opal_progress
This commit was SVN r6326.
2005-07-03 21:57:43 +00:00
Brian Barrett
9f0c969bb4 * rename ompi_hash_table opal_hash_table
This commit was SVN r6324.
2005-07-03 16:52:32 +00:00
Brian Barrett
761402f95f * rename ompi_list to opal_list
This commit was SVN r6322.
2005-07-03 16:22:16 +00:00
Brian Barrett
499e4de1e7 * rename ompi_object and ompi_class to opal_object and opal_class
This commit was SVN r6321.
2005-07-03 16:06:07 +00:00
Jeff Squyres
35c141aef6 While we're moving directories around, move ompi/mpi/runtime ->
ompi/runtime, for consistency and parallel-ness will orte/runtime.
Also remove a few useless #includes along the way.

This commit was SVN r6317.
2005-07-03 12:07:29 +00:00
Brian Barrett
f1c925475e * use the orte_pointer_array properly
This commit was SVN r6314.
2005-07-03 04:02:01 +00:00
Brian Barrett
8077da277b * move ompi_rb_tree from opal to ompi since it's only used in ompi, and should
have the ompi_free_list instead of the opal_free_list
* Change orte to use opal_free_list instead of ompi_free_list

This commit was SVN r6307.
2005-07-02 16:46:27 +00:00
Jeff Squyres
1b6326f76d Move module_exchange to pml/base
This commit was SVN r6305.
2005-07-02 16:12:04 +00:00
Jeff Squyres
36a5b9bd13 Minor fix
This commit was SVN r6296.
2005-07-02 15:43:35 +00:00
Jeff Squyres
282a8b5e8d More orte Makefile.am updates
This commit was SVN r6287.
2005-07-02 15:13:41 +00:00
Jeff Squyres
aa056f7bfd First cut of OMPI Makefile.am's, plus a few more catchup updates in orte
This commit was SVN r6286.
2005-07-02 15:06:47 +00:00
Jeff Squyres
677a385360 Fix same typo that I just fixed in opal :)
This commit was SVN r6281.
2005-07-02 14:37:19 +00:00
Jeff Squyres
4d192c2d10 First cut at Makefile.am's for orte
This commit was SVN r6280.
2005-07-02 14:36:36 +00:00
Jeff Squyres
a314578d94 Oops -- rmgr should be in orte, not ompi.
This commit was SVN r6274.
2005-07-02 14:14:42 +00:00
Jeff Squyres
3c99cf301a - Remove some empty directories (from before the directory re-org)
- Add zero-length Makefile.am's so that we can plug them into
  configure.ac now and not have to keep editing it

This commit was SVN r6273.
2005-07-02 14:13:35 +00:00
Jeff Squyres
cd497636ac Move modex out of opal MCA base into orte/util
This commit was SVN r6268.
2005-07-02 13:43:30 +00:00
Jeff Squyres
3a9179a0d7 Initial population of the opal tree
This commit was SVN r6267.
2005-07-02 13:43:20 +00:00
Jeff Squyres
1b18979f79 Initial population of orte tree
This commit was SVN r6266.
2005-07-02 13:42:54 +00:00