1
1
Граф коммитов

575 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
d741b7f37f We're adding some specific and complex functionality to orteun, so it
really needs to be documented (in part so that users stop asking us
how to do it!).  

This is a first cut at an orterun.1 man page.  It is 95% copied from
LAM's mpirun.1 lam page -- I just edited the very top and am handing
this off to Josh to finish the first cut.  Then we'll add specific
docs about the behavior of some of the finer details.  This is not
listed in the Makefile.am yet because it's so incomplete/incorrect
(w.r.t. OMPI), so I don't want it included in the tarball or installed
[yet].

This commit was SVN r9058.
2006-02-16 13:29:37 +00:00
Jeff Squyres
018a4b98ff - Ensure that "context" is initialized to NULL
- Ensure that we don't free a NULL context
- Add a few {}'s

This commit was SVN r9055.
2006-02-16 04:09:29 +00:00
Tim Woodall
fc751171cd bproc cleanup from release branch
This commit was SVN r9054.
2006-02-16 00:16:22 +00:00
David Daniel
e82c470b32 - Change the exit status set by mpirun when an application process is
killed by a signal.  The exit status is now set to signo + 128, which
  conforms with the behavior of (almost) all shells.

This commit was SVN r9050.
2006-02-15 22:41:29 +00:00
David Daniel
ff7a2c7967 Fixes for BJS (broken since merge)
This commit was SVN r9043.
2006-02-15 01:14:50 +00:00
David Daniel
aa5c5772c2 Fixing a wayward OMPI_ERROR.
Fixing logic of a couple of error logging statements (compiler was complaining)

This commit was SVN r9042.
2006-02-15 00:09:33 +00:00
George Bosilca
8062277bae I'm confused ... Error string as well as the goto label had the same name ...
This commit was SVN r9036.
2006-02-14 17:49:14 +00:00
Ralph Castain
f5d17148c1 Clean up the references to num_env, which has been removed from app_context.
This commit was SVN r9014.
2006-02-13 21:08:35 +00:00
Ralph Castain
bc6a82839d Update these components to new dss
This commit was SVN r9004.
2006-02-13 15:28:29 +00:00
Brian Barrett
913890f534 * forgot to add new directories into DIST_SUBDIRS as well as SUBDIRS, so
tarballs were missing some directories.

This commit was SVN r8989.
2006-02-12 07:06:38 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Ralph Castain
1abe8ef368 Well, it certainly helps triggers to fire if the respective responsible routines adjust the counters!
The INIT counter is supposed to be adjusted when the processes are mapped - this is now done correctly.

The LAUNCHED counter is supposed to be adjusted when the pls sets the process pid info into the registry and changes the state to LAUNCHED. This could probably be changed to have that function use the set_proc_soh API, but this fixes the problem for now.

Thanks to Brian for finding that the triggers were not being fired.

This commit was SVN r8948.
2006-02-09 15:39:06 +00:00
George Bosilca
4b4b70cb0f Remove compilation warning.
This commit was SVN r8942.
2006-02-08 19:52:57 +00:00
Ralph Castain
892b396d70 Ensure that standard triggers are defined for all job/process states so that user's can subscribe to those they want to use. Modify the way that is done to avoid over-burdening the standard launch sequence since it doesn't need alerts from all those triggers.
This commit was SVN r8938.
2006-02-08 17:40:11 +00:00
Ralph Castain
5c750cd8b9 Checkpoint a fix for Brian's observed failure to correctly unpack byte_objects. Will continue testing on another machine.
This commit was SVN r8921.
2006-02-07 15:43:43 +00:00
George Bosilca
3bb2eadfaa Do not let them uninitialized.
This commit was SVN r8916.
2006-02-07 06:06:58 +00:00
George Bosilca
dda0e4182f Remove unused variables
Add required include files (stdio.h for NULL definition).
Make it compile on MAC OS 10.3.

This commit was SVN r8914.
2006-02-07 05:41:31 +00:00
Brian Barrett
72de49f0ad * make the xgrid component compile again. still need to test tomorrow...
This commit was SVN r8913.
2006-02-07 04:46:00 +00:00
Ralph Castain
4b9f015c0b Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list.
This commit was SVN r8912.
2006-02-07 03:32:36 +00:00
George Bosilca
b7fa1f4664 As signal.h to the include files to import SIGCONT.
This commit was SVN r8899.
2006-02-05 05:49:24 +00:00
Brian Barrett
03f6a8529c * Fix situation where we were unlocking a mutex we didn't own in an error
cleanup code in the signal part of the event library
* Only attempt to forward standard input if we have a controlling terminal
  (isatty() returns 1) and we are the foreground process OR we do not have
  a controlling terminal (isatty() returns 0).  If we have a controlling
  terminal, check at each SIGCONT if we should change our forwarding,
  since our foreground / background status may have changed.

  Unfortunately, there isn't a great way in the iof framework to know if
  we are capturing a starter's stdin.  Use the logic that if it's a source
  AND tagged as standard input, it's a starter's stdin.  This seems to
  work for all the common usages.

Both these need to go to the v1.0 branch.

This commit was SVN r8894.
2006-02-04 23:26:58 +00:00
Brian Barrett
7c247eea01 * Add a finalize function for iof framework and add a finalize function for
the svc component so that it can disable the rml exception callback, fixing
  a race condition in the shutdown mechanism of orte.

  This should probably go to the v1.0 branch.

This commit was SVN r8893.
2006-02-03 21:01:11 +00:00
Brian Barrett
ddda56eb0d * Don't use ptys for stdin. When a pty has close() called on it, it
discards all of the data in the pty that hasn't been read.  This was
  leading to data being discarded when files were redirected into
  mpirun and read by rank 0 of the job.  This was very "not good".

  The decision to not use ptys for stdin was made based on what Tim said
  that LA-MPI was doing.

  This needs to go to the v1.0 branch...  Tim should probably review...

This commit was SVN r8892.
2006-02-03 20:43:20 +00:00
Jeff Squyres
abc67a257f This approach is cleaner than the previous one -- use a temporary
shell variable to avoid setting the OMPI $libpath twice in
$LD_LIBRARY_PATH.  Many thanks to Glenn Morris.

This commit was SVN r8883.
2006-02-02 11:58:40 +00:00
Jeff Squyres
cc1ee11eeb Fix issues with tcsh and LD_LIBRARY_PATH when using --prefix. See
lengthy comment inside for details.  Thanks to Glen Morris for finding
this issue and suggesting the fix.

This commit was SVN r8880.
2006-02-02 06:26:55 +00:00
Jeff Squyres
f7097d34c8 Remove some \n typos. Thanks to Glenn Morris for finding these.
This commit was SVN r8878.
2006-02-02 05:50:15 +00:00
Rainer Keller
dd13b098e1 - Simple locking fix.
This commit was SVN r8822.
2006-01-26 13:20:53 +00:00
Jeff Squyres
ed0fa9720d Incorporate fix suggested by Chris Gottbratch.
This commit was SVN r8750.
2006-01-19 15:21:53 +00:00
George Bosilca
e6e28460f1 Remove all windows code as fork is not available on windows. Instead a shinny new pls
will join the fun (handling process creation on windows).

This commit was SVN r8745.
2006-01-19 07:01:51 +00:00
Jeff Squyres
e6bd80b424 Per the commit message of r8514, change the search order to be "ssh :
rsh", *not* "rsh : ssh".

This commit was SVN r8736.

The following SVN revision numbers were found above:
  r8514 --> open-mpi/ompi@9c25bdc5ac
2006-01-18 22:00:34 +00:00
George Bosilca
992daf7522 Remove all unused defines from the Makefile.
This commit was SVN r8734.
2006-01-18 21:21:29 +00:00
Brian Barrett
c96f870674 * Merge of wrapper compiler updates from the bwb-wrapper-fix branch (r8690 -
r8698), with changes below:

  - Split wrapper flags into those required for each of the three projects,
    and cleaned up some cruft (including the LIBMPI_EXTRA_*FLAGS) through-
    out the build system
  - Added opal_init_util and opal_finalize_util to allow init / cleanup
    of all the opal code that doesn't require the MCA system
  - Create standalone key=value file parser, based on the one that used
    to be in the mca param parser, so that it can be shared in multiple
    places
  - Add wrapper datafiles for opal, orte, and ompi wrappers, and add
    wrapper compiler with support for all the old features

This commit was SVN r8699.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r8690
  r8698
2006-01-16 01:48:03 +00:00
George Bosilca
bf266c6109 Rollback the 8682 commit until we figure out the correct way to do it. It break several things
inside (like MPI_Wait* functions).

This commit was SVN r8686.
2006-01-13 22:02:40 +00:00
Rainer Keller
95f886b6ab - Protect callers of opal/ompi_condition_wait from spurious wakeups,
possible when with building with pthreads.
   Compiled on Linux ia32 with and without
   --enable-progress-threads

This commit was SVN r8682.
2006-01-12 17:13:08 +00:00
David Daniel
c2ee847184 Missing header file.
This commit was SVN r8670.
2006-01-10 21:58:21 +00:00
Jeff Squyres
b2de55d72e Back out some debugging stuff from a careless r8643 commit (only
intended to include the OMPI_DEBUG_ZERO call).

These debugging statements should not have affected correcteness
because the value of 78 will be overridden in the read() and the
assert()/abort() stuff will only be triggered on an error which should
never happen (i.e., the error should have been handled by the prior if
conditional).  But still, thise code should not be there.

This commit was SVN r8649.

The following SVN revision numbers were found above:
  r8643 --> open-mpi/ompi@a6b869ed68
2006-01-05 14:44:10 +00:00
Jeff Squyres
a6b869ed68 Avoid a false positive in bcheck
This commit was SVN r8643.
2006-01-04 22:29:09 +00:00
David Daniel
d272e02338 Need to include fcntl.h on linux -- protected for windows.
This commit was SVN r8630.
2006-01-04 00:54:16 +00:00
George Bosilca
7a88e72c1b Add more protections around the headers.
This commit was SVN r8617.
2005-12-31 12:35:24 +00:00
George Bosilca
d91650ea85 Do not use explicitly "ln -s" as on some systems it does not work properly ...
(windows). Instead use the LN_S variable exported by the Makefile (set to
"ln -s" on all Unixes and to "cp -p" on windows).

When we remove an executable use the correct extension for its name
(add $(EXEEXT) to the name).

This commit was SVN r8616.
2005-12-31 12:33:44 +00:00
George Bosilca
3c95dd0801 No discrimination !
This commit was SVN r8613.
2005-12-31 12:20:32 +00:00
Jeff Squyres
1e93c78e2e - Rename rsh component members: argv->agent_argv, argc->agent_argc,
and path->agent_path so that it's totally clear what these are for
- make a new rsh component param for agent_param (the value from the
  MCA param)
- delay the path check for the agent until the component init -- don't
  make it fail during open, because the MCA base will print a warning
  if a component fails open() (e.g., on clusters without rsh/ssh (!),
  this component was failing noisly even though it was
  normal/expected)

This commit was SVN r8596.
2005-12-22 14:37:19 +00:00
Jeff Squyres
8fb5e506aa Arrgh -- should have included this in last commit: need to set a
variable before we += it in a Makefile.am.

This commit was SVN r8595.
2005-12-22 14:30:27 +00:00
Jeff Squyres
93b4d12d14 Add a friendly help message if no pls components are found to be
available.

This commit was SVN r8594.
2005-12-22 14:29:45 +00:00
Jeff Squyres
5a03f86818 Fix a case where it's valid to get no responses back -- return early
before invoking malloc(0).

This commit was SVN r8577.
2005-12-21 13:45:06 +00:00
Rainer Keller
b06d79d4fe - Seems with change r7664, the mapping has slightly changed.
In case of checking for Shell with --mca pls_rsh_assume_same_shell 0
   have the node point to sensible values.

This commit was SVN r8563.

The following SVN revision numbers were found above:
  r7664 --> open-mpi/ompi@0629cdc2d7
2005-12-20 15:59:17 +00:00
Brian Barrett
a5af07cd6b fixes suggested by Ralf for supporting both Libtool 1 and 2 in Open MPI...
This commit was SVN r8538.
2005-12-19 03:10:23 +00:00
George Bosilca
f9b07f1912 Protect the includes.
This commit was SVN r8532.
2005-12-17 22:05:10 +00:00
Brian Barrett
456ba1c11f * need to declare environ on OS X
this should go to the 1.0 branch

This commit was SVN r8527.
2005-12-16 19:20:33 +00:00
Jeff Squyres
fa097c9874 Remove two components that were templated out quite a while ago and
aren't currently in use (i.e., they were never finished).  If needed,
they can be pulled out of SVN history.

This commit was SVN r8524.
2005-12-16 17:40:51 +00:00
Jeff Squyres
25b2730a34 Only allow the fork component to run when we're in an orted.
This commit was SVN r8515.
2005-12-15 21:05:26 +00:00
Jeff Squyres
9c25bdc5ac Change to the rsh pls component to have the pls_rsh_agent MCA param
now take a colon-delimited list of agents (and associated argv).  Also
change the default value to "ssh : rsh".  Hence, if we run on a
cluster that does not have ssh, we'll fall back to rsh.  If we can't
find rsh, then the rsh component will disqualify itself from
selection.

This commit was SVN r8514.
2005-12-15 20:54:24 +00:00
Jeff Squyres
e184fd6801 Make sure that what we find is executable
This commit was SVN r8513.
2005-12-15 20:31:20 +00:00
George Bosilca
505d830b3f I miss the requirement for the mca_base_component_repository.h header.
This commit was SVN r8465.
2005-12-12 21:10:30 +00:00
George Bosilca
7d8d516a4a A bunch of fixed for Windows support.
- protection with __WINDOWS__ and not WIN32 or _WIN32
 - protect all the headers

This commit was SVN r8463.
2005-12-12 20:04:00 +00:00
George Bosilca
32cecc5798 Change ERROR to subscribe_error because ERROR is predefined on Windows. I didn't spend
to much time tracking that down, I just know that cl.exe will replace it with the 
"constant" string ...

This commit was SVN r8449.
2005-12-11 06:23:07 +00:00
Jeff Squyres
31336e4773 Add some missing headers / correct one installation directory
This commit was SVN r8408.
2005-12-08 04:00:52 +00:00
Jeff Squyres
6fbd321442 Fix a bunch of install locations for header files
This commit was SVN r8406.
2005-12-08 00:54:44 +00:00
Jeff Squyres
e781f55d16 Add proper prefixes into the #include statements
This commit was SVN r8404.
2005-12-08 00:05:26 +00:00
Jeff Squyres
3f27e61de6 Fix location of installed header files
This commit was SVN r8403.
2005-12-08 00:04:19 +00:00
Jeff Squyres
bd0b5acf0b Oops -- there's a second instance of OCRNL that needed to be
protected.

This commit was SVN r8374.
2005-12-02 18:24:59 +00:00
Jeff Squyres
0c9420e204 OS X 10.3 does not have OCRNL #define'd, so we need to protect its
usage 

This commit was SVN r8371.
2005-12-02 16:57:37 +00:00
Brian Barrett
bc4d3d6fff IRIX compile fixes:
- Need to make sure that SIZE_MAX exists as a constant if stdint.h
    doesn't exist
  - struct timeval is defined in unistd.h on IRIX, so need to include
    that headerfile where ever struct timeval is used.

This commit was SVN r8361.
2005-12-01 18:28:20 +00:00
Tim Woodall
20e6f41fe2 allow node number as hostname for bproc
This commit was SVN r8357.
2005-12-01 17:44:08 +00:00
Brian Barrett
389e378054 * use opal_init / opal_finalize in orteprobe so that ordering doesn't get out of
sync with opal....

This commit was SVN r8341.
2005-11-30 21:40:11 +00:00
Brian Barrett
79bf8843d2 * update memory hooks interface to allow for callbacks on both allocations
and dealllocations, per request from Galen and Tim

This commit was SVN r8303.
2005-11-29 04:46:14 +00:00
Tim Woodall
cf53d3e48f missing include
This commit was SVN r8295.
2005-11-28 23:13:36 +00:00
Galen Shipman
6e64e8a144 bproc fixes, these exist in the release 1.0 branch.
This commit was SVN r8292.
2005-11-28 21:10:02 +00:00
Tim Woodall
943e6f0cd5 corrections for stdin
- when eof is reached at orterun, send a 0 byte message to peer indicating eof
- on receipt of zero byte message - close corresponding file descriptor associated with the endpoint
- require setup ptys for stdin and stdout so that stdin can be closed independently of stdout

This commit was SVN r8264.
2005-11-28 14:58:53 +00:00
Tim Woodall
eb7cfe3ecd implement unsubscribe
This commit was SVN r8214.
2005-11-21 19:46:47 +00:00
Jeff Squyres
443d833ee9 fx2 is the serial debugger; fxp is the parallel debugger.
This commit was SVN r8211.
2005-11-21 17:00:36 +00:00
Brian Barrett
fee6409708 fix compiler warning and compiler error in totalview code...
This commit was SVN r8207.
2005-11-20 18:41:45 +00:00
Jeff Squyres
8d96c21311 Good weekend brainless activity -- implement the orterun command line
debugger scheme described in
http://www.open-mpi.org/community/lists/users/2005/10/0214.php.  This
makes our user-level debugger scheme much more vendor-independent
(although the "-tv" option will still work for backwards compatibility
-- it'll just be a synonum of "--debug").

This commit was SVN r8206.
2005-11-20 16:06:53 +00:00
Brian Barrett
20cea60b82 * fix "make distclean" error in PML
* turns out (duh!) that there was a reason that the <projectdir>dir
  variable was set in the AM conditional.  If not, stupid directories
  are created and not needed...  duh.

This commit was SVN r8205.
2005-11-20 07:41:09 +00:00
Brian Barrett
8faa1884f0 * The last of the build system optimizations. Combine the component and
component/base Makefile.am files, reducing the time configure spends
  stamping out Makefiles at the end
* Install base_impl.h file when devel-headers are being installed

This commit was SVN r8200.
2005-11-20 01:03:01 +00:00
Tim Woodall
d579e048f7 reset node name to be node number only to match
value set by allocation/mapper

This commit was SVN r8186.
2005-11-17 22:02:28 +00:00
Jeff Squyres
23ca7e1311 Ensure to return a value.
This commit was SVN r8182.
2005-11-17 14:31:42 +00:00
Brian Barrett
3e3ba49cdb should have removed the line of code, rather than #if 0'ing it out
This commit was SVN r8172.
2005-11-17 05:22:19 +00:00
Brian Barrett
f464bbbcc0 fix a couple of double-lock issues in the iof code that have crept in recently.
This should go to the v1.0 branch.

This commit was SVN r8171.
2005-11-17 01:26:00 +00:00
Tim Woodall
142b7cc682 merge from release branch
This commit was SVN r8167.
2005-11-16 17:10:49 +00:00
Tim Woodall
59d8c791d9 return fragments to free list
This commit was SVN r8121.
2005-11-11 17:48:56 +00:00
George Bosilca
c802d54696 The return type is an int. Casting it to a size_t before checking if it's bigger than zero lead to a true condition ... always ...
This commit was SVN r8114.
2005-11-11 06:34:14 +00:00
Brian Barrett
878676218e Rename opal/memory to opal/memoryhooks because XLC++ on Mac OS X is broken.
When compiling C++ code that includes something that looks for the C++
header file "memory" (stupid C++ headers not having .h extensions), it
goes through the header file search path, which includes $(topsrcdir)/opal,
so it finds the directory $(topsrcdir)/opal/memory/ and tries to load
that as the memory header file and all goes downhill.

This commit was SVN r8111.
2005-11-11 00:26:27 +00:00
Josh Hursey
5fa34df9ce Fix for orted / MPI_Abort problem reported from testers. They were seeing orteds
spining in orte_iof_base_flush() when running 
  intel_tests/src/MPI_Errhandler_fatal_c

When we close an endpoint by taking it out of the envent handler, we need to make
sure that it fits the criteria to pass through orte_iof_base_flush(), specificly
make sure we clean out the ep_frags list.
Note: This is more of a sanity check, since the endpoint should already be
      in this state at the point of closure.

Secondly in orte_iof_base_endpoint_read_handler(), if we determine that it is 
necessary to close the endpoint we have to "return" after doing so, otherwise
we add another frag to the endpoint which will cause it to hang in 
orte_iof_base_flush().

Bug go squish!

This commit was SVN r8109.
2005-11-11 00:09:07 +00:00
Tim Woodall
7f20198d49 Filter the set of data returned to the daemons during
startup using the new get_conditional command to improve
scalability during launch

This commit was SVN r8097.
2005-11-10 16:44:51 +00:00
Tim Woodall
d62ea1835d correct typo
This commit was SVN r8090.
2005-11-10 15:29:52 +00:00
Brian Barrett
86e2adc43a * it appears that including event.h without calling opal_init annoys XLC on
OS X (you get an undefined symbol opal_event_lock).  Since the code is
  all #if 0'ed out, #if 0 out the header for now as well.

  I believe console and openmpi are to be removed from OMPI before 1.0
  release, so this doesn't need to go to the 1.0 branch

This commit was SVN r8089.
2005-11-10 15:24:57 +00:00
Tim Woodall
3556757726 init callback from proxy
This commit was SVN r8085.
2005-11-10 05:27:11 +00:00
Tim Woodall
0b0d7f56c1 added support for callback on receipt of I/O
This commit was SVN r8084.
2005-11-10 04:49:51 +00:00
Tim Woodall
3699c924bd callback for init prior to launch - allow app to hookup stdout/stderr
prior to launch

This commit was SVN r8083.
2005-11-10 04:47:41 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Josh Hursey
e7d5ecf016 Comment out the C/N notation parsing. Interior comment has more details.
This commit was SVN r7980.
2005-11-03 18:15:47 +00:00
Jeff Squyres
1b691f8089 Pull NULL checks around releasing of resources to ensure we don't
segv.

This commit was SVN r7971.
2005-11-03 11:27:19 +00:00
Jeff Squyres
653f43cc2b Update to latest prototype
This commit was SVN r7970.
2005-11-03 11:23:23 +00:00
Jeff Squyres
60b0330bc1 Initialize "conditions" to ensure we don't segv
This commit was SVN r7961.
2005-11-01 17:13:18 +00:00
Ralph Castain
399e41d113 Fix a potential memory leak...
This commit was SVN r7960.
2005-11-01 15:17:11 +00:00
Jeff Squyres
0379b27969 Add missing DESTRUCT
This commit was SVN r7948.
2005-11-01 13:35:44 +00:00
Jeff Squyres
a2e507c629 Fix potential segv through uninitialized variable
This commit was SVN r7946.
2005-11-01 13:09:00 +00:00
Tim Woodall
e27dfb180d yet another fix
This commit was SVN r7941.
2005-10-31 21:59:14 +00:00
Tim Woodall
aa5b61e4f1 corrections for multiple app contexts
This commit was SVN r7939.
2005-10-31 20:37:44 +00:00
Tim Woodall
cf5c27c1e3 start all of the sends in parallel (from the same buffer) - wait for
all to complete

This commit was SVN r7935.
2005-10-31 16:21:51 +00:00
Tim Woodall
a891db81e9 set socket options to improve oob performance
This commit was SVN r7934.
2005-10-31 16:21:11 +00:00
Jeff Squyres
8503fce61b Remove debugging message
This commit was SVN r7924.
2005-10-28 18:53:20 +00:00
Jeff Squyres
ce78b76598 Quick fix from Ralph -- this escape committing last night.
This commit was SVN r7917.
2005-10-28 14:03:26 +00:00
Ralph Castain
afeeacd76d Complete hookup of the registry proxy for the get_conditional command.
This commit was SVN r7915.
2005-10-28 05:35:07 +00:00
Ralph Castain
ad9de4ca3b Restore the pointer arrays to the registry dictionaries. Revise the system so that the itag is equivalent to the index into the pointer array. It already was, but it wasn't obvious before (several functions relied upon it, but others "hid" the relationship) - now, make it explicitly clear. Set things up so lookups occur at max speed by just indexing into the dictionary array.
This commit was SVN r7912.
2005-10-28 04:56:06 +00:00
Ralph Castain
eebda71a0b Add a new API to the registry for conditional data retrievals. The new API allows you to retrieve data from registry containers that have key-value pairs where the value matches the specified one. The requested keys are then retrived from that container.
This commit was SVN r7907.
2005-10-28 00:30:58 +00:00
Tim Woodall
3fd351117a removed debug
This commit was SVN r7902.
2005-10-27 21:07:49 +00:00
Tim Woodall
793836da57 removed debug
This commit was SVN r7897.
2005-10-27 17:10:49 +00:00
Tim Woodall
7300112564 removed debug
This commit was SVN r7896.
2005-10-27 17:08:30 +00:00
Tim Woodall
60754acae8 - modified rmaps data structures to point directly to ras node
- modified rsh to NOT query for each nodes mapping, as all data is
  already available in the rmaps structures

This commit was SVN r7894.
2005-10-27 17:04:10 +00:00
Tim Woodall
c0124fecdd changed segment dictionary to hash table to improve
search time for reverse lookup

This commit was SVN r7893.
2005-10-27 17:00:47 +00:00
Tim Woodall
b60bea9ada dont allow callbacks to processed recursively - appear to be blowing away the stack
This commit was SVN r7862.
2005-10-25 13:48:08 +00:00
Tim Woodall
4eca6e22bd use persistent non-blocking receives
This commit was SVN r7861.
2005-10-25 13:38:13 +00:00
Andrew Friedley
f92185c43b Complete George's fix - this is a problem I caused :(
This takes care of Troy's first segfault problem, and compile errors that will likely happen as soon as Ken applies George's patch and runs make again.

This commit was SVN r7833.
2005-10-21 21:06:20 +00:00
Tim Woodall
88c7fd9f8d add support for a "persistent" non-blocking receive
doesn't require a re-registration on every receive

This commit was SVN r7822.
2005-10-20 22:06:11 +00:00
Tim Woodall
cea599a274 back out prior change - investigate an alternate approach
This commit was SVN r7821.
2005-10-20 17:49:13 +00:00
Tim Woodall
56983d3e7f Don't invoke non-blocking recv callbacks when recv is posted. Otherwise,
this can result in recursive callbacks and extremely long call chains

This commit was SVN r7817.
2005-10-20 15:07:06 +00:00
Tim Woodall
d0cd752e33 - don't track the sequence number when the endpoint is a data sink,
its not needed and there could be multiple sources each w/ their 
  own sequence.
- if a write doesn't complete, need to check for non-blocking case.. 

This commit was SVN r7795.
2005-10-18 14:26:12 +00:00
Jeff Squyres
89931ac05f - Correct typo in comment
- Add DIST_SUBDIRS to ompi/tools/Makefile.am

This commit was SVN r7780.
2005-10-17 11:55:55 +00:00
Brian Barrett
1302cb4072 The next in a long line of crazed build system changes from Brian. This was
originally suggested by Ralf Wildenhues, to try to speed autogen, configure,
and make (and possibly even make install).  Use automake's include directive
to drastically reduce the number of Makefile files (although the number of
Makefile.am files is the same - most are just included in a top-level
Makefile.am).  Also use an Automake SUBDIRs feature to eliminate the
dynamic-mca tree, which was no longer really needed.  This makes adding
a framework easier (since you don't have to remember the dynamic-mca
tree) and makes building faster (as make doesn't have to recurse through
the dynamic-mca tree)

This commit was SVN r7777.
2005-10-17 00:21:10 +00:00
Jeff Squyres
9a25554559 Patch from Brooks Davis for some BSD compatibility issues.
This commit was SVN r7751.
2005-10-13 15:41:25 +00:00
Thara Angskun
8b59de0f37 Import RAS for POE
This commit was SVN r7748.
2005-10-13 14:08:17 +00:00
Thara Angskun
73fff4ea2c - change from mca_base_param_register_* to mca_base_param_reg_*
- update include files / fix minor bugs

This commit was SVN r7746.
2005-10-13 12:58:31 +00:00
Josh Hursey
92429dc90f Fix for a problem Edgar and Jeff identified WRT PLS determining if we are
oversubscribed on a node. And thus whether to call sched_yield or not.

The value of node->node_slots_inuse does not currently represent the number of
slots actually in use, at the moment. This is actually a bug in the RAS/RMAPS
base components, but the fix for that specific bug is bigger than we want to 
address at the moment (but will certianly do so in the near future).

Since we cannot trust this value, use the total number of mapped processes
(which was properly set by the RMAPS component upon mapping -- Just not 
properly propagated back to the registry's node segment) from the process 
mapping.

In addition to this change I cleaned up a couple of the debug messages. It
seems that TM and RSH are the only two directly effected by this. SLURM
would be if that section of code wasn't currently inactive, but put the fix
in for prosparity.

This commit was SVN r7743.
2005-10-13 03:26:48 +00:00
Josh Hursey
0f08e87a1f Fixed a max_slots off by one problem that Brian highlighted.
Also cleaned up the error message when allocating over the number of
slots available.

This commit was SVN r7715.
2005-10-12 02:09:56 +00:00
Ralph Castain
70779fa2ab Cleanup some old logic - nothing major.
This commit was SVN r7712.
2005-10-12 01:12:27 +00:00
Brian Barrett
128389758f * fix compile error in XGrid PLS that got introduced sometime in the not
too distant past
* work around apparently broken handling of max_slots somewhere along
  the line by just setting it to 0

Both changes should go to the trunk.

This commit was SVN r7710.
2005-10-12 00:41:14 +00:00
Josh Hursey
af9ccdf04a need to use get_first instead of get_begin since we don't want to execute
this loop if "nodes" is an empty list. get_first, in this loop context, 
allows us to do just that, while get_begin doesn't.

This fixes a --host problem that appeared on the Linux PPC64 build.

This commit was SVN r7703.
2005-10-11 21:33:04 +00:00
Josh Hursey
8ba2900341 fixed a typo, added comments for future work
This commit was SVN r7700.
2005-10-11 20:59:31 +00:00
Ralph Castain
e1244fc160 Fix a few thread-lock things discovered by Josh. The thread locks in the registry's local notify delivery system had not been updated to reflect the design change whereby the xcast uses the notify delivery system. This has now been fixed.
Also revised the callbacks to store and utilize local variables to avoid problems where threads modify the global structures. Not sure this totally fixes the problem, but it's a shot - suggested by Josh (and Jeff, I believe).

This commit was SVN r7694.
2005-10-11 19:35:04 +00:00
Ralph Castain
a47655b3fd Add unlock/lock around the delivery of a local callback to remove thread-lock condition if the callback function attempts to re-enter the registry.
This commit was SVN r7678.
2005-10-10 02:45:50 +00:00
Ralph Castain
6c839048cf Fix a typo that caused valgrind to bark on 64-bit machines. Actually was a potential source of error, so the barking was legit.
This commit was SVN r7677.
2005-10-10 02:34:26 +00:00
Josh Hursey
d5ebb5c46a fix a compiler warning
This commit was SVN r7674.
2005-10-08 17:03:12 +00:00
Jeff Squyres
0629cdc2d7 Bring back the changes from /tmp/jjhursey-rmaps. Specific merge
command:

svn merge -r 7567:7663 https://svn.open-mpi.org/svn/ompi/tmp/jjhursey-rmaps .

(where "." is a trunk checkout)

The logs from this branch are much more descriptive than I will put
here (including a *really* long description from last night).  Here's
the short version:

- fixed some broken implementations in ras and rmaps
- "orterun --host ..." now works and has clearly defined semantics
  (this was the impetus for the branch and all these fixes -- LANL had
  a requirement for --host to work for 1.0)
- there is still a little bit of cleanup left to do post-1.0 (we got
  correct functionality for 1.0 -- we did not fix bad implementations
  that still "work")
  - rds/hostfile and ras/hostfile handshaking
  - singleton node segment assignments in stage1
  - remove the default hostfile (no need for it anymore with the
    localhost ras component)
  - clean up pls components to avoid duplicate ras mapping queries
  - [possible] -bynode/-byslot being specific to a single app context 

This commit was SVN r7664.
2005-10-07 22:24:52 +00:00
Tim Woodall
3c900a7aa2 - fix a deadlock on threaded build
- update sequence number after a partial write completes

This commit was SVN r7654.
2005-10-06 21:50:58 +00:00
Tim Woodall
a79e07390a remove debug
This commit was SVN r7653.
2005-10-06 21:29:47 +00:00
Tim Woodall
2ea71064ad close all file descriptors w/ the exception of stdin/stdout/stderr
otherwise, parent's file descriptors are inherited and held open by
the child even if the parent dies

This commit was SVN r7652.
2005-10-06 21:22:36 +00:00
Tim Woodall
797922fbab - cleanup on loss of connection to peer
- generate ack if no one to forward msg to

This commit was SVN r7651.
2005-10-06 21:21:26 +00:00
Tim Woodall
3280f6e655 add facility to receive callback on disconnection from peer
This commit was SVN r7650.
2005-10-06 19:39:20 +00:00
Andrew Friedley
37123ed430 Implement an opal_show_help() (like is done in ompi_mpi_init) for error handling in opal_init and both stages of orte_init.
Some of the functions in opal_init are void or return a bool (opal_output_init, but always returns true.. eh?), so I don't check them.

This commit was SVN r7638.
2005-10-05 13:56:35 +00:00
Jeff Squyres
65f1adfedc Add "-tv" option to orterun:
orterun -tv -np 4 foo

which will turn around and re-exec:

      totalview orterun -a -np 4 foo

This commit was SVN r7636.
2005-10-05 10:24:34 +00:00
Jeff Squyres
65698bc6be Remove compiler warning
This commit was SVN r7635.
2005-10-05 10:23:02 +00:00
Jeff Squyres
0f100d8577 - Don't overwrite rc with the return value from pls_tm_disconnect --
it's always ORTE_SUCCESS and sometimes masks real !=ORTE_SUCCESS rc
  values. 
- Add MCA param pls_tm_want_path_check.  If nonzero (the default),
  check for the orted in the PATH before each tm_spawn()'ing (doing a
  little caching so that we don't hammer on the filesystem -- remember
  all the PATH's where we successfully found the orted so that we
  don't have to query the filesystem multiple times for a PATH where
  we previously found the orted)
- Be sure to opal_argv_split() the pls_tm_orted MCA param

This commit was SVN r7625.
2005-10-04 19:38:51 +00:00
Jeff Squyres
b79c46dbf6 Downgrade the default priority to 75, just to give leeway (same as the
slurm pls).

This commit was SVN r7624.
2005-10-04 19:18:52 +00:00
Jeff Squyres
eb24fe4fd8 If the job fails to launch properly, set its state to ABORTED, which
will fire some subscriptions that will eventually result in invoking
terminate_job (i.e., terminate anything that may have been
successfully started by launch).

This commit was SVN r7622.
2005-10-04 17:19:23 +00:00
Jeff Squyres
80399aff17 Add some README's to describe what these components are fore.
This commit was SVN r7618.
2005-10-04 15:14:23 +00:00
Jeff Squyres
3df0828921 Restore this PLS -- LANL needs this for some of its older clusters.
This commit was SVN r7617.
2005-10-04 15:09:38 +00:00
Jeff Squyres
7645a0fa23 This is the old bproc launcher that is ok to remove.
This commit was SVN r7583.
2005-10-02 14:58:52 +00:00
Jeff Squyres
a9f24c27bd Restore bproc -- this was *not* the old one (didn't read Tim Prins'
mail carefully -- doh!)

This commit was SVN r7582.
2005-10-02 14:57:44 +00:00