1
1
Граф коммитов

400 Коммитов

Автор SHA1 Сообщение Дата
Rainer Keller
95f886b6ab - Protect callers of opal/ompi_condition_wait from spurious wakeups,
possible when with building with pthreads.
   Compiled on Linux ia32 with and without
   --enable-progress-threads

This commit was SVN r8682.
2006-01-12 17:13:08 +00:00
David Daniel
c2ee847184 Missing header file.
This commit was SVN r8670.
2006-01-10 21:58:21 +00:00
Jeff Squyres
b2de55d72e Back out some debugging stuff from a careless r8643 commit (only
intended to include the OMPI_DEBUG_ZERO call).

These debugging statements should not have affected correcteness
because the value of 78 will be overridden in the read() and the
assert()/abort() stuff will only be triggered on an error which should
never happen (i.e., the error should have been handled by the prior if
conditional).  But still, thise code should not be there.

This commit was SVN r8649.

The following SVN revision numbers were found above:
  r8643 --> open-mpi/ompi@a6b869ed68
2006-01-05 14:44:10 +00:00
Jeff Squyres
a6b869ed68 Avoid a false positive in bcheck
This commit was SVN r8643.
2006-01-04 22:29:09 +00:00
David Daniel
d272e02338 Need to include fcntl.h on linux -- protected for windows.
This commit was SVN r8630.
2006-01-04 00:54:16 +00:00
George Bosilca
7a88e72c1b Add more protections around the headers.
This commit was SVN r8617.
2005-12-31 12:35:24 +00:00
George Bosilca
3c95dd0801 No discrimination !
This commit was SVN r8613.
2005-12-31 12:20:32 +00:00
Jeff Squyres
1e93c78e2e - Rename rsh component members: argv->agent_argv, argc->agent_argc,
and path->agent_path so that it's totally clear what these are for
- make a new rsh component param for agent_param (the value from the
  MCA param)
- delay the path check for the agent until the component init -- don't
  make it fail during open, because the MCA base will print a warning
  if a component fails open() (e.g., on clusters without rsh/ssh (!),
  this component was failing noisly even though it was
  normal/expected)

This commit was SVN r8596.
2005-12-22 14:37:19 +00:00
Jeff Squyres
8fb5e506aa Arrgh -- should have included this in last commit: need to set a
variable before we += it in a Makefile.am.

This commit was SVN r8595.
2005-12-22 14:30:27 +00:00
Jeff Squyres
93b4d12d14 Add a friendly help message if no pls components are found to be
available.

This commit was SVN r8594.
2005-12-22 14:29:45 +00:00
Jeff Squyres
5a03f86818 Fix a case where it's valid to get no responses back -- return early
before invoking malloc(0).

This commit was SVN r8577.
2005-12-21 13:45:06 +00:00
Rainer Keller
b06d79d4fe - Seems with change r7664, the mapping has slightly changed.
In case of checking for Shell with --mca pls_rsh_assume_same_shell 0
   have the node point to sensible values.

This commit was SVN r8563.

The following SVN revision numbers were found above:
  r7664 --> open-mpi/ompi@0629cdc2d7
2005-12-20 15:59:17 +00:00
Brian Barrett
a5af07cd6b fixes suggested by Ralf for supporting both Libtool 1 and 2 in Open MPI...
This commit was SVN r8538.
2005-12-19 03:10:23 +00:00
Brian Barrett
456ba1c11f * need to declare environ on OS X
this should go to the 1.0 branch

This commit was SVN r8527.
2005-12-16 19:20:33 +00:00
Jeff Squyres
fa097c9874 Remove two components that were templated out quite a while ago and
aren't currently in use (i.e., they were never finished).  If needed,
they can be pulled out of SVN history.

This commit was SVN r8524.
2005-12-16 17:40:51 +00:00
Jeff Squyres
25b2730a34 Only allow the fork component to run when we're in an orted.
This commit was SVN r8515.
2005-12-15 21:05:26 +00:00
Jeff Squyres
9c25bdc5ac Change to the rsh pls component to have the pls_rsh_agent MCA param
now take a colon-delimited list of agents (and associated argv).  Also
change the default value to "ssh : rsh".  Hence, if we run on a
cluster that does not have ssh, we'll fall back to rsh.  If we can't
find rsh, then the rsh component will disqualify itself from
selection.

This commit was SVN r8514.
2005-12-15 20:54:24 +00:00
George Bosilca
505d830b3f I miss the requirement for the mca_base_component_repository.h header.
This commit was SVN r8465.
2005-12-12 21:10:30 +00:00
George Bosilca
7d8d516a4a A bunch of fixed for Windows support.
- protection with __WINDOWS__ and not WIN32 or _WIN32
 - protect all the headers

This commit was SVN r8463.
2005-12-12 20:04:00 +00:00
George Bosilca
32cecc5798 Change ERROR to subscribe_error because ERROR is predefined on Windows. I didn't spend
to much time tracking that down, I just know that cl.exe will replace it with the 
"constant" string ...

This commit was SVN r8449.
2005-12-11 06:23:07 +00:00
Jeff Squyres
31336e4773 Add some missing headers / correct one installation directory
This commit was SVN r8408.
2005-12-08 04:00:52 +00:00
Jeff Squyres
6fbd321442 Fix a bunch of install locations for header files
This commit was SVN r8406.
2005-12-08 00:54:44 +00:00
Jeff Squyres
e781f55d16 Add proper prefixes into the #include statements
This commit was SVN r8404.
2005-12-08 00:05:26 +00:00
Jeff Squyres
3f27e61de6 Fix location of installed header files
This commit was SVN r8403.
2005-12-08 00:04:19 +00:00
Jeff Squyres
bd0b5acf0b Oops -- there's a second instance of OCRNL that needed to be
protected.

This commit was SVN r8374.
2005-12-02 18:24:59 +00:00
Jeff Squyres
0c9420e204 OS X 10.3 does not have OCRNL #define'd, so we need to protect its
usage 

This commit was SVN r8371.
2005-12-02 16:57:37 +00:00
Brian Barrett
bc4d3d6fff IRIX compile fixes:
- Need to make sure that SIZE_MAX exists as a constant if stdint.h
    doesn't exist
  - struct timeval is defined in unistd.h on IRIX, so need to include
    that headerfile where ever struct timeval is used.

This commit was SVN r8361.
2005-12-01 18:28:20 +00:00
Tim Woodall
20e6f41fe2 allow node number as hostname for bproc
This commit was SVN r8357.
2005-12-01 17:44:08 +00:00
Tim Woodall
cf53d3e48f missing include
This commit was SVN r8295.
2005-11-28 23:13:36 +00:00
Galen Shipman
6e64e8a144 bproc fixes, these exist in the release 1.0 branch.
This commit was SVN r8292.
2005-11-28 21:10:02 +00:00
Tim Woodall
943e6f0cd5 corrections for stdin
- when eof is reached at orterun, send a 0 byte message to peer indicating eof
- on receipt of zero byte message - close corresponding file descriptor associated with the endpoint
- require setup ptys for stdin and stdout so that stdin can be closed independently of stdout

This commit was SVN r8264.
2005-11-28 14:58:53 +00:00
Tim Woodall
eb7cfe3ecd implement unsubscribe
This commit was SVN r8214.
2005-11-21 19:46:47 +00:00
Brian Barrett
20cea60b82 * fix "make distclean" error in PML
* turns out (duh!) that there was a reason that the <projectdir>dir
  variable was set in the AM conditional.  If not, stupid directories
  are created and not needed...  duh.

This commit was SVN r8205.
2005-11-20 07:41:09 +00:00
Brian Barrett
8faa1884f0 * The last of the build system optimizations. Combine the component and
component/base Makefile.am files, reducing the time configure spends
  stamping out Makefiles at the end
* Install base_impl.h file when devel-headers are being installed

This commit was SVN r8200.
2005-11-20 01:03:01 +00:00
Tim Woodall
d579e048f7 reset node name to be node number only to match
value set by allocation/mapper

This commit was SVN r8186.
2005-11-17 22:02:28 +00:00
Jeff Squyres
23ca7e1311 Ensure to return a value.
This commit was SVN r8182.
2005-11-17 14:31:42 +00:00
Brian Barrett
3e3ba49cdb should have removed the line of code, rather than #if 0'ing it out
This commit was SVN r8172.
2005-11-17 05:22:19 +00:00
Brian Barrett
f464bbbcc0 fix a couple of double-lock issues in the iof code that have crept in recently.
This should go to the v1.0 branch.

This commit was SVN r8171.
2005-11-17 01:26:00 +00:00
Tim Woodall
142b7cc682 merge from release branch
This commit was SVN r8167.
2005-11-16 17:10:49 +00:00
Tim Woodall
59d8c791d9 return fragments to free list
This commit was SVN r8121.
2005-11-11 17:48:56 +00:00
George Bosilca
c802d54696 The return type is an int. Casting it to a size_t before checking if it's bigger than zero lead to a true condition ... always ...
This commit was SVN r8114.
2005-11-11 06:34:14 +00:00
Josh Hursey
5fa34df9ce Fix for orted / MPI_Abort problem reported from testers. They were seeing orteds
spining in orte_iof_base_flush() when running 
  intel_tests/src/MPI_Errhandler_fatal_c

When we close an endpoint by taking it out of the envent handler, we need to make
sure that it fits the criteria to pass through orte_iof_base_flush(), specificly
make sure we clean out the ep_frags list.
Note: This is more of a sanity check, since the endpoint should already be
      in this state at the point of closure.

Secondly in orte_iof_base_endpoint_read_handler(), if we determine that it is 
necessary to close the endpoint we have to "return" after doing so, otherwise
we add another frag to the endpoint which will cause it to hang in 
orte_iof_base_flush().

Bug go squish!

This commit was SVN r8109.
2005-11-11 00:09:07 +00:00
Tim Woodall
7f20198d49 Filter the set of data returned to the daemons during
startup using the new get_conditional command to improve
scalability during launch

This commit was SVN r8097.
2005-11-10 16:44:51 +00:00
Tim Woodall
d62ea1835d correct typo
This commit was SVN r8090.
2005-11-10 15:29:52 +00:00
Tim Woodall
3556757726 init callback from proxy
This commit was SVN r8085.
2005-11-10 05:27:11 +00:00
Tim Woodall
0b0d7f56c1 added support for callback on receipt of I/O
This commit was SVN r8084.
2005-11-10 04:49:51 +00:00
Tim Woodall
3699c924bd callback for init prior to launch - allow app to hookup stdout/stderr
prior to launch

This commit was SVN r8083.
2005-11-10 04:47:41 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Jeff Squyres
1b691f8089 Pull NULL checks around releasing of resources to ensure we don't
segv.

This commit was SVN r7971.
2005-11-03 11:27:19 +00:00
Jeff Squyres
653f43cc2b Update to latest prototype
This commit was SVN r7970.
2005-11-03 11:23:23 +00:00
Jeff Squyres
60b0330bc1 Initialize "conditions" to ensure we don't segv
This commit was SVN r7961.
2005-11-01 17:13:18 +00:00
Ralph Castain
399e41d113 Fix a potential memory leak...
This commit was SVN r7960.
2005-11-01 15:17:11 +00:00
Jeff Squyres
0379b27969 Add missing DESTRUCT
This commit was SVN r7948.
2005-11-01 13:35:44 +00:00
Jeff Squyres
a2e507c629 Fix potential segv through uninitialized variable
This commit was SVN r7946.
2005-11-01 13:09:00 +00:00
Tim Woodall
e27dfb180d yet another fix
This commit was SVN r7941.
2005-10-31 21:59:14 +00:00
Tim Woodall
aa5b61e4f1 corrections for multiple app contexts
This commit was SVN r7939.
2005-10-31 20:37:44 +00:00
Tim Woodall
cf5c27c1e3 start all of the sends in parallel (from the same buffer) - wait for
all to complete

This commit was SVN r7935.
2005-10-31 16:21:51 +00:00
Tim Woodall
a891db81e9 set socket options to improve oob performance
This commit was SVN r7934.
2005-10-31 16:21:11 +00:00
Jeff Squyres
8503fce61b Remove debugging message
This commit was SVN r7924.
2005-10-28 18:53:20 +00:00
Jeff Squyres
ce78b76598 Quick fix from Ralph -- this escape committing last night.
This commit was SVN r7917.
2005-10-28 14:03:26 +00:00
Ralph Castain
afeeacd76d Complete hookup of the registry proxy for the get_conditional command.
This commit was SVN r7915.
2005-10-28 05:35:07 +00:00
Ralph Castain
ad9de4ca3b Restore the pointer arrays to the registry dictionaries. Revise the system so that the itag is equivalent to the index into the pointer array. It already was, but it wasn't obvious before (several functions relied upon it, but others "hid" the relationship) - now, make it explicitly clear. Set things up so lookups occur at max speed by just indexing into the dictionary array.
This commit was SVN r7912.
2005-10-28 04:56:06 +00:00
Ralph Castain
eebda71a0b Add a new API to the registry for conditional data retrievals. The new API allows you to retrieve data from registry containers that have key-value pairs where the value matches the specified one. The requested keys are then retrived from that container.
This commit was SVN r7907.
2005-10-28 00:30:58 +00:00
Tim Woodall
3fd351117a removed debug
This commit was SVN r7902.
2005-10-27 21:07:49 +00:00
Tim Woodall
793836da57 removed debug
This commit was SVN r7897.
2005-10-27 17:10:49 +00:00
Tim Woodall
7300112564 removed debug
This commit was SVN r7896.
2005-10-27 17:08:30 +00:00
Tim Woodall
60754acae8 - modified rmaps data structures to point directly to ras node
- modified rsh to NOT query for each nodes mapping, as all data is
  already available in the rmaps structures

This commit was SVN r7894.
2005-10-27 17:04:10 +00:00
Tim Woodall
c0124fecdd changed segment dictionary to hash table to improve
search time for reverse lookup

This commit was SVN r7893.
2005-10-27 17:00:47 +00:00
Tim Woodall
b60bea9ada dont allow callbacks to processed recursively - appear to be blowing away the stack
This commit was SVN r7862.
2005-10-25 13:48:08 +00:00
Tim Woodall
4eca6e22bd use persistent non-blocking receives
This commit was SVN r7861.
2005-10-25 13:38:13 +00:00
Tim Woodall
88c7fd9f8d add support for a "persistent" non-blocking receive
doesn't require a re-registration on every receive

This commit was SVN r7822.
2005-10-20 22:06:11 +00:00
Tim Woodall
cea599a274 back out prior change - investigate an alternate approach
This commit was SVN r7821.
2005-10-20 17:49:13 +00:00
Tim Woodall
56983d3e7f Don't invoke non-blocking recv callbacks when recv is posted. Otherwise,
this can result in recursive callbacks and extremely long call chains

This commit was SVN r7817.
2005-10-20 15:07:06 +00:00
Tim Woodall
d0cd752e33 - don't track the sequence number when the endpoint is a data sink,
its not needed and there could be multiple sources each w/ their 
  own sequence.
- if a write doesn't complete, need to check for non-blocking case.. 

This commit was SVN r7795.
2005-10-18 14:26:12 +00:00
Brian Barrett
1302cb4072 The next in a long line of crazed build system changes from Brian. This was
originally suggested by Ralf Wildenhues, to try to speed autogen, configure,
and make (and possibly even make install).  Use automake's include directive
to drastically reduce the number of Makefile files (although the number of
Makefile.am files is the same - most are just included in a top-level
Makefile.am).  Also use an Automake SUBDIRs feature to eliminate the
dynamic-mca tree, which was no longer really needed.  This makes adding
a framework easier (since you don't have to remember the dynamic-mca
tree) and makes building faster (as make doesn't have to recurse through
the dynamic-mca tree)

This commit was SVN r7777.
2005-10-17 00:21:10 +00:00
Jeff Squyres
9a25554559 Patch from Brooks Davis for some BSD compatibility issues.
This commit was SVN r7751.
2005-10-13 15:41:25 +00:00
Thara Angskun
8b59de0f37 Import RAS for POE
This commit was SVN r7748.
2005-10-13 14:08:17 +00:00
Thara Angskun
73fff4ea2c - change from mca_base_param_register_* to mca_base_param_reg_*
- update include files / fix minor bugs

This commit was SVN r7746.
2005-10-13 12:58:31 +00:00
Josh Hursey
92429dc90f Fix for a problem Edgar and Jeff identified WRT PLS determining if we are
oversubscribed on a node. And thus whether to call sched_yield or not.

The value of node->node_slots_inuse does not currently represent the number of
slots actually in use, at the moment. This is actually a bug in the RAS/RMAPS
base components, but the fix for that specific bug is bigger than we want to 
address at the moment (but will certianly do so in the near future).

Since we cannot trust this value, use the total number of mapped processes
(which was properly set by the RMAPS component upon mapping -- Just not 
properly propagated back to the registry's node segment) from the process 
mapping.

In addition to this change I cleaned up a couple of the debug messages. It
seems that TM and RSH are the only two directly effected by this. SLURM
would be if that section of code wasn't currently inactive, but put the fix
in for prosparity.

This commit was SVN r7743.
2005-10-13 03:26:48 +00:00
Josh Hursey
0f08e87a1f Fixed a max_slots off by one problem that Brian highlighted.
Also cleaned up the error message when allocating over the number of
slots available.

This commit was SVN r7715.
2005-10-12 02:09:56 +00:00
Ralph Castain
70779fa2ab Cleanup some old logic - nothing major.
This commit was SVN r7712.
2005-10-12 01:12:27 +00:00
Brian Barrett
128389758f * fix compile error in XGrid PLS that got introduced sometime in the not
too distant past
* work around apparently broken handling of max_slots somewhere along
  the line by just setting it to 0

Both changes should go to the trunk.

This commit was SVN r7710.
2005-10-12 00:41:14 +00:00
Josh Hursey
af9ccdf04a need to use get_first instead of get_begin since we don't want to execute
this loop if "nodes" is an empty list. get_first, in this loop context, 
allows us to do just that, while get_begin doesn't.

This fixes a --host problem that appeared on the Linux PPC64 build.

This commit was SVN r7703.
2005-10-11 21:33:04 +00:00
Josh Hursey
8ba2900341 fixed a typo, added comments for future work
This commit was SVN r7700.
2005-10-11 20:59:31 +00:00
Ralph Castain
e1244fc160 Fix a few thread-lock things discovered by Josh. The thread locks in the registry's local notify delivery system had not been updated to reflect the design change whereby the xcast uses the notify delivery system. This has now been fixed.
Also revised the callbacks to store and utilize local variables to avoid problems where threads modify the global structures. Not sure this totally fixes the problem, but it's a shot - suggested by Josh (and Jeff, I believe).

This commit was SVN r7694.
2005-10-11 19:35:04 +00:00
Ralph Castain
a47655b3fd Add unlock/lock around the delivery of a local callback to remove thread-lock condition if the callback function attempts to re-enter the registry.
This commit was SVN r7678.
2005-10-10 02:45:50 +00:00
Ralph Castain
6c839048cf Fix a typo that caused valgrind to bark on 64-bit machines. Actually was a potential source of error, so the barking was legit.
This commit was SVN r7677.
2005-10-10 02:34:26 +00:00
Josh Hursey
d5ebb5c46a fix a compiler warning
This commit was SVN r7674.
2005-10-08 17:03:12 +00:00
Jeff Squyres
0629cdc2d7 Bring back the changes from /tmp/jjhursey-rmaps. Specific merge
command:

svn merge -r 7567:7663 https://svn.open-mpi.org/svn/ompi/tmp/jjhursey-rmaps .

(where "." is a trunk checkout)

The logs from this branch are much more descriptive than I will put
here (including a *really* long description from last night).  Here's
the short version:

- fixed some broken implementations in ras and rmaps
- "orterun --host ..." now works and has clearly defined semantics
  (this was the impetus for the branch and all these fixes -- LANL had
  a requirement for --host to work for 1.0)
- there is still a little bit of cleanup left to do post-1.0 (we got
  correct functionality for 1.0 -- we did not fix bad implementations
  that still "work")
  - rds/hostfile and ras/hostfile handshaking
  - singleton node segment assignments in stage1
  - remove the default hostfile (no need for it anymore with the
    localhost ras component)
  - clean up pls components to avoid duplicate ras mapping queries
  - [possible] -bynode/-byslot being specific to a single app context 

This commit was SVN r7664.
2005-10-07 22:24:52 +00:00
Tim Woodall
3c900a7aa2 - fix a deadlock on threaded build
- update sequence number after a partial write completes

This commit was SVN r7654.
2005-10-06 21:50:58 +00:00
Tim Woodall
a79e07390a remove debug
This commit was SVN r7653.
2005-10-06 21:29:47 +00:00
Tim Woodall
2ea71064ad close all file descriptors w/ the exception of stdin/stdout/stderr
otherwise, parent's file descriptors are inherited and held open by
the child even if the parent dies

This commit was SVN r7652.
2005-10-06 21:22:36 +00:00
Tim Woodall
797922fbab - cleanup on loss of connection to peer
- generate ack if no one to forward msg to

This commit was SVN r7651.
2005-10-06 21:21:26 +00:00
Tim Woodall
3280f6e655 add facility to receive callback on disconnection from peer
This commit was SVN r7650.
2005-10-06 19:39:20 +00:00
Jeff Squyres
65698bc6be Remove compiler warning
This commit was SVN r7635.
2005-10-05 10:23:02 +00:00
Jeff Squyres
0f100d8577 - Don't overwrite rc with the return value from pls_tm_disconnect --
it's always ORTE_SUCCESS and sometimes masks real !=ORTE_SUCCESS rc
  values. 
- Add MCA param pls_tm_want_path_check.  If nonzero (the default),
  check for the orted in the PATH before each tm_spawn()'ing (doing a
  little caching so that we don't hammer on the filesystem -- remember
  all the PATH's where we successfully found the orted so that we
  don't have to query the filesystem multiple times for a PATH where
  we previously found the orted)
- Be sure to opal_argv_split() the pls_tm_orted MCA param

This commit was SVN r7625.
2005-10-04 19:38:51 +00:00
Jeff Squyres
b79c46dbf6 Downgrade the default priority to 75, just to give leeway (same as the
slurm pls).

This commit was SVN r7624.
2005-10-04 19:18:52 +00:00
Jeff Squyres
eb24fe4fd8 If the job fails to launch properly, set its state to ABORTED, which
will fire some subscriptions that will eventually result in invoking
terminate_job (i.e., terminate anything that may have been
successfully started by launch).

This commit was SVN r7622.
2005-10-04 17:19:23 +00:00
Jeff Squyres
80399aff17 Add some README's to describe what these components are fore.
This commit was SVN r7618.
2005-10-04 15:14:23 +00:00
Jeff Squyres
3df0828921 Restore this PLS -- LANL needs this for some of its older clusters.
This commit was SVN r7617.
2005-10-04 15:09:38 +00:00