1
1
Граф коммитов

730 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
3c814fdd23 fixes trac:391
Fix for double mutex free that would cause an abort condition in the orted
whenever threads were enabled.

This commit was SVN r11759.

The following Trac tickets were found above:
  Ticket 391 --> https://svn.open-mpi.org/trac/ompi/ticket/391
2006-09-22 19:24:42 +00:00
Andrew Friedley
798c19d395 Blah.. we should always return after try_connect() here, not just when we have an error.
Another fix for ticket #362.

This commit was SVN r11756.
2006-09-22 15:51:11 +00:00
Tim Prins
567676f3c1 - Formatting and minor cleanup
- made it so we now set the architecture of each node we discover
- remove debugging output

This commit was SVN r11751.
2006-09-22 13:24:32 +00:00
Tim Prins
83a7f6e4de Fix for bug #369.
LoadLeveler only sets LOADL_PROCESSOR_LIST when there are 128 or less tasks allocated to a job. The POE RAS relied on this variable so I created a new RAS which uses the LoadLeveler API instead of relying on the environment variable. This still needs some testing, so for now we use the POE RAS whenever LOADL_PROCESSOR_LIST, otherwise we fall back on this component.

Unfortunately, this will require an autogen...

This commit was SVN r11732.
2006-09-21 00:08:49 +00:00
Galen Shipman
04e9483aab fall back to rand() if /dev/random doesn't exist or the read to /dev/random
would block.. 

This commit was SVN r11725.
2006-09-20 16:59:44 +00:00
Andrew Friedley
8895bf7369 Fix the fix (r11718) for bug #362.
We were still waiting the entire duration of the timeout before we figured out that a connect() was successful.  Re-introduce adding the peer_send_event so that we detect immediately when a connect() completes.

Also make sure to delete the timeout event in complete_connect().

Fixed a struct timeval initialization warning reported by Jeff.

Remove an erroneous opal_output().

This commit was SVN r11724.

The following SVN revision numbers were found above:
  r11718 --> open-mpi/ompi@1b6231a9b5
2006-09-20 14:29:37 +00:00
Andrew Friedley
1b6231a9b5 Fix for running jobs that span multiple 's' partitions on IU BigRed.
Each 's' partition has its own TCP network.  It's fine to use this network for jobs that fit inside the partition, but the TCP OOB errors when trying to connect across two partitions, because there are two disjoint networks.  Each node also has another TCP network connecting ALL nodes together.

So the solution is to actually try all the available TCP interfaces on a node, instead of erroring when the first one fails.

Also, the default TCP connect() timeout is way too long (5 minutes) - use our own timeout mechanism, with the timeout value expressed as an MCA parameter.

This commit was SVN r11718.
2006-09-19 19:33:49 +00:00
Tim Prins
c4db5654fa Fix for bug #370
The POE ras did not correctly enter the number of slots per node. This fixes that.

This commit was SVN r11716.
2006-09-19 16:27:15 +00:00
Ralph Castain
977e3c5ca1 Let's see if Cyrador understands this version a little better...
This commit was SVN r11709.
2006-09-19 13:05:40 +00:00
Ralph Castain
0ad0d84afd Add two new API functions to the RMGR, and modify the "spawn" API to support the enhanced MPI-2 functionality.
No implementation backs these new APIs - just placeholders for now.

This commit was SVN r11699.
2006-09-19 01:45:05 +00:00
George Bosilca
f8de894efe This one wasn't supposed to get into the repository.
This commit was SVN r11697.
2006-09-18 21:28:55 +00:00
George Bosilca
7ad23ff97b Be 100% total view friendly. Let tv find out the real name of our
executable and export all functions as they should be.

This commit was SVN r11694.
2006-09-18 17:55:14 +00:00
Ralph Castain
d7e61e40fc Quiet a few warnings from Cyrador
This commit was SVN r11686.
2006-09-18 12:40:42 +00:00
Ralph Castain
8a291afda6 Ensure the rds_private.h file gets included in the distribution
This commit was SVN r11682.
2006-09-16 11:45:02 +00:00
Ralph Castain
f906af983a Forgot to change the silly Makefile.am names - sorry Cyrador!
This commit was SVN r11670.
2006-09-15 04:52:20 +00:00
Jeff Squyres
8226dab86c Fixes trac:377
Add --enable-orterun-prefix-by-default (and a synonym:
--enable-mpirun-prefix-by-default) to make orterun always behave as if
"--prefix $prefix" was given on the command line (where $prefix is the
value given to the --prefix option to configure).  This prevents many
rsh/ssh users from needing to modify their shell startup files to set
the LD_LIBRARY_PATH for Open MPI (they will still need to set PATH or
otherwise find the OMPI executables to mpicc/mpirun/etc. their MPI
applications).

Also added --noprefix option to orterun to disable this behavior.
Finally, note that even if --enable-orterun-prefix-by-default is
specified, if the user specifies --prefix or /path/to/mpirun, these
options will override the default value of the prefix ($prefix).

This commit was SVN r11669.

The following Trac tickets were found above:
  Ticket 377 --> https://svn.open-mpi.org/trac/ompi/ticket/377
2006-09-15 02:52:08 +00:00
Jeff Squyres
3e239f4532 Add a missing .ompi_ignore
This commit was SVN r11666.
2006-09-15 02:36:22 +00:00
George Bosilca
4fe39a4e7d The old PLS is now called a ODLS. However, the real name is not windows but process. This
change will follow shortly...

This commit was SVN r11663.
2006-09-14 22:22:34 +00:00
Ralph Castain
37dfdb76eb Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done.
This commit was SVN r11661.
2006-09-14 21:29:51 +00:00
George Bosilca
17afe7dc9f Do it on the correct way as this is normally compiled as a module.
This commit was SVN r11660.
2006-09-14 21:22:41 +00:00
George Bosilca
01c5a115b2 Don't export the POE module. Only the component have to be exported (visible).
This commit was SVN r11659.
2006-09-14 21:20:31 +00:00
Galen Shipman
b02185374f Push a generated "key" out to all the processes. This is necessary for some
interconnect wireup in which all processes must agree on a "key" to initialize
the interconnect with. 

This commit was SVN r11653.
2006-09-14 15:27:17 +00:00
Josh Hursey
908f31fe9f Fix a code clarity issue in the POE PLS.
Allow the POE RAS to be compled for linux as well as AIX.
The POE RAS is really a Loadleveler RAS, and IU now has
a cluster that uses Loadleveler in a Linux environment (BigRed).

This seems to be the only thing we need to do so far to run 
Open MPI on BigRed. Yay :)

This commit was SVN r11600.
2006-09-09 05:13:15 +00:00
Josh Hursey
160120b4c5 Fix a cut-n-paste error that causes the 'num_concurrent' to be
set to 1 or 0 instead of the user defined number or default (128).

This caused the PLS to deadlock when using '--debug-daemons' with
more than 2 processes. :(

svn blame says that it was broken in r11347

It is *not* a problem on v1.1 or v1.2 branches.

Bug spotted by Tim Mattox and myself.

This commit was SVN r11575.

The following SVN revision numbers were found above:
  r11347 --> open-mpi/ompi@f52c10d18e
2006-09-08 15:17:17 +00:00
Jeff Squyres
0f11584a6c * Update svn:ignore
* Remove svn:executable from non-executable files

This commit was SVN r11555.
2006-09-07 17:17:40 +00:00
George Bosilca
e33c35112b Correct the conversion between int and bool. Apply it on all files except
the one that will be modified by Ralph for the ORTE 2.0. The missing ones
are in the rsh PLS.

This commit was SVN r11476.
2006-08-28 18:59:16 +00:00
Ralph Castain
9e6e9b8619 Fix a couple of variable declarations
This commit was SVN r11467.
2006-08-28 13:28:10 +00:00
George Bosilca
07b8d3c72c On Windows we can now deliver Open MPI on several flavors:
- everything statically built (dynamically opened).
- OPAL, ORTE and OMPI static libraries and all the components
  as dynamic files(DLL).
- everything as dynamic files (DLL).

This commit was SVN r11461.
2006-08-28 04:19:42 +00:00
George Bosilca
c2311f6e42 Don't define the yywrap function.
This commit was SVN r11459.
2006-08-28 04:11:25 +00:00
George Bosilca
693c835137 No need to cast as the returned value is already in the
expected type.

This commit was SVN r11458.
2006-08-28 04:10:43 +00:00
George Bosilca
ba1514f2e7 A slightly more Windows friendly version. Unfortunately there
is no support for SGE on Windows.

This commit was SVN r11436.
2006-08-27 04:46:43 +00:00
George Bosilca
7e7bae335e Protect the environ variable on windows.
This commit was SVN r11435.
2006-08-27 04:44:17 +00:00
Pak Lui
131f0eff04 fix the verbose value.
This commit was SVN r11418.
2006-08-24 21:30:08 +00:00
Pak Lui
65a524dd0d - need to provide option for showing the grid engine's JOB_ID in case the grid engine job needs to be killed
- clean up the orted_path and debug message

This commit was SVN r11413.
2006-08-24 20:27:19 +00:00
Pak Lui
4f75dfd353 - missed the opal_os_path() for LD_LIBRARY_PATH
This commit was SVN r11410.
2006-08-24 18:58:50 +00:00
George Bosilca
9110ea2b80 Add the Windows fork component. As fork is not available on Windows, I
create a process component which use CreateProcess to spawn the child.
Special care should be taken in order to correctly redirect the stdin,
stdout and stderr of the child process.

This commit was SVN r11405.
2006-08-24 17:51:20 +00:00
George Bosilca
0d607c1346 Use opal_os_path and OPAL_PATH_SEP to build the file path. I don't have any
machine to test, so I hope I get it right.

This commit was SVN r11398.
2006-08-24 16:20:32 +00:00
George Bosilca
e04032ca2f Correct a comment and protect the usage of the environ variable against Windows.
This commit was SVN r11397.
2006-08-24 16:18:42 +00:00
Pak Lui
5220c1ca42 - converted some tabs into spaces
This commit was SVN r11384.
2006-08-23 23:21:08 +00:00
Pak Lui
9dda057f05 - Do the changes as in r11347 for gridengine to use opal_os_path().
- Remove extra NULL argument from rsh module.

This commit was SVN r11377.

The following SVN revision numbers were found above:
  r11347 --> open-mpi/ompi@f52c10d18e
2006-08-23 20:40:01 +00:00
Jeff Squyres
715bae369c Remove extra argument - now obsoleted by the use of opal_os_path().
This commit was SVN r11366.
2006-08-23 14:32:06 +00:00
Brian Barrett
e39f0096a0 * add header file to sources list so make dist works
This commit was SVN r11357.
2006-08-23 13:31:56 +00:00
George Bosilca
fdfae70dbe Use environ.
This commit was SVN r11353.
2006-08-23 06:19:47 +00:00
George Bosilca
75fa0317da Keep environ as the prefered storage for the environment variables.
This commit was SVN r11351.
2006-08-23 06:14:24 +00:00
George Bosilca
c03ef692c1 And the missing header.
This commit was SVN r11348.
2006-08-23 03:33:35 +00:00
George Bosilca
f52c10d18e And ORTE is ready for prime-time. All Windows tricks are in:
- use the OPAL functions for PATH and environment variables
- make all headers C++ friendly
- no unamed structures
- no implicit cast.

Plus a full implementation for the orte_wait functions.

This commit was SVN r11347.
2006-08-23 03:32:36 +00:00
George Bosilca
aecdfc80eb Don't orget to relase the object if we detect an error.
This commit was SVN r11346.
2006-08-23 02:43:05 +00:00
George Bosilca
b4732f557a Now it's time to update ORTE. Cleanup most of the ORTE tools. Force them
to use opal_basename and opal_dirname. Don't create the path manually. Use
the specialized opal functions instead.

This commit was SVN r11345.
2006-08-23 02:35:00 +00:00
George Bosilca
0417d27f46 orte_std_cntr_t vs. size_t round 2. Advantage for size_t ...
This commit was SVN r11317.
2006-08-22 14:58:31 +00:00
Ralph Castain
c3ba1c1cc1 Fix a pack/unpack mismatch
This commit was SVN r11315.
2006-08-22 13:50:59 +00:00