1
1
Граф коммитов

12308 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
4e0f34a062 When we hit an error prior to actually launching daemons, it would be nice if orterun didn't bark about daemons failing to launch, mpirun detecting a job failed, etc.
Add a new job state to indicate that we never attempted to launch. Flag such a scenario and avoid hitting all the other error messages.

This commit was SVN r19366.
2008-08-19 15:19:30 +00:00
Ralph Castain
9447334749 Some comments relating to relative indexing
This commit was SVN r19365.
2008-08-19 15:17:40 +00:00
Ralph Castain
6d82efba21 Add relative indexing capabilities for hostfile and -host - we can now reference hosts using a relative syntax.
See the orte_hosts manpage for an explanation

This commit was SVN r19364.
2008-08-19 15:16:27 +00:00
Edgar Gabriel
ef2bb46e45 no need to create and free the groups. We just want to translate the ranks and we can use the internal group structures right away for that operation. Fixes an issue with groups that have not been freed previously, due to the fact that ompi_group_free was not visible here (I know, this could have been solved also by setting OMPI_DECLSPEC on ompi_group_free, but this solution should be faster.)
This commit was SVN r19362.
2008-08-19 13:59:58 +00:00
Edgar Gabriel
149ecb8d7d 1. debug the four new algorithms
2. fix a bug in the initial communicator creation of llcomm
3. fix a bug which showed up as the result of fixing issue number 2: we have
   to check now whether llcomm has really be created before freeing the 
   according llcomm in hierarch_destruct.

This commit was SVN r19361.
2008-08-18 21:54:35 +00:00
Edgar Gabriel
7cbc4a4077 adding four different algorithms for a hierarchical bcast which try to
generate an overlap between the different layers. Why four versions? Because
there is right now always the trade-off between using non-blocking operations
on a layer with a trivial, linear algorithm and using the more sophisticaed
algorithms in a blocking manner. 

- bcast_intra_seg used the bcast of lcomm and llcomm, similarly
  to original algorithm in hierarch. However, it can segment
  the message, such that we might get an overlap between the two
  layers. This overlap is based on the assumption, that a process
  might be done early with a bcast and can start the next one.
- bcast_intra_seg1: replaces the llcomm->bcast by isend/irecvs
  to increase the overlap, keeps the lcomm->bcast however
- bcast_intra_seg2: replaced lcomm->bcast by isend/irecvs
  to increase the overlap, keeps however llcomm->bcast
- bcast_intra_seg3: replaced both lcomm->bcast and llcomm->bcast
  by isend/irecvs

The code is lightly tested, more testing to follow right now.

This commit was SVN r19358.
2008-08-18 16:05:44 +00:00
Matthias Jurenz
19514f4df6 Fixed Coverity warnings
CIDs: 865,866,896,897,974,975,976

This commit was SVN r19356.
2008-08-18 14:24:41 +00:00
Matthias Jurenz
0f9693d8af Fixed Coverity warnings
CIDs: 714,781,782,825,872,917

This commit was SVN r19354.
2008-08-18 14:24:06 +00:00
Matthias Jurenz
c65e8a6882 Fixed Coverity warnings
CIDs: 794,882,908,934,962

This commit was SVN r19352.
2008-08-18 14:23:25 +00:00
Matthias Jurenz
63d84e2626 Fixed Coverity warnings
CIDs: 977

This commit was SVN r19350.
2008-08-18 14:22:48 +00:00
Matthias Jurenz
aa05fba21f Fixed Coverity warnings
CIDs: 823,824,905,906

This commit was SVN r19348.
2008-08-18 14:22:15 +00:00
Matthias Jurenz
406a61f599 Fixed Coverity warnings
CIDs: 821,822,903,904

This commit was SVN r19346.
2008-08-18 14:21:37 +00:00
Matthias Jurenz
7364bc9691 Fixed Coverity warnings
CIDs: 957,1109

This commit was SVN r19344.
2008-08-18 14:20:50 +00:00
Matthias Jurenz
33434d49fb Fixed Coverity warnings
CIDs: 1105

This commit was SVN r19342.
2008-08-18 14:19:30 +00:00
Matthias Jurenz
daa8119dd0 Corrected provious checkin (Coverity warning CID: 875)
This commit was SVN r19340.
2008-08-18 14:18:48 +00:00
Matthias Jurenz
9fc72120b7 Fixed Coverity warnings
CIDs: 1104

This commit was SVN r19338.
2008-08-18 14:17:39 +00:00
Matthias Jurenz
a4491e1b4c Fixed Coverity warnings
CIDs: 918,919

This commit was SVN r19336.
2008-08-18 14:16:23 +00:00
Matthias Jurenz
1555a8843f Renamed 'args' to 'argv' to avoid Coverity warnings (TAINTED_STRING)
CIDs: 1106,1107,1108

This commit was SVN r19334.
2008-08-18 14:15:39 +00:00
Matthias Jurenz
5081fd0da6 Fixed Coverity warnings
CIDs: 727

This commit was SVN r19332.
2008-08-18 14:13:45 +00:00
George Bosilca
6982e8ecbc Don't forget to release the temporary arrays used for converting
the datatypes from Fortran to C.

This commit was SVN r19314.
2008-08-17 21:57:59 +00:00
George Bosilca
2499112d1c Fix indentation.
This commit was SVN r19313.
2008-08-17 20:10:54 +00:00
George Bosilca
a6e3a47102 Fix typo.
This commit was SVN r19312.
2008-08-17 20:08:38 +00:00
George Bosilca
a41dcbbc44 Play nicely with the PML. If the data is in the pending send queue, then
tell the PML we're still in charge of sending it.

This commit was SVN r19311.
2008-08-17 20:07:53 +00:00
George Bosilca
f3568a271a Replace tabs with spaces.
This commit was SVN r19310.
2008-08-17 20:06:48 +00:00
George Bosilca
10612bef8a Bring back the SM pending queue, to avoid deadlocks.
This commit fixes trac:1378.

This commit was SVN r19309.

The following Trac tickets were found above:
  Ticket 1378 --> https://svn.open-mpi.org/trac/ompi/ticket/1378
2008-08-17 19:00:50 +00:00
Ethan Mallove
cb927614c7 Fixes trac:1448 (''Need configure workaround for Sun Studio compiles on Sparc Solaris'')
This commit was SVN r19307.

The following Trac tickets were found above:
  Ticket 1448 --> https://svn.open-mpi.org/trac/ompi/ticket/1448
2008-08-16 00:47:33 +00:00
Matthias Jurenz
c7ac98dd62 Fixed several Coverity-Warnings
This commit was SVN r19304.
2008-08-15 15:15:21 +00:00
Jeff Squyres
c1abc108af Fix the make_dist_tarball script: check for the successful completion
of the actual command, not the $? from tee (which will always be 0).

This commit was SVN r19300.
2008-08-14 21:23:25 +00:00
Jeff Squyres
80d11dba8f Bring in PLPA v1.2b4
This commit was SVN r19299.
2008-08-14 21:04:28 +00:00
Ralph Castain
a81dfa0aea When using a platform file, allow the system to automatically pickup an associated default MCA param file for that platform. This change will first look for a file named "platform.conf" (where platform = the name of the platform you specified) in the directory where the platform file itself resides. If that isn't found, it then looks for our default mca param file name in that same location. If neither of those are found, we just use the good old standby default param file that ships with the openmpi code.
I tested this with both conventional and VPATH builds without problem. Please let me know if you hit an issue.

This commit was SVN r19296.
2008-08-14 20:26:17 +00:00
Ralph Castain
49745c5f40 Provide a new option that allows a user to leave an ssh session open without getting deluged by ORTE debug output. The new option is --leave-session-attached, with a corresponding MCA param of orte_leave_session_attached.
Theoretically, any PLM could use this - but in reality, all of them except rsh/ssh already leave the session attached anyway.

This fixes trac:656 - a REALLY old ticket

This commit was SVN r19294.

The following Trac tickets were found above:
  Ticket 656 --> https://svn.open-mpi.org/trac/ompi/ticket/656
2008-08-14 18:59:01 +00:00
Jeff Squyres
bb585922fd This is fixed a different way now; no need to be different than stock
PLPA.

This commit was SVN r19293.
2008-08-14 18:54:34 +00:00
Jeff Squyres
a6e0589f01 Update to PLPA v1.2b3. Sorry again for the mid-day configure change...
This commit was SVN r19292.
2008-08-14 14:26:26 +00:00
Jeff Squyres
5946d84023 Add some "canary" code to the debugger DLL so that we'll hopefully get
a compiler error if OMPI data structs that are used in the DLL are
changed in the main code base.

This commit was SVN r19289.
2008-08-14 12:57:44 +00:00
Matthias Jurenz
a62f421d2b Bugfix (Ticket #1447): Removed included system headers inside 'extern "C" {}'
This commit was SVN r19287.
2008-08-14 12:17:24 +00:00
Brian Barrett
4a2cf9b9b9 OS X 10.5 provides a libutil in /usr/lib that is not the normal libutil.
Don't include it in LIBS if it doesn't actually contain the symboles we're
looking for.  Darwin's won't, LInux's will, so things will work out right.

This commit was SVN r19283.
2008-08-13 20:56:06 +00:00
Jeff Squyres
a19cf02c2b Refs trac:1435
Bring in a new version of PLPA (v1.2b2) with some new capabilities for
offline processors and mapping of the Nth processor/socket/core to its
corresponding Linux processor/socket/core ID.

(Sorry for the configure change in the middle of the day, folks -- I
need it to be able to continue to integrate paffinity changes for
#1435...)

This commit was SVN r19282.

The following Trac tickets were found above:
  Ticket 1435 --> https://svn.open-mpi.org/trac/ompi/ticket/1435
2008-08-13 20:18:37 +00:00
Ralph Castain
dd16e4e4a6 Update process component of odls.
This commit was SVN r19281.
2008-08-13 20:07:59 +00:00
Ralph Castain
913cf04633 Only co-locate debugger daemon if the orted has local children - prevents mpirun from co-locating a daemon when it has no local procs
This commit was SVN r19280.
2008-08-13 20:06:28 +00:00
Jeff Squyres
c9f3f2c682 From Ralph Campbell at QLogic:
Roland noticed that the QLogic HCA driver was using the PCIe vendor ID
for the ibv_query_device so the IEEE OUI value is now used. This means
the config file should recognize the vendor ID value 0x1175 too.

This commit was SVN r19277.
2008-08-13 18:35:37 +00:00
Ralph Castain
3e2a3db887 Add a missing ntoh conversion when pushing a message back onto the RML progress queue.
If a message cannot be routed because the addressee isn't yet known, then the message is held on a queue in the RML for a period of time (currently set to 500 millisec). At the end of that time, we pop the message from the list and attempt to send it again. This action requires that we convert the header back to
network-byte-order before calling the OOB.

If the message still cannot be routed, we put the message back on the list and reset the timer. However, since we are going to convert the header when it com
es off of the list, we have to ntoh it before putting it back on the list so it all comes out right. This step was missing.

Thus, the problem only showed up relatively rarely because a message would have to be pushed onto the queue at least twice for the problem to surface.

This should fix a specific ticket (1389), but we will wait to see the results of MTT runs to verify. Note that we really don't know why a message is rattling around in the RML for so long, especially since this all seems to be happening during finalize, so this could cause mpirun to hang. Or it could simply trash the message and exit cleanly. Shall be interesting to see!

This commit was SVN r19276.
2008-08-13 17:54:15 +00:00
Ralph Castain
30f37f762d Enable co-location of debugger daemons during initial launch and when debugging a running job.
Provide support for four MPIR extensions that allow specification of debugger daemon executable, argv for the debugger daemon, whether or not to forward debugger daemon IO, and whether or not debugger daemon will piggy-back on ORTE OOB network. Last is not yet implemented.

No change in behavior or operation occurs unless (a) the debugger specifically utilizes the extensions and, for co-locate while running, the user specifically enables the capability via an MCA param. Two of the MPIR extensions supported here are used in a widely-used debugger for a large-scale installation. The other two extensions are new and being utilized in prototype work by several debuggers for possible future release.

This commit was SVN r19275.
2008-08-13 17:47:24 +00:00
Ralph Castain
89ec513524 Change the #if checks to allow configurations --without-threads to work
This commit was SVN r19274.
2008-08-13 17:39:27 +00:00
Rainer Keller
d57ef70149 - Store the result of the 1-byte read... and assert, in case
of error checking -- we don't return errors here anyway.
   Fixes Coverity CID 981

This commit was SVN r19259.
2008-08-12 18:00:38 +00:00
George Bosilca
5a885a9150 Safety net for the sscanf function. Without the \0 at the end of the
buffer, we can read outside the allocated memory.

This commit was SVN r19258.
2008-08-12 16:59:27 +00:00
George Bosilca
f030497072 Don't forget to set the node to some negative value or ... the memory will
be pinned to some random core.

This commit was SVN r19257.
2008-08-12 16:58:32 +00:00
Terry Dontje
d471f7d7eb Correctly casted variable to make DEBUG print to not cause a warning.
This commit was SVN r19256.
2008-08-12 16:41:35 +00:00
Jeff Squyres
d61cfd98cb Ignore the generated man page
This commit was SVN r19253.
2008-08-12 13:40:15 +00:00
Jeff Squyres
fc7e5f1c0d Fix CID 1095: ensure to initialize the File
This commit was SVN r19252.
2008-08-12 13:13:02 +00:00
Ralph Castain
f017c55bfa Close a minor memory leak - we can reuse timer events
This commit was SVN r19251.
2008-08-12 12:53:30 +00:00