1
1
Граф коммитов

12748 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
7cec018149 Update the Elan BTL to fulfill the requirements for #1713.
This commit was SVN r20149.
2008-12-17 22:14:45 +00:00
Josh Hursey
c954045989 Add a patch to address a deadlock in the CRCP BKMRK component.
The problem was that we doubly decremented the active count on blocking receives that we stall to complete. This moved the active count into the negative. With a negative count for 'active' a message that should have been accounted for would be over looked. This then causes the bookmark exchange to post a drain for a message that was never posted, thus locking the protocol. By eliminating the decrement on the 'active' count when we attempt to post the drain message, we only the decrement this counter when the outstanding blocking recv completes during the stall operation.

Refs trac:1619
Does not close this ticket since there is an outstanding potential problem with ANY_SOURCE and ANY_TAG, as referenced in the ticket.

This should be moved to v1.3

This commit was SVN r20147.

The following Trac tickets were found above:
  Ticket 1619 --> https://svn.open-mpi.org/trac/ompi/ticket/1619
2008-12-17 17:23:39 +00:00
Jeff Squyres
2f94a151e1 Fix a const cast.
This commit was SVN r20146.
2008-12-17 15:29:02 +00:00
Brian Barrett
f8537c0059 Following ticket #1725, when a free list item can not be allocated, return
the error to the upper layer and let it deal with the problem

This commit was SVN r20143.
2008-12-16 22:38:02 +00:00
Brian Barrett
f983bb2bd1 Don't build pml-v on Red Storm
This commit was SVN r20142.
2008-12-16 22:36:42 +00:00
Brian Barrett
64f7848a84 Number of small fixes to get the trunk to build again on Catamount
This commit was SVN r20141.
2008-12-16 20:09:56 +00:00
Brian Barrett
dbccb250f0 TYPE shouldn't be surrounded by parens because it causes issues for some versions of gcc when the construct a = ((int)) sizeof(b) comes up...
This commit was SVN r20140.
2008-12-16 19:49:54 +00:00
Jeff Squyres
b65796b906 Minor update to bullet in NEWS file.
This commit was SVN r20139.
2008-12-16 18:05:39 +00:00
Ethan Mallove
9003e4d722 Add missing #include <errno.h> line (for SunStudio Solaris).
This commit was SVN r20138.
2008-12-16 15:30:02 +00:00
Tim Mattox
7b25f1189b Resync the trunk NEWS file with the v1.2 branch NEWS file.
This commit was SVN r20136.
2008-12-16 15:24:52 +00:00
George Bosilca
f9a1700f55 Some 64 bits architectures support pointer aligned on 4 bytes
(where the sizeof(long) is 8). On such architectures dont
assert if the datatype representation is not aligned on 64 bits.

This commit was SVN r20134.
2008-12-16 09:21:21 +00:00
George Bosilca
db424282f7 Fix an issue where the datatype description introduce a buffer misalignment. Because some
architectures (read SPARC64) require aligned accesses, we increase the storage space
when we pack a datatype description to keep the fields aligned. This has to be done
on both sided in order to be consistent.

This commit was SVN r20133.
2008-12-16 09:06:27 +00:00
Avneesh Pant
c1e508750b Check the active port MTU against the MTU statically configured for the HCA. QLogic HCA's capable of MTU had an issue when connected to switches running at 2K.
This commit was SVN r20131.
2008-12-15 21:17:58 +00:00
Ralph Castain
e878ee4fa3 Revert r20128. Setting a default hostfile name breaks all the filtering code we added to the system. It would require multiple entries in several places to ensure that, should the default hostfile in fact not exist, the system will still work correctly.
Too much complexity - just put the name in the default mca param file iff you actually have a default hostfile.

This commit was SVN r20129.

The following SVN revision numbers were found above:
  r20128 --> open-mpi/ompi@ea01da0eee
2008-12-15 17:37:21 +00:00
Ralph Castain
ea01da0eee Set default name for "default-hostfile" param to "openmpi-default-hostfile" to retain backwards compatibility with OMPI 1.2
This commit was SVN r20128.
2008-12-15 17:08:59 +00:00
Tim Mattox
d578164d82 Woops, need to set "with_openib=yes" properly in the odin platform files.
This commit was SVN r20127.
2008-12-15 16:19:26 +00:00
Ralph Castain
9ae5e5d830 Fix a dumb syntax error in the LANL TLCC platform files
This commit was SVN r20125.
2008-12-15 14:30:10 +00:00
George Bosilca
fe87e28fee This is a temporary fix for the deadlock problem over MX. The real
problem seems to come from the free list, but due to lack of time to
understand it completely, I provide this fix. Basically, there is no
waiting in the MX BTL anymore, if we cannot allocate a fragment we
rely on the PML to take the corrective actions.

This commit was SVN r20124.
2008-12-15 03:45:34 +00:00
George Bosilca
aa4e9da26d Correct the disp array when creating a data based on the
MPI_COMBINER_INDEXED_BLOCK combiner.

This commit was SVN r20123.
2008-12-13 01:57:27 +00:00
George Bosilca
fec8692074 Get rid of all elan3 references.
This commit was SVN r20122.
2008-12-12 23:59:21 +00:00
Nysal Jan
ee8ec6f6b5 Remove dead/redundant code. Minimize number of calloc invocations
This commit was SVN r20121.
2008-12-12 10:55:50 +00:00
George Bosilca
7631eb8eed A fix for http://www.open-mpi.org/community/lists/users/2008/12/7502.php.
The solution is not to compute the OVERLAP flag, as the best we can do
is an approximative answer. Without this flag the unpack can leads to
unexpected answers if the data-type contain any overlapping regions.
As such datatypes are illegal in MPI, this became a user responsability.

This commit was SVN r20120.
2008-12-12 00:25:40 +00:00
Jeff Squyres
c7917db672 Back out the NEWS change from r20096 and make it its own proper entry
in the v1.3 section (not the v1.2.5 section -- thanks for noticing,
Tim!).  Refs trac:1705 -- this commit should be considered part of that
CMR.

This commit was SVN r20115.

The following SVN revision numbers were found above:
  r20096 --> open-mpi/ompi@cad0f41391

The following Trac tickets were found above:
  Ticket 1705 --> https://svn.open-mpi.org/trac/ompi/ticket/1705
2008-12-11 15:51:48 +00:00
Josh Hursey
ce8d18bfda This commit changes the use of the deprecated cr_request_file() to use the cr_request_checkpoint() interface to BLCR. Additional configure checks are added to use the best available checkpointing interface available for the BLCR installed on the system (default: cr_request_checkpoint()).
This commit fixes trac:1691

Thanks to Matthias Hovestadt for identifying this issue.

This commit was SVN r20114.

The following Trac tickets were found above:
  Ticket 1691 --> https://svn.open-mpi.org/trac/ompi/ticket/1691
2008-12-11 00:08:34 +00:00
Nysal Jan
6a5454b76a Fixes crash in openib BTL on a heterogeneous cluster Refs trac:1700
This commit was SVN r20113.

The following Trac tickets were found above:
  Ticket 1700 --> https://svn.open-mpi.org/trac/ompi/ticket/1700
2008-12-10 22:07:48 +00:00
Tim Mattox
4fa13a1a4d Fix two typos inside of comments.
This commit was SVN r20112.
2008-12-10 21:18:13 +00:00
Shiqing Fan
5ae5f0e173 - 4/4 commit for Windows Visual Studio and CCP support:
unnecessary clean up to non windows related files (within ifdef __WINDOWS__).

This commit was SVN r20111.
2008-12-10 21:13:27 +00:00
Shiqing Fan
20cea164db - 3/4 commit for Windows Visual Studio and CCP support:
corrections to non-windows files (but within ifdef __WINDOWS__)
  type casts, event library for windows use win32. 
  in orte runtime, add windows sockets handling and object construction.

This commit was SVN r20110.
2008-12-10 21:13:10 +00:00
Shiqing Fan
8673f19f50 - 2/4 commit for Windows Visual Studio and CCP support:
changes to the already existing ccp components
  event/win32.c: merge old FD handling into new
  opal_installdirs_windows.c:fix the registry handling

This commit was SVN r20109.
2008-12-10 21:01:54 +00:00
Shiqing Fan
a5281f0434 - 1/4 commit for Windows Visual Studio and CCP support:
CMakeLists and .windows files.
  In contribs preconfigured and precompiled parts.

This commit was SVN r20108.
2008-12-10 20:59:20 +00:00
Ralph Castain
728a24c8ec After considerable patience and help with debugging/testing from Tim M and Jeff S, return a completed and pretty well tested patch of the IOF to the trunk. This commit includes the previously reverted r20074, r20068, and r20064, as well as changes to fix those commits.
Basically, the remaining problem turned out to be:

1. closing stdout/stderr during orte_finalize of mpirun

2. inadvertently setting up a write event on fd = -1

3. devising a scheme to more accurately track when the stdin write event was active vs closed so it only got released once

This passed prelim MTT testing by Jeff and Tim, but should soak for awhile before migrating to 1.3.

This commit was SVN r20106.

The following SVN revision numbers were found above:
  r20064 --> open-mpi/ompi@a07660aea8
  r20068 --> open-mpi/ompi@ec930d14a9
  r20074 --> open-mpi/ompi@2940309613
2008-12-10 20:40:47 +00:00
Ralph Castain
9d7cb82bba Modify the daemon cmd processor to relay and then process the cmd locally. We couldn't do this before due to the daemon's needing to update contact info prior to doing the relay. However, the new routed system plus the inclusion of the nidmap in the launch message now makes this possible.
It is a small launch performance improvement as now we relay the launch cmd across to the next daemon before taking the time to launch our own local procs. Still, it does allow more parallel operations during the launch procedure.

This commit was SVN r20104.
2008-12-10 19:18:36 +00:00
Josh Hursey
67ae66326c remove unused variable
This commit was SVN r20103.
2008-12-10 18:08:46 +00:00
Ralph Castain
7e3ddb09d3 As requested by Aurelien at the July design meeting - long time coming, but finally got around to it.
Enable one mpirun to act as the server for another mpirun when doing MPI_Publish_name and its associated operations. The user is responsible, of course, for ensuring that the mpirun acting as a server outlives any mpiruns using it in that capacity.

Add a cmd line option to mpirun --report-pid that prints out mpirun's pid. Allow the --ompi-server option to now take pid:# (or PID:#) of the mpirun to be used as the server, and then look that pid up by searching the local mpirun contact infos for it.

This commit was SVN r20102.
2008-12-10 17:10:39 +00:00
Josh Hursey
df75abd6b2 Fix a warning. Thanks to Jeff for noticing.
This should be moved to v1.3 as well.

This commit was SVN r20101.
2008-12-10 15:38:12 +00:00
Ralph Castain
1ace83c470 Enable modex-less launch. Consists of:
1. minor modification to include two new opal MCA params:
   (a) opal_profile: outputs what components were selected by each framework
       currently enabled for most, but not all, frameworks
   (b) opal_profile_file: name of file that contains profile info required
       for modex

2. introduction of two new tools:
   (a) ompi-probe: MPI process that simply calls MPI_Init/Finalize with
       opal_profile set. Also reports back the rml IP address for all
       interfaces on the node
   (b) ompi-profiler: uses ompi-probe to create the profile_file, also
       reports out a summary of what framework components are actually
       being used to help with configuration options

3. modification of the grpcomm basic component to utilize the
   profile file in place of the modex where possible

4. modification of orterun so it properly sees opal mca params and
   handles opal_profile correctly to ensure we don't get its profile

5. similar mod to orted as for orterun

6. addition of new test that calls orte_init followed by calls to
   grpcomm.barrier

This is all completely benign unless actively selected. At the moment, it only supports modex-less launch for openib-based systems. Minor mod to the TCP btl would be required to enable it as well, if people are interested. Similarly, anyone interested in enabling other BTL's for modex-less operation should let me know and I'll give you the magic details.

This seems to significantly improve scalability provided the file can be locally located on the nodes. I'm looking at an alternative means of disseminating the info (perhaps in launch message) as an option for removing that constraint.

This commit was SVN r20098.
2008-12-09 23:49:02 +00:00
Jeff Squyres
ba359623e0 Fix a few places where we didn't properly escape []; consolidate all debug/optimization flag checking to use AC quadrigraphs properly
This commit was SVN r20097.
2008-12-09 23:42:28 +00:00
Jeff Squyres
cad0f41391 Also strip out -g[0-9] (in addition to -g) from CCASFLAGS on Leopard. Fixes trac:1701. Thanks to Barry Smith for reporting the problem.
This commit was SVN r20096.

The following Trac tickets were found above:
  Ticket 1701 --> https://svn.open-mpi.org/trac/ompi/ticket/1701
2008-12-09 23:42:16 +00:00
Jeff Squyres
3950796fa8 Add scripty-foo to set perms right on hg trees to share (at least on
milliways / www.open-mpi.org).

This commit was SVN r20095.
2008-12-09 20:26:29 +00:00
Jeff Squyres
affbebb15b Add a "Known issues" section, and the connectx XRC + message
coalescing bug.

This commit was SVN r20094.
2008-12-09 20:18:16 +00:00
Pavel Shamis
068054132a Temporary work around for #1693 (osu_bibw segfault in xrc + coalescing mode)
This commit was SVN r20093.
2008-12-09 19:21:54 +00:00
Ralph Castain
e28210d0dc Revert r20074, r20068, and r20064: remove the IOF proc completion code pending further off-trunk work.
This commit was SVN r20089.

The following SVN revision numbers were found above:
  r20064 --> open-mpi/ompi@a07660aea8
  r20068 --> open-mpi/ompi@ec930d14a9
  r20074 --> open-mpi/ompi@2940309613
2008-12-09 17:11:59 +00:00
Ralph Castain
6141401331 Update the LANL platform files for 1.3
This commit was SVN r20088.
2008-12-09 16:35:51 +00:00
Ralph Castain
61c21d787d Add missing param in tm launcher
This commit was SVN r20087.
2008-12-09 13:31:33 +00:00
Ralph Castain
c230a49140 Set ignores
This commit was SVN r20086.
2008-12-09 01:18:49 +00:00
Ralph Castain
6e050bc78c Update the route when it comes from a different job family.
This fixes ticket #1699

This commit was SVN r20085.
2008-12-09 01:16:18 +00:00
Ralph Castain
ce4018efeb Take a step back on the slurm and tm launchers. Problems were occurring in the MTT runs, although not under non-MTT scenarios. Preserve the modified plm versions in new components that are ompi_ignored until we can resolve the problems.
This will allow for better MTT coverage until the problem can be better understood.

This commit was SVN r20083.
2008-12-09 00:32:04 +00:00
Ralph Castain
89792bbc72 May as well have the other "clean" outputs use the same channel
This commit was SVN r20082.
2008-12-08 19:37:22 +00:00
Ralph Castain
51789c9049 Cleanup the output for nodename resolve reporting
This commit was SVN r20081.
2008-12-08 19:00:36 +00:00
Ralph Castain
c2b18b363d Initialize a variable before use
This commit was SVN r20080.
2008-12-08 16:16:40 +00:00