1
1
Граф коммитов

12737 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
db424282f7 Fix an issue where the datatype description introduce a buffer misalignment. Because some
architectures (read SPARC64) require aligned accesses, we increase the storage space
when we pack a datatype description to keep the fields aligned. This has to be done
on both sided in order to be consistent.

This commit was SVN r20133.
2008-12-16 09:06:27 +00:00
Avneesh Pant
c1e508750b Check the active port MTU against the MTU statically configured for the HCA. QLogic HCA's capable of MTU had an issue when connected to switches running at 2K.
This commit was SVN r20131.
2008-12-15 21:17:58 +00:00
Ralph Castain
e878ee4fa3 Revert r20128. Setting a default hostfile name breaks all the filtering code we added to the system. It would require multiple entries in several places to ensure that, should the default hostfile in fact not exist, the system will still work correctly.
Too much complexity - just put the name in the default mca param file iff you actually have a default hostfile.

This commit was SVN r20129.

The following SVN revision numbers were found above:
  r20128 --> open-mpi/ompi@ea01da0eee
2008-12-15 17:37:21 +00:00
Ralph Castain
ea01da0eee Set default name for "default-hostfile" param to "openmpi-default-hostfile" to retain backwards compatibility with OMPI 1.2
This commit was SVN r20128.
2008-12-15 17:08:59 +00:00
Tim Mattox
d578164d82 Woops, need to set "with_openib=yes" properly in the odin platform files.
This commit was SVN r20127.
2008-12-15 16:19:26 +00:00
Ralph Castain
9ae5e5d830 Fix a dumb syntax error in the LANL TLCC platform files
This commit was SVN r20125.
2008-12-15 14:30:10 +00:00
George Bosilca
fe87e28fee This is a temporary fix for the deadlock problem over MX. The real
problem seems to come from the free list, but due to lack of time to
understand it completely, I provide this fix. Basically, there is no
waiting in the MX BTL anymore, if we cannot allocate a fragment we
rely on the PML to take the corrective actions.

This commit was SVN r20124.
2008-12-15 03:45:34 +00:00
George Bosilca
aa4e9da26d Correct the disp array when creating a data based on the
MPI_COMBINER_INDEXED_BLOCK combiner.

This commit was SVN r20123.
2008-12-13 01:57:27 +00:00
George Bosilca
fec8692074 Get rid of all elan3 references.
This commit was SVN r20122.
2008-12-12 23:59:21 +00:00
Nysal Jan
ee8ec6f6b5 Remove dead/redundant code. Minimize number of calloc invocations
This commit was SVN r20121.
2008-12-12 10:55:50 +00:00
George Bosilca
7631eb8eed A fix for http://www.open-mpi.org/community/lists/users/2008/12/7502.php.
The solution is not to compute the OVERLAP flag, as the best we can do
is an approximative answer. Without this flag the unpack can leads to
unexpected answers if the data-type contain any overlapping regions.
As such datatypes are illegal in MPI, this became a user responsability.

This commit was SVN r20120.
2008-12-12 00:25:40 +00:00
Jeff Squyres
c7917db672 Back out the NEWS change from r20096 and make it its own proper entry
in the v1.3 section (not the v1.2.5 section -- thanks for noticing,
Tim!).  Refs trac:1705 -- this commit should be considered part of that
CMR.

This commit was SVN r20115.

The following SVN revision numbers were found above:
  r20096 --> open-mpi/ompi@cad0f41391

The following Trac tickets were found above:
  Ticket 1705 --> https://svn.open-mpi.org/trac/ompi/ticket/1705
2008-12-11 15:51:48 +00:00
Josh Hursey
ce8d18bfda This commit changes the use of the deprecated cr_request_file() to use the cr_request_checkpoint() interface to BLCR. Additional configure checks are added to use the best available checkpointing interface available for the BLCR installed on the system (default: cr_request_checkpoint()).
This commit fixes trac:1691

Thanks to Matthias Hovestadt for identifying this issue.

This commit was SVN r20114.

The following Trac tickets were found above:
  Ticket 1691 --> https://svn.open-mpi.org/trac/ompi/ticket/1691
2008-12-11 00:08:34 +00:00
Nysal Jan
6a5454b76a Fixes crash in openib BTL on a heterogeneous cluster Refs trac:1700
This commit was SVN r20113.

The following Trac tickets were found above:
  Ticket 1700 --> https://svn.open-mpi.org/trac/ompi/ticket/1700
2008-12-10 22:07:48 +00:00
Tim Mattox
4fa13a1a4d Fix two typos inside of comments.
This commit was SVN r20112.
2008-12-10 21:18:13 +00:00
Shiqing Fan
5ae5f0e173 - 4/4 commit for Windows Visual Studio and CCP support:
unnecessary clean up to non windows related files (within ifdef __WINDOWS__).

This commit was SVN r20111.
2008-12-10 21:13:27 +00:00
Shiqing Fan
20cea164db - 3/4 commit for Windows Visual Studio and CCP support:
corrections to non-windows files (but within ifdef __WINDOWS__)
  type casts, event library for windows use win32. 
  in orte runtime, add windows sockets handling and object construction.

This commit was SVN r20110.
2008-12-10 21:13:10 +00:00
Shiqing Fan
8673f19f50 - 2/4 commit for Windows Visual Studio and CCP support:
changes to the already existing ccp components
  event/win32.c: merge old FD handling into new
  opal_installdirs_windows.c:fix the registry handling

This commit was SVN r20109.
2008-12-10 21:01:54 +00:00
Shiqing Fan
a5281f0434 - 1/4 commit for Windows Visual Studio and CCP support:
CMakeLists and .windows files.
  In contribs preconfigured and precompiled parts.

This commit was SVN r20108.
2008-12-10 20:59:20 +00:00
Ralph Castain
728a24c8ec After considerable patience and help with debugging/testing from Tim M and Jeff S, return a completed and pretty well tested patch of the IOF to the trunk. This commit includes the previously reverted r20074, r20068, and r20064, as well as changes to fix those commits.
Basically, the remaining problem turned out to be:

1. closing stdout/stderr during orte_finalize of mpirun

2. inadvertently setting up a write event on fd = -1

3. devising a scheme to more accurately track when the stdin write event was active vs closed so it only got released once

This passed prelim MTT testing by Jeff and Tim, but should soak for awhile before migrating to 1.3.

This commit was SVN r20106.

The following SVN revision numbers were found above:
  r20064 --> open-mpi/ompi@a07660aea8
  r20068 --> open-mpi/ompi@ec930d14a9
  r20074 --> open-mpi/ompi@2940309613
2008-12-10 20:40:47 +00:00
Ralph Castain
9d7cb82bba Modify the daemon cmd processor to relay and then process the cmd locally. We couldn't do this before due to the daemon's needing to update contact info prior to doing the relay. However, the new routed system plus the inclusion of the nidmap in the launch message now makes this possible.
It is a small launch performance improvement as now we relay the launch cmd across to the next daemon before taking the time to launch our own local procs. Still, it does allow more parallel operations during the launch procedure.

This commit was SVN r20104.
2008-12-10 19:18:36 +00:00
Josh Hursey
67ae66326c remove unused variable
This commit was SVN r20103.
2008-12-10 18:08:46 +00:00
Ralph Castain
7e3ddb09d3 As requested by Aurelien at the July design meeting - long time coming, but finally got around to it.
Enable one mpirun to act as the server for another mpirun when doing MPI_Publish_name and its associated operations. The user is responsible, of course, for ensuring that the mpirun acting as a server outlives any mpiruns using it in that capacity.

Add a cmd line option to mpirun --report-pid that prints out mpirun's pid. Allow the --ompi-server option to now take pid:# (or PID:#) of the mpirun to be used as the server, and then look that pid up by searching the local mpirun contact infos for it.

This commit was SVN r20102.
2008-12-10 17:10:39 +00:00
Josh Hursey
df75abd6b2 Fix a warning. Thanks to Jeff for noticing.
This should be moved to v1.3 as well.

This commit was SVN r20101.
2008-12-10 15:38:12 +00:00
Ralph Castain
1ace83c470 Enable modex-less launch. Consists of:
1. minor modification to include two new opal MCA params:
   (a) opal_profile: outputs what components were selected by each framework
       currently enabled for most, but not all, frameworks
   (b) opal_profile_file: name of file that contains profile info required
       for modex

2. introduction of two new tools:
   (a) ompi-probe: MPI process that simply calls MPI_Init/Finalize with
       opal_profile set. Also reports back the rml IP address for all
       interfaces on the node
   (b) ompi-profiler: uses ompi-probe to create the profile_file, also
       reports out a summary of what framework components are actually
       being used to help with configuration options

3. modification of the grpcomm basic component to utilize the
   profile file in place of the modex where possible

4. modification of orterun so it properly sees opal mca params and
   handles opal_profile correctly to ensure we don't get its profile

5. similar mod to orted as for orterun

6. addition of new test that calls orte_init followed by calls to
   grpcomm.barrier

This is all completely benign unless actively selected. At the moment, it only supports modex-less launch for openib-based systems. Minor mod to the TCP btl would be required to enable it as well, if people are interested. Similarly, anyone interested in enabling other BTL's for modex-less operation should let me know and I'll give you the magic details.

This seems to significantly improve scalability provided the file can be locally located on the nodes. I'm looking at an alternative means of disseminating the info (perhaps in launch message) as an option for removing that constraint.

This commit was SVN r20098.
2008-12-09 23:49:02 +00:00
Jeff Squyres
ba359623e0 Fix a few places where we didn't properly escape []; consolidate all debug/optimization flag checking to use AC quadrigraphs properly
This commit was SVN r20097.
2008-12-09 23:42:28 +00:00
Jeff Squyres
cad0f41391 Also strip out -g[0-9] (in addition to -g) from CCASFLAGS on Leopard. Fixes trac:1701. Thanks to Barry Smith for reporting the problem.
This commit was SVN r20096.

The following Trac tickets were found above:
  Ticket 1701 --> https://svn.open-mpi.org/trac/ompi/ticket/1701
2008-12-09 23:42:16 +00:00
Jeff Squyres
3950796fa8 Add scripty-foo to set perms right on hg trees to share (at least on
milliways / www.open-mpi.org).

This commit was SVN r20095.
2008-12-09 20:26:29 +00:00
Jeff Squyres
affbebb15b Add a "Known issues" section, and the connectx XRC + message
coalescing bug.

This commit was SVN r20094.
2008-12-09 20:18:16 +00:00
Pavel Shamis
068054132a Temporary work around for #1693 (osu_bibw segfault in xrc + coalescing mode)
This commit was SVN r20093.
2008-12-09 19:21:54 +00:00
Ralph Castain
e28210d0dc Revert r20074, r20068, and r20064: remove the IOF proc completion code pending further off-trunk work.
This commit was SVN r20089.

The following SVN revision numbers were found above:
  r20064 --> open-mpi/ompi@a07660aea8
  r20068 --> open-mpi/ompi@ec930d14a9
  r20074 --> open-mpi/ompi@2940309613
2008-12-09 17:11:59 +00:00
Ralph Castain
6141401331 Update the LANL platform files for 1.3
This commit was SVN r20088.
2008-12-09 16:35:51 +00:00
Ralph Castain
61c21d787d Add missing param in tm launcher
This commit was SVN r20087.
2008-12-09 13:31:33 +00:00
Ralph Castain
c230a49140 Set ignores
This commit was SVN r20086.
2008-12-09 01:18:49 +00:00
Ralph Castain
6e050bc78c Update the route when it comes from a different job family.
This fixes ticket #1699

This commit was SVN r20085.
2008-12-09 01:16:18 +00:00
Ralph Castain
ce4018efeb Take a step back on the slurm and tm launchers. Problems were occurring in the MTT runs, although not under non-MTT scenarios. Preserve the modified plm versions in new components that are ompi_ignored until we can resolve the problems.
This will allow for better MTT coverage until the problem can be better understood.

This commit was SVN r20083.
2008-12-09 00:32:04 +00:00
Ralph Castain
89792bbc72 May as well have the other "clean" outputs use the same channel
This commit was SVN r20082.
2008-12-08 19:37:22 +00:00
Ralph Castain
51789c9049 Cleanup the output for nodename resolve reporting
This commit was SVN r20081.
2008-12-08 19:00:36 +00:00
Ralph Castain
c2b18b363d Initialize a variable before use
This commit was SVN r20080.
2008-12-08 16:16:40 +00:00
George Bosilca
df13c2810d Undo the last commit related to the Fortran profiling. After spending few hours
pondering about this problem, we came to the conclusion that the best approach
is to keep what we had before (i.e. the original approach).

The main reason for this is being nice with tool developers. In the current
incarnation, they can either catch the Fortran calls or the C calls. If they
provide both, then they will have to figure out how to cope with the double
calls (as your example highlight).

Here is the behavior Open MPI will stick too:
Fortran MPI  -> C MPI
Fortran PMPI -> C MPI

However, the is another possible approach. This might avoid the double calls
while preserving the tool writers friendliness. This possible approach will do:
   Fortran MPI  -> C MPI
   Fortran PMPI -> C PMPI
                     ^
Unfortunately, we will have to heavily modify all files in the Fortran
interface layer in order to support this approach.

This commit was SVN r20079.
2008-12-06 00:35:32 +00:00
George Bosilca
54d9df317f Fix the Fortran profiling layer to insure that we call the C PMPI_ functions instead of
their MPI_ counterpart. This allow the profiling layer to catch each MPI function only once,
from C and Fortran.

This commit was SVN r20076.
2008-12-05 16:52:25 +00:00
Jeff Squyres
01feac443e Add a missing header file; we won't find mca_topo_base_comm_1_0_0_t
in optimized builds without this.

This commit was SVN r20075.
2008-12-05 14:32:50 +00:00
Ralph Castain
2940309613 Attempt to solve a race condition showing up in some MTT runs. There were three entry points for proc termination info into the ODLS:
1. a direct callback from waitpid - this set the waitpid_fired flag

2. a notify event callback from the IOF - this set the iof complete flag

3. a message via the daemon cmd processor from the proc "de-registering" the sync, thus indicating it was going through MPI_Finalize.

The problem is that these could overlap, with the first two allowing the orted to declare the proc complete before the daemon had responded to #3.

This change forces all three events to flow through the daemon cmd processor, thus ensuring an ordered handling. I'm not certain this will solve the problem, but will await further MTT reports to see. Unfortunately, the problem doesn't show up on any manual or script-based tests I have been able to run, even when I duplicate the exact cmd that fails under MTT.

This commit was SVN r20074.
2008-12-05 04:20:00 +00:00
Tim Mattox
a16de4ba54 The OFED install on odin is no longer in /usr/local/ofed
and can now be found automatically.

This commit was SVN r20073.
2008-12-05 01:02:09 +00:00
Brian Barrett
8a8cf96b6c Provide configure parameter to allow the disabling of reading parameters
and components from the home directory for platforms that are bad at
reading in files from home directory at scale (like Red Storm)

This commit was SVN r20069.
2008-12-04 01:51:44 +00:00
Ralph Castain
ec930d14a9 Ensure IOF tags are properly assigned to sinks and read events
This commit was SVN r20068.
2008-12-04 01:09:20 +00:00
Jeff Squyres
06097db928 Fixes trac:1667. Ensure to fill in the source_file if it was requested.
This commit was SVN r20067.

The following Trac tickets were found above:
  Ticket 1667 --> https://svn.open-mpi.org/trac/ompi/ticket/1667
2008-12-03 22:17:50 +00:00
Ralph Castain
a07660aea8 Bring over the IOF completion changes. This commit fixes the long-occurring problem whereby application procs could, under some circumstances, lose their final prints to stdout/err. The commit includes:
1. coordination of job completion notification to include a requirement for both waitpid detection AND notification that all iof pipes have been closed by the app

2. change of all IOF read and write events to be non-persistent so they can properly be shutdown and restarted only when required

3. addition of a delay (currently set to 10ms) before restarting the stdin read event. This was required to ensure that the stdout, stderr, and stddiag read events had an opportunity to be serviced in scenarios where large files are attached to stdin.

This commit was SVN r20064.
2008-12-03 17:45:42 +00:00
Shiqing Fan
d06604c258 Get rid of the compiler warning message when --enable-picky is used.
Do the checks according to inter/intracommunicator flags.

This commit was SVN r20063.
2008-12-03 17:44:21 +00:00
Brad Benton
0b83ab39e5 Added the part number for IBM's 2nd rev of the eHCA to the eHCA param stanza
This commit was SVN r20057.
2008-12-03 14:35:43 +00:00