1
1
Commit Graph

13352 Commits

Author SHA1 Message Date
Rainer Keller
916eb1fb1e - As proposed in RFC and telcon, warn the user about deprecated
functionality (per MPI-2.1). This warning can be toggled using
   --enable-mpi-interface-warning (default OFF), but can be
   selectively turned on passing
       mpicc -DOMPI_WANT_MPI_INTERFACE_WARNING

   Using icc, gcc < 4.5, warnings (such as in mpi2basic_tests) show:
     type_vector.c:83: warning: ‘MPI_Type_hvector’ is deprecated
     (declared at /home/../usr/include/mpi.h:1379)

   Using gcc-4.5 (gcc-svn) these show up as:
     type_vector.c:83: warning: ‘MPI_Type_hvector’ is deprecated
     (declared at /home/../usr/include/mpi.h:1379):
     MPI_Type_hvector is superseded by MPI_Type_create_hvector in MPI-2.0


   Jeff and I propose to turn such warnings on with Open MPI-1.7 by default.


 - Detection of user-level compiler is handled using the preprocessor
   checks of GASnet's other/portable_platform.h (thanks to Paul Hargrove
   and Dan Bonachea) adapted into ompi/include/mpi_portable_platform.h
   (see comments).

   The OMPI-build time detection is output (Familyname and Version)
   with ompi_info.

   This functionality (actually any upcoming __attribute__) are turned
   off, if a different compiler (and version) is being detected.


 - Note, that any warnings regarding (user-compiler!=build-compiler)
   as discussed in the RFC are _not_ included for now.


 - Tested on Linux with --enable-mpi-interface-warning on
   Linux, gcc-4.5 (deprecated w/ specific msg)
   Linux, gcc-4.3 (deprecated w/o specific msg)
   Linux, pathscale 3.1 (deprecated w/o specific msg)
   Linux, icc-11.0 (deprecated w/o specific msg)

   Linux, PGI-8.0.6 accepts __deprecated__ but does not issue a warning,
   further investigation needed...

This commit was SVN r21262.
2009-05-22 04:39:43 +00:00
Rainer Keller
225a1d6d8e - For memcpy and memset need string.h
This commit was SVN r21259.
2009-05-21 22:36:06 +00:00
Rainer Keller
98bdba786a - Change logic a bit to make it easier for us
to deselect parts.

This commit was SVN r21258.
2009-05-21 22:15:51 +00:00
Ralph Castain
cc7620c210 Fix orte-ps so it properly ignores/reports stale HNPs, but continues to provide output on running ones. Add a timeout on the send side of the comm so we don't hang while trying to send the info request to the non-existent HNP.
This commit was SVN r21257.
2009-05-21 02:42:21 +00:00
George Bosilca
26342508de It is supposed not to be set until we detect our local rank ...
This commit was SVN r21256.
2009-05-20 17:41:13 +00:00
Jeff Squyres
154dc846b5 Remove extra ";", which caused a bazillion warnings in MTT.
This commit was SVN r21255.
2009-05-20 13:16:31 +00:00
Rainer Keller
5c80033aa2 - Eliminate icc warning w/ regard to __attribute__((__format__)) on
function pointers... Needed checking in opal_check_attributes.m4

This commit was SVN r21254.
2009-05-20 00:39:22 +00:00
George Bosilca
7bd97ac17b Try to deal with the ticket #1831. I think we might reach a case where the
shm_fifos values are only partially updated, and this leads to wrong values
for the offset. Moving the write barrier at the right place, plus forcing
some read barriers might help.

In addition I get rid of the sm_offset array which is completely useless.

This commit was SVN r21253.
2009-05-19 22:50:44 +00:00
Abhishek Kulkarni
b48f13cda4 Minor fix in the man page for MPI_Comm_remote_size
This commit was SVN r21252.
2009-05-19 15:23:44 +00:00
Jeff Squyres
30c126f45b Add note about Fortran EXTRA_STATE stuff
This commit was SVN r21251.
2009-05-18 15:27:37 +00:00
Jeff Squyres
8dc83b0865 Use the newer ORTE_NAME_PRINT instead of ORTE_NAME_ARGS
This commit was SVN r21250.
2009-05-18 14:38:13 +00:00
Ralph Castain
fc88f04bdd Initialize var before use - thanks to Ashley for pointing out the problem
This commit was SVN r21249.
2009-05-18 14:21:29 +00:00
Ralph Castain
f139cfd28a Fully enable the use of static ports to minimize connections on mpirun. When static ports are provided, daemons will automatically use routes defined by the selected routed module to callback to mpirun during startup, thus elimating the dedicated daemon-to-mpirun connection. Therefore, the total number of connections on mpirun will equal the fanout of the routed module (instead of #nodes in job).
Add a new tm ess module that exploits this capability.

Update the various plm modules to enable it - just a minor change reflecting an added param to a plm base function.

Additional fixes included:

1. remove an erroneous cleanup of session directories in the tool finalize procedure - tools don't create session directories to begin with!

2. fix a duplicate free when attempting to execute a non-existent app

3. cleanup an typo in the comm utilities 

4. fix comm_spawn - was perturbed by the changes in pack/unpack of orte_job_t to properly support orte-ps

Been tested on slurm and tm machines, using all tests in orte/test/mpi. May run into issue with command line length on large jobs due to inclusion of node info to support static ports - will fix this next with addition of regexp generator to compress that info.

This commit was SVN r21248.
2009-05-16 04:15:55 +00:00
Jeff Squyres
c6bfb780e6 Remove an outdated/incorrect comment.
This commit was SVN r21247.
2009-05-16 00:51:13 +00:00
Ralph Castain
484a6f58f2 Repair orte-ps by updating some of the interface code. Add ability to recover from attempting to contact non-responsive HNPs due to stale session directories. Implement the -j option. Turn "off" the -p option as it doesn't work and will take a little while to actually implement it (if anyone really cares).
This commit was SVN r21245.
2009-05-15 13:21:18 +00:00
Jeff Squyres
17761f60b9 Clean up a few compiler warnings
This commit was SVN r21243.
2009-05-14 15:40:38 +00:00
Jeff Squyres
efd229b56b Clean up bunches of compiler warnings
This commit was SVN r21242.
2009-05-14 15:39:53 +00:00
Rainer Keller
fc65875542 - As in r21238, do not use printf %z for size_t...
This commit was SVN r21239.

The following SVN revision numbers were found above:
  r21238 --> open-mpi/ompi@b2f8095ba7
2009-05-14 14:11:31 +00:00
Rainer Keller
b2f8095ba7 - Update to fix in r21234: as discussed on devel@,
for printing size_t use "%lu" and cast to (unsigned long).

This commit was SVN r21238.

The following SVN revision numbers were found above:
  r21234 --> open-mpi/ompi@22b6177fb9
2009-05-14 14:10:22 +00:00
Rainer Keller
d5cb1b4f97 - Shown by __opal_attribute_format__, probably macro
was converted from opal_output_verbose

This commit was SVN r21237.
2009-05-14 13:57:37 +00:00
Shiqing Fan
a006c9ae97 This should also go into the tarball.
This commit was SVN r21236.
2009-05-14 13:34:17 +00:00
Pavel Shamis
14eb11c18a Removing unused and duplicated XRC code.
I just found that we have 2 place where we call for XRC domain
creation. First one in init_one_device() and second one prepare_device_for_use().
They have absolutely identical code, but the call in init_one_device() is useless
because on this stage we don't know about QP configuration and we don't know if we need 
XRC at all. So I removing the duplicated code from init_one_device().

This commit was SVN r21235.
2009-05-14 13:15:00 +00:00
Rainer Keller
22b6177fb9 - Use the "z" length modifier for size_t arguments for printf.
This commit was SVN r21234.
2009-05-14 00:52:20 +00:00
Rainer Keller
b98a095d22 - Similar to r21229, check for return code from
orte_rml_base_update_contact_info

This commit was SVN r21233.

The following SVN revision numbers were found above:
  r21229 --> open-mpi/ompi@9ad9b20847
2009-05-14 00:36:51 +00:00
Rainer Keller
9b7c4c2354 - Update comment
This commit was SVN r21232.
2009-05-14 00:32:43 +00:00
Rainer Keller
7950e37eed - Fix Coverity CID 890;
Use snprintf instead sprintf

This commit was SVN r21231.
2009-05-14 00:30:32 +00:00
Rainer Keller
b3daf7184a - Fix Coverity CID 1272:
Need to release src in case of error

 - Fix Coverity CID 1271:
   basename is allocated by opal_basename and later overwritten

This commit was SVN r21230.
2009-05-14 00:28:35 +00:00
Rainer Keller
9ad9b20847 - Fix Coverity CID #15.
This commit was SVN r21229.
2009-05-14 00:25:02 +00:00
Rainer Keller
36ee105d6a - Fix Coverity CID #1207:
set the tmp_str to NULL, so we don't have any double-free...
   Additionally, we should check for malloc returning NULL...

This commit was SVN r21228.
2009-05-14 00:21:15 +00:00
Rainer Keller
3f7f2b6f0f - Multiple functions, that allocate and return new
strings, aka should have __opal_attribute_malloc__
   update comment of opal_path_access -- new string returned.

This commit was SVN r21227.
2009-05-14 00:20:07 +00:00
Rainer Keller
73fd329cbd - Add the proper __opal_attribute_format__(__printf__...) to
declarations.

This commit was SVN r21226.
2009-05-14 00:10:59 +00:00
Iain Bason
a282cf062c Roll back Jeff's roll back (r21028) of Iain's changes (r20926, r20941, r20950).
It turns out that the "Intel" interpretation of the MPI spec violates
the Fortran standard (section 14.6.1.1 of the Fortran 95 standard).

This commit was SVN r21225.

The following SVN revision numbers were found above:
  r20926 --> open-mpi/ompi@0a24eadaad
  r20941 --> open-mpi/ompi@045b0e8871
  r20950 --> open-mpi/ompi@73af921c22
  r21028 --> open-mpi/ompi@25249c9843
2009-05-13 18:32:13 +00:00
Shiqing Fan
3137001772 Read from the correct registry entry on Windows Vista and Server 2008.
This commit was SVN r21224.
2009-05-13 15:56:37 +00:00
Ralph Castain
fd5dd9c4cb Ensure we correctly cycle through map_by_slot when mapping leftover procs in rankfile mapper
This commit was SVN r21219.
2009-05-12 15:41:55 +00:00
Ralph Castain
aa25a51c92 Do not mark the mpi_paffinity_alone param as deprecated so we don't scare Jeff...er...users.
This commit was SVN r21218.
2009-05-12 15:41:11 +00:00
Josh Hursey
76ecb7d2fe Make sure to initialize orte_process_info.proc_type properly on restart. Otherwise the application will have type 'NONE' instead of 'APP'.
This commit was SVN r21217.
2009-05-12 14:14:05 +00:00
Josh Hursey
ec6c5bf5e9 Make sure that when we destruct the pointer array that we set the address to NULL and size to 0. This will help to flag accidental usage of a destructed pointer array object.
This commit was SVN r21216.
2009-05-12 14:13:07 +00:00
Jeff Squyres
424d9ed0c8 Fix "reorder" parameter to be logical, per MPI spec.
This commit was SVN r21215.
2009-05-12 14:08:14 +00:00
Jeff Squyres
05d87ee7b4 Because this error comes up over and over and over and over and ...
Libltdl erroneously returns an error string of "file not found" for
lots of reasons, even if the file really *is* there, but just failed
to dlopen() for some reason.  So if lt_dlerror() returns "file not
found", do some simple hueristics and if we *do* find a file, print a
slightly better error message.

This commit was SVN r21214.
2009-05-12 12:41:42 +00:00
Shiqing Fan
9f624cd9a2 A small fix, the right one to use in orte.
This commit was SVN r21213.
2009-05-12 09:53:34 +00:00
Shiqing Fan
56866e68e9 A few typecasts.
This commit was SVN r21212.
2009-05-12 09:46:52 +00:00
Shiqing Fan
88b600c24d set executable property for windows flex.
This commit was SVN r21211.
2009-05-12 09:06:29 +00:00
Ralph Castain
1d1e29d482 Adjust platform files to use new opal_paffinity_alone param
This commit was SVN r21210.
2009-05-12 02:22:58 +00:00
Ralph Castain
d396f0a6fc Per the discussion on the devel list, move the binding of processes to processors from MPI_Init to process start. This involves:
1. replacing mpi_paffinity_alone with opal_paffinity_alone - for back-compatibility, I have aliased mpi_paffinity_alone to the new param name. This caus
es a mild abstraction break in the opal/mca/paffinity framework - per the devel discussion...live with it. :-) I also moved the ompi_xxx global variable
 that tracked maffinity setup so it could be properly closed in MPI_Finalize to the opal/mca/maffinity framework to avoid an abstraction break.

2. Added code to the odls/default module to perform paffinity binding and maffinity init between process fork and exec. This has been tested on IU's odi
n cluster and works for both MPI and non-MPI apps.

3. Revise MPI_Init to detect if affinity has already been set, and to attempt to set it if not already done. I have *not* tested this as I haven't yet f
igured out a way to do so - I couldn't get slurm to perform cpu bindings, even though it supposedly does do so.

This has only been lightly tested and would definitely benefit from a wider range of evaluation...

This commit was SVN r21209.
2009-05-12 02:18:35 +00:00
Ralph Castain
fa839f4a30 Fix a bug in the rankfile mapper when nooversubscribe is set
This commit was SVN r21208.
2009-05-11 23:44:59 +00:00
Ralph Castain
388292aed5 Fix my old "friend" singleton comm_spawn
This commit was SVN r21207.
2009-05-11 14:53:02 +00:00
Ralph Castain
8edc9d8a7d Update LANL platform files to no-build carto
This commit was SVN r21204.
2009-05-11 14:14:14 +00:00
Ralph Castain
c45ff0d59f Take the next step towards fully utilizing static ports for the daemons to eliminate the initial "phone home" to mpirun by modifying the orted termination procedure to eliminate the need for a full barrier-like operation. Instead, we add a "onesided" barrier to the grpcomm framework API that releases the orted once it has completed its own contribution to the barrier - i.e., the orteds now exit as the "ack" message rolls up towards mpirun instead of sending the "ack" directly to mpirun.
This causes the orteds in the routing tree to remain alive until all termination "acks" from orteds below them have passed through. Thus, if we use static ports, we no longer require a direct orted-to-mpirun connection.

Also modify the binomial routed module so it conforms to what all the other routed modules do and have all messages pass along the routing tree instead of short-circuiting between orteds. This further reduces the number of ports being opened on backend nodes.

This commit was SVN r21203.
2009-05-11 14:11:44 +00:00
Ralph Castain
c6f0499720 Some cleanups required from last night's commits to resolve some race conditions and ensure we cleanup properly. Also, remove some debug output that was unintentionally left "on" by default.
This commit was SVN r21202.
2009-05-11 14:03:07 +00:00
Ralph Castain
10a694ea43 The current errmgr.register_callback API takes a jobid as one of its argument. The intent was to have the errmgr check the jobid of the job being reported to it and, if it matches the jobid that was registered, call the specified callback function.
Unfortunately, we assign the jobid during the plm.spawn procedure - which means it happens -after- control of the job has passed out of the range of mpirun (or whatever program is spawning the job), so it is too late for that main program to register a callback function. If the main program registers tha callback -after- we return from plm.spawn, then it (a) cannot get a callback for failed-to-start, and (b) will miss the callback if a proc aborts in the time between job launch and the call to errmgr.register_callback.

This commit fixes the problem by adding callback-related fields to the orte_job_t object. Thus, the main program can specify what job states should initiate a callback, what function is to be called, and what data is to be passed back by simply filling in the orte_job_t fields prior to calling plm.spawn.

Also, fully implement the "copy" function for the orte_job_t object.

NOTE: as a result of this change, the errmgr.register_callback API may no longer be of any value.

This commit was SVN r21200.
2009-05-11 03:38:15 +00:00