1
1
Граф коммитов

365 Коммитов

Автор SHA1 Сообщение Дата
Nysal Jan
0538b1a948 Adding GPFS to the list of file systems checked
This commit was SVN r22612.
2010-02-12 14:15:39 +00:00
Rainer Keller
ea4de16561 - Check whether file is opened on network file-system.
If file does not exist, check the directory it lives in...
   Maybe used by caller, trying to open mmap() on NFS, Lustre or
   Panasas (thanks Sam).
   For now, this is used to warn about the usage of mmap on such FS.

   Please note, that Ralph mentioned the orte_no_session_dir parameter.
   The help message includes a reference to this.

   Tested on NFS and Lustre on Linux on
     smoky: mpirun --mca orte_tmpdir_base $HOME/tmp -np 2 ./mpi_stub
     jaguar: mpirun ... --mca orte_tmpdir_base /tmp/work/$USER ...

   Fixes trac:1354

   This should   cmr:v1.5   once it has soaked and is shown to work on
   Solaris

This commit was SVN r22604.

The following Trac tickets were found above:
  Ticket 1354 --> https://svn.open-mpi.org/trac/ompi/ticket/1354
2010-02-10 23:18:29 +00:00
Brian Barrett
86d8356b13 Updates to allow OMPI to build on Cray XT platforms running Catamount
This commit was SVN r22381.
2010-01-07 18:14:03 +00:00
George Bosilca
e55f89dda7 Add a format to stop the complaining.
This commit was SVN r22276.
2009-12-08 17:26:42 +00:00
Rainer Keller
4ce710a147 - The internal function may fail make_opt (e.g. OPAL_ERR_OUT_OF_RESOURCE),
pass that on to callers of opal_cmd_line_make_opt_mca().
   Thanks to Thomas Naughton III.

 - Additionally, cmd-line parameters passed in table to opal_cmd_line_create()
   may be wrong (e.g. OPAL_ERR_BAD_PARAM), which may be missed in the
   loop.

This commit was SVN r22153.
2009-10-28 15:14:31 +00:00
Ralph Castain
c991d155f4 Fix a minor omission in opal/util/path. If someone provides a relative path to the current working directory, without starting it with a
'.', we should still find the executable - it is in a directory beneath us.

In other words, if someone gives us "foo/bar" instead of "./foo/bar", we should still be able to find bar

This commit was SVN r22110.
2009-10-20 04:05:16 +00:00
Ralph Castain
c58a30ea10 Add two new functions:
1. check for loopback interface

2. convert tuple addresses to ip addrs + mask

This commit was SVN r22080.
2009-10-09 15:24:41 +00:00
George Bosilca
b04a42ba3b Add the format to the opal_output call.
This commit was SVN r22041.
2009-09-30 23:33:12 +00:00
Ralph Castain
84a45fea0a Add a convenience macro for assembling network addresses
This commit was SVN r22036.
2009-09-30 14:38:52 +00:00
Ralph Castain
176fdd3a83 Add a new API to the show_help system that allows external users (e.g., libraries built upon OMPI) to define their own locations for show_help files. This allows such users to exploit the rather nice features of the OPAL show_help system -without- interfering with the ability of the ORTE and OMPI layers to use show_help themselves.
Reviewed by Jeff to protect toes...and to get some good comments :-)

This commit was SVN r22026.
2009-09-29 02:07:46 +00:00
Ralph Castain
7765c71428 Add a macro for formatting IP addresses for printing
This commit was SVN r21985.
2009-09-22 00:53:54 +00:00
Rainer Keller
8e1b23779f - Replace combinations of
#if defined (c_plusplus)
          defined (__cplusplus)
   followed by
      extern "C" {
   and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.

   Notable exceptions are:
    - opal/include/opal_config_bottom.h:
      This is our generated code, that itself defines BEGIN_C_DECL and
      END_C_DECL
    - ompi/mpi/cxx/mpicxx.h:
      Here we do not include opal_config_bottom.h:                                 
    - Belongs to external code:                                                    
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c        
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h        
    - opal/include/opal/prefetch.h:
      Has C++ specific macros that are protected:                                  

    - Had #if ... } #endif  _and_ END_C_DECLS (aka end up with 2x
      END_C_DECLS)
      ompi/mca/btl/openib/btl_openib.h
    - opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
    - opal/win32/ompi_process.h: had extern "C"\n {...
      opal/win32/ompi_process.h: dito
    - ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
      ompi/mpi/f90/test/align_c.c: dito
    - ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
    - ompi/mpi/f90/xml/common-C.xsl: Amend

   Tested on linux using --with-openib and --with-mx

   The following do not contain either opal_config.h, orte_config.h or
   ompi_config.h
   (but possibly other header files, that include one of the above):
      ompi/mca/bml/r2/bml_r2_ft.h
      ompi/mca/btl/gm/btl_gm_endpoint.h
      ompi/mca/btl/gm/btl_gm_proc.h
      ompi/mca/btl/mx/btl_mx_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_frag.h
      ompi/mca/btl/ofud/btl_ofud_proc.h
      ompi/mca/btl/openib/btl_openib_mca.h
      ompi/mca/btl/portals/btl_portals_endpoint.h
      ompi/mca/btl/portals/btl_portals_frag.h
      ompi/mca/btl/sctp/btl_sctp_endpoint.h
      ompi/mca/btl/sctp/btl_sctp_proc.h
      ompi/mca/btl/tcp/btl_tcp_endpoint.h
      ompi/mca/btl/tcp/btl_tcp_ft.h
      ompi/mca/btl/tcp/btl_tcp_proc.h
      ompi/mca/btl/template/btl_template_endpoint.h
      ompi/mca/btl/template/btl_template_proc.h
      ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
      ompi/mca/btl/udapl/btl_udapl_endpoint.h
      ompi/mca/btl/udapl/btl_udapl_mca.h
      ompi/mca/btl/udapl/btl_udapl_proc.h
      ompi/mca/mtl/mx/mtl_mx_endpoint.h
      ompi/mca/mtl/mx/mtl_mx.h
      ompi/mca/mtl/psm/mtl_psm_endpoint.h
      ompi/mca/mtl/psm/mtl_psm.h
      ompi/mca/pml/cm/pml_cm_component.h
      ompi/mca/pml/csum/pml_csum_comm.h
      ompi/mca/pml/dr/pml_dr_comm.h
      ompi/mca/pml/dr/pml_dr_component.h
      ompi/mca/pml/dr/pml_dr_endpoint.h
      ompi/mca/pml/dr/pml_dr_recvfrag.h
      ompi/mca/pml/example/pml_example.h
      ompi/mca/pml/ob1/pml_ob1_comm.h
      ompi/mca/pml/ob1/pml_ob1_component.h
      ompi/mca/pml/ob1/pml_ob1_endpoint.h
      ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
      ompi/mca/pml/ob1/pml_ob1_recvfrag.h
      ompi/mca/pml/v/pml_v_output.h
      opal/include/opal/prefetch.h
      opal/mca/timer/aix/timer_aix.h
      opal/util/qsort.h
      test/support/components.h

This commit was SVN r21855.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2009-08-20 11:42:18 +00:00
Rainer Keller
c3226b36b7 - It's BUS_ADRERR, not BUSADRERR...
- Fix typos.

This commit was SVN r21808.
2009-08-12 13:07:04 +00:00
George Bosilca
81c16ac317 Add missing header.
This commit was SVN r21653.
2009-07-13 19:40:31 +00:00
Jeff Squyres
d7d07e0720 Improve the help messages from r20706.
This commit was SVN r21616.

The following SVN revision numbers were found above:
  r20706 --> open-mpi/ompi@248bbb8a2f
2009-07-09 11:58:31 +00:00
George Bosilca
2570a15651 Add a TODO bullet for later processing ...
This commit was SVN r21611.
2009-07-07 17:27:47 +00:00
Shiqing Fan
0e09cb650e The kernel index of the network interface wasn't set on Windows, it really caused a lot of problems.
This commit was SVN r21587.
2009-07-02 14:44:41 +00:00
Rainer Keller
b572dc3591 - As discussed revert r21330, Fortran-configure info should
not end up in OPAL
 - Will post an updated patch for the OMPI_ALIGNMENT_ parts (within C).

This commit was SVN r21342.

The following SVN revision numbers were found above:
  r21330 --> open-mpi/ompi@95596d1814
2009-06-01 19:02:34 +00:00
Rainer Keller
95596d1814 - Move alignment and size output generated by configure-tests
into the OPAL namespace, eliminating cases like opal/util/arch.c
   testing for ompi_fortran_logical_t.
   As this is processor- and compiler-related information
   (e.g. does the compiler/architecture support REAL*16)
   this should have been on the OPAL layer.
 - Unifies f77 code using MPI_Flogical instead of opal_fortran_logical_t

 - Tested locally (Linux/x86-64) with mpich and intel testsuite
   but would like to get this week-ends MTT output


 - PLEASE NOTE: configure-internal macro-names and
   ompi_cv_ variables have not been changed, so that
   external platform (not in contrib/) files still work.

This commit was SVN r21330.
2009-05-30 15:54:29 +00:00
Jeff Squyres
510ec093b6 Remove some dead code
This commit was SVN r21275.
2009-05-26 21:17:00 +00:00
Iain Bason
e7ff2368d6 This fixes trac:1930.
Emit a more informative error message when the file descriptor limit is
reached during an accept() call.  Also, abort when the accept fails to
avoid an infinite loop.

Emit a more informative error message when the help file can't be opened.

This commit was SVN r21271.

The following Trac tickets were found above:
  Ticket 1930 --> https://svn.open-mpi.org/trac/ompi/ticket/1930
2009-05-26 20:03:21 +00:00
Rainer Keller
3f7f2b6f0f - Multiple functions, that allocate and return new
strings, aka should have __opal_attribute_malloc__
   update comment of opal_path_access -- new string returned.

This commit was SVN r21227.
2009-05-14 00:20:07 +00:00
Rainer Keller
73fd329cbd - Add the proper __opal_attribute_format__(__printf__...) to
declarations.

This commit was SVN r21226.
2009-05-14 00:10:59 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Rainer Keller
b8bb7865bc - Fix Coverity CID 9
Check for error in fcntl, as we depend on close-on-exec,
   F_SETFD will result in -1 in case of error (stored in errno).
   To not have a follow-up warning about not freeing filename, move up.

This commit was SVN r21171.
2009-05-05 19:04:58 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Rainer Keller
6c1cce8761 - For the upcoming header cleanup commit,
several header files (previously included by header-files)
   now have to be moved "upward".
   This is mainly system headers such as string.h, stdio.h and for
   networking, but also some orte headers.

This commit was SVN r21095.
2009-04-29 00:49:23 +00:00
Jeff Squyres
0bd9ef0bb9 Some valgrind-clean fixes. Thanks to "Number Cruncher" on the devel
list for pointing these out.

This commit was SVN r21060.
2009-04-23 18:50:46 +00:00
Jeff Squyres
9b5e194d9b Fix opal_basename. Wow; how long has that been broken?
This commit was SVN r21057.
2009-04-22 20:48:24 +00:00
Ralph Castain
9c39a3edd7 Enable the passing of MCA params to dynamically spawned jobs. This creates a new info_key "ompi_param" that allows a user to specify MCA params for a dynamically spawned job.
We currently apply all of the MCA params in the parent job to the child. This commit allows a user to specify additional params for the child job, and to override any pre-existing params with the new value so they can better control behavior of the child job.

This commit was SVN r20989.
2009-04-14 14:15:49 +00:00
Jeff Squyres
8fac195a3a Fixes trac:1871. Take a slightly different approach than before:
1. Probe the signal number that we want
 1. If a handler is already installed there:
    1. if the opal_signal MCA param entry for this signal is suffixed
       with ":complain", then output a show_help message
    1. do not install our signal handler
 1. otherwise, install our signal handler

Hence, we've shifted to a policy of only complaining if the user asks
us to complain.

This commit was SVN r20969.

The following Trac tickets were found above:
  Ticket 1871 --> https://svn.open-mpi.org/trac/ompi/ticket/1871
2009-04-10 15:32:33 +00:00
Jeff Squyres
500750b542 Oops; the MCA param name is "opal_signal", not "opal_signals". Thanks
for noticing, Ralph!

This commit was SVN r20968.
2009-04-09 16:13:07 +00:00
Nysal Jan
2500d88380 Optimize the computation of 16-bit checksum
This commit was SVN r20965.
2009-04-09 11:04:38 +00:00
Nysal Jan
ab18a3629f Change the return type to handle the case where an invalid interface name is passed to this function.
This commit was SVN r20933.
2009-04-02 18:35:09 +00:00
George Bosilca
b7c1ae4f76 Nothing important, just an identation.
This commit was SVN r20919.
2009-04-01 15:27:16 +00:00
George Bosilca
6ca6cfaafc Mark the address with the correct type.
This commit was SVN r20916.
2009-04-01 15:18:08 +00:00
Ralph Castain
17f51a0389 Add a new PML module that acts as a "mini-dr" - when requested, it performs a dr-like checksum on messages for BTL's that require it, as specified by MCA params.
Add two new configure options that specify:

1. when to add padding to the openib control header - this *only* happens when the configure option is specified

2. when to use the dr-like checksum as opposed to the memcpy checksum. Not selectable at runtime - to eliminate performance impacts, this is a configure-only option

Also removed an unused checksum version from opal/util/crc.h.

The new component still needs a little cleanup and some sync with recent ob1 bug fixes. It was created as a separate module to avoid performance hits in ob1 itself, though most of the code is duplicative. The component is only selectable by either specifying it directly, or configuring with the dr-like checksum -and- setting -mca pml_csum_enable_checksum 1.

Modify the LANL platform files to take advantage of the new module.

This commit was SVN r20846.
2009-03-23 23:52:05 +00:00
Ralph Castain
8d313b55ef Correct a couple of minor typos
This commit was SVN r20843.
2009-03-23 18:05:34 +00:00
No Author
4711d6111f Per a comment on the users list, don't try to install our own signal
handlers if there are already non-default handlers installed.  Print a
warning if that situation arises.

'''NOTE:''' This is a definite target for OPAL_SOS conversion -- as it
is right now, this message will be displayed for ''every'' MPI
process.  We want this to be OPAL_SOS'ed when that becomes available
so that the error message can be aggregated nicely.

This commit was SVN r20831.
2009-03-20 01:05:30 +00:00
Jeff Squyres
2815cb88b4 Fixes trac:1836: no reason to constrain the latter numbers to 2 hex
digits.  They likely shouldn't be more than 2 digits anyway, but let's
be social just in case they are (e.g.,
https://bugs.openfabrics.org/show_bug.cgi?id=1544).

This commit was SVN r20824.

The following Trac tickets were found above:
  Ticket 1836 --> https://svn.open-mpi.org/trac/ompi/ticket/1836
2009-03-18 14:43:00 +00:00
Rainer Keller
6a72c0f4d1 - As long as a header declares _DECLSPEC functionality
it should include the corresponding _config.h header file.

   Tested on Linux/x86-64

This commit was SVN r20795.
2009-03-17 01:45:19 +00:00
Rainer Keller
d8cf4c0fec - Get pgcc on XT to complain less:
In case we use memcmp, strlen, strup and friends include <string.h>
   Also several constants.h are not included directly
 - Let's have mca_topo_base_cart_create  return ompi-errors in
   ompi/mca/topo/base/topo_base_cart_create.c

This commit was SVN r20773.
2009-03-13 02:10:32 +00:00
Rainer Keller
fd28b392bf - An intrusive commit yet again (sorry): with the separation we
get bitten by header depending on having already included
   the corresponding [opal|orte|ompi]_config.h header.
   When separating, things like [OPAL|ORTE|OMPI]_DECLSPEC
   are missed.

   Script to add the corresponding header in front of all following
   (taking care of possible #ifdef HAVE_...)

 - Including some minor cleanups to
   - ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H
   - ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H
   - ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for
     orte/util/output.h
   - ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h
   - ompi/mca/btl/btl.h -- reorder to fit
   - ompi/mca/bml/bml.h -- reorder to fit
   - ompi/runtime/ompi_mpi_finalize.c -- reorder to fit
   - ompi/request/request.h -- additionally need ompi/constants.h

 - Tested on linux/x86-64

This commit was SVN r20720.
2009-03-04 15:35:54 +00:00
George Bosilca
5f6896ce5b No memory leaks (this is an improvement for r20706).
This commit was SVN r20707.

The following SVN revision numbers were found above:
  r20706 --> open-mpi/ompi@248bbb8a2f
2009-03-03 22:14:05 +00:00
George Bosilca
248bbb8a2f Give a small chance to those with an "IP guru" admin-sys to define what exactly is
a private IPv4 address. By deafult we obide to the RFC1918 and RFC3330, but we have
the opportunity to change them.

Based on a patch from Camille Coti.

This commit was SVN r20706.
2009-03-03 22:06:09 +00:00
Jeff Squyres
cfcca7d80e Fix some typos in comments submitted by Bert Wesarg.
This commit was SVN r20695.
2009-03-03 12:50:46 +00:00
Eugene Loh
463f11f993 Improve shared-memory allocation:
* compute mmap-file size more wisely and pass requested size to allocator
* change MCA parameters:
  - get rid of mpool_sm_per_peer_size
  - get rid of mpool_sm_max_size
  - set default mpool_sm_min_size to 0
* no longer pad sm allocations to page boundaries
* have sm_btl_first_time_init check return codes on free-list creations

Have mca_btl_sm_prepare_src() check to see if it can allocate an EAGER fragment
rather than a MAX fragment if the smaller size works.

Remove ompi/class/ompi_[circular_buffer_]fifo.h and references thereto.

Remove opal/util/pow2.[c|h] and references thereto.

This commit was SVN r20614.
2009-02-20 19:51:57 +00:00
Jeff Squyres
67a5374a61 Re CID 1180: Actually, it would be better to also print something in
the case of an error, too...

This commit was SVN r20443.
2009-02-05 15:26:44 +00:00
Jeff Squyres
598e530de9 Fix CID 1180: ensure to check the output from snprintf, since we pass
it to write().

This commit was SVN r20442.
2009-02-05 15:24:48 +00:00
Jeff Squyres
bb3d258562 Round up a few places where PATH_MAX was used instead of
OMPI_PATH_MAX.  Thanks to Andrea Iob for the bug report.

This commit was SVN r20360.
2009-01-27 22:57:50 +00:00
Ralph Castain
88a0af9726 Revise the way we output resolved hostnames to make life easier for the Eclipse folks. Store aliases for individual nodes (only when requested to show resolved hostnames) and then report them out as part of the display-map option.
This commit was SVN r20284.
2009-01-15 18:11:50 +00:00
Jeff Squyres
d1c6f3f89a * Fix a truckload of Cisco copyrights to be the same as the rest of
the code base.
 * Fix a few misspellings in other copyrights.

This commit was SVN r20241.
2009-01-11 02:30:00 +00:00
Jeff Squyres
ad7cfe63a3 Fix CID 1180: check for negative return from snprintf.
This commit was SVN r20192.
2009-01-03 15:33:54 +00:00
George Bosilca
80fd24c948 Small cleanups: remove an unused dependency to signal.h and include
output.h.

This commit was SVN r20155.
2008-12-18 22:39:49 +00:00
Tim Mattox
4fa13a1a4d Fix two typos inside of comments.
This commit was SVN r20112.
2008-12-10 21:18:13 +00:00
Ralph Castain
728a24c8ec After considerable patience and help with debugging/testing from Tim M and Jeff S, return a completed and pretty well tested patch of the IOF to the trunk. This commit includes the previously reverted r20074, r20068, and r20064, as well as changes to fix those commits.
Basically, the remaining problem turned out to be:

1. closing stdout/stderr during orte_finalize of mpirun

2. inadvertently setting up a write event on fd = -1

3. devising a scheme to more accurately track when the stdin write event was active vs closed so it only got released once

This passed prelim MTT testing by Jeff and Tim, but should soak for awhile before migrating to 1.3.

This commit was SVN r20106.

The following SVN revision numbers were found above:
  r20064 --> open-mpi/ompi@a07660aea8
  r20068 --> open-mpi/ompi@ec930d14a9
  r20074 --> open-mpi/ompi@2940309613
2008-12-10 20:40:47 +00:00
Jeff Squyres
d7f3dd2230 Add a comment explaining exactly what is returned by this function
because we wasted a good amount of time today assuming that it was
returning the actual netmask.  Specifically, we were confused why it
returned 0x18 instead of 0xffffff00 for a class C subnet (the
head-smacking moment wasn't until [much] later when we converted 0x18
to decimal, which is 24.  Then the Clue Light(tm) went on).

This commit was SVN r20002.
2008-11-14 22:59:41 +00:00
George Bosilca
6344b8dffe Force an explicit cast to keep the compilers quiet.
This commit was SVN r19975.
2008-11-11 14:58:53 +00:00
Rolf vandeVaart
cad49da72d Fix the tcp btl so it makes use of the btl_tcp_if_include and btl_tcp_if_exclude
parameters on the connecting side also.  Also move define of IF_NAMESIZE
into if.h file.  And lastly, add one verbose debug message which may be
useful if we run into other issues like this.

This commit fixes trac:1573.

This commit was SVN r19932.

The following Trac tickets were found above:
  Ticket 1573 --> https://svn.open-mpi.org/trac/ompi/ticket/1573
2008-11-05 18:45:42 +00:00
Shiqing Fan
a456c057d6 - Skip the loopback address on windows.
This commit was SVN r19862.
2008-10-31 17:02:41 +00:00
Matthias Jurenz
47bf6c213c Fixed memory leak in 'opal_vsnprintf()'
This commit was SVN r19843.
2008-10-29 12:59:11 +00:00
George Bosilca
579d70edad We should use #ifdef and not #if
This commit was SVN r19504.
2008-09-05 12:44:19 +00:00
Shiqing Fan
32243829d8 Add the BEGIN/END_C_DECLS declarations.
This commit was SVN r19445.
2008-08-28 13:06:14 +00:00
Ralph Castain
04fe1e1875 Avoid using the "access" function for security reasons as per its documentation. Also, check to ensure it is a file (and not something else) when we are looking for a file
Fixes trac:1272

This commit was SVN r19373.

The following Trac tickets were found above:
  Ticket 1272 --> https://svn.open-mpi.org/trac/ompi/ticket/1272
2008-08-20 15:30:25 +00:00
Rainer Keller
5eb4ebe8fa - Fix buffer size warning
Coverity CID 8

This commit was SVN r19198.
2008-08-06 14:53:43 +00:00
Rainer Keller
dbafe83999 - Update the warn_unused result from allocating functions
- Set __opal_attribute_nonnull__ where an argument *must* not be null
 - Mark unused functions

This commit was SVN r19107.
2008-07-31 15:46:09 +00:00
Ralph Castain
a62b2a0150 Per the July technical meeting:
Standardize the handling of the orte launch agent option across PLMs. This has been a consistent complaint I have received - each PLM would register its own MCA param to get input on the launch agent for remote nodes (in fact, one or two didn't, but most did). This would then get handled in various and contradictory ways.

Some PLMs would accept only a one-word input. Others accepted multi-word args such as "valgrind orted", but then some would error by putting any prefix specified on the cmd line in front of the incorrect argument.

For example, while using the rsh launcher, if you specified "valgrind orted" as your launch agent and had "--prefix foo" on you cmd line, you would attempt to execute "ssh foo/valgrind orted" - which obviously wouldn't work.

This was all -very- confusing to users, who had to know which PLM was being used so they could even set the right mca param in the first place! And since we don't warn about non-recognized or non-used mca params, half of the time they would wind up not doing what they thought they were telling us to do.

To solve this problem, we did the following:

1. removed all mca params from the individual plms for the launch agent

2. added a new mca param "orte_launch_agent" for this purpose. To further simplify for users, this comes with a new cmd line option "--launch-agent" that can take a multi-word string argument. The value of the param defaults to "orted".

3. added a PLM base function that processes the orte_launch_agent value and adds the contents to a provided argv array. This can subsequently be harvested at-will to handle multi-word values

4. modified the PLMs to use this new function. All the PLMs except for the rsh PLM required very minor change - just called the function and moved on. The rsh PLM required much larger changes as - because of the rsh/ssh cmd line limitations - we had to correctly prepend any provided prefix to the correct argv entry.

5. added a new opal_argv_join_range function that allows the caller to "join" argv entries between two specified indices

Please let me know of any problems. I tried to make this as clean as possible, but cannot compile all PLMs to ensure all is correct.

This commit was SVN r19097.
2008-07-30 18:26:24 +00:00
Ralph Castain
83e7c19d33 Remove deprecated function - this was incorporated into the paffinity framework a long time ago. Fortunately, nobody was actually using it!
This commit was SVN r18990.
2008-07-23 03:43:31 +00:00
Jeff Squyres
480c17c332 Fix in minore memory leak
This commit was SVN r18857.
2008-07-10 00:37:08 +00:00
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00
Ralph Castain
ca91ec525b Add a suffix to the opal_output stream descriptor object - we can now output both a prefix and a suffix for a given stream. Default the suffix to NULL.
Remove lingering references to a filtering system as this will no longer be implemented.

This commit was SVN r18586.
2008-06-04 20:52:20 +00:00
Jeff Squyres
e7ecd56bd2 This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.

= ORTE Job-Level Output Messages =

Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):

 * orte_output(): (and corresponding friends ORTE_OUTPUT,
   orte_output_verbose, etc.)  This function sends the output directly
   to the HNP for processing as part of a job-specific output
   channel.  It supports all the same outputs as opal_output()
   (syslog, file, stdout, stderr), but for stdout/stderr, the output
   is sent to the HNP for processing and output.  More on this below.
 * orte_show_help(): This function is a drop-in-replacement for
   opal_show_help(), with two differences in functionality:
   1. the rendered text help message output is sent to the HNP for
      display (rather than outputting directly into the process' stderr
      stream)
   1. the HNP detects duplicate help messages and does not display them
      (so that you don't see the same error message N times, once from
      each of your N MPI processes); instead, it counts "new" instances
      of the help message and displays a message every ~5 seconds when
      there are new ones ("I got X new copies of the help message...")

opal_show_help and opal_output still exist, but they only output in
the current process.  The intent for the new orte_* functions is that
they can apply job-level intelligence to the output.  As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.

=== New code ===

For ORTE and OMPI programmers, here's what you need to do differently
in new code:

 * Do not include opal/util/show_help.h or opal/util/output.h.
   Instead, include orte/util/output.h (this one header file has
   declarations for both the orte_output() series of functions and
   orte_show_help()).
 * Effectively s/opal_output/orte_output/gi throughout your code.
   Note that orte_output_open() takes a slightly different argument
   list (as a way to pass data to the filtering stream -- see below),
   so you if explicitly call opal_output_open(), you'll need to
   slightly adapt to the new signature of orte_output_open().
 * Literally s/opal_show_help/orte_show_help/.  The function signature
   is identical.

=== Notes ===

 * orte_output'ing to stream 0 will do similar to what
   opal_output'ing did, so leaving a hard-coded "0" as the first
   argument is safe.
 * For systems that do not use ORTE's RML or the HNP, the effect of
   orte_output_* and orte_show_help will be identical to their opal
   counterparts (the additional information passed to
   orte_output_open() will be lost!).  Indeed, the orte_* functions
   simply become trivial wrappers to their opal_* counterparts.  Note
   that we have not tested this; the code is simple but it is quite
   possible that we mucked something up.

= Filter Framework =

Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr.  The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations.  The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc.  This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).

Filtering is not active by default.  Filter components must be
specifically requested, such as:

{{{
$ mpirun --mca filter xml ...
}}}

There can only be one filter component active.

= New MCA Parameters =

The new functionality described above introduces two new MCA
parameters:

 * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
   help messages will be aggregated, as described above.  If set to 0,
   all help messages will be displayed, even if they are duplicates
   (i.e., the original behavior).
 * '''orte_base_show_output_recursions''': An MCA parameter to help
   debug one of the known issues, described below.  It is likely that
   this MCA parameter will disappear before v1.3 final.

= Known Issues =

 * The XML filter component is not complete.  The current output from
   this component is preliminary and not real XML.  A bit more work
   needs to be done to configure.m4 search for an appropriate XML
   library/link it in/use it at run time.
 * There are possible recursion loops in the orte_output() and
   orte_show_help() functions -- e.g., if RML send calls orte_output()
   or orte_show_help().  We have some ideas how to fix these, but
   figured that it was ok to commit before feature freeze with known
   issues.  The code currently contains sub-optimal workarounds so
   that this will not be a problem, but it would be good to actually
   solve the problem rather than have hackish workarounds before v1.3 final.

This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
Jeff Squyres
db2695ccab Make the symbols be visible.
This commit was SVN r18201.
2008-04-18 00:26:17 +00:00
Ralph Castain
fa082cafa9 Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex.
Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer.

This commit was SVN r18198.
2008-04-17 20:43:56 +00:00
Ralph Castain
e7487ad533 Implement the seq rmaps module that sequentially maps process ranks to a list hosts in a hostfile.
Restore the "do-not-launch" functionality so users can test a mapping without launching it.

Add a "do-not-resolve" cmd line flag to mpirun so the opal/util/if.c code does not attempt to resolve network addresses, thus enabling a user to test a hostfile mapping without hanging on network resolve requests.

Add a function to hostfile to generate an ordered list of host names from a hostfile

This commit was SVN r18190.
2008-04-17 13:50:59 +00:00
George Bosilca
b359d84661 Use the correct prefix.
This commit was SVN r18048.
2008-03-31 21:42:59 +00:00
George Bosilca
be2454e0c5 Default the temporary directory to /tmp if no special environment
variables are set.

This commit was SVN r18046.
2008-03-31 20:15:49 +00:00
George Bosilca
ee784b601e For consistency reasons always use opal_home_directory and
opal_tmp_directory.

This commit was SVN r18043.
2008-03-31 18:13:41 +00:00
George Bosilca
028c7391d3 Coverty fix: Replace strcpy by strncpy.
This commit was SVN r17961.
2008-03-25 22:39:24 +00:00
George Bosilca
3997639ec6 Hide what should be hidden, and expose the others. Plus some indentation.
This commit was SVN r17856.
2008-03-18 03:00:08 +00:00
George Bosilca
210631962c Add two convenience functions in order to make sure we get these
environment variables in a consistent manner. These functions
retrieve the user and the temporary directories (based on the
system).

This commit was SVN r17815.
2008-03-13 17:56:44 +00:00
Rainer Keller
32dcd9e551 - Adding #include <stdbool.h> with protection in r17488 and r17504
seemed to be the right thing(tm), but broke the Sun Studio C++
   compiler under Linux (ticket 747).

   This patch should allow inclusion into C and C++ from other header
   files without problems.

This commit was SVN r17792.

The following SVN revision numbers were found above:
  r17488 --> open-mpi/ompi@d53131f261
  r17504 --> open-mpi/ompi@b22e8e7567
2008-03-08 12:53:10 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Rainer Keller
d53131f261 - Need stdbool.h if included in userland; additionally protect stdbool / stdarg.h
This commit was SVN r17488.
2008-02-18 08:11:57 +00:00
Adrian Knoth
601fb4389d Cosmetics for r17150. Closes trac:1201
This commit was SVN r17151.

The following SVN revision numbers were found above:
  r17150 --> open-mpi/ompi@4b50f02126

The following Trac tickets were found above:
  Ticket 1201 --> https://svn.open-mpi.org/trac/ompi/ticket/1201
2008-01-17 12:29:12 +00:00
Adrian Knoth
4b50f02126 Only free res iff it's been allocated before. Re #1201
This patch fixes the segfault, so closing the ticket might be possible.
It's a very conservative patch. Perhaps the freeaddrinfo spec says that
it will never allocate res in case of errors, but for now, I neither
have the spec nor the will to rely on it.

This commit was SVN r17150.
2008-01-17 10:01:52 +00:00
Jeff Squyres
00131df353 Fix typo in incorrect variable name; only noticed now because someone
actually compiled on a system without syslog support (Brian B.).  :-)

This commit was SVN r16863.
2007-12-06 11:36:44 +00:00
Torsten Hoefler
e985812e1f fixing a comment to be more detailed about opal_output_open
functionality ...

This commit was SVN r16370.
2007-10-06 17:33:57 +00:00
Tim Prins
e25bb7f187 Some platforms (such as FreeBSD) need libutil.h included for openpty.
Thanks to Karol Mroz for pointing this out.

This commit was SVN r16163.
2007-09-19 21:59:22 +00:00
George Bosilca
d1364c53de Don't allocate the temporary buffer on the stack. It get way too much
space.

This commit was SVN r16127.
2007-09-14 02:09:38 +00:00
George Bosilca
2c8c75ef94 Coverty blame list:
- Remove memory leaks
 - uninitialized return

This commit was SVN r16126.
2007-09-14 02:08:37 +00:00
George Bosilca
921d79c2b8 Remove few memory leaks. Close the files where we're done with them.
This commit was SVN r16125.
2007-09-14 02:06:26 +00:00
Shiqing Fan
a389e61330 - Add some type casts, required by MS compiler.
This commit was SVN r16085.
2007-09-11 09:32:11 +00:00
Sven Stork
3a640603a4 - remove wrong va_end
This commit was SVN r15789.
2007-08-07 13:32:05 +00:00
Sven Stork
5e257fadbd - add missing va_end
This commit was SVN r15788.
2007-08-07 12:25:20 +00:00
George Bosilca
31dfa5592e Few clean-ups, few indentations. Nothing really important.
This commit was SVN r15767.
2007-08-04 00:44:23 +00:00
George Bosilca
e2f6d69669 Only use one va_list, as it seems that only one is allowed.
This commit was SVN r15765.
2007-08-04 00:41:26 +00:00
Josh Hursey
6248b2bb51 Whoops. Make sure to include the opal_output header.
This commit was SVN r15755.
2007-08-03 19:20:23 +00:00
Josh Hursey
dc9644a2c2 Add a bit of error output so the user can figure out what
went wrong when we cannot create a directory.

This commit was SVN r15754.
2007-08-03 19:08:48 +00:00
Brian Barrett
951755f9fb no need to call gethostname twice to determine if a process is local
This commit was SVN r15742.
2007-08-02 16:25:25 +00:00