1
1
Граф коммитов

63 Коммитов

Автор SHA1 Сообщение Дата
Abhishek Kulkarni
afbe3e99c6 * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
 SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
 back the native error code.

* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
  (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
  decode 'ret' to get the native error code.

This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
Ralph Castain
5965d3e620 Include the new error code in the error strings
This commit was SVN r23111.
2010-05-07 18:09:08 +00:00
Ralph Castain
d4f56cff61 More cleanup on paffinity....groan
It is okay to not have a paffinity module IF you aren't using paffinity anyway. So don't error out of MPI_Init because a paffinity module wasn't selected.

Cleanup error reporting in the odls default module to (once and for all!) eliminate messages originating in the fork'd process. Create some new error codes to allow us to pass enough info back to the parent process to provide useful error messages.

This commit was SVN r23106.
2010-05-06 20:57:17 +00:00
Jeff Squyres
16b100219d A patch from UTK to allow orte_init(), opal_init(), and associated
friends also receive &argc and &argv (George asked Jeff to Ralph to
review before committing).  The thought is that passing argv and argc
to opal/orte_init be useful to other projects outside of OMPI that are
using OPAL and/or ORTE (especially in conjunction with some other
bootstrapping code where it is helpful to modify argv).  It's such a
small thing that it's easy to apply here to make others' lives a
little easier.

Ask George for more details; I'm just the messenger.  :-)

Judging by the copyrights on this patch, it's been around for a
while.  :-)

This commit was SVN r22260.
2009-12-04 00:51:15 +00:00
Ralph Castain
176fdd3a83 Add a new API to the show_help system that allows external users (e.g., libraries built upon OMPI) to define their own locations for show_help files. This allows such users to exploit the rather nice features of the OPAL show_help system -without- interfering with the ability of the ORTE and OMPI layers to use show_help themselves.
Reviewed by Jeff to protect toes...and to get some good comments :-)

This commit was SVN r22026.
2009-09-29 02:07:46 +00:00
Ralph Castain
7cc045f9c5 Check return codes when init'ing the paffinity framework to avoid segfaulting
This commit was SVN r21884.
2009-08-26 01:58:15 +00:00
George Bosilca
5155eaf945 The opal datatype engine should _ALWAYS_ be initialized. Therefore move the
call to opal_datatype_init in the opal_util_init.

This commit was SVN r21754.
2009-08-03 16:46:33 +00:00
Rainer Keller
6c5532072a - Split the datatype engine into two parts: an MPI specific part in
OMPI
   and a language agnostic part in OPAL. The convertor is completely
   moved into OPAL.  This offers several benefits as described in RFC
   http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
   namely:
    - Fewer basic types (int* and float* types, boolean and wchar
    - Fixing naming scheme to ompi-nomenclature.
    - Usability outside of the ompi-layer.
 - Due to the fixed nature of simple opal types, their information is
   completely
   known at compile time and therefore constified
 - With fewer datatypes (22), the actual sizes of bit-field types may be
   reduced
   from 64 to 32 bits, allowing reorganizing the opal_datatype
   structure, eliminating holes and keeping data required in convertor
   (upon send/recv) in one cacheline...
   This has implications to the convertor-datastructure and other parts
   of the code.
 - Several performance tests have been run, the netpipe latency does not
   change with
   this patch on Linux/x86-64 on the smoky cluster.
 - Extensive tests have been done to verify correctness (no new
   regressions) using:
   1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
    ompi-ddt:
    a. running both trunk and ompi-ddt resulted in no differences
       (except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
       correctly).
    b. with --enable-memchecker and running under valgrind (one buglet
       when run with static found in test-suite, commited)
   2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
      all passed (except for the dynamic/ tests failed!! as trunk/MTT)
   3. compilation and usage of HDF5 tests on Jaguar using PGI and
      PathScale compilers.
   4. compilation and usage on Scicortex.
 - Please note, that for the heterogeneous case, (-m32 compiled
   binaries/ompi), neither
   ompi-trunk, nor ompi-ddt branch would successfully launch.

This commit was SVN r21641.
2009-07-13 04:56:31 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Ralph Castain
afe1950da5 Make the error message clearer - this error only is used when two buffer types don't match, thus preventing an operation from being executed
This commit was SVN r21033.
2009-04-16 16:23:28 +00:00
Jeff Squyres
d1c6f3f89a * Fix a truckload of Cisco copyrights to be the same as the rest of
the code base.
 * Fix a few misspellings in other copyrights.

This commit was SVN r20241.
2009-01-11 02:30:00 +00:00
Ralph Castain
1ace83c470 Enable modex-less launch. Consists of:
1. minor modification to include two new opal MCA params:
   (a) opal_profile: outputs what components were selected by each framework
       currently enabled for most, but not all, frameworks
   (b) opal_profile_file: name of file that contains profile info required
       for modex

2. introduction of two new tools:
   (a) ompi-probe: MPI process that simply calls MPI_Init/Finalize with
       opal_profile set. Also reports back the rml IP address for all
       interfaces on the node
   (b) ompi-profiler: uses ompi-probe to create the profile_file, also
       reports out a summary of what framework components are actually
       being used to help with configuration options

3. modification of the grpcomm basic component to utilize the
   profile file in place of the modex where possible

4. modification of orterun so it properly sees opal mca params and
   handles opal_profile correctly to ensure we don't get its profile

5. similar mod to orted as for orterun

6. addition of new test that calls orte_init followed by calls to
   grpcomm.barrier

This is all completely benign unless actively selected. At the moment, it only supports modex-less launch for openib-based systems. Minor mod to the TCP btl would be required to enable it as well, if people are interested. Similarly, anyone interested in enabling other BTL's for modex-less operation should let me know and I'll give you the magic details.

This seems to significantly improve scalability provided the file can be locally located on the nodes. I'm looking at an alternative means of disseminating the info (perhaps in launch message) as an option for removing that constraint.

This commit was SVN r20098.
2008-12-09 23:49:02 +00:00
Ralph Castain
9927b2445c Remove the filter framework - the xml support will have to be provided in a different manner that will be implemented shortly
This commit was SVN r18572.
2008-06-04 09:04:51 +00:00
Terry Dontje
ef7ac86929 created opal_version_string and orte_version_string to match the ompi changes
made in r18345 for ompi_version_string.  This was done per request from Jeff 
Squyres to maintain consistency and to remove some warnings caused by the 
non-use of some static const char.

This commit was SVN r18461.

The following SVN revision numbers were found above:
  r18345 --> open-mpi/ompi@8dd0421015
2008-05-20 12:13:19 +00:00
Jeff Squyres
d12b21e21b Ensure that if an error occurs, we actually return that error rather
than an undefined value (which could be 0/OPAL_SUCCESS).

This commit was SVN r18452.
2008-05-19 11:57:44 +00:00
Jeff Squyres
e7ecd56bd2 This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.

= ORTE Job-Level Output Messages =

Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):

 * orte_output(): (and corresponding friends ORTE_OUTPUT,
   orte_output_verbose, etc.)  This function sends the output directly
   to the HNP for processing as part of a job-specific output
   channel.  It supports all the same outputs as opal_output()
   (syslog, file, stdout, stderr), but for stdout/stderr, the output
   is sent to the HNP for processing and output.  More on this below.
 * orte_show_help(): This function is a drop-in-replacement for
   opal_show_help(), with two differences in functionality:
   1. the rendered text help message output is sent to the HNP for
      display (rather than outputting directly into the process' stderr
      stream)
   1. the HNP detects duplicate help messages and does not display them
      (so that you don't see the same error message N times, once from
      each of your N MPI processes); instead, it counts "new" instances
      of the help message and displays a message every ~5 seconds when
      there are new ones ("I got X new copies of the help message...")

opal_show_help and opal_output still exist, but they only output in
the current process.  The intent for the new orte_* functions is that
they can apply job-level intelligence to the output.  As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.

=== New code ===

For ORTE and OMPI programmers, here's what you need to do differently
in new code:

 * Do not include opal/util/show_help.h or opal/util/output.h.
   Instead, include orte/util/output.h (this one header file has
   declarations for both the orte_output() series of functions and
   orte_show_help()).
 * Effectively s/opal_output/orte_output/gi throughout your code.
   Note that orte_output_open() takes a slightly different argument
   list (as a way to pass data to the filtering stream -- see below),
   so you if explicitly call opal_output_open(), you'll need to
   slightly adapt to the new signature of orte_output_open().
 * Literally s/opal_show_help/orte_show_help/.  The function signature
   is identical.

=== Notes ===

 * orte_output'ing to stream 0 will do similar to what
   opal_output'ing did, so leaving a hard-coded "0" as the first
   argument is safe.
 * For systems that do not use ORTE's RML or the HNP, the effect of
   orte_output_* and orte_show_help will be identical to their opal
   counterparts (the additional information passed to
   orte_output_open() will be lost!).  Indeed, the orte_* functions
   simply become trivial wrappers to their opal_* counterparts.  Note
   that we have not tested this; the code is simple but it is quite
   possible that we mucked something up.

= Filter Framework =

Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr.  The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations.  The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc.  This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).

Filtering is not active by default.  Filter components must be
specifically requested, such as:

{{{
$ mpirun --mca filter xml ...
}}}

There can only be one filter component active.

= New MCA Parameters =

The new functionality described above introduces two new MCA
parameters:

 * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
   help messages will be aggregated, as described above.  If set to 0,
   all help messages will be displayed, even if they are duplicates
   (i.e., the original behavior).
 * '''orte_base_show_output_recursions''': An MCA parameter to help
   debug one of the known issues, described below.  It is likely that
   this MCA parameter will disappear before v1.3 final.

= Known Issues =

 * The XML filter component is not complete.  The current output from
   this component is preliminary and not real XML.  A bit more work
   needs to be done to configure.m4 search for an appropriate XML
   library/link it in/use it at run time.
 * There are possible recursion loops in the orte_output() and
   orte_show_help() functions -- e.g., if RML send calls orte_output()
   or orte_show_help().  We have some ideas how to fix these, but
   figured that it was ok to commit before feature freeze with known
   issues.  The code currently contains sub-optimal workarounds so
   that this will not be a problem, but it would be good to actually
   solve the problem rather than have hackish workarounds before v1.3 final.

This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
Sharon Melamed
4a8e2a2648 Renove status check from carto initiation.
This commit was SVN r17812.
2008-03-12 08:55:28 +00:00
Jeff Squyres
b2ed2b95aa Fix filename so that the help file can be found.
This commit was SVN r17759.
2008-03-06 14:44:47 +00:00
Ralph Castain
8d819cf3d3 Move carto open/close/finalize to opal layer so that ORTE can get access to topo info. This will be used to support a topo grpcomm that optimizes communications in non-uniform topologies like RR.
This commit was SVN r17652.
2008-02-28 21:04:30 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Rainer Keller
7621800477 - Fix and add comments -- output full name for pd
- Protect argument in macro...

This commit was SVN r17434.
2008-02-12 16:59:59 +00:00
Shiqing Fan
f5792bbda5 merging the memchecker into trunk.
This commit was SVN r17424.
2008-02-12 08:46:27 +00:00
Jeff Squyres
714b409595 Fix an uninitialized variable in the error case. Thanks to Ake
Sandgren for pointing out the mistake.

This commit was SVN r16682.
2007-11-07 01:52:23 +00:00
Ethan Mallove
005652c9d4 * Embed ident strings into the Open MPI libraries using one of the following
methods (in order of precedence):
  1. #pragma ident <ident string> (e.g., Intel and Sun)
  1. #ident <ident string> (e.g., GCC)
  1. static const char ident[] = <ident string> (all others)
By default, the ident string used is the standard Open MPI version string. Only
the following libraries will get the embedded version strings (e.g., DSOs will
not):
  * libmpi.so
  * libmpi_cxx.so
  * libmpi_f77.so
  * libopen-pal.so
  * libopen-rte.so
* Added two new configure options:
  * `--with-package-name="STRING"` (defaults to "Open MPI username@hostname
    Distribution"). `STRING` is displayed by `ompi_info` next to the "Package"
    heading.
  * `--with-ident-string="STRING"` (defaults to the standard Open MPI version
    string - e.g., X.Y.Zr######). `%VERSION%` will expand to the Open MPI
    version string if it is supplied to this configure option.

This commit was SVN r16644.
2007-11-03 02:40:22 +00:00
George Bosilca
31dfa5592e Few clean-ups, few indentations. Nothing really important.
This commit was SVN r15767.
2007-08-04 00:44:23 +00:00
Brian Barrett
8a7b6656b3 Reference count calls to the util access as well as the main initialized
code

This commit was SVN r15495.
2007-07-18 20:28:19 +00:00
Brian Barrett
916397f358 Use thread specific data and static buffers for the return type of
opal_net_get_hostname() rather than malloc, because no one was freeing
the buffer and the common use case was for printfs, where calling
free is a pain.

This commit was SVN r15494.
2007-07-18 20:25:01 +00:00
Brian Barrett
34fea87819 * Only need to to the opal_progress_event_users_increment() once between
OPAL and ORTE.  Since we now do opal_progress_init(), we do it
    there.  Fixes a performance issue introduced in r14773.
  * While trying to find the above, notived that we did the reference
    counting for the init in init_util and for finalize in fini.  That
    isn't right, so make them both in the non-util versions.

This commit was SVN r14830.

The following SVN revision numbers were found above:
  r14773 --> open-mpi/ompi@1e678c3f55
2007-06-01 02:43:46 +00:00
Josh Hursey
1e678c3f55 per conversation with Ralph and Jeff take out the opal_init_only logic.
This commit moves the initalization/finalization of opal_event and opal_progress
to opal_init/finalize. These were previously init/final in ORTE which is an
abstraction violation. After talking about it we concluded that there are no
ordering issues that require these to be init/final in ORTE instead of OPAL.

I ran the IBM test suite against this commit and it didn't turn up any new
failures so I think it is good to go.

Let us know if this causes problems.

This commit was SVN r14773.
2007-05-24 21:54:58 +00:00
Ralph Castain
1682a72d34 Add ability to read system limits on number of children, open files, and file size from the local OS - to be used in failed-to-start scenarios
This commit was SVN r14476.
2007-04-23 18:53:47 +00:00
Jeff Squyres
0ba47105ed Merge the /tmp/jms-installdirs-trunk branch into the trunk. This
finally brings in functionality that is already on the 1.2 branch, and
was developed and tested in the v1.2ofed branch (and other places).

Short version of new features:

 * Support for ibv_fork_init() 
 * Automatically fill in the openib BTL bandwidth value by 
   querying the HCA port 
 * Installdirs functionality 
 * Fixes to always use -I in the Fortran wrapper compilers (#924) 
 * Gleb's mpool updates 
 * Remove some kruft in btl/openib/configure.m4, therefore 
   fixing the harmless warnings noted in #665 
 * Bunches of updates to the Linux RPM spec file 

I.e., effectively the same thing that r14411 brought to the v1.2
branch.

Also effectively brought in r14432 and r14433 (some fixes on top of
the original r14411 commit to v1.2).  Still need to bring in the moral
equivalent of r14445 after this commit (fixes to installdirs).

This commit was SVN r14449.

The following SVN revision numbers were found above:
  r14411 --> open-mpi/ompi@83b31314ae
  r14432 --> open-mpi/ompi@a48f160595
  r14433 --> open-mpi/ompi@68f346d2bc
  r14445 --> open-mpi/ompi@13d366b827
2007-04-21 00:15:05 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Rainer Keller
e61dd8722e - Silence compiler on ORTE_TRANSPORT_KEY_FMT, it is fixed to llx
- No functional changes, just indentation and corrections to error
   output.

This commit was SVN r12734.
2006-12-03 13:59:23 +00:00
Brian Barrett
778bba2668 refs trac:405
* Make sure to AC_SUBST the backtrace CFLAGS so that the right flags
    are passed to the component (especially -m64)
  * Properly open / close the component.  This isn't strictly necessary
    to fix the bug, but was an oversight that should be fixed.

This commit was SVN r11806.

The following Trac tickets were found above:
  Ticket 405 --> https://svn.open-mpi.org/trac/ompi/ticket/405
2006-09-25 23:41:06 +00:00
George Bosilca
136c79908b Count how many times the opal library get initialized and require the same
numbers of finalizations before really destroying the internals.

This commit was SVN r11303.
2006-08-21 20:07:38 +00:00
George Bosilca
6e6698bec3 Open and close the memcpy component. Hopefully it is in the right place, as
the memcpy should be available as soon as possible after startup.

This commit was SVN r9533.
2006-04-05 05:57:51 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Brian Barrett
c96f870674 * Merge of wrapper compiler updates from the bwb-wrapper-fix branch (r8690 -
r8698), with changes below:

  - Split wrapper flags into those required for each of the three projects,
    and cleaned up some cruft (including the LIBMPI_EXTRA_*FLAGS) through-
    out the build system
  - Added opal_init_util and opal_finalize_util to allow init / cleanup
    of all the opal code that doesn't require the MCA system
  - Create standalone key=value file parser, based on the one that used
    to be in the mca param parser, so that it can be shared in multiple
    places
  - Add wrapper datafiles for opal, orte, and ompi wrappers, and add
    wrapper compiler with support for all the old features

This commit was SVN r8699.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r8690
  r8698
2006-01-16 01:48:03 +00:00
Brian Barrett
60ac1cb5f4 print stack traces (when available) for opal and orte processes, as well as
ompi processes.  Also add SIGABRT to the list of signals that are intercepted
to print out pretty messages.

This commit was SVN r8672.
2006-01-11 04:36:39 +00:00
Jeff Squyres
f1e8790bbe Add remaining OPAL error codes in opal_err2str()
This commit was SVN r8573.
2005-12-21 06:27:34 +00:00
Brian Barrett
79bf8843d2 * update memory hooks interface to allow for callbacks on both allocations
and dealllocations, per request from Galen and Tim

This commit was SVN r8303.
2005-11-29 04:46:14 +00:00
George Bosilca
16ca6e4c88 error seems to be a reserved keyword for some compilers ...
This commit was SVN r8262.
2005-11-26 21:18:47 +00:00
Brian Barrett
878676218e Rename opal/memory to opal/memoryhooks because XLC++ on Mac OS X is broken.
When compiling C++ code that includes something that looks for the C++
header file "memory" (stupid C++ headers not having .h extensions), it
goes through the header file search path, which includes $(topsrcdir)/opal,
so it finds the directory $(topsrcdir)/opal/memory/ and tries to load
that as the memory header file and all goes downhill.

This commit was SVN r8111.
2005-11-11 00:26:27 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
George Bosilca
0d4aaf6fa6 Provide the boool value as expected by the opal_show_help function.
This commit was SVN r7832.
2005-10-21 20:04:18 +00:00
Andrew Friedley
b1af69dfe7 Don't check for errors on the paffinity stuff, as per Brian's request.
This commit was SVN r7640.
2005-10-05 18:08:06 +00:00
Andrew Friedley
37123ed430 Implement an opal_show_help() (like is done in ompi_mpi_init) for error handling in opal_init and both stages of orte_init.
Some of the functions in opal_init are void or return a bool (opal_output_init, but always returns true.. eh?), so I don't check them.

This commit was SVN r7638.
2005-10-05 13:56:35 +00:00
Brian Barrett
1d9b663b62 * test for condition where we think we can intercept malloc/free/munmap but
really can't.  Test for munmap, since it's the most likely to cause problems,
  since it's always an interposed symbol.

  The condition that usually causes problems is if libmpi was brought in as
  the result of a library dependency, rather than as a -l on the link line.
  The linker in this case will find malloc/free/munmap/etc. in libc, rather
  than in libmpi.

This commit was SVN r7508.
2005-09-26 20:20:20 +00:00
Ralph Castain
2c6e47e38c Add a trace utility that provides info on progress through functions. This is not enabled yet - need Jeff or Brian to add it to the configure/build system.
This commit was SVN r7222.
2005-09-07 18:52:28 +00:00
Brian Barrett
77ebdf1c6f * Add some debugging output Ralph asked for when an unknown error code is
passed to opal_error

This commit was SVN r7087.
2005-08-29 23:36:53 +00:00