1
1
Граф коммитов

1003 Коммитов

Автор SHA1 Сообщение Дата
Josh Hursey
22f4c829ba cleanup BLCR configure so --without-blcr works correctly
This commit was SVN r18825.
2008-07-08 02:48:20 +00:00
Lenny Verkhovsky
1ed465326b Change of name conventions in carto
NODE -> EDGE
CONNECTION ->   BRANCH
SLOT -> SOCKET.

This commit was SVN r18799.
2008-07-03 14:19:16 +00:00
Lenny Verkhovsky
ba1fa73881 Selectign Maffinity only if Paffinity selected fix
This commit was SVN r18797.
2008-07-03 13:39:34 +00:00
Jeff Squyres
8efe67e08c Improvements to the MCA param system: allow querying to find out where
an MCA parameter's value came from.  Note that the actual value of the
parameter is irrelevant.  For example, if a value was specified in an
MCA parameter file that happened to have the same defaultvalue that
was specified when the parameter was registered, the returned location
will indicate that the value was set from the file.

Possible answers:

 * '''MCA_BASE_PARAM_SOURCE_DEFAULT:''' no user-specified values were
   found, so the default value was used
 * '''MCA_BASE_PARAM_SOURCE_ENV:''' the value came from the
   environment (which also means the mpirun/orterun command line!)
 * '''MCA_BASE_PARAM_SOURCE_FILE:''' the value came a file (or the
   Windows registry)
 * '''MCA_BASE_PARAM_SOURCE_KEYVAL:''' the value came from a keyval
   (can currently never happen)
 * '''MCA_BASE_PARAM_SOURCE_OVERRIDE:''' the value came from an MCA
   param API "set" function

This commit was SVN r18770.
2008-06-28 15:13:25 +00:00
Jeff Squyres
21c7d95109 Fixes trac:1365: if we're using !^ to negate module inclusion, then don't
bother to check to see whether they exist or not.  Specifically, this
will not cause an error:

{{{
shell$ mpirun --mca btl ^does_not_exist ...
}}}

but neither will this:

{{{
shell$ mpirun --mca btl ^sm ...
}}}

(where the sm BTL ''does'' exist)

This commit was SVN r18760.

The following Trac tickets were found above:
  Ticket 1365 --> https://svn.open-mpi.org/trac/ompi/ticket/1365
2008-06-27 19:42:08 +00:00
Ralph Castain
830ea9dfe6 Reconnect the opal dss debug envar with the debug output
This commit was SVN r18759.
2008-06-27 19:29:18 +00:00
Shiqing Fan
d129578694 Small fix for including unistd.h header file.
This commit was SVN r18758.
2008-06-27 16:25:31 +00:00
Josh Hursey
a003fa7a50 C/R fix for broken CRS component selection resulting from r18707.
Make sure that if we ask for the 'none' component (which is not a 'real' component, but a component in crs/base) then we do not fail out of the box when using tools. We check for the {{{OPAL_ERR_NOT_FOUND}}} error.

Also make sure that component_open() returns {{{OPAL_ERR_NOT_FOUND}}} when it cannot find a value instead of {{{OPAL_ERROR}}} which means something quite a bit different.

C/R is working but the tools still print the warning below everytime they are ran:
{{{
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      odin.cs.indiana.edu
Framework: crs
Component: none
--------------------------------------------------------------------------
}}}

I'll have to figure out a work around for this warning (maybe work on the {{{MCA_NULL}}} Ticket #1291).

This commit was SVN r18739.

The following SVN revision numbers were found above:
  r18707 --> open-mpi/ompi@bdaaf01d8a
2008-06-25 14:55:09 +00:00
George Bosilca
2bc52a87d2 Related to my previous commit. The Sicortex is a MIPS machine, so
allow the assembly to understand this.

This commit was SVN r18732.
2008-06-25 03:09:02 +00:00
George Bosilca
872d957550 Allow Open MPI to configure correctly on the Sicortex machine.
This commit was SVN r18731.
2008-06-25 03:07:53 +00:00
Brian Barrett
e9c50a29ba Some (rare) platforms only have a #define for htonl and friends,
but not anything in libc.  Which causes an incorrect answer for
AC_CHECK_FUNCS.  Work around that by also checking for the
#define.

This commit was SVN r18730.
2008-06-24 23:20:25 +00:00
Brian Barrett
e7a299d046 Add timer support for Catamount
This commit was SVN r18729.
2008-06-24 22:13:34 +00:00
Rolf vandeVaart
95cd9758e5 Fix broken build on Solaris.
This commit was SVN r18719.
2008-06-24 14:57:12 +00:00
Ralph Castain
f70b7e51ce Fix a missing header file and ensure we use a portable name for a system limit
This commit was SVN r18712.
2008-06-23 22:32:26 +00:00
Jeff Squyres
bdaaf01d8a Fixes trac:1338: Have the MCA base specifically check for all requested
components.  If they are not found / able to be opened, a warning will
be printed and the mca_base_component_find() will return
OPAL_ERR_NOT_FOUND.  It is the upper-layer's responsibility to handle
this error appropriately.

This commit was SVN r18707.

The following Trac tickets were found above:
  Ticket 1338 --> https://svn.open-mpi.org/trac/ompi/ticket/1338
2008-06-23 16:14:05 +00:00
Ralph Castain
ccbf194e8f Visibility fix
This commit was SVN r18687.
2008-06-19 19:08:08 +00:00
Ralph Castain
26c9ad5799 Clean-up the DSS API to remove two functions that are supposed to be used solely internally to the DSS. These were likely exposed because we need to call them when packing/unpacking declared types, but this means that developers may accidentally use the wrong functions, causing the DSS buffer to get confused. Instead, return the system to the way it used to work and hide those functions.
This commit was SVN r18684.
2008-06-19 18:46:25 +00:00
Pak Lui
188c8bce5d Fix the SEGV when module_get finds that no proc is binded. Also make no-intr available for processor binding.
This commit was SVN r18671.
2008-06-18 16:03:08 +00:00
George Bosilca
f97a728dc6 Dont cast the int32_t pointer into a long pointer. This doesn't work on
64 bits architectures.

This commit was SVN r18667.
2008-06-18 08:33:58 +00:00
Ralph Castain
0532d799d6 Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm.
Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed.

This commit was SVN r18664.
2008-06-18 03:15:56 +00:00
Jeff Squyres
16b2a50543 Slight clarification of help message.
This commit was SVN r18661.
2008-06-17 11:25:32 +00:00
Jeff Squyres
c1d1ffbc56 Fix compile problems on systems with older versions of libnuma (that
don't have MPOL_MF_MOVE).  I know that this is a configure change in
the middle of the US workday, but this compile problem is preventing
work on several kinds of systems (e.g., RHEL4).

This commit was SVN r18659.
2008-06-16 17:26:42 +00:00
Lenny Verkhovsky
dee2f1d175 Adding new functionality to Maffinity component to support NUMA awareness
This commit was SVN r18657.
2008-06-15 07:27:29 +00:00
Brian Barrett
7712b07ac4 Add perl based wrapper compilers for cross-compile environments. The default
is still to use the C based wrapper compilers (which have many more features
and are more well tested).  The Perl compilers are enabled with the option
--enable-script-wrapper-compilers, which also ignores the option
--disable-binaries (ie --enable-script-wrapper-compilers --disable-binaries
will result in perl-based wrapper compilers being installed, but no other
binaries being installed).

This commit was SVN r18655.
2008-06-13 22:52:25 +00:00
Brian Barrett
79ad6d983e - The ptmalloc2 memory manager component is now by default built as
a standalone library named libopenmpi-malloc.  Users wanting to
  use leave_pinned with ptmalloc2 will now need to link the library
  into their application explicitly.  All other users will use the
  libc-provided allocator instead of Open MPI's ptmalloc2.  This change
  may be overriden with the configure option enable-ptmalloc2-internal
- The leave_pinned options will now default to using mallopt on
  Linux in the cases where ptmalloc2 was not linked in.  mallopt
  will also only be available if munmap can be intercepted (the
  default whenever Open MPI is not compiled with --without-memory-
  manager.
- Open MPI will now complain and refuse to use leave_pinned if
  no memory intercept / mallopt option is available.

This commit was SVN r18654.
2008-06-13 22:32:49 +00:00
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00
Ralph Castain
e1e224b81a Silence a couple of minor compiler warnings
This commit was SVN r18617.
2008-06-09 12:57:41 +00:00
Ralph Castain
7bee71aa59 Fix a potential, albeit perhaps esoteric, race condition that can occur for fast HNP's, slow orteds, and fast apps. Under those conditions, it is possible for the orted to be caught in its original send of contact info back to the HNP, and thus for the progress stack never to recover back to a high level. In those circumstances, the orted can "hang" when trying to exit.
Add a new function to opal_progress that tells us our recursion depth to support that solution.

Yes, I know this sounds picky, but good ol' Jeff managed to make it happen by driving his cluster near to death...

Also ensure that we declare "failed" for the daemon job when daemons fail instead of the application job. This is important so that orte knows that it cannot use xcast to tell daemons to "exit", nor should it expect all daemons to respond. Otherwise, it is possible to hang.

After lots of testing, decide to default (again) to slurm detecting failed orteds. This proved necessary to avoid rather annoying hangs that were difficult to recover from. There are conditions where slurm will fail to launch all daemons (slurm folks are working on it), and yet again, good ol' Jeff managed to find both of them.

Thanks you Jeff! :-/

This commit was SVN r18611.
2008-06-06 19:36:27 +00:00
George Bosilca
b2aa751c28 Remove a race condition in the threaded mode. As a callback is allowed
to modify the callback array (add or remove), make sure we don't call
the same callback twice if it get remove in another thread.

This commit was SVN r18608.
2008-06-06 15:54:40 +00:00
Josh Hursey
1de50b523c Fix some Coverity 'Event set_but_not_used' highlights.
Thanks to Jeff for bringing them to my attention.

This commit was SVN r18606.
2008-06-06 14:38:41 +00:00
Jeff Squyres
12a3fe57e1 As pointed out by Ralf
W. (http://www.open-mpi.org/community/lists/devel/2008/06/4095.php),
these dependencies don't need to be here.

This commit was SVN r18603.
2008-06-06 01:20:47 +00:00
Jeff Squyres
b123629e6a Fix CIDs 458, 716, 717: ensure that strings are long enough to always
be properly \0 terminated.

This commit was SVN r18602.
2008-06-06 00:59:08 +00:00
Jeff Squyres
e2b08aaca4 Fix bad free's found in CID 707 and CID 708.
This commit was SVN r18600.
2008-06-05 20:49:33 +00:00
Lenny Verkhovsky
a8b5dcb204 Added more output info about socket:core pair in paffinity / rankfile components
This commit was SVN r18589.
2008-06-05 10:28:44 +00:00
Ralph Castain
ca91ec525b Add a suffix to the opal_output stream descriptor object - we can now output both a prefix and a suffix for a given stream. Default the suffix to NULL.
Remove lingering references to a filtering system as this will no longer be implemented.

This commit was SVN r18586.
2008-06-04 20:52:20 +00:00
Josh Hursey
78f14b5255 Fix the none.checkpoint command.
orte-checkpoint/orte-restart seem to not seem to totally like orte_output so revert them to opal_output for now. Since we have no need for the additional complexity of orte_output we can drop it for now and revisit this if anyone needs it later.

It seems that if you set the verbose level on an output handle then try to call a normal orte_output() on it then the message will *not* be printed. This is the same for opal_output, and seems incorrect to me because it stops some error messages from being printed out if you do not directly specify opal_output(0, ...). Maybe someone should take a look a this.


orte-checkpoint would segv if passed an incorrect PID. Fixed the return code so it errors out properly.

Thanks to Eric Roman for bringing this to my attention.

This commit was SVN r18583.
2008-06-04 14:44:11 +00:00
Jeff Squyres
530a15baa4 Fix cross-compiling scenario with valgrind.m4.
This commit was SVN r18579.
2008-06-04 11:58:41 +00:00
Shiqing Fan
2dc812f720 Clean configure.m4 of memchecker/valgrind.
If Valgrind is requested but wrong version is supplied, print error messages and stop. 
Save the CPPFLAGS in opal_memchecker_valgrind_CPPFLAGS, which could be used in 
Makefile.am.

Many thanks to Jeff. 

This commit was SVN r18573.
2008-06-04 11:46:50 +00:00
Ralph Castain
9927b2445c Remove the filter framework - the xml support will have to be provided in a different manner that will be implemented shortly
This commit was SVN r18572.
2008-06-04 09:04:51 +00:00
Jeff Squyres
75a97ebbf0 Many thanks to Ralf W. for finding a subtle bug in these Makefile.am's
that can *sometimes* cause problems with "make -j [N>1] install".
Ensure to make the target directory before we copy stuff into it --
read the thread starting here for more details:

    http://www.open-mpi.org/community/lists/devel/2008/06/4080.php

This commit was SVN r18570.
2008-06-04 01:28:03 +00:00
Jeff Squyres
3b568d4b14 Remove an old attempt to understand the tradeoffs with using GNU libc's malloc_hooks functionality, which turned out to be totally unusable in practice. I think we just always forgot to remove them.
This commit was SVN r18547.
2008-05-30 00:11:12 +00:00
Shiqing Fan
b67a1244b6 Some small fixes.
This commit was SVN r18541.
2008-05-29 15:05:28 +00:00
Jeff Squyres
ed5bc2cd08 Per http://www.open-mpi.org/community/lists/devel/2008/05/4057.php, remove the darwin memory hooks component
This commit was SVN r18531.
2008-05-28 23:50:53 +00:00
Sharon Melamed
64fe554b8e Fix bug in carto component select. After the insertion of mca_base_select the carto file component was never selected.
This commit was SVN r18496.
2008-05-26 12:52:41 +00:00
Jeff Squyres
d45cb82ecc Fix two bugs in PLPA:
1. If we don't have the topology information, don't bother trying to
    create cross-referencing information
 1. Ensure to only check for valid processor ID's

This commit was SVN r18462.
2008-05-20 12:57:12 +00:00
Terry Dontje
ef7ac86929 created opal_version_string and orte_version_string to match the ompi changes
made in r18345 for ompi_version_string.  This was done per request from Jeff 
Squyres to maintain consistency and to remove some warnings caused by the 
non-use of some static const char.

This commit was SVN r18461.

The following SVN revision numbers were found above:
  r18345 --> open-mpi/ompi@8dd0421015
2008-05-20 12:13:19 +00:00
Jeff Squyres
ea1582856f Clarify some messages, move AC_ARG_WITH outside of the conditional
This commit was SVN r18459.
2008-05-19 23:13:31 +00:00
Jeff Squyres
d12b21e21b Ensure that if an error occurs, we actually return that error rather
than an undefined value (which could be 0/OPAL_SUCCESS).

This commit was SVN r18452.
2008-05-19 11:57:44 +00:00
Terry Dontje
517abf9b09 This commit fixes trac:1288.
This commit was SVN r18441.

The following Trac tickets were found above:
  Ticket 1288 --> https://svn.open-mpi.org/trac/ompi/ticket/1288
2008-05-15 17:40:08 +00:00
Jeff Squyres
fb17097de4 Make ompi_info correctly display "filter" components
This commit was SVN r18435.
2008-05-13 20:56:20 +00:00