1
1
Граф коммитов

416 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
aa35ef53d0 Fix CID 1079: don't use a value until it's been initialized (duh).
This commit was SVN r19173.
2008-08-06 11:44:22 +00:00
Ralph Castain
fdde3de903 Combination of some changes by both Jeff and I. Few minor cleanups to the code (e.g., allow options to show-mca-params to be either case), and an enhancement that allows the user to specify multiple options separated by commas (e.g., "env,api").
This commit was SVN r19124.
2008-08-02 00:43:27 +00:00
Ralph Castain
21ba1b2ec0 Modify the configure system in the paffinity framework so that only one component is built. Cleanout variable name conflicts that on some systems prevented building
This commit was SVN r19122.
2008-08-01 22:54:24 +00:00
Dan Lacher
9175da1e02 Putback for all changes to automate man page updates to strings of
versions, dates and build names.

Fixes trac:1387

Big thanks to Jeff and Brian for help and oversight.

This commit was SVN r19120.

The following Trac tickets were found above:
  Ticket 1387 --> https://svn.open-mpi.org/trac/ompi/ticket/1387
2008-08-01 21:14:37 +00:00
Jeff Squyres
1a3045ff81 * Remove some extraneous AC_MSG_RESULT's
* Make the results of the top-level configure.ac test for
   _SC_NPROCESSORS_ONLN be cached so that we can check for it
   elsewhere (e.g., opal/mca/paffinity/posix/configure.m4)
 * Update top-level configure.ac test for _SC_NPROCESSORS_ONLN: stamp
   out another AC_TRY_COMPILE
 * Ensure paffinity:posix doesn't even try to compile if we don't
   have _SC_NPROCESSORS_ONLN
 * Minor style updates

This commit was SVN r19118.
2008-08-01 11:41:08 +00:00
Ralph Castain
e1501f2c9c Add darwin paffinity component to handle the difference between Tiger and Leopard. Although both are POSIX compatible, Tiger is a tad different in this regard and requires a different interface to get the #processor data.
This commit was SVN r19117.
2008-08-01 00:15:10 +00:00
Jeff Squyres
9fda668edf Clarify the comment that the caller should not modify or free the
filename.

This commit was SVN r19114.
2008-07-31 21:53:59 +00:00
Ralph Castain
f7d1c2d229 Extend the mca param display capability to allow independent output of the params based on where they were last set (default, enviro, file, or API), and to out
put the name of the file that set them if they were set by file. This is of great assistance to support personnel trying to understand why a user is having pro
blems.

Coordinated with Jeff.

This commit was SVN r19111.
2008-07-31 20:00:45 +00:00
Lenny Verkhovsky
90a784dfca Making paffinity_base_slot_list invisible for the user
This commit was SVN r19096.
2008-07-30 14:52:45 +00:00
Terry Dontje
0ff11f7523 Added initialization and proper increment of the value of num_processors
pointer.  This commit fixes trac:1420.

This commit was SVN r19089.

The following Trac tickets were found above:
  Ticket 1420 --> https://svn.open-mpi.org/trac/ompi/ticket/1420
2008-07-30 10:29:05 +00:00
Jeff Squyres
49d9f614d0 Remove errant debugging printf
This commit was SVN r19082.
2008-07-29 18:53:40 +00:00
Jeff Squyres
0af7ac53f2 Fixes trac:1392, #1400
* add "register" function to mca_base_component_t
   * converted coll:basic and paffinity:linux and paffinity:solaris to
     use this function
   * we'll convert the rest over time (I'll file a ticket once all
     this is committed)
 * add 32 bytes of "reserved" space to the end of mca_base_component_t
   and mca_base_component_data_2_0_0_t to make future upgrades
   [slightly] easier
   * new mca_base_component_t size: 196 bytes
   * new mca_base_component_data_2_0_0_t size: 36 bytes
 * MCA base version bumped to v2.0
   * '''We now refuse to load components that are not MCA v2.0.x'''
 * all MCA frameworks versions bumped to v2.0
 * be a little more explicit about version numbers in the MCA base
   * add big comment in mca.h about versioning philosophy

This commit was SVN r19073.

The following Trac tickets were found above:
  Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392
2008-07-28 22:40:57 +00:00
Jeff Squyres
4d034383d9 Apply patch from Ralf W. to remove a non-portable use of ==.
This commit was SVN r19046.
2008-07-26 12:36:24 +00:00
Jeff Squyres
92c10cd187 Remove some old kruft from Makefile.am's -- likely the result of
copying some old Makefile.am a long time ago.

This commit was SVN r19043.
2008-07-26 00:27:42 +00:00
Josh Hursey
ca43968418 Fix a dealock scenario when registering depricated MCA parameters. The internal loop uses the 'item' variable that is used by the outer loop as well. So when the outer loop checks the value of 'item' it will never equal the end of the list since it no longer references the same list.
Kinda found by MTT. MTT calls 'ompi_info --all --parsable' and it was livelocked and had to be killed by hand.

I'm going to push this one to Jeff to push to v1.3 since he did the original implementation and should check this code.

This commit was SVN r19014.
2008-07-24 15:51:54 +00:00
Ralph Castain
fdb2408bf2 Rename the osx paffinity component the "posix" component since it really has nothing osx specific in it - it is just a generic posix call to determine #processors. Set the priority low so that both linux and solaris components override it if they build. It shouldn't build in Windows at all.
Modify the odls to remove a (size_t) typecast in front of the num_processors variable just in case it is returned negative. This usually is accompanied by an opal_error, so this shouldn't make any difference - but it is more technically correct.

This commit was SVN r19008.
2008-07-24 01:54:51 +00:00
Jeff Squyres
1fd5b0402a Refs trac:1250
* Fix linux paffinity component to make a "best" guess when PLPA
   can't find topology information in the Linux kernel.  That is, if
   PLPA can't tell us the max_processor_id, just assume that it's the
   same as the number of processors.  If you have a more complex
   system than that (e.g., you have holes in your available processor
   IDs), you'll likely be running a Linux kernel that supports the
   topology information, and this problem won't happen.
 * Make sure to conver the return codes from PLPA to OPAL_ERR* codes.

This commit was SVN r19001.

The following Trac tickets were found above:
  Ticket 1250 --> https://svn.open-mpi.org/trac/ompi/ticket/1250
2008-07-23 15:47:43 +00:00
Shiqing Fan
5f021e47a9 - Add support for get_processor_info in windows paffinity module.
This commit was SVN r18992.
2008-07-23 07:59:03 +00:00
Ralph Castain
f32e24ab86 Move the POSIX-specific code out of the paffinity base. Add support for OSX in its own component.
For now, hide the OSX component with .ompi_ignore so only I can see it until I can ensure that it doesn't inadvertently interfere with Linux and Solaris support.

This clears the conflict with Windows.

This commit was SVN r18989.
2008-07-23 03:29:43 +00:00
Ralph Castain
28ca14297c Add minimal support (#processors only) for OSX and other systems that don't have paffinity modules.
This commit was SVN r18959.
2008-07-21 16:54:14 +00:00
George Bosilca
4f9ea0155b Remove 2 compiler warnings.
This commit was SVN r18956.
2008-07-21 12:55:40 +00:00
Shiqing Fan
54e93ff9d3 - This fix replaces r18899, which actually was not correct.
- Revert the $2, which was correct.
- It fixes the problem, that memchecker valgrind component could be 
compiled and is required, but it is unable to be selected. 

This commit was SVN r18906.

The following SVN revision numbers were found above:
  r18899 --> open-mpi/ompi@0b1b96b598
2008-07-14 13:06:09 +00:00
Jeff Squyres
cb36782310 Make this parameter visible to users; it was a mistake/typo to make
it hidden.

This commit was SVN r18902.
2008-07-14 11:21:52 +00:00
Lenny Verkhovsky
a812324963 Fixing "paffinity_base_slot_list" environment
This commit was SVN r18900.
2008-07-14 07:10:50 +00:00
Shiqing Fan
0b1b96b598 Fix the bug in memchecker/valgrind/configure.m4, which wrongly reset the
CPPFLAG.

This commit was SVN r18899.
2008-07-13 18:03:02 +00:00
Jeff Squyres
583bf425c0 Fixes trac:1383:
Short version: remove opal_paffinity_alone and restore
mpi_paffinity_alone.  ORTE makes various information available for the
MPI layer to decide what it wants to do in terms of processor
affinity.

Details:

 * remove opal_paffinity_alone MCA param; restore mpi_paffinity_alone
   MCA param
 * move opal_paffinity_slot_list param registration to paffinity base
 * ompi_mpi_init() calls opal_paffinity_base_slot_list_set(); if that
   succeeds use that.  If no slot list was set, see if
   mpi_paffinity_alone was set.  If so, bind this process to its Node
   Local Rank (NLR).  The NLR is the ORTE-maintained slot ID; if you
   COMM_SPAWN to a host in this ORTE universe that already has procs
   on it, the NLR for the new job will start at N (not 0).  So this is
   slightly better than mpi_paffinity_alone in the v1.2 series.
 * If a slot list is specified *and* mpi_paffinity_alone is set, we
   display an error and abort.
 * Remove calls from rmaps/rank_file component to register and lookup
   opal_paffinity mca params. 
 * Remove code in orte/odls that set affinities - instead, have them
   just pass a slot_list if it exists. 
 * Cleanup the orte/odls code that determined
   oversubscribed/want_processor as these were just opposites of each
   other.

This commit was SVN r18874.

The following Trac tickets were found above:
  Ticket 1383 --> https://svn.open-mpi.org/trac/ompi/ticket/1383
2008-07-10 21:12:45 +00:00
Jeff Squyres
7b2612696c Remove all the keyval stuff from the MCA parameter functionality. The
meat of it was commented out long ago, anyway (because of the way it
was written, it violates OPAL<->OMPI abstraction barriers); we never
ended up using the MPI keyval MCA parameter stuff.  So just delete it.

This commit was SVN r18860.
2008-07-10 01:52:51 +00:00
Jeff Squyres
49be4b1e45 Fixes trac:1383
Lenny and I went back and forth on whether we should simply register
another "mpi_paffinity_alone" MCA param and then try to figure out
which one was set in ompi_mpi_init, but there was difficulty in
figuring out what to do.  So it seemed like the Right Thing to do was
to implement what was committed in r18770; then we could tell where
MCA parameters were set from and you could do Better Things (this is
also useful in the openib BTL, where parameters can be set either via
MCA parameter or via an INI file).

But after that was done, it seemed only a few steps further to
actually implement two new features in the MCA params area:

 * Synonyms (where one MCA param name is a synonym for another)
 * Allow MCA params and/or their synonyms to be marked as "deprecated"
   (printing out warnings if they are used)

These features have actually long been discussed/desired, and I had
some time in airports and airplanes recently where I could work in
this stuff on a standalone laptop.  So I did it.  :-)

This commit introduces these two new features, and then uses them to
register mpi_paffinity_alone as a non-deprecated synonym for
opal_paffinity_alone.  A few other random points in this commit:

 * Add a few error checks for conditions that were not checked before
 * Correct some comments in mca_base_params.h
 * Add a few comments in strategic places
 * ompi_info now prints additional information:
   * for any MCA parameter that has synonyms, it lists all the
     synonyms
   * synonyms are also output as 1st-class MCA params, but with an
     additional attribute indicating that they have a "parent"
   * all MCA param name (both "real" or "synonym") will output an
     attribute indicating whether it is deprecated or not.  A synonym
     is deprecated if it iself is marked as deprecated (via the
     mca_base_param_regist_syn() or mca_base_param_register_syn_name()
     functions) or if its "parent" MCA parameter is deprecated

This commit was SVN r18859.

The following SVN revision numbers were found above:
  r18770 --> open-mpi/ompi@8efe67e08c

The following Trac tickets were found above:
  Ticket 1383 --> https://svn.open-mpi.org/trac/ompi/ticket/1383
2008-07-10 01:44:51 +00:00
Josh Hursey
c4035d848f This commit fixes runs when there is no available CRS component (BLCR is unavailable, and SELF is deactivated). Previously the run would fail out of MPI_INIT since the OPAL CRS framework could not select a component. This is because the framework did not recognize the 'none' component as a full component because it was part of crs/base.
I promoted the ''none'' component to a full component, and updated the other components to reflect this code movement. The ''none'' component is the default component unless the user requests '''-am ft-enable-cr''' to auto-select a component. There is an MCA parameter to show a warning if the application requested an FT enabled job, but the ''none'' component was selected ({{{crs_none_select_warning}}}).

This temporarily fixes the problem mentioned in r18739. The full fix will entail working on ticket #1291.

Thanks to Ethan from Sun for finding this bug.

This commit was SVN r18840.

The following SVN revision numbers were found above:
  r18739 --> open-mpi/ompi@a003fa7a50
2008-07-08 20:04:39 +00:00
Josh Hursey
22f4c829ba cleanup BLCR configure so --without-blcr works correctly
This commit was SVN r18825.
2008-07-08 02:48:20 +00:00
Lenny Verkhovsky
1ed465326b Change of name conventions in carto
NODE -> EDGE
CONNECTION ->   BRANCH
SLOT -> SOCKET.

This commit was SVN r18799.
2008-07-03 14:19:16 +00:00
Lenny Verkhovsky
ba1fa73881 Selectign Maffinity only if Paffinity selected fix
This commit was SVN r18797.
2008-07-03 13:39:34 +00:00
Jeff Squyres
8efe67e08c Improvements to the MCA param system: allow querying to find out where
an MCA parameter's value came from.  Note that the actual value of the
parameter is irrelevant.  For example, if a value was specified in an
MCA parameter file that happened to have the same defaultvalue that
was specified when the parameter was registered, the returned location
will indicate that the value was set from the file.

Possible answers:

 * '''MCA_BASE_PARAM_SOURCE_DEFAULT:''' no user-specified values were
   found, so the default value was used
 * '''MCA_BASE_PARAM_SOURCE_ENV:''' the value came from the
   environment (which also means the mpirun/orterun command line!)
 * '''MCA_BASE_PARAM_SOURCE_FILE:''' the value came a file (or the
   Windows registry)
 * '''MCA_BASE_PARAM_SOURCE_KEYVAL:''' the value came from a keyval
   (can currently never happen)
 * '''MCA_BASE_PARAM_SOURCE_OVERRIDE:''' the value came from an MCA
   param API "set" function

This commit was SVN r18770.
2008-06-28 15:13:25 +00:00
Jeff Squyres
21c7d95109 Fixes trac:1365: if we're using !^ to negate module inclusion, then don't
bother to check to see whether they exist or not.  Specifically, this
will not cause an error:

{{{
shell$ mpirun --mca btl ^does_not_exist ...
}}}

but neither will this:

{{{
shell$ mpirun --mca btl ^sm ...
}}}

(where the sm BTL ''does'' exist)

This commit was SVN r18760.

The following Trac tickets were found above:
  Ticket 1365 --> https://svn.open-mpi.org/trac/ompi/ticket/1365
2008-06-27 19:42:08 +00:00
Shiqing Fan
d129578694 Small fix for including unistd.h header file.
This commit was SVN r18758.
2008-06-27 16:25:31 +00:00
Josh Hursey
a003fa7a50 C/R fix for broken CRS component selection resulting from r18707.
Make sure that if we ask for the 'none' component (which is not a 'real' component, but a component in crs/base) then we do not fail out of the box when using tools. We check for the {{{OPAL_ERR_NOT_FOUND}}} error.

Also make sure that component_open() returns {{{OPAL_ERR_NOT_FOUND}}} when it cannot find a value instead of {{{OPAL_ERROR}}} which means something quite a bit different.

C/R is working but the tools still print the warning below everytime they are ran:
{{{
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      odin.cs.indiana.edu
Framework: crs
Component: none
--------------------------------------------------------------------------
}}}

I'll have to figure out a work around for this warning (maybe work on the {{{MCA_NULL}}} Ticket #1291).

This commit was SVN r18739.

The following SVN revision numbers were found above:
  r18707 --> open-mpi/ompi@bdaaf01d8a
2008-06-25 14:55:09 +00:00
Brian Barrett
e7a299d046 Add timer support for Catamount
This commit was SVN r18729.
2008-06-24 22:13:34 +00:00
Rolf vandeVaart
95cd9758e5 Fix broken build on Solaris.
This commit was SVN r18719.
2008-06-24 14:57:12 +00:00
Ralph Castain
f70b7e51ce Fix a missing header file and ensure we use a portable name for a system limit
This commit was SVN r18712.
2008-06-23 22:32:26 +00:00
Jeff Squyres
bdaaf01d8a Fixes trac:1338: Have the MCA base specifically check for all requested
components.  If they are not found / able to be opened, a warning will
be printed and the mca_base_component_find() will return
OPAL_ERR_NOT_FOUND.  It is the upper-layer's responsibility to handle
this error appropriately.

This commit was SVN r18707.

The following Trac tickets were found above:
  Ticket 1338 --> https://svn.open-mpi.org/trac/ompi/ticket/1338
2008-06-23 16:14:05 +00:00
Pak Lui
188c8bce5d Fix the SEGV when module_get finds that no proc is binded. Also make no-intr available for processor binding.
This commit was SVN r18671.
2008-06-18 16:03:08 +00:00
Ralph Castain
0532d799d6 Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm.
Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed.

This commit was SVN r18664.
2008-06-18 03:15:56 +00:00
Jeff Squyres
c1d1ffbc56 Fix compile problems on systems with older versions of libnuma (that
don't have MPOL_MF_MOVE).  I know that this is a configure change in
the middle of the US workday, but this compile problem is preventing
work on several kinds of systems (e.g., RHEL4).

This commit was SVN r18659.
2008-06-16 17:26:42 +00:00
Lenny Verkhovsky
dee2f1d175 Adding new functionality to Maffinity component to support NUMA awareness
This commit was SVN r18657.
2008-06-15 07:27:29 +00:00
Brian Barrett
79ad6d983e - The ptmalloc2 memory manager component is now by default built as
a standalone library named libopenmpi-malloc.  Users wanting to
  use leave_pinned with ptmalloc2 will now need to link the library
  into their application explicitly.  All other users will use the
  libc-provided allocator instead of Open MPI's ptmalloc2.  This change
  may be overriden with the configure option enable-ptmalloc2-internal
- The leave_pinned options will now default to using mallopt on
  Linux in the cases where ptmalloc2 was not linked in.  mallopt
  will also only be available if munmap can be intercepted (the
  default whenever Open MPI is not compiled with --without-memory-
  manager.
- Open MPI will now complain and refuse to use leave_pinned if
  no memory intercept / mallopt option is available.

This commit was SVN r18654.
2008-06-13 22:32:49 +00:00
Josh Hursey
1de50b523c Fix some Coverity 'Event set_but_not_used' highlights.
Thanks to Jeff for bringing them to my attention.

This commit was SVN r18606.
2008-06-06 14:38:41 +00:00
Jeff Squyres
12a3fe57e1 As pointed out by Ralf
W. (http://www.open-mpi.org/community/lists/devel/2008/06/4095.php),
these dependencies don't need to be here.

This commit was SVN r18603.
2008-06-06 01:20:47 +00:00
Jeff Squyres
b123629e6a Fix CIDs 458, 716, 717: ensure that strings are long enough to always
be properly \0 terminated.

This commit was SVN r18602.
2008-06-06 00:59:08 +00:00
Lenny Verkhovsky
a8b5dcb204 Added more output info about socket:core pair in paffinity / rankfile components
This commit was SVN r18589.
2008-06-05 10:28:44 +00:00
Josh Hursey
78f14b5255 Fix the none.checkpoint command.
orte-checkpoint/orte-restart seem to not seem to totally like orte_output so revert them to opal_output for now. Since we have no need for the additional complexity of orte_output we can drop it for now and revisit this if anyone needs it later.

It seems that if you set the verbose level on an output handle then try to call a normal orte_output() on it then the message will *not* be printed. This is the same for opal_output, and seems incorrect to me because it stops some error messages from being printed out if you do not directly specify opal_output(0, ...). Maybe someone should take a look a this.


orte-checkpoint would segv if passed an incorrect PID. Fixed the return code so it errors out properly.

Thanks to Eric Roman for bringing this to my attention.

This commit was SVN r18583.
2008-06-04 14:44:11 +00:00