1
1
Граф коммитов

1132 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
a0874b61e6 Remove debugging message.
This commit was SVN r27795.
2013-01-12 01:33:54 +00:00
Jeff Squyres
569a60c2de In short: this commit removes a bunch of code by switching the opal
event framework to STOP_AT_FIRST, and then moves a bunch of
side-effect-inducing code in the libevent2019 configure.m4 up to
POST_CONFIG.

== More detail ==

Change the event framework from STOP_AT_FIRST_PRIORITY to
STOP_AT_FIRST.  This means that only one component can win (vs. all
STOP_AT_FIRST_PRIORITY, in which multiple components of the same
priority can all win).

You still need to ensure that there are no side-effects from the
winner, however, so check for winning during POST_CONFIG, and set
things like the base_include there.

This simplifies the configury quite a bit -- you don't have to assume
that mulitple components can win: zero or one components will win.

Also change the libevent 2019 priority to 50 so that some other
(developer-specific/local) component could win, if it wanted to.

This commit was SVN r27794.
2013-01-12 01:28:37 +00:00
Jeff Squyres
b2d5d1e348 Along with the Automake 1.13.x changes in r27790, rename these third
party configure.in scripts to be configure.ac so that Automake stops
complaining about them.

This commit was SVN r27791.

The following SVN revision numbers were found above:
  r27790 --> open-mpi/ompi@675a2f5c48
2013-01-11 20:26:19 +00:00
Jeff Squyres
675a2f5c48 Updates for Automake 1.13.x. Without these changes, Automake 1.13.x
will error out, due to use of the
previously-deprecated-and-now-removed AM_CONFIG_HEADER macro.

This commit was SVN r27790.
2013-01-11 20:20:02 +00:00
Samuel Gutierrez
4c28c8cbd0 New sm BTL initialization take two. This approach is pretty simple. Instead of
using the modex or RML to share sm initialization information, have node rank 0
create a file containing initialization information in a well-known place. Then
during add_procs, the rest of the node processes requiring sm BTL initialization
will just read from that file to complete their initialization.

This commit was SVN r27789.
2013-01-11 16:24:56 +00:00
Jeff Squyres
e9ae2567f0 Based on a bug report and suggested fix from Darshan maintainer Phil
Carns, change to use access(.., F_OK) instead of stat() to check for
the presence of files.

Also remove redundant check for FAKEROOTKEY, and update all comments
to match.

This commit was SVN r27785.
2013-01-10 14:43:07 +00:00
Ralph Castain
4834fb7e6d Minor change to the way we record test data
This commit was SVN r27749.
2013-01-05 06:31:20 +00:00
Samuel Gutierrez
c4acd20eb9 Backout r27739.
This commit was SVN r27745.

The following SVN revision numbers were found above:
  r27739 --> open-mpi/ompi@a159bfaf25
2013-01-05 01:54:23 +00:00
Samuel Gutierrez
a159bfaf25 sm BTL initialization via modex, as discussed at last year's meeting.
This commit was SVN r27739.
2013-01-03 21:52:20 +00:00
Ralph Castain
cada035f38 Fix the segfault problem in the orteds - turns out it only occurred with progress threads enabled. Ensure the thread gets started at the right time (at the end of init), although the event base gets created earlier. Remove the finalize event as we can instead use the loopbreak call to exit the event loop.
This commit was SVN r27721.
2012-12-25 19:30:18 +00:00
Jeff Squyres
b29b852281 Consolidate all the opal/orte/ompi .m4 files back to the top-level
config/ directory.  We split them apart a while ago in the hopes that
it would simplify things, but it didn't really (e.g., because there
were still some ompi/opal .m4 files in the top-level config/
directory, resulting in developer confusion where any given m4 macro
was defined).

So this commit consolidates them back into the top-level directory for
simplicity.  

There's still (at least) two changes that would be nice to make:

 1. Split any generated .m4 file (e.g., autogen-generated .m4 files)
    into a separate directory somewhere so that a top-level -Iconfig/
    will only get our explicitly defined macros, not the autogen stuff
    (e.g., with libevent2019 needing to get the visibility macro, but
    NOT all the autogen-generated inclusion of component configure.m4
    files).
 1. Change configure to be of the form:
{{{
# ...a small amount of preamble/setup...
OPAL_SETUP
m4_ifdef([project_orte], [ORTE_SETUP])
m4_ifdef([project_ompi], [OMPI_SETUP])
# ...a small amount of finishing stuff...
}}}

I doubt we'll ever get anything as clean as that, but that would be
the goal to shoot for.

This commit was SVN r27704.
2012-12-19 00:00:36 +00:00
Nathan Hjelm
5449c45444 Per RFC: Make mca_base_param_deregister usable by changing its behavior to create a hole in the parameter array instead of deleting the parameter.
The old behavior of mca_base_param_deregister could cause the indices of other mca parameters to change. This could potentially cause problems if a mca user saves and later references an affected index.

This commit was SVN r27633.
2012-11-26 20:55:02 +00:00
Nathan Hjelm
87e5f97400 add missing #include of opal/util/output.h
This commit was SVN r27599.
2012-11-13 07:14:41 +00:00
Nathan Hjelm
bdedd8b0d3 Per RFC modify the behavior of mca_base_components_close to NOT close the output. Modify frameworks to always close their output and set to -1.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.

This commit was SVN r27570.
2012-11-06 19:09:26 +00:00
Nathan Hjelm
f3ce12e71a Per RFC fix several leaks in opal and ompi. Details below.
pml/v:
  - If vprotocol is not being used vprotocol_include_list is leaked. Assume vprotocol never takes ownership (see below) and always free the string.

coll/ml:
  - (patch verified) calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.
  - Need to free mca_coll_ml_component.config_file_name on component close.

btl/openib:
  - calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.

vprotocol/base:
  - There was no way for pml/v to determine if vprotocol took ownership of vprotocol_include_list. Fix by always never ownership (use strdup).

mca/base:
  - param_lookup will result in storage->stringval to be a newly allocated string if the mca parameter has a string value. ensure this string is always freed.

cmr:v1.7

This commit was SVN r27569.
2012-11-06 18:57:46 +00:00
Nathan Hjelm
906e29ed96 Fix leaks in the opal if posix code. Error paths were not calling OBJ_RELEASE on an opal_if_t created with OBJ_NEW.
This affects both trunk and 1.7 and might affect 1.6.

cmr:v1.7

This commit was SVN r27562.
2012-11-05 20:51:10 +00:00
Ralph Castain
bc54976f13 Silence warnings when threads are enabled
This commit was SVN r27550.
2012-11-01 03:34:51 +00:00
Nathan Hjelm
2acd0f83de Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter".
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.

This commit was SVN r27527.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-30 19:45:18 +00:00
Ralph Castain
6aac54b02e Revert r27510, r27509, and r27508.
Not sure what happened here, but the resulting trunk wouldn't even configure. After spending time fixing that problem, I found it wouldn't compile due to multiple syntax errors that had been introduced in both the OPAL and OMPI layer. This raised questions as to the completeness of the work.

Given that the author is departing, I pinged Jeff about it and we agreed to revert this for now. Hopefully, it can either be fixed by the author prior to actual departure, or someone else can pick it up (now that it is in the history) and fix it.

This commit was SVN r27511.

The following SVN revision numbers were found above:
  r27508 --> open-mpi/ompi@12c3c743de
  r27509 --> open-mpi/ompi@79e4a8ca38
  r27510 --> open-mpi/ompi@1ad5ff625a
2012-10-27 16:43:45 +00:00
Shiqing Fan
12c3c743de Per the MemPin RFC, submit the component source files, and update the memchecker macros.
This commit was SVN r27508.
2012-10-27 02:48:20 +00:00
Ralph Castain
e6014bf2e1 Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
This commit was SVN r27477.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Nathan Hjelm
d59034e6ef MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
cmr:v1.7

This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Samuel Gutierrez
1f24f1d305 Update the data types used in opaldf to minimize the chance of overflow when
determining the amount of available space. Thanks to Eugene for pointing out the
issue.

This commit was SVN r27436.
2012-10-11 16:11:23 +00:00
Samuel Gutierrez
21be553e21 Add Windows support to opaldf and shmem/windows -- thanks Shiqing. Next commit
will fix issues found by Eugene.

This commit was SVN r27435.
2012-10-11 14:49:41 +00:00
Samuel Gutierrez
dcd4493f54 Properly report the amount of free space on failure.
This commit was SVN r27434.
2012-10-10 19:49:52 +00:00
Samuel Gutierrez
0461826a4b Fix bus errors caused by an inadequate amount of space during
opal_shmem_segment_create by testing whether or not the target mount has enough
space to accommodate the shared-memory backing store. Fixes trac:2827. Will work
with Shiqing to add Windows support (if required).

This commit was SVN r27433.

The following Trac tickets were found above:
  Ticket 2827 --> https://svn.open-mpi.org/trac/ompi/ticket/2827
2012-10-09 20:48:04 +00:00
Jeff Squyres
6af6809dc2 * Fix some comments.
* Use the hwloc logical index, not the os_index.  Fixes problems with
   opal_hwloc_base_cset2str() output (e.g., --report-bindings output)
   on machines where the os_index is not tightly packed in the range
   ![0, n-1]

This commit was SVN r27394.
2012-10-03 09:33:40 +00:00
Ralph Castain
36679e19df Add a convenient macro for debugging process binding that shows the current binding pattern - helps when trying to figure out when a process got bound, and to where
This commit was SVN r27387.
2012-10-01 15:06:15 +00:00
Ralph Castain
5639d1617f Move missing piece to required visibility
This commit was SVN r27380.
2012-09-27 01:43:54 +00:00
Ralph Castain
54db4c35eb Get the trunk to build again when --without-hwloc is specified. Move a couple of key type definitions and utilities out from under the HAVE_HWLOC test so they are always available as they don't really depend on hwloc's presence. Tell two compnents not to build if hwloc is disabled:
ompi/mca/sbgp/basesmsocket
orte/mca/rmaps/lama

Remove stale configure.params files from the sbgp framework as the OMPI build system no longer looks at those files.

This commit was SVN r27377.
2012-09-26 23:24:27 +00:00
Pavel Shamis
4935b9d930 Fixing compilation error. Adding missing output.h.
This commit was SVN r27375.
2012-09-26 15:52:09 +00:00
Brian Barrett
cb6831830a Remove the TSD_HACKS macro. The TSD hack is only for non-glibc libraries
and we only build the linux memory component on glibc, so this shouldn't
be needed.

This commit was SVN r27371.
2012-09-26 07:42:43 +00:00
Ralph Castain
01504239e6 Enable some debugging in the if discovery code
This commit was SVN r27367.
2012-09-25 20:23:37 +00:00
Ralph Castain
662bc05aa6 Refs trac:3322
Cannot start the data clearing at the root object level as the root object has a different struct attached to userdata.

This commit was SVN r27357.

The following Trac tickets were found above:
  Ticket 3322 --> https://svn.open-mpi.org/trac/ompi/ticket/3322
2012-09-20 23:30:32 +00:00
Ralph Castain
d95025f53a Ensure we clear the usage numbers when binding on multiple nodes so we don't "carry over" info from one node to the next. Use the same tracking mechanism for binding upwards and in-place to avoid doing a bunch of mallocs.
Refs trac:3322

This commit was SVN r27356.

The following Trac tickets were found above:
  Ticket 3322 --> https://svn.open-mpi.org/trac/ompi/ticket/3322
2012-09-20 15:16:06 +00:00
Ralph Castain
a3060cdd15 Fix the bind_downward code - it was incorrectly looking across the entire node instead of only looking below the locale to which the proc had been assigned. In other words, if the proc was mapped to a core, then the only hwthreads that should be considered for binding are those directly below that core. The binding algo was incorrectly looking at ALL hwthreads in that scenario, causing the proc to be bound to an HT outside of the mapped location.
This now results in the procs being bound within their assigned location. It also causes us to use only the 0th HT on a core unless --use-hwthread-cpus has been specified (in which case, we use all the HTs in a core). Bind to core binds you to all HTs regardless - the --use-hwthread-cpus only impacts the oversubscribed determination and when binding to HT.

cmr:v1.7

This commit was SVN r27342.
2012-09-14 22:01:19 +00:00
Jeff Squyres
fb2e543a57 Refs trac:3275.
We ran into a case where the OMPI SVN trunk grew a new acceptable MCA
parameter value, but this new value was not accepted on the v1.6
branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts
the value "silent", but on the older v1.6 branch, it doesn't).  If you
set "hwloc_base_mem_bind_failure_action=silent" in the default MCA
params file and then accidentally ran with the v1.6 branch, every OMPI
executable (including ompi_info) just failed because hwloc_base_open()
would say "hey, 'silent' is not a valid value for
hwloc_base_mem_bind_failure_action!".  Kaboom.

The only problem is that it didn't give you any indication of where
this value was being set.  Quite maddening, from a user perspective.

So we changed the ompi_info handles this case.  If any framework open
function return OMPI_ERR_BAD_PARAM (either because its base MCA params
got a bad value or because one of its component register/open
functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out
a warning that it received and error, and then dump out the parameters
that it has received so far in the framework that had a problem.

At a minimum, this will show the user the MCA param that had an error
(it's usually the last one), and ''where it was set from'' (so that
they can go fix it).  

We updated ompi_info to check for O???_ERR_BAD_PARAM from each from
the framework opens.  Also updated the doxygen docs in mca.h for this
O???_BAD_PARAM behavior.  And we noticed that mca.h had MCA_SUCCESS
and MCA_ERR_??? codes.  Why?  I think we used them in exactly one
place in the code base (mca_base_components_open.c).  So we deleted
those and just used the normal OPAL_* codes instead.

While we were doing this, we also cleaned up a little memory
management during ompi_info/orte-info/opal-info finalization.
Valgrind still reports a truckload of memory still in use at ompi_info
termination, but they mostly look to be components not freeing
memory/resources properly (and outside the scope of this fix).

This commit was SVN r27306.

The following Trac tickets were found above:
  Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275
2012-09-11 20:47:24 +00:00
Jeff Squyres
8076cf8089 Abort configure if --enable-memchecker was specified, but then no
memchecker components were able to configure successfully.

This commit was SVN r27267.
2012-09-07 16:08:43 +00:00
Ralph Castain
67f34c3be6 Record the bind_level recvd by the daemon for each job so it can be correctly sent to the procs. Add test in get_relative_locality to avoid descending into an infinite loop if the level is NODE (==0).
This commit was SVN r27252.
2012-09-06 20:50:07 +00:00
Shiqing Fan
0326e88c51 As opal_hwloc_topo_data_t has to create a class instance in orte, its definition has to be exported. Otherwise, there will be unresolved variable error on Windows.
This commit was SVN r27227.
2012-09-04 13:52:29 +00:00
Jeff Squyres
dd5bd99942 Clean up the error message names from the hwloc base, and add a
missing error message.

This commit was SVN r27180.
2012-08-29 16:40:46 +00:00
Ralph Castain
ad4cdd1a64 Sigh - add a continuation character so we don't lose required files
This commit was SVN r27004.
2012-08-11 16:19:29 +00:00
Ralph Castain
85af056090 GARRR...Remove the stupid dot <sigh>
This commit was SVN r27003.
2012-08-11 15:49:31 +00:00
Ralph Castain
acaaadb7a1 Correct file names for Windows events
This commit was SVN r27002.
2012-08-11 15:28:28 +00:00
Samuel Gutierrez
6188d97e1a Getting out of bed this morning was a bad idea... Reverting the sm update once more because it breaks direct launch. Will address this issue and commit the update once it has all been tested. Sorry everyone!
This commit was SVN r27001.
2012-08-10 22:20:38 +00:00
Jeff Squyres
65b4657159 Shqing needs a few more files from libevent for the Windows build.
This commit was SVN r26998.
2012-08-10 21:22:03 +00:00
Samuel Gutierrez
159bd2e62e Let's try this again: sm BTL initialization via modex.
This commit was SVN r26989.
2012-08-10 20:12:36 +00:00
George Bosilca
f7528bb404 Remove unused variables.
This commit was SVN r26966.
2012-08-08 12:43:13 +00:00
George Bosilca
2303cd0bdb Remove initialized but unused variables.
This commit was SVN r26959.
2012-08-07 12:05:25 +00:00
Shiqing Fan
2f442799f8 fix several typecasts
This commit was SVN r26957.
2012-08-07 10:41:53 +00:00
Jeff Squyres
91ccba9643 Minor enhancements to the hwloc base:
* NULL's out the hwloc_obj_t->userdata in
   hwloc_base_util.c:free_object() and
   hwloc_base_util.c:opal_hwloc_base_free_topology() after it has been
   OBJ_RELEASE'd.
 * Adds a userdata field to opal_hwloc_topo_data_t.  This field will
   be used in an upcoming rmaps component ("lama") to cache some
   associated data during hardware tree traversals.

This commit was SVN r26938.
2012-08-02 16:29:44 +00:00
Jeff Squyres
46591b0b1a Clarify a configure warning: we're ''not'' adding to DYLD_LIBRARY_PATH.
This commit was SVN r26880.
2012-07-26 21:47:00 +00:00
Jeff Squyres
a8a5f26bc2 Fix typo in comment.
This commit was SVN r26874.
2012-07-26 18:09:33 +00:00
Jeff Squyres
89a4258dfc Shorten the help message, per
http://www.open-mpi.org/community/lists/devel/2012/07/11314.php.  

This commit was SVN r26853.
2012-07-24 12:48:12 +00:00
Jeff Squyres
11feeb61f3 Clarify the comment: we ''do'' apply the memory policy before main()
starts... unless you direct launch MPI applications, in which case the
policy isn't in effect until MPI_INIT completes.

This commit was SVN r26823.
2012-07-20 22:46:34 +00:00
Shiqing Fan
12d99a9ebb Update the hwloc build on Windows and related files.
This commit was SVN r26818.
2012-07-20 12:14:28 +00:00
Shiqing Fan
0f6184985d correct a few typecasts
This commit was SVN r26816.
2012-07-20 12:10:00 +00:00
Shiqing Fan
bd6cb5decd change "#ifndef WIN32" to "#ifdef HAVE_DIRENT_H"
This commit was SVN r26755.
2012-07-05 16:37:30 +00:00
Shiqing Fan
1244f1f93a Use HAVE_STRINGS_H instead of WIN32 in r26728.
This commit was SVN r26729.

The following SVN revision numbers were found above:
  r26728 --> open-mpi/ompi@c97f46bcc7
2012-07-03 14:45:24 +00:00
Shiqing Fan
c97f46bcc7 minor changes on hwloc source files to support windows build.
This commit was SVN r26728.
2012-07-03 12:57:39 +00:00
Ralph Castain
0dfe29b1a6 Roll in the rest of the modex change. Eliminate all non-modex API access of RTE info from the MPI layer - in some cases, the info was already present (either in the ompi_proc_t or in the orte_process_info struct) and no call was necessary. This removes all calls to orte_ess from the MPI layer. Calls to orte_grpcomm remain required.
Update all the orte ess components to remove their associated APIs for retrieving proc data. Update the grpcomm API to reflect transfer of set/get modex info to the db framework.

Note that this doesn't recreate the old GPR. This is strictly a local db storage that may (at some point) obtain any missing data from the local daemon as part of an async methodology. The framework allows us to experiment with such methods without perturbing the default one.

This commit was SVN r26678.
2012-06-27 14:53:55 +00:00
Nathan Hjelm
780a25945c do not compile unused libevent code
This commit was SVN r26657.
2012-06-25 22:24:51 +00:00
Ralph Castain
b990c65a53 Remove another antiquated dss function - the 'size' API isn't used anywhere since the GPR went away
This commit was SVN r26646.
2012-06-25 13:33:45 +00:00
Ralph Castain
abe7dd8274 Cleanup the dss by removing unused functions
This commit was SVN r26644.
2012-06-23 21:20:09 +00:00
Jeff Squyres
148ae6d6e3 This commit unifies the configury of some verbs-lovin' components.
* Add new configure command line options and deprecate some old ones:
   * --with-verbs replaces --with-openib
   * --with-verbs-libdir replaces --with-openib-libdir
 * If you specify --with-openib[-libdir] without
   --with-verbs[-libdir], you'll get a "these options have been
   deprecated!" warning, but then they'll act just like
   --with-verbs[--libdir]. 

  '''Sidenote:''' Note that we are not renaming any components at this
  time, nor are we renaming the top-level OMPI_CHECK_OPENIB m4 macro
  (which is pretty strongly tied to the openib BTL and is bastaridzed
  by the ofud BTL).  Note that there will likely be more changes in
  this area coming soon (next week?) when some long-standing changes
  move to the SVN trunk: some openib BTL infrastructure will move to
  ompi/mca/common, and its configury gets split up / refactored.

We extend our philosophy of other --with-<foo> configure options of
--with-verbs to ''all'' verbs-lovin components:

 * If you specify --with-verbs, then all verbs-lovin' components must
   configure successfully (or abort).  This currently means: OOB ud,
   BTL ofud, BTL openib.
 * If you specify --with-verbs=DIR, then all verbs-lovin' component
   must configure successfully (or abort), and will use DIR to find
   verbs headers and libraries.
 * If you specify --without-verbs, then all verbs-lovin' components
   will be ignored.

This commit also fixes a problem where the --with-openib=DIR form
would not use DIR for ''all'' verbs-lovin' components (I think only
BTL openib and BTL ofud used that DIR).  Now all of them do, as does
hwloc (because hwloc has some !OpenFabrics helper functions that
require ibv types from verbs.h).

There's a little new m4 infrastructure worth mentioning:

 * If you create a new verbs-lovin' component (i.e., a component that
   need verbs), your configure.m4 should
   AC_REQUIRE([OPAL_CHECK_VERBS_DIR]). 
 * You can then use three global shell variables: $opal_want_verbs,
   $opal_verbs_dir, $opal_verbs_libdir, which will be set as follows:
   * opal_want_verbs will be "yes" and opal_verbs_dir and
     opal_verbs_libdir will both be set to directory values, '''OR'''
   * opal_want_verbs will be "no" and opal_verbs_dir and
     opal_verbs_libdir will both be set empty

This commit was SVN r26640.
2012-06-22 19:53:56 +00:00
Jeff Squyres
06c4317dd4 Ensure to include external.h in the tarball.
This commit was SVN r26610.
2012-06-15 16:29:21 +00:00
Jeff Squyres
6760840ebb Fix builds with the external hwloc component when we use the
hwloc/openfabrics-verbs.h helper header file.

This commit was SVN r26603.
2012-06-14 19:00:57 +00:00
Terry Dontje
6d7cf4a0e5 corrected picl dependency checking to occur in the hwloc.m4 instead of Makefile.am
This commit was SVN r26595.
2012-06-12 14:47:05 +00:00
Ralph Castain
cee5a75d19 Revert the default configuration to no orte progress thread and no libevent thread support until we can get more of the kinks ironed out.
This commit was SVN r26593.
2012-06-11 20:52:28 +00:00
Nathan Hjelm
ceee4bcb0d libevent2019: libevent_pthreads.la is never built. don't include it
This commit was SVN r26570.
2012-06-07 19:22:45 +00:00
Jeff Squyres
ba040e3a42 Upgrade hwloc from 1.3.2+patches to 1.4.2+patches.
This commit was SVN r26566.
2012-06-07 16:24:46 +00:00
Ralph Castain
5876496f4c Enable orte progress threads and libevent thread support by default
This commit was SVN r26565.
2012-06-07 04:25:00 +00:00
Jeff Squyres
8d161af059 Move hwloc_cpuset_t prettyprint routines down into the hwloc base:
* opal_hwloc_base_cset2str(): Make a human-readable string of a
   hwloc_cpuset_t (e.g., socket 2[core 3[hwt 1]])
 * opal_hwloc_base_cset2mapstr(): Make a map-like string of a
   hwloc_cpuset_t (e.g., [B./..])

This commit was SVN r26532.
2012-06-01 16:02:18 +00:00
Jeff Squyres
05807ef19a Record the upstream SVN commit
This commit was SVN r26525.
2012-05-29 23:44:25 +00:00
Jeff Squyres
b3fbb0a2d5 Ensure to actually exit the non-voice function, even in non-debug
builds (i.e., where assert() is preprocessed away).

This commit was SVN r26524.
2012-05-29 23:41:23 +00:00
Ralph Castain
9bedb25dda Cleanup some compiler warnings, some of which are actual logic errors
This commit was SVN r26519.
2012-05-29 20:11:51 +00:00
Jeff Squyres
551b53dd89 Keep the help string less than 509 characters so that compilers don't complain.
This commit was SVN r26514.
2012-05-29 18:43:04 +00:00
Jeff Squyres
96901d9503 Slightly change the wording in the help message for the
hwloc_base_mem_alloc_policy MCA parameter to be more explicit.

This commit was SVN r26512.
2012-05-29 18:08:39 +00:00
Jeff Squyres
5391e98d56 Remove some generated files.
This commit was SVN r26495.
2012-05-25 00:45:38 +00:00
Ralph Castain
32337d0d5c Well, we dont need the -levent any more
This commit was SVN r26487.
2012-05-24 01:54:37 +00:00
Ralph Castain
63a23f1f90 Get the event lib built correctly - missing the OMPI-specific pieces from Makefile.am!
Copmlete the symbol renaming to cover new symbols in the updated version.

This commit was SVN r26486.
2012-05-24 00:27:41 +00:00
Ralph Castain
1d41c6bb80 Some more libevent configure fixes. Ensure -levent gets added to the wrapper flags as the LANL machines can't seem to find it otherwise. Remove the duplicate evaluation of --visibility in the libevent area as it was already done by opal for everyone. Set some more ignores.
This commit was SVN r26485.
2012-05-23 21:26:54 +00:00
Ralph Castain
5821e63e27 Remove built files and update ignores
This commit was SVN r26482.
2012-05-23 18:17:29 +00:00
Ralph Castain
8a2ca3d96f Delete built files
This commit was SVN r26481.
2012-05-23 18:14:26 +00:00
Ralph Castain
11d5a31b1e Update ignores, remove build result file from repo
This commit was SVN r26460.
2012-05-21 00:59:04 +00:00
Ralph Castain
40317c0290 Remove the old event lib version - new one seems to be working just fine.
This commit was SVN r26459.
2012-05-21 00:57:47 +00:00
Ralph Castain
83d69b6c95 Enable the ORTE progress thread for apps (not needed in the tools as they already continuously loop in the event lib). This appears to be working, at least for MPI apps that only use shared memory (a simple "hello"). More testing is required to identify where problems will occur - this is only intended to allow further development.
In order to use the progress thread, you must configure with:

--enable-orte-progress-threads --enable-event-thread-support

This commit was SVN r26457.
2012-05-20 15:14:43 +00:00
Ralph Castain
1ce59d08b5 Continue cleaning up the libevent ignores
This commit was SVN r26455.
2012-05-19 16:11:24 +00:00
Ralph Castain
1826b513ee Add missing windows file
This commit was SVN r26433.
2012-05-12 02:05:25 +00:00
Ralph Castain
09f413025b As per the RFC, upgrade OMPI to libevent 2.0.19. Leave the 2.0.13 release in the system, but inactive, for now in case we discover a need to rollback.
This commit was SVN r26431.
2012-05-11 01:05:36 +00:00
Jeff Squyres
1d7fef001c Record the upstream hwloc commit that we've committed here in the OMPI
tree

This commit was SVN r26422.
2012-05-10 12:15:23 +00:00
Jeff Squyres
9c9d7e77df Commit a fix for hwloc -- still checking with upstream to see if this
will be the final solution.  But I'm committing it now so that
Oracle's Solaris Studio builds can resume.

The issue is that the C++ bindings are now (eventually) including
<hwloc.h>.  We use !__hwloc_inline__ and #define it to an appropriate
value at compile-time.  The issue is that when we're compiling C++
code, we should just set !__hwloc_inline__ to "inline", because that's
a keyword in the C++ language (as opposed to !__inline__, or
somesuch).

This commit was SVN r26418.
2012-05-09 21:03:45 +00:00
Jeff Squyres
de4bbacd13 It turns out that we can't always include the hwloc OpenFabrics verbs
helper file, even if we find that the system has <infiniband/verbs.h>.
The reason is because there are some inline functions in that verbs
helper file that invoke ibv_* functions.  Some linkers (e.g., Solaris
Studio Compilers) will instantiate those static inline functions --
even if we don't use them -- and therefore we need to be able to
resolve the ibv_* symbols at link time.

But since -libverbs is only specified in places where we use other
ibv_* functions (e.g., the OpenFabrics-based BTLs), that means that
linking random executables can/will fail (e.g., orterun).

So instead, introduce a new #define: OPAL_HWLOC_WANT_VERBS_HELPER.  If
this macro is set to 1 before including opal/mca/hwloc/hwloc.h, then
you'll also get the hwloc OpenFabrics verbs helper header file (*if*
hwloc found <infiniband/verbs.h> -- otherwise, it'll #error).

This commit was SVN r26417.
2012-05-09 20:18:31 +00:00
Jeff Squyres
2ba10c37fe Per RFC, bring in the following changes:
* Remove paffinity, maffinity, and carto frameworks -- they've been
   wholly replaced by hwloc.
 * Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
 * Update sm, smcuda, wv, and openib components to no longer use carto.
   Instead, use hwloc data.  There are still optimizations possible in
   the sm/smcuda BTLs (i.e., making multiple mpools).  Also, the old
   carto-based code found out how many NUMA nodes were ''available''
   -- not how many were used ''in this job''.  The new hwloc-using
   code computes the same value -- it was not updated to calculate how
   many NUMA nodes are used ''by this job.''
   * Note that I cannot compile the smcuda and wv BTLs -- I ''think''
     they're right, but they need to be verified by their owners.
 * The openib component now does a bunch of stuff to figure out where
   "near" OpenFabrics devices are.  '''THIS IS A CHANGE IN DEFAULT
   BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
   (I do not have a NUMA machine with an OpenFabrics device that is a
   non-uniform distance from multiple different NUMA nodes).
 * Completely rewrite the OMPI_Affinity_str() routine from the
   "affinity" mpiext extension.  This extension now understands
   hyperthreads; the output format of it has changed a bit to reflect
   this new information.
 * Bunches of minor changes around the code base to update names/types
   from maffinity/paffinity-based names to hwloc-based names.
 * Add some helper functions into the hwloc base, mainly having to do
   with the fact that we have the hwloc data reporting ''all''
   topology information, but sometimes you really only want the
   (online | available) data.

This commit was SVN r26391.
2012-05-07 14:52:54 +00:00
Jeff Squyres
aba398ce09 Per RFC
(http://www.open-mpi.org/community/lists/devel/2012/04/10905.php), set
opal_cache_line_size via hwloc data, if we have it.
opal_cache_line_size will be set to an hwloc-inspired value by the end
of orte_init(), but will always have a safe value to use (i.e., a
default value 128) -- even before opal_init() has completed.

Default to the same value of 128 that Open MPI has used for several
years if a) we have no hwloc data, or b) we weren't able to find L2
objects in the hwloc data.

This commit was SVN r26322.
2012-04-24 17:31:06 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Mike Dubman
ff1c84c53f revert previous commit
This commit was SVN r26206.
2012-03-29 14:07:13 +00:00
Mike Dubman
43a5775e8a performance fix: set alignment for openib internal buffers
This commit was SVN r26205.
2012-03-29 14:00:08 +00:00
Jeff Squyres
028f471a20 Using the right env variable name helps!
This commit was SVN r26204.
2012-03-28 17:59:21 +00:00
Jeff Squyres
8a2df3311d Fixes trac:2812: check for env. markers indicating that we're in a
fakeroot.  If so, exit out of the pre-main hook immediately (without
calling functions such as stat, which will be replaced by fakeroot to
things that are not safe to call in a pre-main environment).

This commit was SVN r26203.

The following Trac tickets were found above:
  Ticket 2812 --> https://svn.open-mpi.org/trac/ompi/ticket/2812
2012-03-28 16:41:29 +00:00
Pavel Shamis
39a55df333 Adding exported libevent globabl variables to the opal_rename file.
Otherwise the varables case name conflicts.

This commit was SVN r26201.
2012-03-27 17:26:21 +00:00
Ralph Castain
811413e9bc Correctly handle multiple cpu-set ranges. Correctly support optional binding directives combined with cpu-set.
This commit was SVN r26187.
2012-03-23 14:50:41 +00:00
Ralph Castain
ce0caf7567 Support -cpu-set by binding to the specified cpus in the absence of any other binding directive. Allows users to subdivide nodes for multiple parallel mpirun invocations.
This commit was SVN r26186.
2012-03-23 14:05:52 +00:00
Ralph Castain
6f6930eb66 Resolve infinite loop when -cpu-set is specified
This commit was SVN r26184.
2012-03-23 07:18:58 +00:00
Jeff Squyres
95148f3310 Don't force the use of libpci support in hwloc in the default case --
just let hwloc decide for itself.

This commit was SVN r26178.
2012-03-22 15:28:35 +00:00
Jeff Squyres
3bf038bb1c Per RFC from long ago:
http://www.open-mpi.org/community/lists/devel/2011/10/9784.php

Bring support for a DMTCP CRS module into the trunk.  See
http://dmtcp.sourceforge.net/ for a description of DMTCP.  Thanks to
the contribution from Alex Brick at Northeastern University, and all
the others up there who helped shepherd this into being ready to
submit.

This commit was SVN r26176.
2012-03-22 12:01:46 +00:00
Jeff Squyres
d30bbc2ef9 Fix an old issue: enable hwloc PCI detection except on SuSE 10 64 bit.
Worked with Oracle to verify that hwloc PCI detection is correctly
disabled on the Suse 10/64 bit platform and is enabled by default on
all other platforms.  The --[en|dis]able-hwloc-pci switch is also
available for manual override of the configure decision about hwloc
PCI support.

This commit was SVN r26175.
2012-03-22 11:30:57 +00:00
Jeff Squyres
ab543fce58 We have no common components in opal any more, so we can remove this directory.
This commit was SVN r26169.
2012-03-20 21:21:49 +00:00
Jeff Squyres
0322db7cde Bring over r4402 from hwloc trunk.
This commit was SVN r26165.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4402
2012-03-19 16:39:54 +00:00
Jeff Squyres
aeca190744 Refs trac:3046: feedback from Brian -- don't set DYLD_LIBRARY_PATH.
This commit was SVN r26108.

The following Trac tickets were found above:
  Ticket 3046 --> https://svn.open-mpi.org/trac/ompi/ticket/3046
2012-03-07 13:12:22 +00:00
Ralph Castain
366f9d1518 Add some missing localities to the hwloc pretty-print, fix pmi modex
This commit was SVN r26105.
2012-03-06 06:21:10 +00:00
Jeff Squyres
f84c16bb65 Fixes trac:3043. Looks like some of the improvements to the hwloc132
hwloc component weren't reverse applied to the external hwloc
component.  Additionaly, if we add stuff to LDFLAGS/LIBS, we also may
need to append (DY)LD_LIBRARY_PATH (here in this configure process
only), otherwise future configure tests may fail because they can't
find libhwloc.so (e.g., if you --with-hwloc=/some/path, we need to add
/some/path/lib to (DY)LD_LIBRARY_PATH).

This commit was SVN r26082.

The following Trac tickets were found above:
  Ticket 3043 --> https://svn.open-mpi.org/trac/ompi/ticket/3043
2012-03-02 20:15:07 +00:00
Jeff Squyres
e77653511b Bring in upstream hwloc v1.3 branch SVN commit r4345
This commit was SVN r26048.

The following SVN revision numbers were found above:
  r4345 --> open-mpi/ompi@b6c2a5b602
2012-02-24 13:57:18 +00:00
Jeff Squyres
f8f7f6b3ef Bring over upstream hwloc fix r4340
This commit was SVN r26037.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4340
2012-02-23 20:44:21 +00:00
Jeff Squyres
d0df08c953 Bring in upstream hwloc SVN r4319.
This commit was SVN r25987.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4319
2012-02-21 15:39:21 +00:00
Jeff Squyres
9f7b1d76cd Apply upstream hwloc fix; hwloc SVN r4314
This commit was SVN r25986.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4314
2012-02-21 15:10:40 +00:00
Brian Barrett
2d4bbfb083 Need to make sure that only the winning component sets the include file.
Easiest solution is to set the include in a POST_CONFIG macro based on
whether the configure system says the component was selected or not.

This commit was SVN r25968.
2012-02-20 16:45:54 +00:00
Brian Barrett
628aa0d84d Altix check needs to occur before linux check.
This commit was SVN r25967.
2012-02-20 16:07:41 +00:00
Ralph Castain
534d70025f Cleanup the detection of process binding during mpi_init. There are several cases that need to be checked:
1. no binding support - indicated by a negative return code from get_cpubind

2. binding supported, but not bound - the bitset returned by get_cpubind is the same as the available cpuset

3. binding supported and bound - bitset from get_cpubind is a subset of available cpuset

4. only one cpu is available - in this case, get_cpubind matches the available cpuset, but we are effectively bound

This commit was SVN r25957.
2012-02-17 21:18:53 +00:00
Jeff Squyres
72e44cfefe Fixes trac:2951: make .../hwloc/include/autogen/config.h not be included
in the tarball.  Thanks to Paul Hargrove for the fix.

This commit was SVN r25952.

The following Trac tickets were found above:
  Ticket 2951 --> https://svn.open-mpi.org/trac/ompi/ticket/2951
2012-02-17 14:27:27 +00:00
Jeff Squyres
eb47a97025 After a bunch more back-n-forth with Paul Hargrove, hopefully this
visibility stuff will now be fixed!

This commit was SVN r25944.
2012-02-17 00:09:32 +00:00
Jeff Squyres
a055c5662c This is already ompi_ignore'd -- let's remove it.
This commit was SVN r25943.
2012-02-16 22:58:58 +00:00
Ralph Castain
7fd3ee6662 Ensure we see the correct config.h, and silence the warnings caused by duplicate defines
This commit was SVN r25938.
2012-02-16 02:50:57 +00:00
Jeff Squyres
14457accd7 Add hwloc 1.3.2 and ompi_ignore hwloc 1.3.1 (with the intent of
removing 1.3.1 in the near future).

This commit was SVN r25927.
2012-02-14 21:01:36 +00:00
Jeff Squyres
63a96e92b5 In a recent v1.5 branch issue, it took a while to figure out that
paffinity hwloc was returning "NOT_SUPPORTED" when the real problem
was that the underlying hwloc simply hadn't been initialized yet.  So
let's clearly delineate this case: return OPAL_ERR_NOT_INITIALIZED if
the underlying hwloc is not initialized.

This commit was SVN r25902.
2012-02-10 18:29:52 +00:00
Jeff Squyres
8d0bc199df hwloc131_module.c isn't necessary -- there's no module.
This commit was SVN r25901.
2012-02-10 18:09:19 +00:00
Jeff Squyres
6557d74e01 Make sure we get the entire hwloc tree, including IO devices.
This commit was SVN r25887.
2012-02-09 16:59:38 +00:00
Jeff Squyres
6dde3b6d86 Remove the old hwloc component; we bumped up to 1.3.1 a long time ago.
This commit was SVN r25885.
2012-02-09 12:27:00 +00:00
Ralph Castain
a3ab70c53f Correctly parse socket:core syntax in rankfile
This commit was SVN r25848.
2012-02-01 01:50:05 +00:00
Jeff Squyres
ba1b02dea0 Don't install this extra libevent file.
This commit was SVN r25808.
2012-01-27 20:38:00 +00:00
Jeff Squyres
9e9b06d9f7 Fixes trac:2844: ensure to take the value of --with(out)-memory-manager
into account when configuring the components of the faramework.  If
--without-memory-manager was given, then we really don't want any
memory managers to be used.

This commit was SVN r25807.

The following Trac tickets were found above:
  Ticket 2844 --> https://svn.open-mpi.org/trac/ompi/ticket/2844
2012-01-27 18:05:48 +00:00
Ralph Castain
3f31feee6f Handle the case where a user's rankfile specifies only cpus, and not socket:cpu pairs.
This commit was SVN r25803.
2012-01-27 12:21:45 +00:00
Shiqing Fan
debe91aefa Change the syntax to be compatible with C++ compiler, as this has to be compiled as C++ on Windows. Thanks Ralph.
This commit was SVN r25785.
2012-01-26 14:53:45 +00:00
Jeff Squyres
e162945090 This script is generated and should not be in SVN.
This commit was SVN r25778.
2012-01-25 16:38:25 +00:00
Jeff Squyres
40e23e3979 Refs trac:2952: temporarily turn off hwloc PCI support because it causes a
problem on SuSE 10 (which might be related to Oracle's dual-bitness
builds, but we aren't completely sure yet).

So just turn it off for now, and bring this over to v1.5.  Find a
proper fix (that enables pci support properly) for trunk/v1.7 later.

This commit was SVN r25769.

The following Trac tickets were found above:
  Ticket 2952 --> https://svn.open-mpi.org/trac/ompi/ticket/2952
2012-01-24 15:07:41 +00:00
Jeff Squyres
6c6d19f5f5 We always want to add HWLOC_EMBEDDED_LIBS to the wrapper flags. It'll
either be empty or have meaningful stuff in it.

This commit was SVN r25761.
2012-01-21 02:57:17 +00:00
Jeff Squyres
6cad1f34e0 Bring r4182 from the hwloc v1.3 branch: fix static linking issues with
libhwloc_embedded.la.

This commit was SVN r25760.

The following SVN revision numbers were found above:
  r4182 --> open-mpi/ompi@b240395d9a
2012-01-21 02:56:42 +00:00
Jeff Squyres
878a0365be Bring over r4094 from the hwloc v1.3 branch: add missing HWLOC_PCI_LIBS
This commit was SVN r25759.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4094
2012-01-21 02:17:07 +00:00
Jeff Squyres
1d15d39fb8 Remove the libnuma component; the hwloc maffinity component does everything that this compnent used to do. Good riddance!
This commit was SVN r25749.
2012-01-19 23:53:05 +00:00
Jeff Squyres
45636b0558 Make hwloc 1.3.1 the default. Will likel remove 1.2.2ompi shortly.
This commit was SVN r25748.
2012-01-19 23:18:40 +00:00
Jeff Squyres
1a73ba6ce8 Note the upstream patches that we have in addition to stock hwloc 1.3.1.
This commit was SVN r25708.
2012-01-11 00:22:34 +00:00
Jeff Squyres
4243cb7af0 Bring over hwloc r4102 and r4104 for some upstream patches.
This commit was SVN r25707.

The following SVN revision numbers were found above:
  r4102 --> open-mpi/ompi@8961ca568d

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r4104
2012-01-11 00:21:47 +00:00
Jeff Squyres
50e5b0937c Add hwloc 1.3.1, but it is not yet the default -- it is currently
.ompi_ignored to allow other developers to test with it.  It is
expected that we'll remove the .ompi_ignore here soon, and
simultaneously remove the hwloc 1.2.2ompi component.

There was one very minor patch added to stock hwloc 1.3.1 in
hwloc/config/hwloc.m4:

{{{
--- hwloc-1.3.1/config/hwloc.m4	2011-12-14 
+++ ompi3/opal/mca/hwloc/hwloc131/hwloc/config/hwloc.m4
@@ -583,6 +583,7 @@
         ])
     fi
     AC_SUBST(HWLOC_PCI_LIBS)
+    HWLOC_LIBS="$HWLOC_LIBS $HWLOC_PCI_LIBS"
     # If we asked for pci support but couldn't deliver, fail
     AS_IF([test "$enable_pci" = "yes" -a "$hwloc_pci_happy" = "no"],
           [AC_MSG_WARN([Specified --enable-pci switch, but could
	   not])
}}}

This will be pushed upstream to hwloc.

This commit was SVN r25706.
2012-01-10 23:38:14 +00:00
Samuel Gutierrez
0ca6603fa0 remove some unused cruft in shmem. minor common sm cleanup.
This commit was SVN r25665.
2011-12-16 22:43:55 +00:00
Jeff Squyres
1d3dc0af28 Gah! opal_shmem_base_register_params() ''wasn't'' added for the mmap
on NFS warning -- it was already there!  So put it back so that it can
register base_verbose and RUNTIME_QUERY_hint.

This commit was SVN r25663.
2011-12-15 21:14:34 +00:00
Jeff Squyres
9cef715194 Updates to r25652 -- put this MCA param in the shmem/mmap component.
No need for it to be in the base (we mistakenly thought it was used in
multiple shmem components).

This commit was SVN r25662.

The following SVN revision numbers were found above:
  r25652 --> open-mpi/ompi@7e223b5799
2011-12-15 20:41:14 +00:00
Ralph Castain
7e223b5799 Okay, okay...stop the whining! Put the mca param registration in the shmem base.
This commit was SVN r25652.
2011-12-14 22:25:32 +00:00
Ralph Castain
4303958968 Allow users to silence warning
This commit was SVN r25650.
2011-12-14 21:50:34 +00:00
Shiqing Fan
a58e4ae809 Add a missing .windows file into the tarball.
This commit was SVN r25638.
2011-12-14 13:29:26 +00:00
Jeff Squyres
efd8106d0a Fix typo in help message
This commit was SVN r25628.
2011-12-13 21:55:48 +00:00