1
1
Граф коммитов

1056 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
8b8333da3e Add missing include
This commit was SVN r28114.
2013-02-26 19:56:05 +00:00
Ralph Castain
e413596705 Add the loopexit API to the opal_event definitions
This commit was SVN r28113.
2013-02-26 19:27:26 +00:00
Ralph Castain
bd9265c560 Per the meeting on moving the BTLs to OPAL, move the ORTE database "db" framework to OPAL so the relocated BTLs can access it. Because the data is indexed by process, this requires that we define a new "opal_identifier_t" that corresponds to the orte_process_name_t struct. In order to support multiple run-times, this is defined in opal/mca/db/db_types.h as a uint64_t without identifying the meaning of any part of that data.
A few changes were required to support this move:

1. the PMI component used to identify rte-related data (e.g., host name, bind level) and package them as a unit to reduce the number of PMI keys. This code was moved up to the ORTE layer as the OPAL layer has no understanding of these concepts. In addition, the component locally stored data based on process jobid/vpid - this could no longer be supported (see below for the solution).

2. the hash component was updated to use the new opal_identifier_t instead of orte_process_name_t as its index for storing data in the hash tables. Previously, we did a hash on the vpid and stored the data in a 32-bit hash table. In the revised system, we don't see a separate "vpid" field - we only have a 64-bit opaque value. The orte_process_name_t hash turned out to do nothing useful, so we now store the data in a 64-bit hash table. Preliminary tests didn't show any identifiable change in behavior or performance, but we'll have to see if a move back to the 32-bit table is required at some later time.

3. the db framework was a "select one" system. However, since the PMI component could no longer use its internal storage system, the framework has now been changed to a "select many" mode of operation. This allows the hash component to handle all internal storage, while the PMI component only handles pushing/pulling things from the PMI system. This was something we had planned for some time - when fetching data, we first check internal storage to see if we already have it, and then automatically go to the global system to look for it if we don't. Accordingly, the framework was provided with a custom query function used during "select" that lets you seperately specify the "store" and "fetch" ordering.

4. the ORTE grpcomm and ess/pmi components, and the nidmap code,  were updated to work with the new db framework and to specify internal/global storage options.

No changes were made to the MPI layer, except for modifying the ORTE component of the OMPI/rte framework to support the new db framework.

This commit was SVN r28112.
2013-02-26 17:50:04 +00:00
Brian Barrett
7c3e42a689 Work around issue shown in #3505 by not linking against libpci by default.
This commit was SVN r28076.
2013-02-19 16:19:33 +00:00
Jeff Squyres
acefc1588e Patch for Cygwin support: Use S_IRWXU for shmget() and include
<sys/stat.h>.  Thanks to Marco Atzeri for reporting the issue and
providing an initial patch.

This commit was SVN r28060.
2013-02-15 14:31:58 +00:00
Ralph Castain
037918e7b4 Correctly parse the rank file slot_list when given "S:C" - the first position holds the socket, so start looking for cores at posn=1
This commit was SVN r28054.
2013-02-13 13:06:03 +00:00
Eugene Loh
3d44f97572 Fix hwloc get-cpubind routine for Solaris. FIRST, check
processor_bind to see if we're bound to a single core.
If not, THEN check lgroup affinity.  Already CMR'ed to
v1.6 (trac 3507) and fixed upstream in hwloc (r5295).

This commit was SVN r28040.

The following SVN revision numbers were found above:
  r5295 --> open-mpi/ompi@6df8cb0f02
2013-02-10 04:02:19 +00:00
Nathan Hjelm
05a8958bb0 shmem_RUNTIME_QUERY_hint is not really read_only as it is set from the environment not the default value
This commit was SVN r28005.
2013-01-31 23:41:00 +00:00
Brian Barrett
b8442ba505 Revamp the handling of wrapper compiler flags. The user flags, main configure
flags, and mca flags are kept seperate until the very end.  The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro.  MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.

This commit was SVN r27950.
2013-01-29 00:00:43 +00:00
Ralph Castain
f6b4db0b79 Fix rank_file operations. We changed the syntax to use semi-colons between multiple slot assignments so that we could use the comma to separate specific cores, but somehow the flex definitions didn't get updated to accept that character. We also incorrectly zero'd the bitmap between slot assignment sections, and so multiple slot assignments only wound up making the last one in the list.
This commit was SVN r27908.
2013-01-25 18:33:25 +00:00
Brian Barrett
cdf0325589 Keep libevent headers from being installed in the wrong place. The top level
Makefile.am gets them installed in the right place already

This commit was SVN r27903.
2013-01-24 22:23:51 +00:00
Brian Barrett
4f41f5ce5b OPAL_WRAPPER_EXTRA_CPPFLAGS is the wrong variable, want to set
WRAPPER_EXTRA_CPPFLAGS

This commit was SVN r27886.
2013-01-21 23:37:35 +00:00
Ralph Castain
92e297d1fa Pack/unpack the disk and net stats so they get passed along
This commit was SVN r27844.
2013-01-16 21:54:48 +00:00
Samuel Gutierrez
cba06776f1 Fix copy and paste error in linux memory component debug output.
This commit was SVN r27842.
2013-01-16 18:27:57 +00:00
Ralph Castain
f29f1b731c Extend the node statistics to include disk and network traffic data.
This commit was SVN r27834.
2013-01-15 22:42:36 +00:00
Ralph Castain
2379b7369f Hey Jeff - AC_HELP_STRING takes *two* arguments, dude!
This commit was SVN r27820.
2013-01-15 15:25:58 +00:00
Jeff Squyres
e30d9a2bfb The "external" hwloc component didn't have the same fixes applied to
it that the others did: move the "I won!" code up into the POST_CONFIG
macro.  Also, fix a long-standing typo when restoring the $CPPFLAGS (!).

This commit was SVN r27813.
2013-01-14 21:44:47 +00:00
Jeff Squyres
423208932e HWLOC_DO_AM_CONDITIONALS must be run unconditionally.
This commit was SVN r27812.
2013-01-14 21:43:16 +00:00
Jeff Squyres
c17ec83de3 Add some post-v1.5.1 release hwloc bug fixes
This commit was SVN r27805.
2013-01-14 16:25:21 +00:00
Jeff Squyres
c7cb363da9 Remove some more generated files.
This commit was SVN r27800.
2013-01-12 03:30:43 +00:00
Jeff Squyres
4d6f026941 Fix a typo.
This commit was SVN r27799.
2013-01-12 03:30:29 +00:00
Ralph Castain
4d43585a1e Cleanup new hwloc install - remove build products that were accidentally included in the commit, remove non-existent file from Makefile.am
This commit was SVN r27798.
2013-01-12 03:21:53 +00:00
Jeff Squyres
3ce170d463 Update the embedded hwloc from v1.4.2 to v1.5.1.
This commit was SVN r27797.
2013-01-12 02:08:04 +00:00
Jeff Squyres
427c154800 Similar to r27794, simplify the hwloc framework by changing it to
STOP_AT_FIRST.  And move the side-effect-inducing code in
hwloc142/configure.m4 up to POST_CONFIG.

Also change the priority of the external hwloc component to 90 so that
it is evaluated before the internal component (as a direct result of
changing to STOP_AT_FIRST).

This commit was SVN r27796.

The following SVN revision numbers were found above:
  r27794 --> open-mpi/ompi@569a60c2de
2013-01-12 01:48:53 +00:00
Jeff Squyres
a0874b61e6 Remove debugging message.
This commit was SVN r27795.
2013-01-12 01:33:54 +00:00
Jeff Squyres
569a60c2de In short: this commit removes a bunch of code by switching the opal
event framework to STOP_AT_FIRST, and then moves a bunch of
side-effect-inducing code in the libevent2019 configure.m4 up to
POST_CONFIG.

== More detail ==

Change the event framework from STOP_AT_FIRST_PRIORITY to
STOP_AT_FIRST.  This means that only one component can win (vs. all
STOP_AT_FIRST_PRIORITY, in which multiple components of the same
priority can all win).

You still need to ensure that there are no side-effects from the
winner, however, so check for winning during POST_CONFIG, and set
things like the base_include there.

This simplifies the configury quite a bit -- you don't have to assume
that mulitple components can win: zero or one components will win.

Also change the libevent 2019 priority to 50 so that some other
(developer-specific/local) component could win, if it wanted to.

This commit was SVN r27794.
2013-01-12 01:28:37 +00:00
Jeff Squyres
b2d5d1e348 Along with the Automake 1.13.x changes in r27790, rename these third
party configure.in scripts to be configure.ac so that Automake stops
complaining about them.

This commit was SVN r27791.

The following SVN revision numbers were found above:
  r27790 --> open-mpi/ompi@675a2f5c48
2013-01-11 20:26:19 +00:00
Jeff Squyres
675a2f5c48 Updates for Automake 1.13.x. Without these changes, Automake 1.13.x
will error out, due to use of the
previously-deprecated-and-now-removed AM_CONFIG_HEADER macro.

This commit was SVN r27790.
2013-01-11 20:20:02 +00:00
Samuel Gutierrez
4c28c8cbd0 New sm BTL initialization take two. This approach is pretty simple. Instead of
using the modex or RML to share sm initialization information, have node rank 0
create a file containing initialization information in a well-known place. Then
during add_procs, the rest of the node processes requiring sm BTL initialization
will just read from that file to complete their initialization.

This commit was SVN r27789.
2013-01-11 16:24:56 +00:00
Jeff Squyres
e9ae2567f0 Based on a bug report and suggested fix from Darshan maintainer Phil
Carns, change to use access(.., F_OK) instead of stat() to check for
the presence of files.

Also remove redundant check for FAKEROOTKEY, and update all comments
to match.

This commit was SVN r27785.
2013-01-10 14:43:07 +00:00
Ralph Castain
4834fb7e6d Minor change to the way we record test data
This commit was SVN r27749.
2013-01-05 06:31:20 +00:00
Samuel Gutierrez
c4acd20eb9 Backout r27739.
This commit was SVN r27745.

The following SVN revision numbers were found above:
  r27739 --> open-mpi/ompi@a159bfaf25
2013-01-05 01:54:23 +00:00
Samuel Gutierrez
a159bfaf25 sm BTL initialization via modex, as discussed at last year's meeting.
This commit was SVN r27739.
2013-01-03 21:52:20 +00:00
Ralph Castain
cada035f38 Fix the segfault problem in the orteds - turns out it only occurred with progress threads enabled. Ensure the thread gets started at the right time (at the end of init), although the event base gets created earlier. Remove the finalize event as we can instead use the loopbreak call to exit the event loop.
This commit was SVN r27721.
2012-12-25 19:30:18 +00:00
Jeff Squyres
b29b852281 Consolidate all the opal/orte/ompi .m4 files back to the top-level
config/ directory.  We split them apart a while ago in the hopes that
it would simplify things, but it didn't really (e.g., because there
were still some ompi/opal .m4 files in the top-level config/
directory, resulting in developer confusion where any given m4 macro
was defined).

So this commit consolidates them back into the top-level directory for
simplicity.  

There's still (at least) two changes that would be nice to make:

 1. Split any generated .m4 file (e.g., autogen-generated .m4 files)
    into a separate directory somewhere so that a top-level -Iconfig/
    will only get our explicitly defined macros, not the autogen stuff
    (e.g., with libevent2019 needing to get the visibility macro, but
    NOT all the autogen-generated inclusion of component configure.m4
    files).
 1. Change configure to be of the form:
{{{
# ...a small amount of preamble/setup...
OPAL_SETUP
m4_ifdef([project_orte], [ORTE_SETUP])
m4_ifdef([project_ompi], [OMPI_SETUP])
# ...a small amount of finishing stuff...
}}}

I doubt we'll ever get anything as clean as that, but that would be
the goal to shoot for.

This commit was SVN r27704.
2012-12-19 00:00:36 +00:00
Nathan Hjelm
5449c45444 Per RFC: Make mca_base_param_deregister usable by changing its behavior to create a hole in the parameter array instead of deleting the parameter.
The old behavior of mca_base_param_deregister could cause the indices of other mca parameters to change. This could potentially cause problems if a mca user saves and later references an affected index.

This commit was SVN r27633.
2012-11-26 20:55:02 +00:00
Nathan Hjelm
87e5f97400 add missing #include of opal/util/output.h
This commit was SVN r27599.
2012-11-13 07:14:41 +00:00
Nathan Hjelm
bdedd8b0d3 Per RFC modify the behavior of mca_base_components_close to NOT close the output. Modify frameworks to always close their output and set to -1.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.

This commit was SVN r27570.
2012-11-06 19:09:26 +00:00
Nathan Hjelm
f3ce12e71a Per RFC fix several leaks in opal and ompi. Details below.
pml/v:
  - If vprotocol is not being used vprotocol_include_list is leaked. Assume vprotocol never takes ownership (see below) and always free the string.

coll/ml:
  - (patch verified) calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.
  - Need to free mca_coll_ml_component.config_file_name on component close.

btl/openib:
  - calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.

vprotocol/base:
  - There was no way for pml/v to determine if vprotocol took ownership of vprotocol_include_list. Fix by always never ownership (use strdup).

mca/base:
  - param_lookup will result in storage->stringval to be a newly allocated string if the mca parameter has a string value. ensure this string is always freed.

cmr:v1.7

This commit was SVN r27569.
2012-11-06 18:57:46 +00:00
Nathan Hjelm
906e29ed96 Fix leaks in the opal if posix code. Error paths were not calling OBJ_RELEASE on an opal_if_t created with OBJ_NEW.
This affects both trunk and 1.7 and might affect 1.6.

cmr:v1.7

This commit was SVN r27562.
2012-11-05 20:51:10 +00:00
Ralph Castain
bc54976f13 Silence warnings when threads are enabled
This commit was SVN r27550.
2012-11-01 03:34:51 +00:00
Nathan Hjelm
2acd0f83de Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter".
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.

This commit was SVN r27527.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-30 19:45:18 +00:00
Ralph Castain
6aac54b02e Revert r27510, r27509, and r27508.
Not sure what happened here, but the resulting trunk wouldn't even configure. After spending time fixing that problem, I found it wouldn't compile due to multiple syntax errors that had been introduced in both the OPAL and OMPI layer. This raised questions as to the completeness of the work.

Given that the author is departing, I pinged Jeff about it and we agreed to revert this for now. Hopefully, it can either be fixed by the author prior to actual departure, or someone else can pick it up (now that it is in the history) and fix it.

This commit was SVN r27511.

The following SVN revision numbers were found above:
  r27508 --> open-mpi/ompi@12c3c743de
  r27509 --> open-mpi/ompi@79e4a8ca38
  r27510 --> open-mpi/ompi@1ad5ff625a
2012-10-27 16:43:45 +00:00
Shiqing Fan
12c3c743de Per the MemPin RFC, submit the component source files, and update the memchecker macros.
This commit was SVN r27508.
2012-10-27 02:48:20 +00:00
Ralph Castain
e6014bf2e1 Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
This commit was SVN r27477.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Nathan Hjelm
d59034e6ef MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
cmr:v1.7

This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Samuel Gutierrez
1f24f1d305 Update the data types used in opaldf to minimize the chance of overflow when
determining the amount of available space. Thanks to Eugene for pointing out the
issue.

This commit was SVN r27436.
2012-10-11 16:11:23 +00:00
Samuel Gutierrez
21be553e21 Add Windows support to opaldf and shmem/windows -- thanks Shiqing. Next commit
will fix issues found by Eugene.

This commit was SVN r27435.
2012-10-11 14:49:41 +00:00
Samuel Gutierrez
dcd4493f54 Properly report the amount of free space on failure.
This commit was SVN r27434.
2012-10-10 19:49:52 +00:00
Samuel Gutierrez
0461826a4b Fix bus errors caused by an inadequate amount of space during
opal_shmem_segment_create by testing whether or not the target mount has enough
space to accommodate the shared-memory backing store. Fixes trac:2827. Will work
with Shiqing to add Windows support (if required).

This commit was SVN r27433.

The following Trac tickets were found above:
  Ticket 2827 --> https://svn.open-mpi.org/trac/ompi/ticket/2827
2012-10-09 20:48:04 +00:00