1
1

4492 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
8736a1c138 Per RFC:
http://www.open-mpi.org/community/lists/devel/2014/05/14822.php

Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root).

This commit was SVN r31916.
2014-06-01 16:14:10 +00:00
Ralph Castain
cf2c7381d0 Replace the PML barrier with an RTE barrier for now until we can come up with a better solution for connectionless BTLs.
Refs trac:4643

This commit was SVN r31915.

The following Trac tickets were found above:
  Ticket 4643 --> https://svn.open-mpi.org/trac/ompi/ticket/4643
2014-06-01 16:08:56 +00:00
Ralph Castain
1107f9099e Per the RFC issued here:
http://www.open-mpi.org/community/lists/devel/2014/05/14827.php

Refactor PMI support

This commit was SVN r31907.
2014-06-01 04:28:17 +00:00
Nathan Hjelm
041b72b0cc plm/alps: better workaround for the noisy cray pmi implementation
This commit is a slightly better workaround to prevent mesages of
the form:
[unset]:_pmi_alps_get_apid:alps_app_lli_put_request failed
[unset]:_pmi_alps_get_appLayout:pmi_alps_get_apid returned with error: Bad file descriptor

It works by completely disabling PMI in the application process when using
mpirun. This should not be an issue for any apps.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31882.
2014-05-22 16:04:36 +00:00
Oscar Vega-Gisbert
83bdebbf81 Java bindings for OSHMEM.
This commit was SVN r31810.
2014-05-18 21:48:09 +00:00
Nathan Hjelm
73bfecd650 More leak fixes.
Two leaks are fixed in this commit:

 - Do not leak btl component list items.

 - Do not leak the nodename when decoding the pidmap.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31779.
2014-05-15 16:38:13 +00:00
Nathan Hjelm
59d09ad9de orte: fix several small memory leaks
grpcomm: fix memory leaks

We were leaking the caddy object used to pass data to the callback
function. This commit fixes these leaks.

oob,rml: fix memory leaks

This commit fixes several leaks:

 - Both the oob/base and oob/tcp were leaking objects on their peer
   hash tables. Iterate on the hash tables and free any objects.

 - Leaked sent messages because of missing OBJ_RELEASE. I placed the
   release in ORTE_RML_SEND_COMPLETE to catch all the possible
   paths.

ess/base: close the state framework

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31776.
2014-05-15 15:06:27 +00:00
Gilles Gouaillardet
5b9364fc12 Fix a memory leak in orte_register_params()
mca_base_var_register (..., MCA_BASE_VAR_TYPE_STRING, ...)
will dup() the orte_set_slots string, so there is no need
to do this in the first place.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31773.
2014-05-15 10:31:19 +00:00
Gilles Gouaillardet
5f82c391a6 Fix memory leaks in orte/util/nidmap.c
This patch fixes four memory leaks in orte/util/nidmap.c :
 - hwloc_get_root_obj(opal_hwloc_topology)->userdata was never freed
 - even if bo->bytes is freed in the decode, bo was not freed
 - a job list is populated but never used nor freed

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31770.
2014-05-15 08:28:53 +00:00
Ralph Castain
ad0e8f841d Just pick a module to handle the incoming connection if no direct interface is identified. Siegmar hit it because his IP/netmask is disjoint, but a router was able to make the connection.
Refs trac:4627

This commit was SVN r31763.

The following Trac tickets were found above:
  Ticket 4627 --> https://svn.open-mpi.org/trac/ompi/ticket/4627
2014-05-14 19:23:02 +00:00
Ralph Castain
e605e73379 Close the incoming socket if we aren't going to accept it
cmr=v1.8.2:reviewer=rhc

This commit was SVN r31759.
2014-05-14 16:51:59 +00:00
Ralph Castain
3a1c2fff3e Correct a misplaced bracket - daemons shouldn't be doing app-related operations
This may need a patch for 1.8.2, but we can try to directly apply it

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31754.
2014-05-14 15:23:30 +00:00
Nathan Hjelm
2a57e71a47 plm/alps: fix typo introduced in r31589
This commit was SVN r31747.

The following SVN revision numbers were found above:
  r31589 --> open-mpi/ompi@445b552d3a
2014-05-13 22:36:54 +00:00
Ralph Castain
f55c587a74 Per patch from Tetsuya Mishima, ensure the rank_file mapper accurately tracks number of nodes in the map
Refs trac:4594

This commit was SVN r31725.

The following Trac tickets were found above:
  Ticket 4594 --> https://svn.open-mpi.org/trac/ompi/ticket/4594
2014-05-13 14:36:25 +00:00
Ralph Castain
5388347511 Per Jeff's suggestion, remove function that has duplicate functionality and just use one to check if session_dir directory should be removed.
Refs trac:4584

This commit was SVN r31691.

The following Trac tickets were found above:
  Ticket 4584 --> https://svn.open-mpi.org/trac/ompi/ticket/4584
2014-05-08 17:22:43 +00:00
Ralph Castain
aaae4841e9 Flush the show_help system on our way out - this also restores the opal_show_help function pointer to the OPAL layer for any subsequent processing.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31685.
2014-05-08 14:37:47 +00:00
Ralph Castain
5602156a1c Use the correct abstraction layer name for the data dirs
This commit was SVN r31684.
2014-05-08 14:32:24 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Ralph Castain
05590b6a8c Correct the datastore containing the coprocessor info
This commit was SVN r31677.
2014-05-07 19:29:12 +00:00
Ralph Castain
4def94900a Per RFC: OMPI_INSTALL_BINARIES -> OPAL_INSTALL_BINARIES
This commit was SVN r31634.
2014-05-05 21:43:05 +00:00
Ralph Castain
87d809eefe Add a new "run-time controls" framework for setting controls on processes. Initially, just move the process binding code there under a new "hwloc" component. Additional components to support cgroups, power settings, etc. to follow
This commit was SVN r31633.
2014-05-05 19:22:06 +00:00
Ralph Castain
fae39a658d Add third flag for open when using O_CREAT. Thanks to "robi" for reporting it and providing a patch.
Fixes trac:4596

Reviewed by rhc, RM-approved

cmr=v1.8.2:reviewer=ompi-gk1.8

This commit was SVN r31626.

The following Trac tickets were found above:
  Ticket 4596 --> https://svn.open-mpi.org/trac/ompi/ticket/4596
2014-05-02 21:58:38 +00:00
Ralph Castain
60c554e097 Ugh - protect that --display-devel print with some NULL checks
This commit was SVN r31604.
2014-05-02 14:28:45 +00:00
Ralph Castain
c7f55be387 Per a user request, add binding info to the simple --diplay-map option
This commit was SVN r31603.
2014-05-02 14:25:59 +00:00
Ralph Castain
ccd33a17b8 Since we cannot block when calling abort, and we want to ensure any "show_help" message at least has a chance to get out before we exit, introduce a slight delay into the abort procedure.
Refs trac:4576

This commit was SVN r31601.

The following Trac tickets were found above:
  Ticket 4576 --> https://svn.open-mpi.org/trac/ompi/ticket/4576
2014-05-02 10:46:25 +00:00
Ralph Castain
c1383ca1f3 Protect against NULL cpuset when not bound
This commit was SVN r31600.
2014-05-02 10:45:11 +00:00
Ralph Castain
0209cddb5b Revert r31596 and r31595 as they recreate the "abort" problem - all they did was move the blocking send to another point in the code. An alternative solution to the "show_help and abort" problem. will come in another commit
Refs trac:4576

This commit was SVN r31599.

The following SVN revision numbers were found above:
  r31595 --> open-mpi/ompi@2b61f22973
  r31596 --> open-mpi/ompi@712634efd3

The following Trac tickets were found above:
  Ticket 4576 --> https://svn.open-mpi.org/trac/ompi/ticket/4576
2014-05-02 10:38:30 +00:00
Ralph Castain
6545e6e9a8 Add one more check for failed mapping that rarely occurs, but results in a hang when it does
cmr=v1.8.2:reviewer=rhc

This commit was SVN r31598.
2014-05-02 10:35:14 +00:00
Ralph Castain
712634efd3 Silence warning
Refs trac:4576

This commit was SVN r31596.

The following Trac tickets were found above:
  Ticket 4576 --> https://svn.open-mpi.org/trac/ompi/ticket/4576
2014-05-01 23:58:03 +00:00
Ralph Castain
2b61f22973 Now that the abort code no longer involves a blocking rml send section, apps that call show_help followed by abort are not printing their error message. So block them in show_help until that message gets out.
This commit was SVN r31595.
2014-05-01 22:57:17 +00:00
Ralph Castain
445b552d3a Try again to get an error message printed when a daemon fails to successfully report back to mpirun. In this case, there is no guaranteed way for the daemon to output the error report itself - we don't have a connection back to the HNP, and we have tied stderr off to /dev/null (for good reasons). So the HNP has to detect the failure itself and report it.
The HNP can't know the precise reason, of course - all it knows is that the daemon failed. So output a generic error message that provides guidance on probable causes.

Refs trac:4571

This commit was SVN r31589.

The following Trac tickets were found above:
  Ticket 4571 --> https://svn.open-mpi.org/trac/ompi/ticket/4571
2014-05-01 19:48:21 +00:00
Ralph Castain
567ed25938 As per the earlier RFC, move the DB framework to orcm, thus removing it from the OMPI code repo
This commit was SVN r31586.
2014-05-01 15:43:32 +00:00
Ralph Castain
3b64c603b4 First stage of RFC to rename OMPI_foo build system support: change OMPI_CHECK_PACKAGE -> OPAL_CHECK_PACKAGE
This commit was SVN r31582.
2014-05-01 14:24:56 +00:00
Ralph Castain
238ecea311 When we comm_spawn, we really want to respect the original -host directives and not expand the daemon virtual machine unless directed to do so in the comm_spawn command. Otherwise, we will automatically launch daemons on every node in the allocation.
cmr=v1.8.2:reviewer=rhc:subject=respect vm boundaries during comm_spawn

This commit was SVN r31578.
2014-04-30 22:26:18 +00:00
Ralph Castain
d04a102ab8 Silence warnings
This commit was SVN r31573.
2014-04-30 20:55:46 +00:00
Ralph Castain
087b84b0ef Add some further debug to the dstore framework. When doing comm_spawn, we have to exchange any provided cpu bitmaps to ensure both sides compute the same locality, else various mpi frameworks can go bonkers.
This commit was SVN r31572.
2014-04-30 19:29:00 +00:00
Ralph Castain
8cda1b3dc6 Don't store cpu_bitmap unless it is non-NULL
This commit was SVN r31570.
2014-04-30 18:12:48 +00:00
Ralph Castain
7a79b25577 Ensure we cleanup some files so session dirs can be rolled up
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31569.
2014-04-30 17:52:10 +00:00
Ralph Castain
34988ba2a2 Cleanup the MPI_Abort detection
Refs trac:4576

This commit was SVN r31561.

The following Trac tickets were found above:
  Ticket 4576 --> https://svn.open-mpi.org/trac/ompi/ticket/4576
2014-04-30 00:51:59 +00:00
Ralph Castain
3c9d877c1b Remove debug
This commit was SVN r31560.
2014-04-30 00:08:43 +00:00
Ralph Castain
9402380e1f Fix some errors in transition
This commit was SVN r31559.
2014-04-30 00:07:53 +00:00
Ralph Castain
c4c9bc1573 As per the RFC:
http://www.open-mpi.org/community/lists/devel/2014/04/14496.php

Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM

This commit was SVN r31557.
2014-04-29 21:49:23 +00:00
Ralph Castain
1f0efe62a4 Minor cleanup - remove unused RML tag
Refs trac:4576

This commit was SVN r31545.

The following Trac tickets were found above:
  Ticket 4576 --> https://svn.open-mpi.org/trac/ompi/ticket/4576
2014-04-29 17:34:17 +00:00
Ralph Castain
e05b88fd18 Take another stab at resolving the "called-abort" requirement without getting stuck. Return to "drop a turd" mode, perhaps with a little more intelligence behind it. Don't worry about catching it if session dirs weren't created
cmr=v1.8.2:reviewer=jsquyres:subject=cleanup MPI_Abort hangs

This commit was SVN r31543.
2014-04-29 17:29:46 +00:00
Ralph Castain
2c6234698e Fix the tarball build - need to include the orte_config.h header
This commit was SVN r31540.
2014-04-29 00:05:19 +00:00
Ralph Castain
3723b39f30 Ensure we don't silently fail when unable to make a connection - bark pleasantly first.
Refs trac:4571

This commit was SVN r31537.

The following Trac tickets were found above:
  Ticket 4571 --> https://svn.open-mpi.org/trac/ompi/ticket/4571
2014-04-28 19:16:32 +00:00
Ralph Castain
d642babff6 Derived from patch provided by Artem, cleanup the "abnormal" code path for selecting TCP OOB modules to connect to a remote process. If we can't find a direct interface-to-address match, then assign all the provided addresses to the first available TCP module and let the normal failure process determine if the remote proc is truly reachable.
cmr=v1.8.2:reviewer=artpol:subject=fix abnormal code connection path in tcp oob

This commit was SVN r31536.
2014-04-28 19:05:14 +00:00
Ralph Castain
fb61a94804 Follow the lead set by Jeff: no need to run AC_CONFIG_HEADERS on orte_config.h. However, unlike the MPI layer, we don't run that macro on another file in orte/include, so ensure we add that -I path back!
This commit was SVN r31534.
2014-04-28 17:12:15 +00:00
Jeff Squyres
d8715f1e3a Close 3 more fd's that were leaking into child processes.
Child processes now look clean; I can't find any more fd's that are
leaking from the parent to children.

Refs trac:4550

This commit was SVN r31515.

The following Trac tickets were found above:
  Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
2014-04-24 15:36:24 +00:00