1
1

4490 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
1107f9099e Per the RFC issued here:
http://www.open-mpi.org/community/lists/devel/2014/05/14827.php

Refactor PMI support

This commit was SVN r31907.
2014-06-01 04:28:17 +00:00
Nathan Hjelm
041b72b0cc plm/alps: better workaround for the noisy cray pmi implementation
This commit is a slightly better workaround to prevent mesages of
the form:
[unset]:_pmi_alps_get_apid:alps_app_lli_put_request failed
[unset]:_pmi_alps_get_appLayout:pmi_alps_get_apid returned with error: Bad file descriptor

It works by completely disabling PMI in the application process when using
mpirun. This should not be an issue for any apps.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31882.
2014-05-22 16:04:36 +00:00
Oscar Vega-Gisbert
83bdebbf81 Java bindings for OSHMEM.
This commit was SVN r31810.
2014-05-18 21:48:09 +00:00
Nathan Hjelm
73bfecd650 More leak fixes.
Two leaks are fixed in this commit:

 - Do not leak btl component list items.

 - Do not leak the nodename when decoding the pidmap.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31779.
2014-05-15 16:38:13 +00:00
Nathan Hjelm
59d09ad9de orte: fix several small memory leaks
grpcomm: fix memory leaks

We were leaking the caddy object used to pass data to the callback
function. This commit fixes these leaks.

oob,rml: fix memory leaks

This commit fixes several leaks:

 - Both the oob/base and oob/tcp were leaking objects on their peer
   hash tables. Iterate on the hash tables and free any objects.

 - Leaked sent messages because of missing OBJ_RELEASE. I placed the
   release in ORTE_RML_SEND_COMPLETE to catch all the possible
   paths.

ess/base: close the state framework

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31776.
2014-05-15 15:06:27 +00:00
Gilles Gouaillardet
5b9364fc12 Fix a memory leak in orte_register_params()
mca_base_var_register (..., MCA_BASE_VAR_TYPE_STRING, ...)
will dup() the orte_set_slots string, so there is no need
to do this in the first place.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31773.
2014-05-15 10:31:19 +00:00
Gilles Gouaillardet
5f82c391a6 Fix memory leaks in orte/util/nidmap.c
This patch fixes four memory leaks in orte/util/nidmap.c :
 - hwloc_get_root_obj(opal_hwloc_topology)->userdata was never freed
 - even if bo->bytes is freed in the decode, bo was not freed
 - a job list is populated but never used nor freed

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31770.
2014-05-15 08:28:53 +00:00
Ralph Castain
ad0e8f841d Just pick a module to handle the incoming connection if no direct interface is identified. Siegmar hit it because his IP/netmask is disjoint, but a router was able to make the connection.
Refs trac:4627

This commit was SVN r31763.

The following Trac tickets were found above:
  Ticket 4627 --> https://svn.open-mpi.org/trac/ompi/ticket/4627
2014-05-14 19:23:02 +00:00
Ralph Castain
e605e73379 Close the incoming socket if we aren't going to accept it
cmr=v1.8.2:reviewer=rhc

This commit was SVN r31759.
2014-05-14 16:51:59 +00:00
Ralph Castain
3a1c2fff3e Correct a misplaced bracket - daemons shouldn't be doing app-related operations
This may need a patch for 1.8.2, but we can try to directly apply it

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31754.
2014-05-14 15:23:30 +00:00
Nathan Hjelm
2a57e71a47 plm/alps: fix typo introduced in r31589
This commit was SVN r31747.

The following SVN revision numbers were found above:
  r31589 --> open-mpi/ompi@445b552d3a
2014-05-13 22:36:54 +00:00
Ralph Castain
f55c587a74 Per patch from Tetsuya Mishima, ensure the rank_file mapper accurately tracks number of nodes in the map
Refs trac:4594

This commit was SVN r31725.

The following Trac tickets were found above:
  Ticket 4594 --> https://svn.open-mpi.org/trac/ompi/ticket/4594
2014-05-13 14:36:25 +00:00
Ralph Castain
5388347511 Per Jeff's suggestion, remove function that has duplicate functionality and just use one to check if session_dir directory should be removed.
Refs trac:4584

This commit was SVN r31691.

The following Trac tickets were found above:
  Ticket 4584 --> https://svn.open-mpi.org/trac/ompi/ticket/4584
2014-05-08 17:22:43 +00:00
Ralph Castain
aaae4841e9 Flush the show_help system on our way out - this also restores the opal_show_help function pointer to the OPAL layer for any subsequent processing.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31685.
2014-05-08 14:37:47 +00:00
Ralph Castain
5602156a1c Use the correct abstraction layer name for the data dirs
This commit was SVN r31684.
2014-05-08 14:32:24 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Ralph Castain
05590b6a8c Correct the datastore containing the coprocessor info
This commit was SVN r31677.
2014-05-07 19:29:12 +00:00
Ralph Castain
4def94900a Per RFC: OMPI_INSTALL_BINARIES -> OPAL_INSTALL_BINARIES
This commit was SVN r31634.
2014-05-05 21:43:05 +00:00
Ralph Castain
87d809eefe Add a new "run-time controls" framework for setting controls on processes. Initially, just move the process binding code there under a new "hwloc" component. Additional components to support cgroups, power settings, etc. to follow
This commit was SVN r31633.
2014-05-05 19:22:06 +00:00
Ralph Castain
fae39a658d Add third flag for open when using O_CREAT. Thanks to "robi" for reporting it and providing a patch.
Fixes trac:4596

Reviewed by rhc, RM-approved

cmr=v1.8.2:reviewer=ompi-gk1.8

This commit was SVN r31626.

The following Trac tickets were found above:
  Ticket 4596 --> https://svn.open-mpi.org/trac/ompi/ticket/4596
2014-05-02 21:58:38 +00:00
Ralph Castain
60c554e097 Ugh - protect that --display-devel print with some NULL checks
This commit was SVN r31604.
2014-05-02 14:28:45 +00:00
Ralph Castain
c7f55be387 Per a user request, add binding info to the simple --diplay-map option
This commit was SVN r31603.
2014-05-02 14:25:59 +00:00
Ralph Castain
ccd33a17b8 Since we cannot block when calling abort, and we want to ensure any "show_help" message at least has a chance to get out before we exit, introduce a slight delay into the abort procedure.
Refs trac:4576

This commit was SVN r31601.

The following Trac tickets were found above:
  Ticket 4576 --> https://svn.open-mpi.org/trac/ompi/ticket/4576
2014-05-02 10:46:25 +00:00
Ralph Castain
c1383ca1f3 Protect against NULL cpuset when not bound
This commit was SVN r31600.
2014-05-02 10:45:11 +00:00
Ralph Castain
0209cddb5b Revert r31596 and r31595 as they recreate the "abort" problem - all they did was move the blocking send to another point in the code. An alternative solution to the "show_help and abort" problem. will come in another commit
Refs trac:4576

This commit was SVN r31599.

The following SVN revision numbers were found above:
  r31595 --> open-mpi/ompi@2b61f22973
  r31596 --> open-mpi/ompi@712634efd3

The following Trac tickets were found above:
  Ticket 4576 --> https://svn.open-mpi.org/trac/ompi/ticket/4576
2014-05-02 10:38:30 +00:00
Ralph Castain
6545e6e9a8 Add one more check for failed mapping that rarely occurs, but results in a hang when it does
cmr=v1.8.2:reviewer=rhc

This commit was SVN r31598.
2014-05-02 10:35:14 +00:00
Ralph Castain
712634efd3 Silence warning
Refs trac:4576

This commit was SVN r31596.

The following Trac tickets were found above:
  Ticket 4576 --> https://svn.open-mpi.org/trac/ompi/ticket/4576
2014-05-01 23:58:03 +00:00
Ralph Castain
2b61f22973 Now that the abort code no longer involves a blocking rml send section, apps that call show_help followed by abort are not printing their error message. So block them in show_help until that message gets out.
This commit was SVN r31595.
2014-05-01 22:57:17 +00:00
Ralph Castain
445b552d3a Try again to get an error message printed when a daemon fails to successfully report back to mpirun. In this case, there is no guaranteed way for the daemon to output the error report itself - we don't have a connection back to the HNP, and we have tied stderr off to /dev/null (for good reasons). So the HNP has to detect the failure itself and report it.
The HNP can't know the precise reason, of course - all it knows is that the daemon failed. So output a generic error message that provides guidance on probable causes.

Refs trac:4571

This commit was SVN r31589.

The following Trac tickets were found above:
  Ticket 4571 --> https://svn.open-mpi.org/trac/ompi/ticket/4571
2014-05-01 19:48:21 +00:00
Ralph Castain
567ed25938 As per the earlier RFC, move the DB framework to orcm, thus removing it from the OMPI code repo
This commit was SVN r31586.
2014-05-01 15:43:32 +00:00
Ralph Castain
3b64c603b4 First stage of RFC to rename OMPI_foo build system support: change OMPI_CHECK_PACKAGE -> OPAL_CHECK_PACKAGE
This commit was SVN r31582.
2014-05-01 14:24:56 +00:00
Ralph Castain
238ecea311 When we comm_spawn, we really want to respect the original -host directives and not expand the daemon virtual machine unless directed to do so in the comm_spawn command. Otherwise, we will automatically launch daemons on every node in the allocation.
cmr=v1.8.2:reviewer=rhc:subject=respect vm boundaries during comm_spawn

This commit was SVN r31578.
2014-04-30 22:26:18 +00:00
Ralph Castain
d04a102ab8 Silence warnings
This commit was SVN r31573.
2014-04-30 20:55:46 +00:00
Ralph Castain
087b84b0ef Add some further debug to the dstore framework. When doing comm_spawn, we have to exchange any provided cpu bitmaps to ensure both sides compute the same locality, else various mpi frameworks can go bonkers.
This commit was SVN r31572.
2014-04-30 19:29:00 +00:00
Ralph Castain
8cda1b3dc6 Don't store cpu_bitmap unless it is non-NULL
This commit was SVN r31570.
2014-04-30 18:12:48 +00:00
Ralph Castain
7a79b25577 Ensure we cleanup some files so session dirs can be rolled up
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31569.
2014-04-30 17:52:10 +00:00
Ralph Castain
34988ba2a2 Cleanup the MPI_Abort detection
Refs trac:4576

This commit was SVN r31561.

The following Trac tickets were found above:
  Ticket 4576 --> https://svn.open-mpi.org/trac/ompi/ticket/4576
2014-04-30 00:51:59 +00:00
Ralph Castain
3c9d877c1b Remove debug
This commit was SVN r31560.
2014-04-30 00:08:43 +00:00
Ralph Castain
9402380e1f Fix some errors in transition
This commit was SVN r31559.
2014-04-30 00:07:53 +00:00
Ralph Castain
c4c9bc1573 As per the RFC:
http://www.open-mpi.org/community/lists/devel/2014/04/14496.php

Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM

This commit was SVN r31557.
2014-04-29 21:49:23 +00:00
Ralph Castain
1f0efe62a4 Minor cleanup - remove unused RML tag
Refs trac:4576

This commit was SVN r31545.

The following Trac tickets were found above:
  Ticket 4576 --> https://svn.open-mpi.org/trac/ompi/ticket/4576
2014-04-29 17:34:17 +00:00
Ralph Castain
e05b88fd18 Take another stab at resolving the "called-abort" requirement without getting stuck. Return to "drop a turd" mode, perhaps with a little more intelligence behind it. Don't worry about catching it if session dirs weren't created
cmr=v1.8.2:reviewer=jsquyres:subject=cleanup MPI_Abort hangs

This commit was SVN r31543.
2014-04-29 17:29:46 +00:00
Ralph Castain
2c6234698e Fix the tarball build - need to include the orte_config.h header
This commit was SVN r31540.
2014-04-29 00:05:19 +00:00
Ralph Castain
3723b39f30 Ensure we don't silently fail when unable to make a connection - bark pleasantly first.
Refs trac:4571

This commit was SVN r31537.

The following Trac tickets were found above:
  Ticket 4571 --> https://svn.open-mpi.org/trac/ompi/ticket/4571
2014-04-28 19:16:32 +00:00
Ralph Castain
d642babff6 Derived from patch provided by Artem, cleanup the "abnormal" code path for selecting TCP OOB modules to connect to a remote process. If we can't find a direct interface-to-address match, then assign all the provided addresses to the first available TCP module and let the normal failure process determine if the remote proc is truly reachable.
cmr=v1.8.2:reviewer=artpol:subject=fix abnormal code connection path in tcp oob

This commit was SVN r31536.
2014-04-28 19:05:14 +00:00
Ralph Castain
fb61a94804 Follow the lead set by Jeff: no need to run AC_CONFIG_HEADERS on orte_config.h. However, unlike the MPI layer, we don't run that macro on another file in orte/include, so ensure we add that -I path back!
This commit was SVN r31534.
2014-04-28 17:12:15 +00:00
Jeff Squyres
d8715f1e3a Close 3 more fd's that were leaking into child processes.
Child processes now look clean; I can't find any more fd's that are
leaking from the parent to children.

Refs trac:4550

This commit was SVN r31515.

The following Trac tickets were found above:
  Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
2014-04-24 15:36:24 +00:00
Jeff Squyres
e1655ae68d opal/util/fd.c: add new convenience function for setting FD_CLOEXEC
Paul Hargrove pointed out that Stevens tells us that we should
FD_GETFL before FD_SETFL.  And so we shall.

Make a new convenience function to do this (opal_fd_set_cloexec()),
just so that we don't have to litter this 2-step process throughout
the code.

Refs trac:4550

This commit was SVN r31513.

The following Trac tickets were found above:
  Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
2014-04-24 13:04:49 +00:00
Jeff Squyres
410f5bfb91 oob_tcp_listener.c: set both ends of this thread to be close-on-exec
This pipe is used to communicate between threads in this process.
Mark both fd as close-on-exec so that children don't inherit this
pipe.

Refs trac:4550

This commit was SVN r31512.

The following Trac tickets were found above:
  Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
2014-04-23 21:46:41 +00:00