1
1
Граф коммитов

12290 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
f7a94f17b9 Since we now & in the mask, the value can never be higher than the
mask value.  Also, the value is unsigned, so it can never be less than
0.

This commit was SVN r19719.
2008-10-09 13:12:49 +00:00
Jeff Squyres
46d7ffd298 Remove some redundancy from redundant MCA redundant param names. The
following names are all new for v1.3, and therefore haven't been
officially released yet:

 * btl_openib_of_cq_size
 * btl_openib_of_max_inline_data
 * btl_openib_of_pkey
 * btl_openib_of_psn
 * btl_openib_of_mtu

The "_of_" (for OpenFabrics) in there is redundant.  It used to be
"_ib_", indicating that these values are pretty much passed directly
to the verbs stack.  But I think the "openib" in the name implies this
already; having "_of_" in there just seems redundant, makes the name
longer, and seems redundant.  It's also redundant.

So I took those "_of_"'s out of the MCA names.  The old (v1.2) names
are still valid (but deprecated), such ash btl_openib_ib_cq_size.

This commit was SVN r19718.
2008-10-08 21:34:05 +00:00
Jeff Squyres
b8b7619312 * Remove pkey index as an MCA param
* Change name: mca_btl_openib_of_pkey_value -> mca_btl_openib_of_pkey
   (since now there's no index, the "_value" suffix is somewhat
   superfluous)
 * Put in a better help message for the _pkey MCA param (to agree with
   the new help message in v1.2.8)

This commit was SVN r19716.
2008-10-08 20:55:40 +00:00
Ralph Castain
a7afa869af Bring Jeff's changes over from v1.2 that restores the automatic source of .profile for bash and ksh shells.
This commit was SVN r19709.
2008-10-08 14:21:42 +00:00
Josh Hursey
5b5d557b3d Deactive the C/R thread by default. So if you happed to have compiled Open MPI with FT and with Threads, but ran without activating them then the C/R thread never becomes active.
Fix a finalize bug with the C/R thread. There is a race in the way we were finalizing the C/R thread such that, given a particular interleaving of threads, the pthread_join would stall because the C/R thread was never released properly.

Found in MTT regression testing on Odin.

All and all if you were not compiling with FT & threading support you would never see this problem.

This commit was SVN r19708.
2008-10-08 14:19:37 +00:00
Josh Hursey
6e34994ee1 Fix a problem with persistent sends found in MTT regression testing on Odin.
This commit was SVN r19707.
2008-10-08 14:11:55 +00:00
Pavel Shamis
d6eb6b3a34 replacing maximum pkey value with mask
This commit was SVN r19706.
2008-10-08 10:39:57 +00:00
Pavel Shamis
eefb66a133 Fixing openib partition support.
This commit was SVN r19705.
2008-10-08 09:56:43 +00:00
Pavel Shamis
ceb3e04d0d Fixing compilation failure for case when OMPI_HAVE_RDMACM is not defined.
This commit was SVN r19704.
2008-10-08 09:50:54 +00:00
Rolf vandeVaart
13e8975f83 In the case where we detect a value of 0 in the recvcount
array, fall back to the simpler algorithms.  This is not
the optimal solution, but it works.  

This commit was SVN r19702.
2008-10-07 19:44:51 +00:00
Josh Hursey
b7560c52be Make sure to check the bounds of the communicator array before referencing it. Fixes some of the FT related Intel bugs seen in Odin testing last night.
This commit was SVN r19701.
2008-10-07 19:15:12 +00:00
Ralph Castain
802d14b130 Default job controls to forward IO. Adjust debugger code to not forward IO unless requested.
This commit was SVN r19690.
2008-10-06 17:56:23 +00:00
Jeff Squyres
942b59d572 The endpoint is actually a list_item_t -- not a base object. But only
XOOB uses it, so we didn't notice util recently.

This commit was SVN r19689.
2008-10-06 17:38:25 +00:00
Jeff Squyres
252d370c70 Update svn:ignore
This commit was SVN r19688.
2008-10-06 16:27:44 +00:00
Jeff Squyres
dbb932b619 Remove the missing app mpi_after_finalize from the Makefile.
This commit was SVN r19687.
2008-10-06 14:35:15 +00:00
Jeff Squyres
c42ab8ea37 Fixes trac:1210, #1319
Commit from a long-standing Mercurial tree that ended up incorporating a lot of things:

 * A few fixes for CPC interface changes in all the CPCs
 * Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC
 * #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC)
   * Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag)
   * CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers. 
   * CPCs allocate a CTS buffer only if they're about to make a connection
 * RDMA CM improvements:
   * Use threaded mode openib fd monitoring to wait for for RDMA CM events
   * Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions
   * Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors
   * Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type
   * Renamed many variables to be internally consistent
   * Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator
   * Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them) 
   * Use rdma_create_qp() instead of ibv_create_qp()
 * openib fd monitoring improvements:
   * Renamed a bunch of functions and variables to be a little more obvious as to their true function
   * Use pipes to communicate between main thread and service thread
   * Add ability for main thread to invoke a function back on the service thread 
   * Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3))
   * Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port
   * Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout
 * Other improvements:
   * btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp")
   * Somewhat improved error handling
   * Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements
   * Oodles of little coding style fixes
   * Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction
   * Added some more show_help error messages
   * Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread) 

This commit was SVN r19686.

The following Trac tickets were found above:
  Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
2008-10-06 00:46:02 +00:00
Tim Mattox
4602b0e94a Adjust NEWS items about a SLURM fix.
This commit was SVN r19678.
2008-10-03 19:01:12 +00:00
Ralph Castain
1976992300 Update platform files to correctly deal with visibility and memchecker
This commit was SVN r19677.
2008-10-03 13:41:33 +00:00
Ralph Castain
f4f81c7308 Let the HNP only update the routing tree if necessary. Enable some debug output
This commit was SVN r19676.
2008-10-03 13:41:08 +00:00
Ralph Castain
0cc2e724f8 Separate var declaration from use to remove compiler warnings in non-debug builds
This commit was SVN r19675.
2008-10-03 13:40:31 +00:00
Ralph Castain
15c47a2473 Revise the daemon collective system to handle comm_spawn patterns that cross into new nodes that are not direct children on the routing tree of the HNP.
Refers to ticket #1548. Although this appears to fix the problem, the ticket will be held open pending further test prior to transition to the 1.3 branch.

This commit was SVN r19674.
2008-10-02 20:08:27 +00:00
Rolf vandeVaart
0a0ddfc934 Handle MPI_IN_PLACE correctly in the ompi_coll_tuned_reduce_scatter_intra_ring function.
We were not adjusting the sendbuf in this case so we were reducing garbage.

This fixes ticket #1506.

This commit was SVN r19673.
2008-10-02 20:01:27 +00:00
Aurelien Bouteiller
852c0e35b8 Fixed a stupid type mismatch. Thanks Jeff for noticing.
Aurelien

This commit was SVN r19672.
2008-10-02 00:22:41 +00:00
Jeff Squyres
8cb6ad5cb7 Adjustment to match the ob1 change from r19658.
This commit was SVN r19671.

The following SVN revision numbers were found above:
  r19658 --> open-mpi/ompi@b32e4e7f34
2008-10-01 23:57:26 +00:00
Jeff Squyres
70b02a0178 Sometimes we don't have a valid error code, so don't segv if
ompi_mpi_errnum_get_string() returns a NULL.

This commit was SVN r19670.
2008-10-01 21:42:08 +00:00
Aurelien Bouteiller
77fa3b5d4c Code to connect to event loggers over optimized MPI communication channels.
This commit was SVN r19669.
2008-10-01 18:42:43 +00:00
Aurelien Bouteiller
aded765084 Upgrade to the new mca version
This commit was SVN r19668.
2008-10-01 18:40:44 +00:00
Aurelien Bouteiller
73edda92db Take into account Ralph's comments. There is no duplicate functionnality with the rml base. Also fix a stupid typo bug.
This commit was SVN r19667.
2008-10-01 02:31:14 +00:00
Tim Mattox
1995fb23cb Resync the NEWS file for a 1.2.8 change.
This commit was SVN r19662.
2008-10-01 00:01:01 +00:00
George Bosilca
00d24bf8ab Scalability patch, or slim-fast effect #1. All BML structures just
got a whole lot smaller, decreasing the memory footprint of the
running application. How much it's a good question. Here is a
breakdown:

- in mca_bml_base_endpoint_t: 3 *size_t + 1 * uint32_t
- in mca_bml_base_btl_t: 1 * int + 1 * double - 1 * float
                         + 6 * size_t + 9 * (void*)

The decrease in mca_bml_base_endpoint_t is for each peer and the
decrease in mca_bml_base_btl_t is for each BTL for each peer.
So, if we consider the most convenient case where there is only
one network between all peers, this decrease the memory foot print
per peer by
9*size_t + 9*(void*) + 2 * int32_t + 1 * double - 1 * float.
On a 64 bits machine this will be 156 bytes per peer.

Now we access all these fields directly from the underlying BTL
structure, and as this structure is common to multiple BML endpoint,
we are a lot more cache friendly. Even if this do not improve the
latency, it makes the SM performance graph a lot smoother.

This commit was SVN r19659.
2008-09-30 21:02:37 +00:00
George Bosilca
b32e4e7f34 Nothing important, mainly replacing tabs with spaces.
This commit was SVN r19658.
2008-09-30 18:30:35 +00:00
George Bosilca
325d006577 Mostly cleanups, and eventually a little bit more scalable add_procs.
There was an argument that was barely used, and on return at the PML
level it contained nothing usable. It has been removed, so now we're
using less memory ...

This commit was SVN r19657.
2008-09-30 15:47:43 +00:00
Ralph Castain
aa11e0977c Correct a bug in the bookmarking code that incorrectly looked at #slots instead of #slots_allocated, thus causing slot reductions in hostfiles to be ignored when selecting our starting node.
Fixes trac:1527

This commit was SVN r19656.

The following Trac tickets were found above:
  Ticket 1527 --> https://svn.open-mpi.org/trac/ompi/ticket/1527
2008-09-29 14:09:02 +00:00
Ralph Castain
4f89adae0c Prettify the user level display of allocation and map to make it easier to see and understand
This commit was SVN r19655.
2008-09-28 16:44:09 +00:00
Ralph Castain
508cb45583 Add a little more diagnostic info when we cannot do an rml send
This commit was SVN r19654.
2008-09-28 02:13:49 +00:00
Aurelien Bouteiller
89a2eea37a Add functions to access the opaque port_string and to add routes to a remote port. This is usefull for FT, but could also turn usefull when considering MPI3 extentions to the MPI2 dynamics.
This commit was SVN r19653.
2008-09-27 13:22:32 +00:00
Jeff Squyres
8b786cac04 The configure test we had for checking whether openib could build
(related to the presence of posix threads and ptmalloc2) is now a
little outdated: since we don't build ptmalloc2 as part of libopal
anymore, the openib BTL's requirements are not directly tied to
ptmalloc2's anymore.  Specifically, I altered the test to:

 1. At compile time, if no threads are found, the ptmalloc2 component
    is going to be built, '''and the ptmalloc2 component is going to be
    inside libopal,''' then refuse to build the openib BTL.
 1. At run time, if no threads were available at compile time and the
    ptmalloc2 component is part of the process, then refuse to use the
    openib BTL.

Fixes trac:1537.

This commit was SVN r19652.

The following Trac tickets were found above:
  Ticket 1537 --> https://svn.open-mpi.org/trac/ompi/ticket/1537
2008-09-27 11:19:21 +00:00
George Bosilca
2803de5298 Remove the protection around computing the remote size. This has to be done
always in a heterogeneous way in order to be able to support extern32. It
doesn't really matter as it is outside the critical path.

This commit was SVN r19651.
2008-09-26 23:11:53 +00:00
George Bosilca
ed0f5a18ce Typo.
This commit was SVN r19650.
2008-09-26 23:09:48 +00:00
Brian Barrett
76f47aaa1c Fix a bunch of compiler warnings. Refs trac:1458
This commit was SVN r19649.

The following Trac tickets were found above:
  Ticket 1458 --> https://svn.open-mpi.org/trac/ompi/ticket/1458
2008-09-26 16:15:05 +00:00
Jeff Squyres
c11bee41da Add bullet about updated SLURM support
This commit was SVN r19648.
2008-09-26 12:36:59 +00:00
Ralph Castain
edb3d99687 Update SLURM environmental variables used to describe allocation. Retain backwards compatibility to SLURM 1.1 and earlier versions.
This commit was SVN r19647.
2008-09-26 02:38:37 +00:00
Tim Mattox
aaae28b6c7 Resync the NEWS file with the 1.2 branch.
This commit was SVN r19646.
2008-09-25 21:48:17 +00:00
Aurelien Bouteiller
4be474f727 CRS is now an opal framework. It should use OPAL version defines.
This commit was SVN r19643.
2008-09-25 21:01:04 +00:00
Kenneth Matney
91bbc6b919 Change algorithm from spawning a shell that spawns another shell, and
thereby runs apstat twice; and in the process thereof reads the ALPS
appinfo file TWICE; and in addition, experiences a failure sometimes
which causes mpirun to hang.  Change this to a looped read attempt
that breaks on success, thereby avoiding failure (except in the most

This commit was SVN r19642.
2008-09-25 20:44:16 +00:00
Tim Mattox
821d81304f Testing...
This commit was SVN r19641.
2008-09-25 20:34:00 +00:00
Jeff Squyres
8de0663ae0 Increase the size of MPI_MAX_PORT_NAME from 256 to 1024.
Rationale:

 1. This value has already changed since v1.2 (v1.2 MPI_MAX_PORT_NAME
    == 36).  Hence, this commit simply increases the value from a
    previous change.
 1. The changes does increase OMPI's memory footprint slightly, but
    only when using MPI-2 dynamics.  So it is expected that the change
    will have minimal impact on the overall footprint.
 1. The change is helpful for nodes that have 4 or more IP networks
    (e.g., regular ethernet and multiple IP-over-<pick your favorite
    high-speed network> networks).  Without this change, invoking
    MPI_COMM_SPAWN on hosts with 4 or more IP networks will fail
    because we'll exceed 256 bytes for the port name.  Some OMPI
    developer test clusters already have this kind of configuration
    (e.g., Cisco); it is expected that this is not too common in the
    real world yet, but with "manycore" coming, having multiple
    IP-based networks in a single server will likely become more
    common.

This commit was SVN r19638.
2008-09-25 16:47:17 +00:00
Ralph Castain
037231fbcb MOdify the node_rank and local_rank fields to be uint16_t so we can handle more than 256 procs/node. Change the type to a defined one so that any future change can be easily done, if required.
This commit was SVN r19637.
2008-09-25 13:39:08 +00:00
Ralph Castain
55738aeabe Very tiny modification of the output when displaying mca param values to clarify that ones found in the environment could have also been set on the cmd line - we don't have a way to distinguish them internally.
This commit was SVN r19636.
2008-09-25 13:08:17 +00:00
Tim Mattox
6a3e28a3b6 Resync the NEWS file with the 1.3 branch.
This commit was SVN r19635.
2008-09-24 21:06:34 +00:00