1
1
Граф коммитов

4287 Коммитов

Автор SHA1 Сообщение Дата
Josh Hursey
d460264c79 Fix C/R support in response to r20586. This commit changed the way that bml/r2 finalized, so the C/R support needed to be updated otherwise the BTLs were not properly handled on restart.
This commit was SVN r20617.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
  r20586 --> open-mpi/ompi@14a83a6bbc
2009-02-21 13:42:17 +00:00
Jeff Squyres
f1a6d170dc Revert part of r20537: per lengtyh discussion on the phone and the
devel list, it ''is'' within in the spirit of MPI to allow
MPI_REQUEST_NULL to be passed to MPI_REQUEST_GET_STATUS.  I filed a
ticket proposal with MPI-2.2 to make this officially accepted:

  https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/137

Plus, r20537 didn't revert out all of the machinery for allowing
MPI_REQUEST_NULL or inactive requests, anyway.  So this commit simply
removes the parameter check that was added in r20537, and we're back
to where we were before this whole conversation.  :-)

This commit was SVN r20616.

The following SVN revision numbers were found above:
  r20537 --> open-mpi/ompi@38aab37bb3
2009-02-20 19:57:46 +00:00
Jeff Squyres
7e210fdaf8 Return MPI_ERR_COMM and MPI_ERR_WIN, respectively, for
MPI_COMM|WIN_SET|GET_ERRHANDLER if a bad MPI handle is passed.  Thanks
to Lisandro Dalcín for reporting the issue.

This commit was SVN r20615.
2009-02-20 19:53:48 +00:00
Eugene Loh
463f11f993 Improve shared-memory allocation:
* compute mmap-file size more wisely and pass requested size to allocator
* change MCA parameters:
  - get rid of mpool_sm_per_peer_size
  - get rid of mpool_sm_max_size
  - set default mpool_sm_min_size to 0
* no longer pad sm allocations to page boundaries
* have sm_btl_first_time_init check return codes on free-list creations

Have mca_btl_sm_prepare_src() check to see if it can allocate an EAGER fragment
rather than a MAX fragment if the smaller size works.

Remove ompi/class/ompi_[circular_buffer_]fifo.h and references thereto.

Remove opal/util/pow2.[c|h] and references thereto.

This commit was SVN r20614.
2009-02-20 19:51:57 +00:00
Rainer Keller
02599446d0 - Occurences of ORTE_PROC_MY_NAME require orte/runtime/orte_globals.h
This commit was SVN r20607.
2009-02-20 03:16:13 +00:00
Rainer Keller
32b7189995 - Make usage of BTL_OUTPUT
This commit was SVN r20606.
2009-02-20 03:05:14 +00:00
Jeff Squyres
28f1c995ae Add a decrement to the loop, lest it loop forever.
This commit was SVN r20605.
2009-02-20 02:58:52 +00:00
George Bosilca
97a2296fdd Correct the GET protocol. Thanks to Mike Dubman for finding the problem and
testing my patch.

This commit was SVN r20591.
2009-02-19 16:00:15 +00:00
Jeff Squyres
b8259ba500 Remove unused variable. Thanks for the heads-up, Ralph!
This commit was SVN r20587.
2009-02-19 13:59:38 +00:00
Jeff Squyres
14a83a6bbc Clean up the BML shutdown. Reviewed by George.
This commit was SVN r20586.
2009-02-19 13:17:01 +00:00
Jeff Squyres
3742c3550c Add "sync" collective component. This component is totally
deactivated by default.  It is activated by setting either of the
following two MCA parameters to values greater than 0:

 * coll_sync_barrier_before
 * coll_sync_barrier_after

If !_before is >0, then the sync coll collective will insert itself
before the underlying collective operations and invoke a barrier
before every Nth barrier (N == coll_sync_barrier_before).  Similar for
!_after.  Note that N is a _per communicator_ value; not global to the
MPI process.

If both are 0 (which is the default), this component returns NULL for
the comm query, meaning that it is not insertted into the coll module
stack. 

The intent of this component is to provide a a workaround for
applications with large numbers of collectives of short messages that
can cause unbounded unexpected messages.  Specifically, it is possible
for some iterative collective communication patterns to cause
unbounded unexpected messages.  Forcing a barrier before or after
every Nth collective operation would prevent that behavior by forcing
applications to synchronize (and thereby consume any outstanding
unexpected messages caused by collectives on the same communicator).

Open MPI still needs to bound unexpected messages resource consumption
at the receiver, but this is a viable workaround for at least some
symptoms of the problem.

Additionally, there has been anecdotal evidence of some applications
that "perfom better" when they put barriers after other collective
operations.  This could be due to many factors -- including shortening
the unexpected message queue.  Putting this component in Open MPI
allows people to try this with their own applications and give real
world feedback on this kind of behavior.

This commit was SVN r20584.
2009-02-18 23:32:44 +00:00
Jeff Squyres
563e989b6d Use a bit more friendly language. :-)
This commit was SVN r20583.
2009-02-18 22:12:42 +00:00
George Bosilca
15b60941f3 Cast the req to an opal_list_item_t*
This commit was SVN r20581.
2009-02-18 02:33:37 +00:00
George Bosilca
21f8eba620 There was nothing in item to be added to any list. Instead add
the request that we just removed.

This commit was SVN r20580.
2009-02-18 02:15:57 +00:00
George Bosilca
1b1ed0da37 Always set the frag to NULL.
This commit was SVN r20579.
2009-02-18 02:15:09 +00:00
Eugene Loh
5bbf5ba7d7 First putback of some sm BTL latency optimizations:
* The main thing done here is to convert from multiple FIFOs/queues per
  receiver (each receiver has one FIFO for each sender) to a single FIFO/queue
  per receiver (all senders sharing the same FIFO for a given receiver).
* This requires rewriting the FIFO support, so that
  ompi/class/ompi_[circular_buffer_]fifo.h is no longer used and FIFO
  support is instead in btl_sm.h.
* The number of FIFOs per receiver is actually an MCA tunable parameter,
  but it appears that 1 or possibly 2 FIFOs (even for 112 local processes)
  per receiver is sufficient.

This commit was SVN r20578.
2009-02-17 15:58:15 +00:00
George Bosilca
a0afc9ee29 Always release the allocated memory.
This commit was SVN r20560.
2009-02-14 21:49:06 +00:00
Jeff Squyres
265ac096e8 Restore a few #include's
This commit was SVN r20559.
2009-02-14 15:21:28 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Jeff Squyres
8b29e27ead Some minor valgrind-inspired cleanups: fix some memory leaks
This commit was SVN r20543.
2009-02-13 03:45:32 +00:00
Jeff Squyres
91415c2996 Some minor valgrind-inspired cleanups: fix some memory leaks
This commit was SVN r20542.
2009-02-13 03:45:11 +00:00
Jeff Squyres
c83ef674e3 Some minor valgrind-inspired cleanups: fix some memory leaks.
Also took the opprotunity to convert the rdma mpool to use the MCA
register function.

This commit was SVN r20541.
2009-02-13 03:44:29 +00:00
Jeff Squyres
6a1a8311cd Some minor valgrind-inspired cleanups: fix some memory leaks
This commit was SVN r20540.
2009-02-13 03:43:29 +00:00
Jeff Squyres
661690c273 Some minor valgrind-inspired cleanups: fix some memory leaks
This commit was SVN r20539.
2009-02-13 03:40:53 +00:00
Jeff Squyres
44092c6a21 Don't allow freeing of predefined datatypes. Thanks to Lisandro
Dalcín for reporting the issue.

This commit was SVN r20538.
2009-02-13 00:00:55 +00:00
Jeff Squyres
38aab37bb3 Be a little tougher looking for MPI_*_NULL cases in some functions.
Thanks to Lisandro Dalcín for reporting the issue.

This commit was SVN r20537.
2009-02-12 23:57:41 +00:00
Jeff Squyres
bcdd3ddbde Ensure to zero out all the pointers in the op so that the destructor
knows what it can and cannot free (these pointers are largely unused
and therefore otherwise uninitialized in user-defined op's and
MPI_REPLACE).

This commit was SVN r20532.
2009-02-12 19:15:37 +00:00
George Bosilca
a0248f736c Move the if around the for loop.
Don't release memory that has not been allocated by the freelist.

This commit was SVN r20530.
2009-02-12 17:29:14 +00:00
Ralph Castain
62dd763a8f Add ability for local slave spawns to pre-position supporting files. Update comm_spawn and comm_spawn_multiple man pages to cover new info_keys.
This commit was SVN r20527.
2009-02-12 15:56:45 +00:00
Ralph Castain
62e08e7212 Add missing header file
This commit was SVN r20526.
2009-02-12 14:15:25 +00:00
George Bosilca
4747a4bb53 ompi_comm_all allocate memory and retain the objects. Therefore, after
each call to ompi_comm_all we should parse the communicator list and
release the objects ...

This commit was SVN r20525.
2009-02-11 21:48:11 +00:00
George Bosilca
3b68ae5ea7 As we do call opal_util_init before calling opal_init we should call
opal_finalize_util after calling the opal_finalize.

This commit was SVN r20523.
2009-02-11 21:01:56 +00:00
George Bosilca
db4a49e3b0 Correctly release the objects, and don't check for NULL.
This commit was SVN r20522.
2009-02-11 21:00:44 +00:00
George Bosilca
0dab6eb93d Release the memory on finalize.
This commit was SVN r20521.
2009-02-11 20:58:41 +00:00
Tim Mattox
9b83df22ec Fix some "is proc on local node?" logic that got accidentally flipped
by r20496 for the sm BTL, openib BTL on iWarp, and the sm & sm2 coll modules.

This commit was SVN r20515.

The following SVN revision numbers were found above:
  r20496 --> open-mpi/ompi@4cdf91a8d4
2009-02-11 15:02:38 +00:00
Jeff Squyres
c596a1bcb3 Fix MPI_File_c2f -- ensure that if you invoke
MPI_File_c2f(MPI_FILE_NULL), you actually get 0, not -1.  Thanks for
Lisandro Dalcin for the bug report.

This commit was SVN r20511.
2009-02-11 00:48:12 +00:00
Shiqing Fan
2f1461419c Add a new feature for checking mca subdirectories, i.e. detecting if there is an exclude file list which indicates the files that shouldn't be added to the source list. By default, the CMake build system will simply add all source files in the required sub folders, without knowing which files have to be excluded. The first use of it is in plm/base/.windows.
And clean up the nested variable names, in order to make it readable.

This commit was SVN r20498.
2009-02-10 17:20:13 +00:00
Ralph Castain
4cdf91a8d4 Per the RFC, extend the current use of the ompi_proc_t flags field (without changing the field itself).
The prior ompi_proc_t structure had a uint8_t flag field in it, where only one
bit was used to flag that a proc was "local". In that context, "local" was
constrained to mean "local to this node".

This commit provides a greater degree of granularity on the term "local", to include tests
to see if the proc is on the same socket, PC board, node, switch, CU (computing
unit), and cluster.

Add #define's to designate which bits stand for which local condition. This
was added to the OPAL layer to avoid conflicting with the proposed movement of
the BTLs. To make it easier to use, a set of macros have been defined - e.g.,
OPAL_PROC_ON_LOCAL_SOCKET - that test the specific bit. These can be used in
the code base to clearly indicate which sense of locality is being considered.

All locations in the code base that looked at the current proc_t field have
been changed to use the new macros.

Also modify the orte_ess modules so that each returns a uint8_t (to match the
ompi_proc_t field) that contains a complete description of the locality of this
proc. Obviously, not all environments will be capable of providing such detailed
info. Thus, getting a "false" from a test for "on_local_socket" may simply
indicate a lack of knowledge.

This commit was SVN r20496.
2009-02-10 02:20:16 +00:00
Ralph Castain
f0af389910 Enable comm_spawn of slave processes, currently only active for the rsh, slurm, and tm environments. Establish support for local rsh environments in the plm/base so that rsh of local slaves can be done by any environment that supports it. Create new orte_rsh_agent param so users can specify rsh agent from outside of rsh plm, and sym link that to the old plm_rsh_agent and pls_rsh_agent options.
Modify the orte-bootproxy to pass prefix for the remote slave to support hetero/hybrid scenarios

This commit was SVN r20492.
2009-02-09 20:44:44 +00:00
Ralph Castain
eaa57e29b6 Revert r20480 as this breaks the trunk. The dpm.h include file has defines for OMPI_RML tags that are required for wireup.
This commit was SVN r20482.

The following SVN revision numbers were found above:
  r20480 --> open-mpi/ompi@62282fefe5
2009-02-09 14:14:45 +00:00
Rainer Keller
62282fefe5 - Get rid of #include "ompi/mca/dpm/dpm.h"
This commit was SVN r20480.
2009-02-09 02:56:10 +00:00
Jeff Squyres
f68d2b00d8 Fix one more place where the old name was left over.
This commit was SVN r20473.
2009-02-06 19:21:50 +00:00
Terry Dontje
64ace9ec12 convert bzero calls to memset to remove warnings.
This commit was SVN r20471.
2009-02-06 19:08:22 +00:00
Jeff Squyres
aae930e58b s/__n/converted_n/ -- according to C99, symbols that being with "__"
are the domain of the compiler.

This commit was SVN r20462.
2009-02-06 01:04:50 +00:00
Jeff Squyres
dfb2d92b37 s/ID/id/ - both work, but if I don't make this change, I'll wonder if
we remembered to use strcasecmp() every time I see this entry in the
file... (we did, but I just don't want to have to keep remembering
that ;-) )

This commit was SVN r20461.
2009-02-06 01:02:25 +00:00
Jeff Squyres
656d8578d0 * Rename (new) MCA parameter to
btl_openib_connect_rdmacm_reject_causes_connect_error (yes, it's
   still long -- on purpose :-) )
 * Add INI file parameter rdmacm_reject_causes_connect_error
 * Now only treat CONNECT_ERROR events as a REJECT if:
   * It's on a connection where we were expecting a REJECT, ''and''
   * The MCA parameter is true ''or'' the INI parameter for this
     device is true
 * Set the INI parameter for true for the NE020

This commit was SVN r20459.
2009-02-06 00:51:04 +00:00
Jeff Squyres
ffc5d8877f Fix a problem where we're accidentally initializing the wrong
errhandler (should be initializing _errors_throw_exceptions, not
_are_fatal).  This bug was not a huge tragedy because the only real
problem is that _are_fatal has the wrong string name with it (because
MPI::Init fixes up the _errors_throw_exceptions later).

This commit was SVN r20458.
2009-02-05 21:36:10 +00:00
Jeff Squyres
50b1fd1392 Per the big discussion on the OpenFabrics list a while ago, some
versions of the NE driver will report the OUI while others will report
the PCI ID.  We'll put in the Intel values when we get them (may not
be for a few more weeks).

This commit was SVN r20457.
2009-02-05 21:19:45 +00:00
Jeff Squyres
66d0a02f90 For a problem for some iWARP drivers that don't handle RDMA CM REJECT
properly at all.  NetEffect's current driver (OFED 1.4.0) will return
a CONNECT_ERROR event to the initiator rather than the REJECTED event.
Doh!  Additionally -- unfortunately -- NetEffect's vendor_id and
vendor_part_id are reported as 0 in OFED 1.4.0, so we can't
automatically detect these cards and work around the problem.  So all
we can do is add a new MCA parameter
(btl_openib_connect_rdmacm_ignore_connect_errors -- yes, it's long on
purpose ;-) ) that says that if we get a CONNECT_ERROR, bascially
treat it exactly as a REJECT for the WRONG_DIRECTION reason (which is
a "good" reject).  This allows OMPI to function with NetEffect/Intel
cards on OFED 1.4.0.

Note that NetEffect has been bought by Intel; I'm waiting for
information from them to update the ini file for their new OUI/PCI
ID's and/or new vendor_part_id values.

This commit was SVN r20454.
2009-02-05 18:45:59 +00:00
Jeff Squyres
08c35ca135 Somehow this mca param registration code got duplicated; remove one of
them

This commit was SVN r20452.
2009-02-05 16:52:30 +00:00