1
1
Граф коммитов

4046 Коммитов

Автор SHA1 Сообщение Дата
Galen Shipman
44cd373a87 I also forgot to initialize the convertor max_data, george probably copied
this dumb mistake from me. 

This commit was SVN r18653.
2008-06-13 18:33:43 +00:00
George Bosilca
170b9c344e Mea culpa. I forget to initialize the max_data before the call to the
convertor.

This commit was SVN r18651.
2008-06-12 17:24:39 +00:00
Pavel Shamis
dc3f14736d Fixing QP initialization stuff.
This commit was SVN r18650.
2008-06-11 16:31:39 +00:00
Matthias Jurenz
a9ff2b84f2 Bugfix (Ticket #1318): Implemented copy-contructor of 'FiltHandlerArgument' in 'vt_filthandler.cc' instead of the header file 'vt_filthandler.h'
This commit was SVN r18637.
2008-06-10 09:03:21 +00:00
Galen Shipman
a239877b78 revert my previous boneheadedness
This commit was SVN r18634.
2008-06-10 01:19:04 +00:00
George Bosilca
dc0ab0d0a8 Enable the sendi path.
This commit was SVN r18633.
2008-06-09 23:03:56 +00:00
Galen Shipman
4ef4a9520f remove showhelp..
This commit was SVN r18628.
2008-06-09 20:53:01 +00:00
Aurelien Bouteiller
ebe6df4c06 Moving the pml_v_output global variable inside the pml_v structure. This should avoid one of the missing symbols when visibility is enabled.
This commit was SVN r18627.
2008-06-09 20:38:44 +00:00
Ralph Castain
c13cadc3c7 Refs trac:1255
This commit repairs the debugger initialization procedure. I am not closing the ticket, however, pending Jeff's review of how it interfaces to the ompi_debugger code he implemented. There were duplicate symbols being created in that code, but not used anywhere. I replaced them with the ORTE-created symbols instead. However, since they aren't used anywhere, I have no way of checking to ensure I didn't break something.

So the ticket can be checked by Jeff when he returns from vacation... :-)

This commit was SVN r18625.

The following Trac tickets were found above:
  Ticket 1255 --> https://svn.open-mpi.org/trac/ompi/ticket/1255
2008-06-09 20:34:14 +00:00
Galen Shipman
9efbec0383 fix normal send path
remove unneeded checks

This commit was SVN r18624.
2008-06-09 20:25:27 +00:00
Galen Shipman
dbd282fcad doh.. fix GET protocol..
This commit was SVN r18623.
2008-06-09 19:45:44 +00:00
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00
Jeff Squyres
c087b4cd4f * Revert r18067
* Add specific comments about why we're not setting MPI_ERROR here

This commit was SVN r18616.

The following SVN revision numbers were found above:
  r18067 --> open-mpi/ompi@58e31d767e
2008-06-07 02:44:10 +00:00
George Bosilca
2aec094d56 The PML V is a component so it should use OMPI_MODULE_DECLSPEC.
This commit was SVN r18610.
2008-06-06 17:43:57 +00:00
George Bosilca
ae7bca2f4a Update the MPI_ERROR field as well.
This commit was SVN r18607.
2008-06-06 15:53:17 +00:00
Josh Hursey
1de50b523c Fix some Coverity 'Event set_but_not_used' highlights.
Thanks to Jeff for bringing them to my attention.

This commit was SVN r18606.
2008-06-06 14:38:41 +00:00
Jeff Squyres
1a748bc7be First cut at the NetEffect NE020 NIC.
This commit was SVN r18599.
2008-06-05 20:24:24 +00:00
Jeff Squyres
9109f7126a Per CID 988, free some memory that would be leaked in an error condition.
This commit was SVN r18597.
2008-06-05 20:04:38 +00:00
Jeff Squyres
f0d465c30a Slightly simplify the code and remove a compiler warning.
This commit was SVN r18596.
2008-06-05 19:08:08 +00:00
Jeff Squyres
b1999bbba3 * Use inclusive NIC/HCA language
* Add a description of receive_queues

This commit was SVN r18595.
2008-06-05 19:07:22 +00:00
Pavel Shamis
7b9024bc05 Updating Mellanox's Copyright in files touched in 2008
This commit was SVN r18592.
2008-06-05 13:40:26 +00:00
Ralph Castain
6ddcce4085 Apply a patch from Edgar to fix the Intercomm MTT tests.
Fixes ticket #1332

This commit was SVN r18591.
2008-06-05 12:53:12 +00:00
Pavel Shamis
379e00050c Fixing openib btl finalize flow. Bug fix for #1286.
This commit was SVN r18590.
2008-06-05 12:20:13 +00:00
Jeff Squyres
91a281080a Fix a compiler warning for a case that would never really happen
anyway.  Rename a variable to be a bit more descriptive.

This commit was SVN r18585.
2008-06-04 19:10:23 +00:00
Jeff Squyres
bc584dedd6 Remove a compiler warning that would never happen in practice.
This commit was SVN r18584.
2008-06-04 19:03:02 +00:00
Jeff Squyres
6e37dd0ef0 Fix some 32/64 printf errors once and for all
This commit was SVN r18582.
2008-06-04 14:39:37 +00:00
Pavel Shamis
0a8321e08d Calls to APM functions should be protected with OMPI_HAVE_THREADS.
This commit was SVN r18581.
2008-06-04 14:27:41 +00:00
Jeff Squyres
5e918ad25d Add first cut of NetXen iWARP NIC definition. May still be refined
with more experimentation.

This commit was SVN r18580.
2008-06-04 12:11:45 +00:00
Matthias Jurenz
7f5730d073 Bugfix: Removed *unused* contructors of structure 'FirstHandlerArgument' (Ticket #1318)
This commit was SVN r18578.
2008-06-04 11:53:17 +00:00
Matthias Jurenz
f9b2fa95aa Added some words to Open MPI in the section "Introduction"
This commit was SVN r18577.
2008-06-04 11:52:57 +00:00
Ralph Castain
9927b2445c Remove the filter framework - the xml support will have to be provided in a different manner that will be implemented shortly
This commit was SVN r18572.
2008-06-04 09:04:51 +00:00
Pavel Shamis
c73ed2b256 Updating cpc name from xrc to xoob.
This commit was SVN r18571.
2008-06-04 08:50:30 +00:00
Jeff Squyres
75a97ebbf0 Many thanks to Ralf W. for finding a subtle bug in these Makefile.am's
that can *sometimes* cause problems with "make -j [N>1] install".
Ensure to make the target directory before we copy stuff into it --
read the thread starting here for more details:

    http://www.open-mpi.org/community/lists/devel/2008/06/4080.php

This commit was SVN r18570.
2008-06-04 01:28:03 +00:00
Jeff Squyres
cd6f550720 2 more minor fixes from our Debian friends.
This commit was SVN r18560.
2008-06-03 17:39:37 +00:00
Jeff Squyres
a4db97c213 More man pages fixes from our Debian Open MPI package maintainer
friends.  Woo hoo!

This commit was SVN r18559.
2008-06-03 16:44:40 +00:00
George Bosilca
4d8cbbc167 Add Pasha's patch as it correctly solve the issues. In fact in the current
incarnation these functions do not need the inline keyword anymore.

This commit was SVN r18558.
2008-06-03 16:03:36 +00:00
Ralph Castain
c992e99035 Remove the tags from orte_output_open and the filtering operation from orte_output - this will be handled differently to improve the XML output interface
This commit was SVN r18557.
2008-06-03 14:24:01 +00:00
Jeff Squyres
69d78c6739 Fixes trac:1215: adds specific show_help messages about PP vs. SRQ/XRC RNR
retry exceeded errors.

This commit was SVN r18554.

The following Trac tickets were found above:
  Ticket 1215 --> https://svn.open-mpi.org/trac/ompi/ticket/1215
2008-06-02 11:03:48 +00:00
Jeff Squyres
8c267d50a3 Fixes trac:1121.
We already show_help when we fail to create queues, so I just made the
message a little more verbose such that it may be that OMPI is trying
to use a feature that is not supported on the hardware.

This commit was SVN r18553.

The following Trac tickets were found above:
  Ticket 1121 --> https://svn.open-mpi.org/trac/ompi/ticket/1121
2008-05-30 19:03:58 +00:00
George Bosilca
e361bcb64c Send optimizations.
1. The send path get shorter. The BTL is allowed to return > 0 to specify that the
   descriptor was pushed to the networks, and that the memory attached to it is 
   available again for the upper layer. The MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag
   can be used by the PML to force the BTL to always trigger the callback.
   Unmodified BTL will continue to work as expected, as they will return OMPI_SUCCESS
   which force the PML to have exactly the same behavior as before. Some BTLs have
   been modified: self, sm, tcp, mx.
2. Add send immediate interface to BTL.
   The idea is to have a mechanism of allowing the BTL to take advantage of
   send optimizations such as the ability to deliver data "inline". Some
   network APIs such as Portals allow data to be sent using a "thin" event
   without packing data into a memory descriptor. This interface change
   allows the BTL to use such capabilities and allows for other optimizations
   in the future. All existing BTLs except for Portals and sm have this interface
   set to NULL.

This commit was SVN r18551.
2008-05-30 03:58:39 +00:00
Galen Shipman
4da4c44210 Receive side changes, basically uses multiple active message callbacks rather
than using a single receive callback followed by a switch on the header.
Also fast pathed the matching for small fragments. 

This commit was SVN r18549.
2008-05-30 01:29:09 +00:00
Jeff Squyres
728ee47be4 Just check for the presents of $sysfsdir/class/infiniband and check
that it's a directory.  That's good enough to know that the
OpenFabrics kernel drivers have been loaded.  If you have no RDMA
devices and don't want to see the OMPI warning about not finding any
devices, then don't start the OpenFabrics kernel drivers.

This commit was SVN r18540.
2008-05-29 14:19:51 +00:00
Nysal Jan
25ac3629e9 eHCA does not have SRQ. Adding receive_queues value so that it works out of the box
This commit was SVN r18537.
2008-05-29 13:55:39 +00:00
Jeff Squyres
d5bf8fe005 Remove unused variables.
This commit was SVN r18532.
2008-05-29 11:58:16 +00:00
Jeff Squyres
e5ea9d08ca Fixes trac:1305: check to see if $sysfsdir/class/infiniband exists and is
non-empty.  If not, then exit the openib btl silently.  This addresses
the case where libibverbs is installed (which is getting more common)
and therefore the openib BTL was built/installed, but the kernel
drivers are not loaded (assumedly because there is no RDMA hardware
present).  In this case, "mpirun a.out" will not issue a warning.

There appears to be no good way to definitely tell if there are no
RDMA hardware devices present.  For example, if libibverbs/the openib
BTL is installed, there are no RDMA devices present, but the RDMA
hardware kernel drivers ''are'' loaded, OMPI will warn that it was
unable to find suitable devices.  This warning is easily eliminated by
unloading the kernel drivers.

This commit was SVN r18530.

The following Trac tickets were found above:
  Ticket 1305 --> https://svn.open-mpi.org/trac/ompi/ticket/1305
2008-05-28 22:05:47 +00:00
Pavel Shamis
28c763f751 Fixing the error flow when somebody tries to use XRC without XOOB.
This commit was SVN r18527.
2008-05-28 15:56:04 +00:00
Pavel Shamis
2c81b0ab9a Fixing compilation warning in btl_openib_connect_ibcm.c
This commit was SVN r18526.
2008-05-28 15:20:48 +00:00
Ralph Castain
828ae26d90 ORTE-level MCA params are defined in several places. Ompi_info cannot call orte_init due to an issue with the memory allocator, thus making it impossible for ompi_info to display all of the ORTE-level MCA params.
By consolidating them all into one function, ompi_info can call that function and register the desired variables. This also requires, however, that ompi_info call orte_output_init to avoid generating tons of error messages, so make that adjustment too. 

Fixes ticket #1314

In addition, orte_output has a race condition issue whereby calls to orte_output/verbose can occur prior to either the RML being defined/setup, or the HNP being defined. This latter occurs during the initialization of the orte_process_info structure. In both cases, there is no way orte_output can send the output to the HNP. Hence, the message must be simply output locally.

Fixes ticket #1315

This commit was SVN r18524.
2008-05-28 13:29:58 +00:00
Pavel Shamis
879a9fe45c setup_qps() may exit with error.
This commit was SVN r18523.
2008-05-28 11:36:38 +00:00
Pavel Shamis
e657a03143 Fixing broken XRC initialization flow.
This commit was SVN r18522.
2008-05-28 11:31:38 +00:00
Rolf vandeVaart
18879285c7 Fix the selection logic to prevent memory leaks. More work may be done in the priority logic but for now we just fix the leaks and preserve current behavior.
This commit fixes trac:1307.

This commit was SVN r18504.

The following Trac tickets were found above:
  Ticket 1307 --> https://svn.open-mpi.org/trac/ompi/ticket/1307
2008-05-27 14:16:39 +00:00
Pavel Shamis
6596d19c90 Adding new ConnectX vendor_part_id. Fix for ticket #1310.
This commit was SVN r18495.
2008-05-26 12:25:49 +00:00
Gleb Natapov
5fabade090 Use payload_buffer_alignment value for payload alignment.
This commit was SVN r18493.
2008-05-26 08:29:02 +00:00
Jeff Squyres
e1f118d0e6 Remove unused variable
This commit was SVN r18491.
2008-05-24 13:05:04 +00:00
Rolf vandeVaart
5baa733ad5 Fix another warning (using a variable before it was initialized.)
Thanks Jeff for pointing this out.

This commit was SVN r18489.
2008-05-23 13:57:55 +00:00
Rolf vandeVaart
0d8faf7559 Fix the fix for ticket #1298. Thanks George for pointing it out.
This commit was SVN r18488.
2008-05-23 13:33:38 +00:00
Jeff Squyres
1b50e5f6a5 Use the right variable in the output
This commit was SVN r18487.
2008-05-23 13:11:12 +00:00
Rich Graham
b08839f9f5 change reduce-scatter/gather for non-power of 2. Spreading out the
load for the non-power of 2 phase of the reduction.

This commit was SVN r18486.
2008-05-22 21:42:42 +00:00
Rich Graham
f2a4b67809 automate the allreduce selection logic.
This commit was SVN r18484.
2008-05-22 20:53:35 +00:00
Jeff Squyres
8faeeab81a Style cleanup only: s/struct foo/foo_t/g to conform to rest of code
base

This commit was SVN r18483.
2008-05-22 19:26:00 +00:00
Jeff Squyres
1f7f0e1f96 Fixes trac:1281
* s/port/tcp_port/g where relevant to disambiguate TCP port from
   device port
 * Rework ipaddrcheck to make it work in the LMC>0 case

This commit was SVN r18482.

The following Trac tickets were found above:
  Ticket 1281 --> https://svn.open-mpi.org/trac/ompi/ticket/1281
2008-05-22 19:18:15 +00:00
Rolf vandeVaart
8c3b31b181 Need to properly handle zero-length scatters and gathers on intercommunicators. Add a check for the MPI_ROOT and MPI_PROC_NULL processes so they do not enter collective module when count=0.
This commit was SVN r18481.
2008-05-22 19:09:43 +00:00
Rich Graham
5900415a25 for non-powers of 2, distribute the work on the first step among all
the procs doing the work.

This commit was SVN r18480.
2008-05-22 18:50:53 +00:00
Jon Mason
d0e26b1cf6 Add pretty comments for *_iwarp.*
This commit was SVN r18478.
2008-05-22 18:02:20 +00:00
Jeff Squyres
62ac6533e0 * Add proper copyrights
* Ensure _iwarp.h is always included, or you'll get warnings on
   platforms that don't have the RDMACM
 * Add skeleton for function descriptions in comments in iwarp.h

This commit was SVN r18477.
2008-05-22 17:41:43 +00:00
Jeff Squyres
28b56c389a Only check if the opal_ifindex is >= 0 (opal_ifbegin() and
opal_ifnext() return -1 upon completion); don't check it against
opal_ifcount() -- the interface indexes aren't necessarily related to
how many interfaces were found.

This commit was SVN r18476.
2008-05-22 02:10:23 +00:00
Jeff Squyres
27978b29f8 Fixes trac:1302: ensure to also use the LID for identifing an incoming
IBCM request (not just the port number).

This commit was SVN r18475.

The following Trac tickets were found above:
  Ticket 1302 --> https://svn.open-mpi.org/trac/ompi/ticket/1302
2008-05-22 01:28:34 +00:00
George Bosilca
21b940887a Tricky stuff !!! If we post a receive for ZERO bytes and we match it
with something with a different size ... well we segfault. The reason was
that the logic in the PML OB1 call the convertor based on the length
of he data on the wire and not the length of the data that the receiver
expects.

In other words, this is only half a patch :) It fix the problem, but we
still have to make sure the unpack is not called at all when the receiver
expect ZERO bytes.

This commit was SVN r18474.
2008-05-21 23:31:34 +00:00
George Bosilca
c31cc5b270 Remove a warning about line being unused.
This commit was SVN r18472.
2008-05-21 20:46:22 +00:00
George Bosilca
df2156568d The Elan BTL is now thread safe, and can be build in all conditions.
This commit was SVN r18471.
2008-05-21 20:44:37 +00:00
Pak Lui
1585789e8b Fix the undeclared variable.
This commit was SVN r18470.
2008-05-21 04:09:54 +00:00
Rich Graham
afd71abde6 remove some useless qualifiers.
This commit was SVN r18469.
2008-05-21 01:11:49 +00:00
Jon Mason
b9c25efbd2 Modify to comply with the "prefix rule" and remove "static inline" for
the non-rdmacm enabled case.   This should fix Ticket #1294.

This commit was SVN r18468.
2008-05-20 23:28:59 +00:00
Jeff Squyres
64f61ebd07 Fixes trac:1285. Really.
This commit has the same commit message as r18450, but without the
extra bonus memory corruption that was introduced.

This commit was SVN r18467.

The following SVN revision numbers were found above:
  r18450 --> open-mpi/ompi@5295902ebe

The following Trac tickets were found above:
  Ticket 1285 --> https://svn.open-mpi.org/trac/ompi/ticket/1285
2008-05-20 21:53:42 +00:00
Edgar Gabriel
0500420bec fixing a bug in the inter-communicator scatter operation, where we used
accidentally rcount instead of scounts.

This commit was SVN r18466.
2008-05-20 21:17:19 +00:00
Rolf vandeVaart
74d0259480 Add new implentation of barrier. This shows better performance on some clusters.
However, no decision logic is changed by this commit so default behavior has not changed.  This
is only selectable by runtime parameters.

This commit was SVN r18464.
2008-05-20 17:37:41 +00:00
Rolf vandeVaart
71091a19c3 Fix bug in spacing of code per https://svn.open-mpi.org/trac/ompi/wiki/CodingStyle.
This commit was SVN r18463.
2008-05-20 14:11:10 +00:00
Jeff Squyres
a9e26c33e0 Ensure that we don't try to call orte_show_help() before orte_init()
succeeds.

This commit was SVN r18458.
2008-05-19 21:57:54 +00:00
Rolf vandeVaart
763f5259a8 Fix memory leak of 88 bytes that occurred on each call to MPI_Comm_dup.
Need to release the items and the item list after selecting the collective
modules that are being used.  Reviewed by Jeff Squyres.

This commit was SVN r18457.
2008-05-19 21:34:01 +00:00
Jeff Squyres
c8c01572d0 ompi_info was erroneously not showing all the paths that it supports
(via compiled-in defaults/configure, or via env variables).

This commit was SVN r18456.
2008-05-19 17:44:56 +00:00
Jeff Squyres
01a7f7eeb6 Switch orte_output* -> OPAL_OUTPUT* for two reasons:
1. We can't use orte_output in the CPC service thread because orte is
    not thread safe
 1. Use the macro version sso that they're compiled out of production
    builds 

This commit was SVN r18455.
2008-05-19 17:42:51 +00:00
Jeff Squyres
7154776465 Removed unused variable / compiler warning.
This commit was SVN r18454.
2008-05-19 13:41:45 +00:00
Jeff Squyres
76fc8dd188 Revert r18450 -- there is some memory badness in there somewhere...
This commit was SVN r18451.

The following SVN revision numbers were found above:
  r18450 --> open-mpi/ompi@5295902ebe
2008-05-18 19:11:45 +00:00
Jeff Squyres
5295902ebe Fixes trac:1285:
* allow receive_queues to be specified in the INI file 
 * detect when multiple different receive_queues are specified and 
   gracefully abort 

However, accomplishing these goals ran into multiple difficulties. By 
putting receive_queues in the INI file: 

 1. we may not find the value until we've already traversed multiple HCAs 
 1. we may find multiple different receive_queues values

But since the openib btl initializes as it discovers each HCA/port/LID
(including the BSRQ data), if we find a new receive_queues value late
in the discovery process, then all the BSRQ data that was previously
initialized will likely be invalid. So I had to pull all the BSRQ
initialization out until after the rest of the discovery /
initialization process.

Additionally, note that if the user specifies the MCA parameter
btl_openib_receive_queues, it trumps whatever was in the INI file. So
in this case, there can never be a receive_queues conflict.  This
commit does the following (Jon wrote part of this, too):

 * adapt _ini.c to accept the "receive_queues" field in the file 
 * move 90% of _setup_qps() from _ini.c to _component.c 
 * move what was left of _setup_qps() into the main 
   _register_mca_params() function 
 * adapt init_one_hca() to detect conflicting receive_queues values 
   from the INI file 
 * after the _component.c loop calling init_one_hca(): 
   * call setup_qps() to parse the final receive_queues string value 
   * traverse all resulting btls and initialize their HCAs (if they
     weren't already): setup some lists and call prepare_hca_for_use()

I tested this code on a dual-HCA system where I artificially put in 
differing receive_queues values in the INI file for the two different 
types of HCAs that I have and it all seemed to work.

This commit was SVN r18450.

The following Trac tickets were found above:
  Ticket 1285 --> https://svn.open-mpi.org/trac/ompi/ticket/1285
2008-05-18 18:50:56 +00:00
Jeff Squyres
87d4201bdf From our faithful Debian package maintainers: remove some lint-quality
lines from the man pages.

This commit was SVN r18449.
2008-05-16 14:58:52 +00:00
Jeff Squyres
1cc663ebf6 Change this back to use opal_init_util() -- using orte_init() mucks
with the C++ memory allocator.  Let's not go there.

This commit was SVN r18447.
2008-05-16 14:18:56 +00:00
Jeff Squyres
caacaadb0a Minor shuffling of code: no need to query the GID in the iWARP case.
This commit was SVN r18446.
2008-05-16 03:36:48 +00:00
Jeff Squyres
9f1b5237fe Ensure to return an error rather than continue
This commit was SVN r18445.
2008-05-16 03:36:11 +00:00
Jeff Squyres
6546898f09 Minor style cleanups; nothing very important in this commit.
This commit was SVN r18444.
2008-05-16 03:28:20 +00:00
Jeff Squyres
5c91f53848 Fix a minor memory leak
This commit was SVN r18443.
2008-05-16 03:27:42 +00:00
Rolf vandeVaart
375406e1fa Remove the ignore files as decided at Tuesday's developers conference call. Now, hierarchical collectives will be compiled in but the priority is still at 0 requiring a user to set mca parameters to enable them.
This commit was SVN r18440.
2008-05-15 01:26:52 +00:00
Josh Hursey
35a2af28d1 Cleanup the CRCP Coord timing functionality. Provides a rough assessment of time each element of the algorithm is taking.
There are more details in the code regarding how to use this feature.

Also shift a few of the orte_output back to opal_output. I'm experiencing an odd problem with locks in the oob/tcp when using orte_output. I haven't had time to track it down yet.

This commit was SVN r18439.
2008-05-14 19:54:20 +00:00
Jeff Squyres
671f0c379d Remove a whole pile of orte/util/show_help.h's that I missed. :-(
This commit was SVN r18437.
2008-05-14 11:32:33 +00:00
Jeff Squyres
fb17097de4 Make ompi_info correctly display "filter" components
This commit was SVN r18435.
2008-05-13 20:56:20 +00:00
Jeff Squyres
e7ecd56bd2 This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.

= ORTE Job-Level Output Messages =

Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):

 * orte_output(): (and corresponding friends ORTE_OUTPUT,
   orte_output_verbose, etc.)  This function sends the output directly
   to the HNP for processing as part of a job-specific output
   channel.  It supports all the same outputs as opal_output()
   (syslog, file, stdout, stderr), but for stdout/stderr, the output
   is sent to the HNP for processing and output.  More on this below.
 * orte_show_help(): This function is a drop-in-replacement for
   opal_show_help(), with two differences in functionality:
   1. the rendered text help message output is sent to the HNP for
      display (rather than outputting directly into the process' stderr
      stream)
   1. the HNP detects duplicate help messages and does not display them
      (so that you don't see the same error message N times, once from
      each of your N MPI processes); instead, it counts "new" instances
      of the help message and displays a message every ~5 seconds when
      there are new ones ("I got X new copies of the help message...")

opal_show_help and opal_output still exist, but they only output in
the current process.  The intent for the new orte_* functions is that
they can apply job-level intelligence to the output.  As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.

=== New code ===

For ORTE and OMPI programmers, here's what you need to do differently
in new code:

 * Do not include opal/util/show_help.h or opal/util/output.h.
   Instead, include orte/util/output.h (this one header file has
   declarations for both the orte_output() series of functions and
   orte_show_help()).
 * Effectively s/opal_output/orte_output/gi throughout your code.
   Note that orte_output_open() takes a slightly different argument
   list (as a way to pass data to the filtering stream -- see below),
   so you if explicitly call opal_output_open(), you'll need to
   slightly adapt to the new signature of orte_output_open().
 * Literally s/opal_show_help/orte_show_help/.  The function signature
   is identical.

=== Notes ===

 * orte_output'ing to stream 0 will do similar to what
   opal_output'ing did, so leaving a hard-coded "0" as the first
   argument is safe.
 * For systems that do not use ORTE's RML or the HNP, the effect of
   orte_output_* and orte_show_help will be identical to their opal
   counterparts (the additional information passed to
   orte_output_open() will be lost!).  Indeed, the orte_* functions
   simply become trivial wrappers to their opal_* counterparts.  Note
   that we have not tested this; the code is simple but it is quite
   possible that we mucked something up.

= Filter Framework =

Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr.  The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations.  The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc.  This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).

Filtering is not active by default.  Filter components must be
specifically requested, such as:

{{{
$ mpirun --mca filter xml ...
}}}

There can only be one filter component active.

= New MCA Parameters =

The new functionality described above introduces two new MCA
parameters:

 * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
   help messages will be aggregated, as described above.  If set to 0,
   all help messages will be displayed, even if they are duplicates
   (i.e., the original behavior).
 * '''orte_base_show_output_recursions''': An MCA parameter to help
   debug one of the known issues, described below.  It is likely that
   this MCA parameter will disappear before v1.3 final.

= Known Issues =

 * The XML filter component is not complete.  The current output from
   this component is preliminary and not real XML.  A bit more work
   needs to be done to configure.m4 search for an appropriate XML
   library/link it in/use it at run time.
 * There are possible recursion loops in the orte_output() and
   orte_show_help() functions -- e.g., if RML send calls orte_output()
   or orte_show_help().  We have some ideas how to fix these, but
   figured that it was ok to commit before feature freeze with known
   issues.  The code currently contains sub-optimal workarounds so
   that this will not be a problem, but it would be good to actually
   solve the problem rather than have hackish workarounds before v1.3 final.

This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
Jon Mason
125eb5a2ed Convert from the Linux ifaddrs to the OMPI ifaddrs, which should unbreak Solaris.
This commit was SVN r18433.
2008-05-13 18:34:22 +00:00
Jeff Squyres
d8e5608053 Remove all retransmission code; the IBCM kernel module handles all of
that for us.

This commit was SVN r18432.
2008-05-13 16:10:34 +00:00
Jon Mason
74bf1ae25f Fix compiler warnings
This commit was SVN r18431.
2008-05-13 16:01:58 +00:00
Jon Mason
4ead9442b5 Add in IDs for all Chelsio iWARP capable adapters
This commit was SVN r18428.
2008-05-12 21:59:03 +00:00
Jeff Squyres
6b26895ad4 A little style update -- constants on the left...
This commit was SVN r18426.
2008-05-12 12:05:16 +00:00
Jeff Squyres
16cde0e5fa Fix compile error on older OFED systems
This commit was SVN r18425.
2008-05-12 11:56:14 +00:00
Gleb Natapov
6844ff32ba Return OMPI_ERR_RESOURCE_BUSY from sm->btl_send() function if there is no place in cb. This will prevent OB1 from doing early completion of small sends.
This commit was SVN r18424.
2008-05-12 07:15:29 +00:00
Gleb Natapov
31d2797a2f If RDMA PUT is received before ACK and registration of memory fails don't
start sending fragment by copy in/out before ACK is received as we don't
know pointer to receive request yet.

Pipeline protocol sometimes doesn't send ACK though, so this case is still
broken.

This commit was SVN r18423.
2008-05-11 12:40:55 +00:00
Gleb Natapov
0827e537fa Don't include rdma/rdma_cma.h if !OMPI_HAVE_RDMACM.
This commit was SVN r18422.
2008-05-11 11:58:02 +00:00
Rainer Keller
4b89706dfe - Properly check for valid output parameters...
This commit was SVN r18419.
2008-05-09 08:39:24 +00:00
Jon Mason
99ab66e131 RDMACM code cleanup
This patch adds some much needed comments, reduces the amount of code
wrapping, and rearrges and removes redundant code.

This commit was SVN r18417.
2008-05-08 21:20:12 +00:00
Josh Hursey
da2f1c58e2 Some checkpoint/restart cleanup.
* Remove the opal_only option. This was suffering from bit rot, and no one uses it. It can be added back fairly easily if wanted.
 * Cleanup metadata interactions at the local level.
 * Touch up some of the INC funcitonality (fix typos and a minor ordering issue)

This commit was SVN r18416.
2008-05-08 18:47:47 +00:00
Jon Mason
88e5f2a339 Abstract iWARP subnet ID functions (sans build break)
The iWARP subnet ID determination should not be in the RDMACM cpc, as
it was in the preversion, as this violates the cpc abstract that is
present throughout the code.  Also, this patch uses the opal_list_t
data struct instead of using its own linked lists.

This attempt includes *iwarp.c and *iwarp.h

This commit was SVN r18414.
2008-05-08 14:38:14 +00:00
Jeff Squyres
60f39a30f6 Revert r18409; that commit broke the build because it forgot to add
the btl_openib_iwarp.c and btl_openib_iwarp.h files.

This commit was SVN r18410.

The following SVN revision numbers were found above:
  r18409 --> open-mpi/ompi@056bbb68c8
2008-05-08 00:22:21 +00:00
Jon Mason
056bbb68c8 Abstract iWARP subnet ID functions
The iWARP subnet ID determination should not be in the RDMACM cpc, as
it was in the preversion, as this violates the cpc abstract that is
present throughout the code.  Also, this patch uses the opal_list_t
data struct instead of using its own linked lists.

This commit was SVN r18409.
2008-05-07 23:59:43 +00:00
Ralph Castain
7c7b9b0486 Do a little cleanup on the opal graph class and opal carto framework to conform to OMPI naming conventions and avoid potential conflict with user applications - no change in functionality, passes carto test program
This commit was SVN r18407.
2008-05-07 19:33:49 +00:00
Jeff Squyres
e76d2dd518 This is an internal function; this error won't happen.
This commit was SVN r18406.
2008-05-07 19:00:34 +00:00
Jeff Squyres
157cea378f * A few fixes to make IP address and port number comparisons properly
* A few indenting and style fixes

This commit was SVN r18405.
2008-05-07 16:56:07 +00:00
Jeff Squyres
bfae8ea828 The comment wasn't long enough; I felt the need to make it longer (and
explain a little more ;-) ).

This commit was SVN r18404.
2008-05-07 16:53:05 +00:00
Shiqing Fan
19fe973095 The most important thing, otherwise it won't work properly.
This commit was SVN r18403.
2008-05-07 16:38:09 +00:00
Jeff Squyres
63abb3eb9b Clarify a comment / fix typos.
This commit was SVN r18402.
2008-05-07 14:51:36 +00:00
Shiqing Fan
897fe404c0 Get rid of the warning message from the compiler.
This commit was SVN r18401.
2008-05-07 14:31:42 +00:00
Shiqing Fan
8088ec8bce More for non-blocking communication.
This commit was SVN r18400.
2008-05-07 13:00:28 +00:00
Shiqing Fan
8393fb5d47 Use the new memchecker_call function for memory checking of non-blocking communication.
This commit was SVN r18399.
2008-05-07 12:28:51 +00:00
Shiqing Fan
aed20e213b Using a more general way for memchcker call, which is similar to the convertor call. The convertor call is dependent on structure convertor, but this is not available in some places. Now we are dependent only on datatype, which will have the same behavior as convertor, but more flexible.
This commit was SVN r18398.
2008-05-07 12:23:07 +00:00
Ralph Castain
ff70636024 Allgather_list needs its own tag to avoid conflicting with the allgather modex operation.
All spawned procs must decode the port of the spawning process so they can communicate in direct routed mode.

This fixes comm_spawn for all routing modes.

This commit was SVN r18395.
2008-05-07 03:03:56 +00:00
Rolf vandeVaart
0e32dd1022 Add MPI_Alltoallv to tuned collectives and add a pairwise implementation of MPI_Alltoallv. However, do not change the default behavior for now. The only way to use new pairwise implementation is via mca parameters.
This commit was SVN r18394.
2008-05-07 02:31:24 +00:00
Jon Mason
502d164908 Create subnet ID's for iWARP.
This enables subnet differientation for iWARP devices, and rearrange
initilization so that the services are available when they are needed.

This commit was SVN r18393.
2008-05-06 22:43:52 +00:00
Jon Mason
9c724128f8 Handle no IP Address in rdmacm more resiliently
If there is no IP Address, have rdmacm log the correct error and let
another cpc have a go at it.  This is being done by splitting off the
IP address checking logic for the modex message creation, and having
it log the correct error in the error case.

This commit was SVN r18392.
2008-05-06 22:31:29 +00:00
Jon Mason
46bfd42c09 Fix compile warnings in rdmacm
Fix some reported compiler warnings and make the code a little prettier.

This commit was SVN r18391.
2008-05-06 22:19:28 +00:00
Jon Mason
9066168cd1 Prevent iWARP qp flush errors.
For iWARP, the TCP connection is tied to the QP once the QP is in RTS.  
And destroying the QP is thus tied to connection teardown for iWARP.  
This is a key distinction from IB, I think.   Anyway, to destroy the 
connection in iWARP you must move the QP out of RTS, either into CLOSING 
for a nice graceful close, or to ERROR if you want to be rude.  In both 
cases, all pending non-completed SQ and RQ WRs must be flushed.

This patch ignores all flush errors reaped by the cq and removes an
earlier attempt to work around this in the rdmacm cpc.

This commit was SVN r18388.
2008-05-06 21:57:40 +00:00
Josh Hursey
9971bc9d95 Merge in the mca_base_select changes per RFC:
http://www.open-mpi.org/community/lists/devel/2008/04/3779.php

{{{
svn merge -r 18276:18380 https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play .
}}}

Any components not in the trunk, but in one of the effected frameworks *must* be
updated. Contact the list, look at the RFC, or look at the diff for how to do this.

Sorry for the early commit of this, but I wanted to get it in today (per RFC) and
didn't know if I would have a chance later today.

This commit was SVN r18381.
2008-05-06 18:08:45 +00:00
Jeff Squyres
a06d4023b8 Oops -- missed one sys_errlist -> strerror().
This commit was SVN r18378.
2008-05-06 13:22:36 +00:00
Jeff Squyres
4154e587de strerror() is much better.
This commit was SVN r18376.
2008-05-05 21:06:07 +00:00
Shiqing Fan
f35a06119c Use memchecker_convertor_call function instead the old one. Move the function to the place that we can use convertor.
This commit was SVN r18370.
2008-05-05 13:57:27 +00:00
Jon Mason
a3bf503e01 Remove error on rdma cm
If there are multiple QP's, RDMACM will not send a message if the
qpnum != 0.  In doing so, it will log an error unecessarily.  This
removes that.

This commit was SVN r18363.
2008-05-02 20:12:01 +00:00
Jon Mason
3989981578 Enable support of num_proc > num_nodes
Add the logic to support using port numbers, instead of simply using
the IP address of the sending node to determine which endpoint to
connect.  Since each process calls the cpc query function, it will
generate its own port to listen on thus enablign this to work.

This commit was SVN r18362.
2008-05-02 16:20:28 +00:00
Jeff Squyres
ba5615a18f Merge in /tmp-public/cpc3 branch to trunk. oob/xoob still remains the
default CPC.

This commit was SVN r18356.
2008-05-02 11:52:33 +00:00
Donald Kerr
843a35094f adding local work queue accounting
This commit was SVN r18352.
2008-05-01 21:01:51 +00:00
George Bosilca
a69ac964df Allow any order in the list of Elan vpid.
This commit was SVN r18350.
2008-05-01 20:32:03 +00:00
Josh Hursey
dcd21d7d07 Some checkpoint/restart fixes in response to r18338 (changes in modex).
Things should be working now.

This commit was SVN r18348.

The following SVN revision numbers were found above:
  r18338 --> open-mpi/ompi@3e55fe6f6d
2008-05-01 17:48:13 +00:00
Terry Dontje
8dd0421015 Moved ident lines to ompi_mpi_init.c and created new ompi_version_string
variable.

This commit was SVN r18345.
2008-05-01 15:06:10 +00:00
Ralph Castain
3e55fe6f6d Fold in the revised modex scheme. Move the ompi_proc_t modex portions to the RTE level since the daemons already have that info. Provide each process with the equivalent of a "nidmap" - both a map of what nodes are in the job, and a map of which node each process is on. This enables the use of static ports, though that hasn't been turned "on" in this commit.
Update the rsh tree spawn capability so we spawn the next wave of daemons before launching our own local procs.

Add an ability to encode nodenames for large clusters with contiguous node name numbering schemes - this allows communication of all node names in a few bytes instead of tens-of-bytes/node.

This commit was SVN r18338.
2008-04-30 19:49:53 +00:00
Pavel Shamis
61cc8843bf The r17940 broke the XRC code.
The endpoint may be appended to list during XOOB connection bring up.

This commit was SVN r18328.

The following SVN revision numbers were found above:
  r17940 --> open-mpi/ompi@ebfdd133f5
2008-04-29 13:22:40 +00:00
Galen Shipman
ced88a338b include portals modex fun in the distro
This commit was SVN r18325.
2008-04-28 18:51:54 +00:00
Brad Penoff
c699236be2 updating SCTP BTL to configure properly with FreeBSD 7
This commit was SVN r18324.
2008-04-28 04:19:10 +00:00
George Bosilca
6e6c370917 Rollback r18274 as its legal to have a sequence number smaller than the
expected one. It doesn't necessarily means the message is duplicated,
it can simply signify the message is out of sequence and the counter
overflowed.

This commit was SVN r18323.

The following SVN revision numbers were found above:
  r18274 --> open-mpi/ompi@73c9de3af9
2008-04-27 18:35:54 +00:00
Aurelien Bouteiller
611d52fa95 Fix a bug that rpevented to use the same port (as returned by Open_port) for several Comm_accept)
This commit was SVN r18303.
2008-04-25 20:41:44 +00:00
Aurelien Bouteiller
c20b020ea6 Fix ticket #1275. The pml v can now be correctly deactivated on the configure command line. Also fix a dist target under some unusual circumpstances.
This commit was SVN r18291.
2008-04-24 21:42:54 +00:00
Josh Hursey
2c736873bb Fix a checkpoint/restart bug that causes a restarted application to occasionally throw a SIGSEGV or SIGPIPE due to invalid socket descriptors.
The problem was caused by a bad ordering between the restart of the ORTE level tcp connections (in the OOB - out-of-band communication) and the Open MPI level tcp connections (BTLs). Before this commit ORTE would shutdown and restart the OOB completely before the OMPI level restarted its tcp connections. What would happen is that a socket descriptor used by the OMPI level on checkpoint was assigned to the ORTE level on restart. But the OMPI level had no knowledge that the socket descriptor it was previously using has been recycled so it closed it on restart. This caused the ORTE level to break as the newly created socket descriptor was closed without its knowledge.

The fix is to have the OMPI level shutdown tcp connections, allow the ORTE level to restart, and then allow the OMPi level to restart its connections. This seems obvious, and I'm surprised that this bug has not cropped up sooner. I'm confident that this specific problem has been fixed with this commit.

Thanks to Eric Roman and Tamer El Sayed for their help in identifying this problem, and patience while I was fixing it.

 * Add a new state {{{OPAL_CRS_RESTART_PRE}}}. This state identifies when we are on the down slope of the INC (finalize-like) which is useful when you want to close, but not reopen a component set for fear of interfering with a lower level.
 * Use this new state in OMPI level coordination. Here we want to make sure to play well with both the OMPI/BTL/TCP and ORTE/OOB/TCP components.
 * Update ft_event functions in PML and BML to handle the new restart state.
 * Add an additional flag to the error output in OOB/TCP so we can see what the socket descriptor was on failure as this can be helpful in debugging.

This commit was SVN r18276.
2008-04-24 17:54:22 +00:00
George Bosilca
3ccac4f803 Oops ...
This commit was SVN r18275.
2008-04-24 15:54:52 +00:00
George Bosilca
73c9de3af9 Bark if we got a wrong sequence number. Here wrong means that the
seq number if smaller than what we expect.

This commit was SVN r18274.
2008-04-24 15:48:43 +00:00
Rich Graham
4d1ae7b05f accidentally made a change in the wrong place.
This commit was SVN r18262.
2008-04-23 17:32:05 +00:00
Rich Graham
293dd6ad4e add myself to list of people building this module.
This commit was SVN r18261.
2008-04-23 17:25:36 +00:00
Rich Graham
7658cc79e4 Pass in the correct module to the reduction call.
This commit was SVN r18260.
2008-04-23 17:23:30 +00:00
Adrian Knoth
c53d3c3c22 reverted r18169,r18170 due to connection reset by peer on odin/sif
This commit was SVN r18255.

The following SVN revision numbers were found above:
  r18169 --> open-mpi/ompi@20473bfda2
  r18170 --> open-mpi/ompi@d34dfbe12c
2008-04-23 15:26:15 +00:00
Josh Hursey
cc83d41ad9 Merge in tmp/jjh-scratch
{{{
 svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch .
}}}

Contains:
 * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart.
 * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff
 * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P.
 * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry
 * Some other sundry cleanup items all dealing with C/R functionality in the trunk.

This commit was SVN r18241.
2008-04-23 00:17:12 +00:00
Tim Mattox
0215474cb8 Fix two bugs in coll_sm_module.c from bit-rot:
Fixed a selection bug, and removed a bogus "free(proc)" call
which ultimately caused MPI_Finalize to crash.

This commit was SVN r18235.
2008-04-22 18:41:21 +00:00
Jeff Squyres
c40740947f Fix minor spelling error.
This commit was SVN r18229.
2008-04-22 13:11:50 +00:00
Galen Shipman
27c425b304 make portals level ack's optional (require ACK by default)
This commit was SVN r18228.
2008-04-21 22:22:18 +00:00
Rich Graham
df35223603 add selection logic for barrier and reduce.
This commit was SVN r18215.
2008-04-19 22:40:04 +00:00
Rich Graham
bee8b42f29 remove debug code that would not let people run.
Add infrastructure for blocking-barrier.

This commit was SVN r18214.
2008-04-19 01:34:04 +00:00
Galen Shipman
92e3b8671f nasty memory bug...
This commit was SVN r18207.
2008-04-18 03:01:53 +00:00
Ralph Castain
fa082cafa9 Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex.
Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer.

This commit was SVN r18198.
2008-04-17 20:43:56 +00:00
Tim Prins
eb94fa48ce the port name is only relevant at the root, so only look at it there.
This commit was SVN r18188.
2008-04-17 12:37:10 +00:00
Tim Prins
3582e11200 cleanup some warnings on 32 bit systems
This commit was SVN r18187.
2008-04-17 12:25:05 +00:00
Tim Prins
b2acb51d04 make comm_join work again. Allocate memory to the correct pointer.
This commit was SVN r18186.
2008-04-17 11:56:53 +00:00
Rich Graham
6c77fa4921 add a blocking shared memory algorithm.
This commit was SVN r18185.
2008-04-16 22:10:23 +00:00
Ralph Castain
7b91f8baff Cleanup and fix bugs in the MPI dynamics section. Modify the dpm API so it properly takes ports instead of process names (as correctly identified by Aurelien). Fix race conditions in the use of ompi-server. Fix incompatibilities between the mpi bindings and the dpm implemenation that could cause segfaults due to uninitialized memory.
Fix the ompi-server -h cmd line option so it actually tells you something!

Add two new testing codes to the orte/test/mpi area: accept and connect.

This commit was SVN r18176.
2008-04-16 14:27:42 +00:00
Shiqing Fan
aa616b9530 Check whether the debugger is running and whether the convertor is valid.
Add a loop to skip the DT_LOOP element. 

This commit was SVN r18175.
2008-04-16 13:58:58 +00:00
Shiqing Fan
1c4c7e0f2f Add memchecker support for osc rdma communication.
This commit was SVN r18173.
2008-04-16 13:29:55 +00:00
Shiqing Fan
79da2fdd2c Use the new memchecker convertor function.
Remove some unnecessary memchecker calls.

This commit was SVN r18172.
2008-04-16 13:24:35 +00:00
Adrian Knoth
d34dfbe12c fixed misleading comment.
This commit was SVN r18170.
2008-04-16 11:26:15 +00:00
Adrian Knoth
20473bfda2 on incoming connections, compare with every possible source address.
Rational (taken from the code):

    /* This is PITA. We never know which source address an 
    * incoming/outgoing packet will have, so even with 
    * btl_tcp_if_include/exclude on the remote end, we 
    * might get a different source address. 
    * 
    * If this address isn't included in btl_proc->proc_addrs, 
    * we would erroneously drop the connection 
    */ 

merge -r18165:18167 to the trunk.

This commit was SVN r18169.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r18165
  r18167
2008-04-16 11:24:09 +00:00
Adrian Knoth
e981a259bb btl_tcp_disable_family=4 and btl_tcp_disable_family=6 are mutually
exclusive, so this should result in "unreachable" when set differently
between peers.

This commit was SVN r18168.
2008-04-16 10:14:58 +00:00
Adrian Knoth
75c54616c7 renamed opal_sockaddr2str to opal_net_get_hostname for WANT_PEER_DUMP=1
This commit was SVN r18154.
2008-04-15 19:23:47 +00:00
Jeff Squyres
72af302360 Remove unused variable.
This commit was SVN r18151.
2008-04-15 14:58:32 +00:00
Aurelien Bouteiller
0f311ed824 Make sure the function returns NULL when no elan adapter is available instead of a random value.
This commit was SVN r18136.
2008-04-11 21:03:01 +00:00
Aurelien Bouteiller
20592cbcbf Fixes a warning about mallocing 0 bytes when no elan adapter is available.
This commit was SVN r18135.
2008-04-11 20:59:12 +00:00
Aurelien Bouteiller
921a6ce3d4 Process with different jobid can kwon connet/accept to each other.
This commit was SVN r18134.
2008-04-11 15:40:59 +00:00
Rich Graham
249445d61f added reduce-scatter followed by gather to root.
This commit was SVN r18133.
2008-04-11 13:49:08 +00:00
Rich Graham
a6bdbfab97 implement allreduce as reduce-scatter, followed by an allgather.
This commit was SVN r18132.
2008-04-11 04:06:29 +00:00
Jon Mason
08ead87604 Potential double free of locks
mca_btl_openib_endpoint_post_rr_nolock is freeing the endpoint lock on
the error case, but most/all of the functions calling this free the lock
regardless of its error case.  Thus resulting is a double free of the
lock.

This commit was SVN r18131.
2008-04-10 21:15:01 +00:00
Rich Graham
70f3aab5f2 remove some code that is not needed.
This commit was SVN r18128.
2008-04-10 17:32:04 +00:00
Rich Graham
5c7db1e315 remove 2 race conditions in the buffer recycling logic.
This commit was SVN r18127.
2008-04-10 17:20:52 +00:00
Edgar Gabriel
4964434205 reverting commit 18122, since the commit was executed accidentally in the
wring directory. The UH copyrights do belong into this file (i.e. because of
the fix which is in the 1.2 branch, the UH copyright notes are in the header
there alreary), but I want to have the proper log for that.  

This commit was SVN r18124.
2008-04-10 15:09:31 +00:00
Edgar Gabriel
5989fa570c Sorry, previous commit was in the wrong directory. This is the real fix (have
to undo 1822).

The verification of recvcount==0 and rank = root was braking
inter-communicator scatter, since the root (root==MPI_ROOT) might very well
have recvcount=0. The same fix has been applied to gather.c just the other way
round. 
 
Fixes the bug reported on the mainling list by Martin Audet. If there is a
1.2.7 this fix might be worthwhile porting it over.

Please note, that while the test works now for basic and for inter, we get a
0byte malloc warning from the inter module, which we still have to fix in a
separate patch.

This commit was SVN r18123.
2008-04-10 15:03:14 +00:00
Edgar Gabriel
f87830767a the verification of recvcount==0 and rank = root was braking
inter-communicator scatter, since the root (root==MPI_ROOT) might very well
have recvcount=0. The same fix has been applied to gather.c just the other way
round. 
 
Fixes the bug reported on the mainling list by Martin Audet. If there is a
1.2.7 this fix might be worthwhile porting it over.

Please note, that while the test works now for basic and for inter, we get a
0byte malloc warning from the inter module, which we still have to fix in a
separate patch.

This commit was SVN r18122.
2008-04-10 14:58:51 +00:00
Ralph Castain
3a0d09300b Fully implement the inbound binomial allgather for daemon-based collectives. Supports both modex and barrier operations.
Comm_spawn still uses the rank=0 method - shifting that algo to the daemons is under study.

This commit was SVN r18115.
2008-04-09 22:10:53 +00:00
Rich Graham
c6783549ef getting old
This commit was SVN r18110.
2008-04-09 16:55:16 +00:00
Rich Graham
1a20c3ce51 more debug.
This commit was SVN r18109.
2008-04-09 16:19:52 +00:00
Rich Graham
e7e18303f6 more debug.
This commit was SVN r18108.
2008-04-09 15:10:58 +00:00
Rich Graham
b14c6b17d5 adding debug output.
This commit was SVN r18107.
2008-04-09 13:32:01 +00:00
Rich Graham
10434fb2f1 add barrier synchorinzation at the end of the module init, to
avoid initializing shared memory variables in use.

This commit was SVN r18105.
2008-04-09 03:44:40 +00:00
Rich Graham
19bb1a2e86 fix initialization bug.
This commit was SVN r18104.
2008-04-08 23:34:06 +00:00
Donald Kerr
38e298cc9a report error message in all libs, not just debug
This commit was SVN r18103.
2008-04-08 22:58:28 +00:00
Rich Graham
a69a8d9626 initialize the flags.
This commit was SVN r18102.
2008-04-08 22:16:39 +00:00
Rich Graham
8765a2bbdd more debug code.
This commit was SVN r18101.
2008-04-08 20:38:20 +00:00
Rich Graham
08becf33b5 add more debugging.
This commit was SVN r18100.
2008-04-08 18:44:50 +00:00
Rich Graham
aa1b7dd406 more debug
This commit was SVN r18099.
2008-04-08 03:56:47 +00:00
Rich Graham
0c18bdeff7 more debug code.
This commit was SVN r18098.
2008-04-08 03:04:20 +00:00
Rich Graham
9d5a7238df Add some debugging code.
This commit was SVN r18097.
2008-04-07 23:20:15 +00:00
Rich Graham
fa696734d5 add some debug code.
This commit was SVN r18096.
2008-04-07 21:03:23 +00:00
Shiqing Fan
28746bbcdb Remove the memchecker macro in pml base request, used in req_wait.c, which actually is in the wrong place. Instead, one simple call from send_request_free and recv_request_free(already done) will do all the work, fast and clean.
This commit was SVN r18095.
2008-04-07 17:46:50 +00:00
George Bosilca
9e0bc441a6 Make this header ISO C compliant.
This commit was SVN r18090.
2008-04-07 14:47:13 +00:00
Shiqing Fan
d22de11e8e Remove the running debugger function.
This commit was SVN r18087.
2008-04-07 10:40:02 +00:00
Shiqing Fan
c74b488cdb Forgot to comment this function out at moment.
This commit was SVN r18086.
2008-04-07 10:33:11 +00:00
Shiqing Fan
a1e5df1cc9 Use the new memchecker function call which is based on convertor.
Remove one unnecessary call.

This commit was SVN r18085.
2008-04-07 07:52:04 +00:00
Shiqing Fan
a913a60c24 Add a new function for setting memory states based on structure convertor.
Benefits of this function will be using less memory, compactness and better performance. Thanks to George.
Keep the old memchecker function as well in case of convertor is not available.

This commit was SVN r18084.
2008-04-07 07:47:27 +00:00
Gleb Natapov
713a27dc71 Counter of created RDMA channels should be incremented immediately after channel
creation (not in control message completion) otherwise more than max_eager_rdma
channel may be created.

This commit was SVN r18082.
2008-04-06 13:48:45 +00:00
Rich Graham
1b54e8b76e fix buffer management for nb-barrier.
This commit was SVN r18081.
2008-04-05 21:59:04 +00:00
Ralph Castain
5e6dc24e62 Fix ompi-server so it works with unity routed module - still not working with tree routing.
Cleanup debug flag so it activates debugging on the data server code itself

This commit was SVN r18080.
2008-04-04 19:17:28 +00:00
Tim Prins
313edd8955 - Fix a problem reported on the users list where we would segfault in finalize after calling spawn if the user did not call MPI_Comm_disconnect
- Fix the app context constructor so it initializes all the fields.

This commit was SVN r18079.
2008-04-04 15:07:39 +00:00
Aurelien Bouteiller
3d0ed3dfe8 Small typo in manpage.
This commit was SVN r18078.
2008-04-04 01:02:51 +00:00
Jeff Squyres
7072a32703 * Properly protect XRC stuff
* A few minor style fixes

This commit was SVN r18076.
2008-04-02 19:52:03 +00:00
Rich Graham
94f8fd365c a few reduction optimizations. Add bcast.
This commit was SVN r18075.
2008-04-02 19:02:33 +00:00
George Bosilca
a00ca20446 More cleanups.
This commit was SVN r18069.
2008-04-02 06:38:33 +00:00
George Bosilca
944453c4c1 Cleanups.
This commit was SVN r18068.
2008-04-02 06:37:42 +00:00
George Bosilca
58e31d767e Cleanup.
This commit was SVN r18067.
2008-04-02 06:35:24 +00:00
George Bosilca
9738ee7784 Add the logicalx types to fortran.
This commit was SVN r18066.
2008-04-02 06:34:46 +00:00
Rich Graham
eb5d6096f1 add reduction routine - fix buffer recycling logic which was totally
broken.

This commit was SVN r18065.
2008-04-01 22:56:18 +00:00
Matthias Jurenz
1b021eb63f Bugfix for LIBC's I/O tracing: fileno(stream) is called only if stream != NULL
This commit was SVN r18053.
2008-04-01 07:09:36 +00:00
Jeff Squyres
d944d5ec52 Just in case something goes drastically wrong, don't segv.
This commit was SVN r18049.
2008-03-31 21:55:07 +00:00
Edgar Gabriel
f7c8bb78fd move the coll_base_comm_select functions after dpm has been opened and
selected, but before we check whether we have been spawned. This is necessary
in order for the hierarch collective component to work. This component might
create new communicators already in MPI_Init(), which then have to execute the
dpm.mark_dyncomm function. If dpm is not initialized at that point, we
segfault. 

This commit was SVN r18045.
2008-03-31 19:37:37 +00:00
George Bosilca
5adaa88241 Cleanup the code and make it a little faster.
This commit was SVN r18038.
2008-03-31 17:12:03 +00:00
Matthias Jurenz
879fdc4feb merging VampirTrace-5.4.5 into the main branch
This commit was SVN r18030.
2008-03-31 12:48:35 +00:00
Matthias Jurenz
a33831c1f8 Pass OMPI's configure option '--[enable|disable]-binaries' to VT's configure
This commit was SVN r18029.
2008-03-31 12:46:27 +00:00
George Bosilca
60111ce66d Few less warnings.
This commit was SVN r18025.
2008-03-30 19:06:49 +00:00
George Bosilca
b4f828f389 We need a newline at the nd of the file, or some compiler bark.
This commit was SVN r18023.
2008-03-30 19:05:56 +00:00
Gleb Natapov
b42234461a Cleanup shared file creation on unix/linux.
This commit was SVN r18021.
2008-03-30 13:41:47 +00:00
Lenny Verkhovsky
7e45d7e134 Few updates due to RMAPS rank_file component changes
1. applied prefix rule to functions and variables of RMAPS rank_file component
2. cleaned ompi_mpi_init.c from paffinity code
3. paffinity code moved to new opal/mca/paffinity/base/paffinity_base_service.c file
4. added opal_paffinity_slot_list mca parameter

This commit was SVN r18019.
2008-03-30 11:52:11 +00:00
Jeff Squyres
d0f12f3df0 Make a better error message.
This commit was SVN r18014.
2008-03-29 12:54:24 +00:00
Rich Graham
3b42d2268d add functions to handle two different input buffers and a separate
output buffer.  User defined data types have not way to make use
of these.

This commit was SVN r18012.
2008-03-28 23:45:44 +00:00
Rich Graham
90e53ca9ee debug the pipeline algorithm.
This commit was SVN r18008.
2008-03-28 15:10:07 +00:00
Aurelien Bouteiller
77653ac787 Missing .h file in makefile breaked nightly tarball distcheck...
This commit was SVN r18006.
2008-03-28 14:36:56 +00:00
Aurelien Bouteiller
c16339944a Fix a coverity warning about using unsafe sprintf.
This commit was SVN r17999.
2008-03-27 21:24:27 +00:00
Aurelien Bouteiller
e11237aadb Introduction of the "progress" sender_based method to replace the slow isend-self method.
This commit was SVN r17998.
2008-03-27 21:19:45 +00:00
Aurelien Bouteiller
93db01871e This is part of the previous patch.
This commit was SVN r17997.
2008-03-27 21:06:14 +00:00
Aurelien Bouteiller
f8bf6f2c6a Code cleanup.
sender_based.h is now split in two files, to solve cyclic .h files inclusion. 
Most macros are now inline functions.
Variable names have been changed from places to places.
Various other small things... 

This commit was SVN r17996.
2008-03-27 21:05:44 +00:00
George Bosilca
691806680a I guess this wasn't really intended ...
This commit was SVN r17995.
2008-03-27 18:41:06 +00:00
George Bosilca
303941f642 Avoid a deadlock. The comment explain how this might happen.
This commit was SVN r17994.
2008-03-27 18:37:11 +00:00
George Bosilca
be4b153f0d Another patch for thread safety in the TCP BTL (thanks to Pierre).
This commit was SVN r17993.
2008-03-27 18:36:08 +00:00
Tim Prins
c5736e3f9a Remove old constants used with the registry.
This commit was SVN r17991.
2008-03-27 17:13:20 +00:00
Ralph Castain
6166278e18 Improve the scalability of the modex operation and fix a bug reported by Tim P
The bug was a race condition in the barrier operation that caused the barrier in MPI_Finalize to fail on very short programs.

Scalaiblity was improved by using the daemons to aggregate modex and barrier messages before sending them to the rank=0 proc. Improvement is proportional to ppn, of course, but there really wasn't a scaling problem at low ppn anyway. This modification also paves the way for better allgather operations since now all the data for each node is sitting at the daemon level, and the daemons are now aware that a collective operation on the OOB is underway (so they -can- participate in a collective of their own to support it).

Also added better diagnostics to map out the timing associated with MPI_Init - turned on by -mca orte_timing 1.

This commit was SVN r17988.
2008-03-27 15:17:53 +00:00
Gleb Natapov
cf40674369 Decide if sends should be throttled at the receiver and pass this to the sender
in an ACK message. The decision can't be done reliably at the sender.

This commit was SVN r17987.
2008-03-27 08:56:43 +00:00
Rich Graham
e2ad9c4be2 adjust to change in orte_process_info.
This commit was SVN r17986.
2008-03-27 01:25:28 +00:00
Rich Graham
441fb9fb9e checkpoint.
This commit was SVN r17985.
2008-03-27 01:16:32 +00:00
Jeff Squyres
a2795fe43d Very minor modification against r17980: check the whole string against
"all", not just the first 3 chars (i.e., if someone sets the value
"allfoo", we should still error).

This commit was SVN r17981.

The following SVN revision numbers were found above:
  r17980 --> open-mpi/ompi@b3ef774d46
2008-03-26 19:10:02 +00:00
Josh Hursey
b3ef774d46 A fix for r17956.
r17956 broke the ability for the user to override the 'opal_event_include'
parameter. This commit checks to see if the user specified a value before
forcing the "all" value on the event engine.

This commit fixes Checkpoint/Restart support in the trunk which requires
this feature.

This commit was SVN r17980.

The following SVN revision numbers were found above:
  r17956 --> open-mpi/ompi@763218e754
2008-03-26 14:54:09 +00:00
Rainer Keller
b7efc2b18e - Coverity issues CID 42:
Event var_deref_model: Variable "array_of_integers" tracked as NULL was
   passed to a function that dereferences it. [model]
   The arrays passed down type_get_contents may be NULL, only iff max_* is 0...
   If the max_* parameter does not fit, an error is returned, anyhow.
   One could improve the checks of MPI_PARAM_CHECK, but to be on the
   safe side, fix in dt_args.c.

This commit was SVN r17974.
2008-03-26 09:07:06 +00:00
Rainer Keller
334b64e760 - Coverity issue CID 35:
Event var_deref_op: Variable "requests" tracked as NULL was
   dereferenced.
   Only check requests[i] for NULL, if requests is != NULL itself.

This commit was SVN r17973.
2008-03-26 08:19:55 +00:00
Rainer Keller
56f3d59f2a - Coverity issues 939, 940, 941:
Event uninit_use_in_call: Using uninitialized value "tag" in call to
   function "(ompi_dpm).connect_accept" and others
   The tag is set and used in get_rport only on root...

This commit was SVN r17972.
2008-03-26 08:09:11 +00:00
George Bosilca
a01f3f762c Check if extra is NULL or not ...
This commit was SVN r17967.
2008-03-25 22:43:46 +00:00
George Bosilca
bea5c0f734 Don't allocate anything if we don't really need it, and avoid leaking memory.
This commit was SVN r17966.
2008-03-25 22:43:11 +00:00
Jeff Squyres
763218e754 Fix #1253: default libevent to use select/poll and only use the other
mechanisms (such as epoll) if someone (ompi_mpi_init()) requests
otherwise.  See big comment in opal/event/event.c for a full
explanation.

This commit was SVN r17956.
2008-03-25 17:18:17 +00:00
Ralph Castain
90107f3c14 Fix an issue with comm_spawn over who sent/recv first in the modex. The modex assumes that the first name on the list is the "root" that will serve as the allgather collector/distributor. The dpm was putting that entity last, which forced us to pre-inform the parent procs of the child proc's contact info since the parent was trying to send to the child.
Clarify the setting of send_first in the mpi bindings (trivial, i know, but helpful)

Remove the extra xcast of child contact info to the parent job.

This commit was SVN r17952.
2008-03-25 14:57:34 +00:00
Ralph Castain
cca449e379 Move an OMPI RML tag to the OMPI layer
This commit was SVN r17950.
2008-03-25 13:30:48 +00:00
Jeff Squyres
5320c91ab3 Oops -- fix the constructor to also use opal_object_t instead of
opal_list_item_t.

This commit was SVN r17945.
2008-03-25 11:59:50 +00:00
Galen Shipman
0116041133 BTL shouldn't own the passive side's descriptor in the PML get protocol. The BTL
doesn't know when to free it on the passive side. 

This commit was SVN r17943.
2008-03-25 01:43:41 +00:00
Ralph Castain
ebea4d04e4 Remove defunct error constant - we no longer have a GPR that can hold corrupt data!
This commit was SVN r17942.
2008-03-24 21:05:14 +00:00
Jeff Squyres
ebfdd133f5 AFACT, we never put endpoints on a list.
This commit was SVN r17940.
2008-03-24 18:32:55 +00:00
Jeff Squyres
004c3a5b09 Ensure to cover all cases when either ORTE or OMPI is not yet
initialized.  For example, there is a period of time during
ompi_mpi_init when orte_initialized==true, but
ompi_mpi_initialized==false (and therefore communicators are not setup
yet, etc.).

This commit was SVN r17937.
2008-03-24 16:25:14 +00:00
Ralph Castain
dc7f45dafd Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure.
Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code.

This commit was SVN r17926.
2008-03-23 23:10:15 +00:00
Jeff Squyres
dee561d29e Per recent off-list discussions about the build system, I have done
some cleanups and standardizations in the various */tools/*/ 
Makefile.am files.  This commit:

 * Somewhat simplify the tool Makefile.am's 
 * Makes the tool Makefile.am's consistent with each other (do similar
   actions in similar ways)
 * Update the tool Makefile.am's to remove old kruft that was required
   by older versions of AM (trunk requires AM >=1.10)

This commit was SVN r17921.
2008-03-22 02:04:05 +00:00
Brian Barrett
f176c67cd2 Set the nodeid to something somewhat sane if we're not using modex, and
don't set the LOCAL flag just because both procs have an invalid nodeid.

This commit was SVN r17917.
2008-03-21 20:20:00 +00:00
Brian Barrett
5a7ebf5f25 Do not try to update the local process with modex information (from the local
process) as it stomps on information if the modex doesn't exist for the
current platform

This commit was SVN r17916.
2008-03-21 19:20:47 +00:00
Jeff Squyres
4fbcb75ce8 With 5 commits over a 16 hour period and 3 broken tarball builds and a
still-broken trunk build on common platforms (e.g., 64 bit Linux
RHEL4U4), I think it's clear that this code is not ready for
prime-time.

I'm backing out all the commits in the trunk/ompi/op tree from r17901
onwards.  This code can be re-committed when compiles and runs on
common platforms.

cd ompi/op
svn merge -r 17907:17900 https://svn.open-mpi.org/svn/ompi/trunk/ompi/op .

This commit was SVN r17908.

The following SVN revision numbers were found above:
  r17901 --> open-mpi/ompi@b9520e61dc
2008-03-21 14:47:01 +00:00
Jeff Squyres
8284f64af1 With r17906, this commit should make the trunk compile again.
This commit was SVN r17907.

The following SVN revision numbers were found above:
  r17906 --> open-mpi/ompi@df4a6c3fc5
2008-03-21 13:49:23 +00:00
Rich Graham
df4a6c3fc5 fix function prototypes for new 3 buffer routines.
This commit was SVN r17906.
2008-03-21 13:44:15 +00:00
Ralph Castain
b2655ab585 Per Brian's suggestion, remove unnecessary library dependency - libtool automagically picks up the other libraries when we include libmpi
This commit was SVN r17905.
2008-03-21 12:47:04 +00:00
Rich Graham
0974160e29 correct several of the new macros.
This commit was SVN r17904.
2008-03-21 03:45:43 +00:00
Rich Graham
a7c836a2b0 fix location of the restrict key word.
Make the tag in the fan-in/fan-out algorithm be fragment based.

This commit was SVN r17903.
2008-03-21 01:40:36 +00:00
Rich Graham
2c66d396b7 take care of some bit-rot with the fanin-fanout method.
This commit was SVN r17902.
2008-03-21 01:08:49 +00:00
Rich Graham
b9520e61dc get the sm optimized allreduce working for all but user defined
operations.  Added to the reduction operations a set of reduction
functions that take 2 input buffers and one output buffer to avoid
some extra memory copies.  These can't be used with user defined
operations.  The intel c collective suite passes both original, and
new (new, not the user defined operations).

This commit was SVN r17901.
2008-03-20 23:51:16 +00:00
Galen Shipman
dcac824f59 Fix problem in releasing fragments during GET_END event (didn't check that
portals btl has ownership and therefor didn't free the frag as it should) this
causes leakage and hangs in MPI_Finalize. 

Also added a bit more debugging. 

This commit was SVN r17900.
2008-03-20 22:46:32 +00:00
Jeff Squyres
4314609a00 * Remove a meaningless clause (it could never be true)
* Fix an error message to correctly display if we were before
   MPI_INIT or after MPI_FINALIZE (refs trac:1243)

This commit was SVN r17873.

The following Trac tickets were found above:
  Ticket 1243 --> https://svn.open-mpi.org/trac/ompi/ticket/1243
2008-03-18 22:26:43 +00:00
George Bosilca
efa89bfa3f Revert r17857. The context should be set in one case ... when we call prepare_{src|dst}
without calling a get or put. So, just keep it here until a better solution is
found.

This commit was SVN r17872.

The following SVN revision numbers were found above:
  r17857 --> open-mpi/ompi@d460ccfbf9
2008-03-18 19:01:27 +00:00
Ralph Castain
f39ce707b5 Remove an ORTE debug flag from an MPI function
This commit was SVN r17871.
2008-03-18 18:25:45 +00:00
Jeff Squyres
a9028d21dd This file is generated; it should not be in SVN.
This commit was SVN r17867.
2008-03-18 16:46:53 +00:00
Ralph Castain
32a82349df More fixes to cleanup compiler warnings for rank_file code
This commit was SVN r17863.
2008-03-18 13:21:38 +00:00
Lenny Verkhovsky
647bce6d3e Support for new RMAPS rank mapping component
This commit was SVN r17860.
2008-03-18 09:39:07 +00:00
George Bosilca
8943ae0b4e Cleanup plus some typos.
This commit was SVN r17858.
2008-03-18 03:03:33 +00:00
George Bosilca
d460ccfbf9 No need to check for NULL there. The bml_btl is set correctly
on the upper level.

This commit was SVN r17857.
2008-03-18 03:02:31 +00:00
George Bosilca
39353ebb44 Cleanup.
This commit was SVN r17855.
2008-03-18 02:56:50 +00:00
George Bosilca
76deec135e The .h file is not used anymore (it contain the descriptor cache). Update the
Makefile.am file as well.

This commit was SVN r17854.
2008-03-18 02:50:24 +00:00
George Bosilca
1d04ec4ded Correct the connection logic for TCP. Now we have not only a cleaner
connection, but a more thread safe one. Thanks to Pierre for his
help on this.

This commit was SVN r17853.
2008-03-18 02:42:16 +00:00
Jeff Squyres
61290c0e51 Remove a useless file.
This commit was SVN r17852.
2008-03-18 01:50:47 +00:00
Ralph Castain
be7d0a8a4d Fix a problem introduced by the conversion of orte_pointer_array to opal_pointer_array. We used to derive the app context's index from the returned index of the orte_pointer_array_add function - this parameter was lost in the transition to opal_pointer_array_add. As a result, we no longer knew the index of the app_context, so everything is launched with app0.
This commit was SVN r17851.
2008-03-17 23:48:10 +00:00
Jeff Squyres
12426b64ea Per MPI-2 ballot 3, the definition of MPI::BOTTOM has changed. w00t!
Fixes trac:1175.

This commit was SVN r17850.

The following Trac tickets were found above:
  Ticket 1175 --> https://svn.open-mpi.org/trac/ompi/ticket/1175
2008-03-17 21:42:27 +00:00
Edgar Gabriel
570bbea5e0 fixing the allgather problem reported on the mailing list. The problem was
that at one locatin we had the local-size instead of the remote size as a
receive argument.

This commit was SVN r17849.
2008-03-17 19:42:18 +00:00
Gleb Natapov
9b6db25182 Fix compilation warning.
This commit was SVN r17839.
2008-03-17 13:37:57 +00:00
Matthias Jurenz
613de1bff6 bugfix in VT_COMM_ID: return static comm. id (1) for MPI_COMM_SELF
This commit was SVN r17837.
2008-03-17 11:55:40 +00:00
Pavel Shamis
54ad8d7446 The issue was reported/fixed by Jon Mason one month ago but the fix was not committed. So I'm commiting it now.
This commit was SVN r17835.
2008-03-17 11:13:06 +00:00
Brad Penoff
be13b86fc5 Clarifying and fixing SCTP btl_sctp_if_11 parameter
This commit was SVN r17834.
2008-03-17 09:18:31 +00:00
Gleb Natapov
f488b94899 More SM BTL initialization cleanups.
This commit was SVN r17833.
2008-03-16 10:01:56 +00:00
Rich Graham
27182afb67 get the timers in correctly.
This commit was SVN r17832.
2008-03-16 03:25:16 +00:00
Rich Graham
afcd1016fd move temp buffer allocation out of the iteration loop - i.e. always use the
same temp loop.  The algorithm is rather synchronous already...

This commit was SVN r17831.
2008-03-16 03:20:46 +00:00
Rich Graham
a1766b29f6 fix some barrier addressing errors.
This commit was SVN r17830.
2008-03-15 22:46:19 +00:00
Rich Graham
0453e7d2f4 bug in management memory allocation - too much memory allocated.
This commit was SVN r17829.
2008-03-15 18:12:20 +00:00
Rich Graham
3c2f1eb8bf reduce the number of temp buffers used.
This commit was SVN r17828.
2008-03-15 17:23:04 +00:00
Rich Graham
0f9d642d51 temp buffer pointers are computed when they are set up. A bit more
efficient, but more important, it is much easier to play around with
memory layout now.

This commit was SVN r17827.
2008-03-15 16:36:35 +00:00
Rich Graham
e3e336b5ab check point
This commit was SVN r17826.
2008-03-15 13:31:21 +00:00
Jeff Squyres
6c77c995c2 Add missing dependencies in the static build case.
This commit was SVN r17825.
2008-03-15 12:11:36 +00:00
George Bosilca
5e229fe688 Thanks Ma for the patch. Correct the multi-rail support and
rename some fields to something more clear.

This commit was SVN r17824.
2008-03-14 19:17:28 +00:00
George Bosilca
ecebd5ae77 Update the Elan BTL to take in account multiple networks, and correctly deal
with the node position in the network.

This commit was SVN r17822.
2008-03-14 17:32:35 +00:00
Matthias Jurenz
6fe53bb5c2 merging VampirTrace-5.4.4.5 into the main branch
This commit was SVN r17821.
2008-03-14 16:23:52 +00:00
Gleb Natapov
772772b944 Remove unneeded include.
This commit was SVN r17813.
2008-03-12 10:01:20 +00:00
George Bosilca
17317faed4 Make visible the exported functions.
This commit was SVN r17810.
2008-03-11 19:26:38 +00:00
Edgar Gabriel
c11957fbb4 the ompi_group_get_proc_ptr has to be OMPI_DECLSPECed, since else it won't
work if 
 - visibility is enabled (now enabled by default)
 - sparse groups by default.

Thanks for Mohamad locating the problem, and Rainer for locating the solution.

This commit was SVN r17809.
2008-03-11 18:53:18 +00:00
Gleb Natapov
90c70e37b9 Clean up SM btl startup code. Remove no longer needed code leftovers from two
BTL times. Remove old and no longer correct comment.

This commit was SVN r17805.
2008-03-11 14:39:10 +00:00
Gleb Natapov
3a9652ffc4 Endpoint array may not exist if in add_proc() we failed to find suitable
btl for communication with a proc. Don't segfault in this case.

This commit was SVN r17804.
2008-03-11 08:13:37 +00:00
Matthias Jurenz
b9c8e46d8b Removed dubious AC_CACHE_CHECK constructs
This commit was SVN r17800.
2008-03-10 14:08:31 +00:00
Gleb Natapov
ffa09c44fd Pass correct pointer to mpool_base function.
This commit was SVN r17795.
2008-03-09 13:22:12 +00:00
Gleb Natapov
b0b21c68b4 Remove trailing spaces from SM BTL.
This commit was SVN r17794.
2008-03-09 13:17:13 +00:00
Rich Graham
ebcf928c24 add some diagnostics.
This commit was SVN r17789.
2008-03-07 22:27:41 +00:00
Rich Graham
9131461511 move some test code to another machine.
This commit was SVN r17785.
2008-03-07 19:18:02 +00:00
Rich Graham
c230b65543 fix a couple of bugs. Recursive doubling seems to be working.
This commit was SVN r17777.
2008-03-07 02:51:38 +00:00
Rich Graham
70157166f9 checkpoint - compiles, now neeed to debug.
This commit was SVN r17775.
2008-03-07 00:39:59 +00:00
Ralph Castain
b110a247be Fix comm_spawn (maybe).
Comm_spawn was sticking during spawn_multiple because of a problem in the dpm - the modex there is asking processes to talk to each other in an allgather_list operation, but the procs don't have the required contact info to do so. The solution here was to ensure that all parent procs have full contact info for procs in the child job.

Admittedly, this isn't the long-term answer. We would like to have the contact info given to only the parent procs that were involved in the comm_spawn. There is a way to do that, but this will suffice to keep things working until that can be implemented and tested.

This commit was SVN r17772.
2008-03-06 21:56:00 +00:00
Rich Graham
4eace9d020 starting to implement recursive doubling algorithm.
This commit was SVN r17765.
2008-03-06 18:38:58 +00:00
Tim Prins
5de3e1965e Remove the orte_proc_table. Migrate all users of it to the opal_hash_table and a new name hash function in orte.
Everything should work, however I am unable to compile and test the sctp BTL.

This commit was SVN r17751.
2008-03-05 22:44:35 +00:00
Tim Prins
f9916811ae Make it so we do not mangle the options the user passes to their executeable. Fixes trac:1124
The change also:
 - cleans up and simplifies the command line processing code
 - adds an error output if more than one hostfile passed for a single app context
 - gets rid of the superfluous orte_app_context_map_t type, and instead use a simple argv of -host options

This commit was SVN r17750.

The following Trac tickets were found above:
  Ticket 1124 --> https://svn.open-mpi.org/trac/ompi/ticket/1124
2008-03-05 22:12:27 +00:00
Donald Kerr
ef8f807c1c was not passing correct variable to dat_strerror
This commit was SVN r17749.
2008-03-05 21:45:16 +00:00
Matthias Jurenz
cdf25e2b12 - merging VampirTrace-5.4.4.4 into the main branch
- fixed ticket #1212 (Make VT use OMPI autogen)

This commit was SVN r17739.
2008-03-05 15:19:29 +00:00
Matthias Jurenz
36211ad385 Fixed ticket #1212
(Make VT use OMPI autogen)

This commit was SVN r17738.
2008-03-05 15:17:10 +00:00
Josh Hursey
612ebdc2ac Cleanup some symbol visability issues.
This commit was SVN r17733.
2008-03-05 13:59:25 +00:00
Tim Prins
10c2ce7d35 Export needed symbol
This commit was SVN r17731.
2008-03-05 12:46:59 +00:00
Jeff Squyres
597266fdec Present state of MPI debugger work:
* New/improved bootstrapping technique for DLLs 
 * First cut of the MPI handle debugging interface. It is still
   evolving, but the interface is getting more stable.
 * Some minor bugs were fixed in the unity topo component (brought to
   light because of the new MPI handle debugging stuff).

Fixes trac:1209.

This commit was SVN r17730.

The following Trac tickets were found above:
  Ticket 1209 --> https://svn.open-mpi.org/trac/ompi/ticket/1209
2008-03-05 12:22:34 +00:00
Josh Hursey
3b4073e32c This commit fixes the checkpoint/restart functionality on the trunk. Included in this commit are:
* Extension to the ESS framework to support C/R
 * Fixed support for {{{snapc_base_establish_global_snapshot_dir}}}
 * Fixed FileM support
 * Misc. minor code modifications

There are some outstanding visability issues that I want to fix next.

This commit was SVN r17725.
2008-03-05 04:57:23 +00:00
Jeff Squyres
ea5c0cb4a2 Now that the nightly tarball has safely been made, let's try this
commit again.  Remove the svn:ignore from problematic directories and
try a merge from /tmp-public/plpa-merge-area2.

This commit was SVN r17718.
2008-03-05 02:45:15 +00:00
Tim Prins
1b34620d8e Make the default to enable symbol visibility.
Fixes trac:1222

This commit was SVN r17712.

The following Trac tickets were found above:
  Ticket 1222 --> https://svn.open-mpi.org/trac/ompi/ticket/1222
2008-03-05 01:30:32 +00:00
Galen Shipman
3a59cbd4a7 not sure how this got missed..
This commit was SVN r17710.
2008-03-05 01:23:43 +00:00
Christian Bell
987de57c9c Looks like orte/ns is now gone
This commit was SVN r17706.
2008-03-05 00:55:43 +00:00
Jeff Squyres
8189fcc7d5 Back out r17702; it went very badly.
This commit was SVN r17704.

The following SVN revision numbers were found above:
  r17702 --> open-mpi/ompi@3df754ebd7
2008-03-05 00:42:39 +00:00
Jeff Squyres
3df754ebd7 Bring over PLPA v1.1 from /tmp-public/plpa-v1.1 branch.
This commit was SVN r17702.
2008-03-05 00:16:49 +00:00
Christian Bell
c3d0a81cd3 Add new QLogic adapters to hca-params.init
This commit was SVN r17699.
2008-03-04 22:14:27 +00:00
Ralph Castain
55c727cea4 Fix compiler warning
This commit was SVN r17684.
2008-03-04 15:46:37 +00:00
Rich Graham
67ad9b6d6b increase max data segments size.
This commit was SVN r17677.
2008-03-02 19:11:09 +00:00
Gleb Natapov
08abafdaa1 Initialize ib_pd to NULL.
This commit was SVN r17674.
2008-03-02 09:11:23 +00:00
Rich Graham
53126fa7bd add calls to opal_progress()
This commit was SVN r17673.
2008-02-29 23:25:09 +00:00
Rich Graham
d37db14901 get the shared memory collectives working again with the new
version of orte.

This commit was SVN r17672.
2008-02-29 22:28:57 +00:00
Rich Graham
c253a7bda1 simplify the code abit.
This commit was SVN r17664.
2008-02-29 03:55:12 +00:00
Rich Graham
1632d8b299 revert to an older (not previosly checked in) version to get around a
regression.

This commit was SVN r17663.
2008-02-29 03:12:12 +00:00
Rich Graham
827e8d877e fix bug in node type, and some memory copy optimizations.
This commit was SVN r17661.
2008-02-29 01:20:11 +00:00
Rich Graham
940d6732c9 remove compiler warnings.
This commit was SVN r17656.
2008-02-28 22:01:19 +00:00
Tim Prins
84b2099fe8 Remove the now-unused orte_value_array. As this is the last 'class' split between orte and ompi, remove the big comment about the split in ompi_bitmap.
Also, update some properties (source files should not be executeable...), and remove a couple unneeded inclusions of orte_proc_table.h

This commit was SVN r17655.
2008-02-28 21:39:42 +00:00
Rich Graham
2b5fab9d51 avoid 0 byte malloc.
This commit was SVN r17653.
2008-02-28 21:11:42 +00:00
Ralph Castain
8d819cf3d3 Move carto open/close/finalize to opal layer so that ORTE can get access to topo info. This will be used to support a topo grpcomm that optimizes communications in non-uniform topologies like RR.
This commit was SVN r17652.
2008-02-28 21:04:30 +00:00
Rich Graham
4b26adef00 remove some debug output.
This commit was SVN r17650.
2008-02-28 20:54:35 +00:00
Ralph Castain
48e5840c50 Restore a placeholder to make non-SVN SCM's happy.
This commit was SVN r17648.
2008-02-28 20:19:22 +00:00
Rich Graham
5df6c6d043 fix several race conditions.
This commit was SVN r17645.
2008-02-28 19:40:19 +00:00
George Bosilca
d9937cca81 Only declare ret in the block where it is used (avoid a warning about
unused variable).

This commit was SVN r17638.
2008-02-28 06:18:57 +00:00
George Bosilca
9d421bea2a Replace all occurences of orte_pointer_array by opal_pointer_array. Remove the
implementation of orte_pointer_array.

This commit was SVN r17636.
2008-02-28 05:32:23 +00:00
George Bosilca
678e6c7f0d This is a Mercurial file.
This commit was SVN r17635.
2008-02-28 05:18:06 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00