1
1
Граф коммитов

4504 Коммитов

Автор SHA1 Сообщение Дата
Mike Dubman
9556310bd0 cosmetic: add comment with rationale for malloc.h include
This commit was SVN r28614.
2013-06-12 05:58:32 +00:00
Nathan Hjelm
9b1f32bf12 BTL: add flags for signaled BTL operations
As per discussion in the June 2013 developer meeting these
flags will be used by the PML in the future to request
asynchronous progress on an operation. The naming was chosen
to reflect that a BTL supports this mode (MCA_BTL_FLAG_SIGNALED)
and that a descriptor should "signal" the remote side to wake
up and progress the message (MCA_BTL_DES_FLAG_SIGNAL).

Future commits will update OB1 to take advantage of this
feature when performing the RDMA get or RDMA rendezvous
protocols.

This commit was SVN r28612.
2013-06-11 21:52:20 +00:00
Mike Dubman
d18b3ae1a7 fix malloc deprication error with gcc 4.6.3 on ubuntu/fedora
This commit was SVN r28605.
2013-06-09 18:13:16 +00:00
George Bosilca
d789423d34 Typo.
This commit was SVN r28603.
2013-06-08 10:44:02 +00:00
Vishwanath Venkatesan
0b727f84da Avoid malloc of zero bytes, add a check and avoid it.
This commit was SVN r28597.
2013-06-06 14:08:57 +00:00
Edgar Gabriel
2d4655a05a Logic has been revised compared to the previous implementation.
This commit was SVN r28594.
2013-06-05 23:47:42 +00:00
Edgar Gabriel
03c1db7a3a fix the calculation of the UNIFORM flag.
This commit was SVN r28593.
2013-06-05 23:18:50 +00:00
Vishwanath Venkatesan
7d6a05982a Removing the gather_array based on the flag UNIFORM FVIEW for read all operations (dynamic/static),
+ Disabling Timing data extraction by default in dynamic write all

This commit was SVN r28592.
2013-06-05 21:35:37 +00:00
Vishwanath Venkatesan
55878674d7 1. Removing the allgather_array based on the flag UNIFORM FVIEW. This is not really and optimization.
2. Fixing some of the debug printf's these are outdated.

This commit was SVN r28591.
2013-06-05 21:30:15 +00:00
Jeff Squyres
713e3aa3db Refs trac:3626: that ticket specifically refers to the v1.6 branch; this
commit is the trunk version of what is needed for #3626.

Add the "ignore_device" field to the INI file.  This allows us to
specifically list devices that should be ignored by the openib BTL
(such as the Intel Phi, at least as of May 2013 -- see #3626).  

Also add the Intel Phi to the ini file, and set its ignore_device=1.

Finally, add the concept of counting intentionally ignored verbs
devices.  Devices are ignored for one of two reasons:

 * If the number of allowed ports on that device is 0 (i.e., if
   if_include/if_exclude was set such that we're intentionally
   ignoring this device).
 * If the INI ignore_device field for this device is set to 1.

Once we have the count of devices that were intentionally ignored,
only show the "Hey, there's verbs devices that you're not using!"
show_help message if there are devices that were ''unintentionally''
ignored.

This commit was SVN r28589.

The following Trac tickets were found above:
  Ticket 3626 --> https://svn.open-mpi.org/trac/ompi/ticket/3626
2013-06-05 12:12:09 +00:00
Jeff Squyres
3019b7a3f8 Oops! Remove duplicate registration.
This commit was SVN r28588.
2013-06-05 11:55:19 +00:00
Jeff Squyres
1de00b17ad Properly check the return status from registering the MCA params.
This commit was SVN r28587.
2013-06-05 11:53:18 +00:00
Jeff Squyres
d692aba672 Remove the DR PML. It was abondoned long ago. It had a nice life,
a few papers, and now a decent demise with respect.  

This commit was SVN r28582.
2013-06-04 19:36:16 +00:00
Edgar Gabriel
87b3782b7f arghh, copy-and-paste error, status->_ucount has to be set to 0 not max_data for count=0.
This commit was SVN r28576.
2013-05-30 22:00:29 +00:00
Edgar Gabriel
9daec82f17 - make a fileview of 0 bytes work in ompio
- fixes the bug reported in ticket 3619 (which is already closed) also for ompio

This commit was SVN r28575.
2013-05-30 21:33:13 +00:00
Rolf vandeVaart
3d1d158a80 Do not abort in BTL. Rather, callback into PML error function. Thanks George for review.
This commit was SVN r28559.
2013-05-23 18:45:23 +00:00
Nathan Hjelm
721779d7ab Per RFC: remove old MCA parameter system.
This commit was SVN r28541.
2013-05-20 15:36:13 +00:00
Ralph Castain
889bf60c64 Fix bad merge
This commit was SVN r28540.
2013-05-18 01:29:55 +00:00
Jeff Squyres
089c632cce Remove a bunch of dead code: gcc 4.7 warns of set-but-unused
variables.  So get rid of them.

This commit was SVN r28538.
2013-05-17 21:45:49 +00:00
Edgar Gabriel
1b1051da6c fix a bug in the calculation of the explicit offset. Use the opportunity to
clean up the code a bit.

This commit was SVN r28537.
2013-05-17 20:22:00 +00:00
Ralph Castain
3e6e1046a3 fix a correctness issue by returning an error if waitall fails and invoking the mpi error handler
cmr:v1.7.2:reviewer=jsquyres

This commit was SVN r28533.
2013-05-16 15:04:37 +00:00
Rolf vandeVaart
91fdb423d7 Fix warning in CUDA-aware code.
This commit was SVN r28511.
2013-05-14 21:04:15 +00:00
Rolf vandeVaart
52ebb0b17f Change some opal_output to OPAL_OUTPUT per CMR review.
This commit was SVN r28510.
2013-05-14 20:49:42 +00:00
Nathan Hjelm
32a8ff5255 btl/openib: bump up udcm priority
This commit was SVN r28505.
2013-05-14 20:02:40 +00:00
Edgar Gabriel
d5cae9aced - fix the mca stripe size and stripe depth parameter logic in the pvfs2
component
- correctly recognize and handle the corresponding info objects.

This commit was SVN r28497.
2013-05-14 16:11:39 +00:00
Yossi Etigin
64d98e0438 Fix data corruption in MXM by registering to OPAL memory release hooks and removing any mappings created by mxm
This commit was SVN r28489.
2013-05-14 12:27:44 +00:00
Rolf vandeVaart
9d569f1487 Fix warning when compiling in CUDA aware code.
This commit was SVN r28476.
2013-05-10 21:29:08 +00:00
Nathan Hjelm
422331b4da btl/openib: fix unconnected datagram connection method (udcm)
The primary issue with udcm is that the immediate data in message
acks were often bogus. This caused the sender to keep trying even
though a message was received and acked. The fix is to use the
source LID and QP to determine which message is being acked. In
most cases this should work well since only one message will be
in flight to any peer.

This commit was SVN r28444.
2013-05-03 17:11:38 +00:00
Jeff Squyres
c8258c06e2 In coll_sm, we alloc a huge chunk of shared memory, divvy it into lots
of individual regions (each region is a multiple of page size in
length), and each process claims its own regions by binding it to its
local memory.  Each process would end up membining something like 16
individual regions in the overall shmem segment.

There were two errors in this code relating to the memory affinity
pinning.  Some combination of these two errors would lead to kernel
panics (!) on my RHEL 6.2 x86_64 machines when used with mmap'ed
shared memory (not posix or sysv shared memory, curiously enough):

1. The shared memory segment is initially divided into two regions:
control and data.  The control starts at the beginning of the shmem
segment, the data starts after that.  The data portion, unfortunately,
was ''not'' aligned to a page.  So all the multiple-of-page-size
regions that we divvy up were also not alined on page boundaries.  And
therefore all the regions we tried to membind were not on page
boundaries.

The solution was to ensure that the data portion started on a page
boundary.  Then all of the individual regions were on page boundaries,
too.

That being said, in my tests, Linux mbind() fails gracefully when the
address is not on a page boundary.  So I'm not sure how this worked at
all / led to a kernel panic...

2. There was some bad pointer math that resulted in membinding regions
larger than they should have been, resulting in region overlaps.
There were definitely overlaps between regions in the same process;
it's likely that there were overlaps between regions of multiple
processes, too -- I'm not sure (and don't care to figure out :-) ).

The solution was to fix the pointer math so that each region membinds
exactly only itself and no neighboring/overlapping regions.

cmr:v1.7.2:reviewer=samuel

This commit was SVN r28442.
2013-05-03 12:49:35 +00:00
Alex Mikheev
9e2fdc7d56 - correction of r28440
This commit was SVN r28441.

The following SVN revision numbers were found above:
  r28440 --> open-mpi/ompi@93ce233530
2013-05-02 12:52:58 +00:00
Alex Mikheev
93ce233530 - btl_openib: changed default SRQ settings:
- increase number of wqe to minimize number of RNRs
    - it is better to have high watermark and post relatively small number of wqes
    - increased TX queue size

This commit was SVN r28440.
2013-05-02 12:46:35 +00:00
Alex Mikheev
f76680fbd0 - btl_openib: fix total registered memory calculation for ConnectIB and Ofed 2.0
This commit was SVN r28432.
2013-05-01 13:39:29 +00:00
Jeff Squyres
d92a8e01f8 Use the _SAFE list traversal macro so that we can remove each item
from the list (just for good measure), and then free() it (without
using _SAFE, we were accessing memory that was just free()'d to get to
the next item).  Also be a little more thorough -- DESTRUCT the list
when we're all done.

This commit was SVN r28429.
2013-05-01 12:26:16 +00:00
George Bosilca
8b0335380a Fix the error messages to reference the correct function.
This commit was SVN r28425.
2013-04-30 23:26:03 +00:00
George Bosilca
6a75c84fa8 Remove useless define.
This commit was SVN r28424.
2013-04-30 23:24:59 +00:00
Ralph Castain
9de82aba55 Revert r28417 - given the non-standard way vprotocol is implemented, I see no way to use the framework verbosity here. Best to just leave it alone as those who use it know what they need to do to get debug output
This commit was SVN r28418.

The following SVN revision numbers were found above:
  r28417 --> open-mpi/ompi@b00de5be8b
2013-04-30 16:37:17 +00:00
Nathan Hjelm
b00de5be8b vprotocol: remove the old output and use the framework output
This commit was SVN r28417.
2013-04-30 15:21:42 +00:00
Ralph Castain
ceb4061214 Fix BTL_VERBOSE - when the MCA param change was committed, it left the base verbosity variable declared so things compiled. Sadly, the verbosity was now being set to a new variable, so debug never was output.
This commit was SVN r28414.
2013-04-30 01:15:52 +00:00
Nathan Hjelm
f384263de7 btl/openib: fix typo
This commit was SVN r28413.
2013-04-29 22:21:25 +00:00
Ralph Castain
5d7a93c032 Add the ability to use an external version of libevent. Clearly not recommended at this time. I've verified that it works in limited scenarios, but more thorough testing and performance impacts need to be assessed.
Interesting how many includes had to be fixed here and there to fill in missing dependencies :-)

This commit was SVN r28411.
2013-04-29 17:02:37 +00:00
Ralph Castain
8996ecb128 Add missing include
This commit was SVN r28405.
2013-04-27 00:09:36 +00:00
Jeff Squyres
f55cea1a5b If there are no BTLs, do ''not'' actually shut down the fd listener,
because a) it may still be needed to shut down the CPCs, and b) it
will be shut down during component_close().

This commit was SVN r28402.
2013-04-26 15:31:50 +00:00
Jeff Squyres
99b7a0f20d Remove unused variables.
This commit was SVN r28401.
2013-04-26 15:29:42 +00:00
Vishwanath Venkatesan
c902624b59 Using ompi_type_destroy to free ompi_datatype. This had to be updated in all the collective algorithms.
Hopefully this will fix all warnings.

This commit was SVN r28385.
2013-04-24 19:27:26 +00:00
Nathan Hjelm
2edff7f784 btl/openib: don't free string handle by MCA variable system
This commit was SVN r28383.
2013-04-24 18:59:18 +00:00
Alex Margolin
aebd794bf6 Fixed macro definition order in MXM component headers
This commit was SVN r28378.
2013-04-24 16:51:43 +00:00
Vishwanath Venkatesan
bba4a93f63 Got this wrong while replacing MPI function with OMPI functions. Fixed it now.
This commit was SVN r28350.
2013-04-22 19:58:25 +00:00
Rolf vandeVaart
5e1dde419c Fix some compile errors in CUDA-aware code that has crept in.
This commit was SVN r28346.
2013-04-18 15:34:16 +00:00
Vishwanath Venkatesan
53753622d4 Changing some of the MPI_ functions to ompi_ equivalents.
This commit was SVN r28342.
2013-04-17 21:06:36 +00:00
Alex Margolin
0ab7675019 Fix MXM connection establishment flow
This commit was SVN r28329.
2013-04-12 16:37:42 +00:00
Steve Wise
134baaf2fa Add Chelsio T5 device. This fixes trac:3552 and should be added to cmr:v1.6:reviewer=jsquyres and cmr:v1.7:reviewer=jsquyres
This commit was SVN r28327.

The following Trac tickets were found above:
  Ticket 3552 --> https://svn.open-mpi.org/trac/ompi/ticket/3552
2013-04-11 19:30:53 +00:00
George Bosilca
2d33c9ee39 Stop complaining about an overwritten default parameter.
This commit was SVN r28322.
2013-04-10 19:44:37 +00:00
Jeff Squyres
8405975bf6 Be a little more conservative about initializing devices and modules
(i.e., ensure that more data items get zeroed out/set to NULL) so that
if something goes wrong during initialization, we don't try to clean
up something that isn't there (and segv).

The chance of this happening on the trunk is very low (and will also
be low once the verbs improvements are brought over to v1.7).  But it
can actually happen in the v1.6 branch (e.g., if no CPC is available,
we'll try to get the length of the endpoints list, but the endpoints
list is NULL).  

Hence, even though the real goal is to get this functionality over to
v1.6, I figured I'd commit to the trunk/CMR to v1.7 just to try to
keep commonality in the openib between all three where possible.

This commit was SVN r28317.
2013-04-09 21:55:31 +00:00
Ralph Castain
45af6cf59e The move of the orte_db framework to opal required that we create an opaque opal_identifier_t type as OPAL cannot know anything about the ORTE process name. However, passing a value down to opal and then having the db components reference it causes alignment issues on Solaris Sparc platforms. So pass the pointer instead and do the old "memcpy" trick to avoid the problem.
This commit was SVN r28308.
2013-04-08 23:34:16 +00:00
Nathan Hjelm
4e95d691a7 pml/ob1: do not reset the convertor if one was not created (size = 0).
This macro is only used on the failure path so the additional if statement
should not have any affect on performance.

cmr:v1.7

This commit was SVN r28292.
2013-04-05 01:40:11 +00:00
Pavel Shamis
fed6e60131 Fixing OpenIB BTL compilation failure for a cases when
BTL_OPENIB_MALLOC_HOOKS_ENABLED is disabled.

This commit was SVN r28290.
2013-04-04 20:17:18 +00:00
Pavel Shamis
aa1f5697b4 In order to prevent name conflicts in XRC (MOFED) enabled mode
OFACM's ib_address_t was renamed to ofacm_ib_address_t

This commit was SVN r28289.
2013-04-04 20:02:17 +00:00
Nathan Hjelm
e8d9944456 sbgp/ibnet: fix param -> var update errors
This commit was SVN r28284.
2013-04-03 20:17:18 +00:00
Nathan Hjelm
75093155ab bcol/iboffload: fix still more errors from param -> var updates
This commit was SVN r28283.
2013-04-03 19:57:03 +00:00
Nathan Hjelm
47a1897710 bcol/iboffload: fix more errors from param -> var updates
This commit was SVN r28281.
2013-04-03 18:55:46 +00:00
Nathan Hjelm
31a498c2a1 bcol/iboffload: fix errors from param -> var updates
This commit was SVN r28280.
2013-04-03 18:33:19 +00:00
Ralph Castain
66f3a81488 Cleanup warnings found when building v1.7
cmr:v1.7

This commit was SVN r28279.
2013-04-03 17:37:02 +00:00
Vishwanath Venkatesan
74c418b860 Adding typecasting with intptr_t to remove warnings.
This commit was SVN r28278.
2013-04-03 17:07:43 +00:00
Vishwanath Venkatesan
784337aab1 typecasting with intptr_t to remove warnings
This commit was SVN r28276.
2013-04-03 17:06:02 +00:00
Jeff Squyres
64d39a4e97 Technically speaking, we're creating a QP with 1 send WQE and 1
receive WQE, so it's good form to have a CQ with 2 entries, not 1.

This commit was SVN r28256.
2013-03-28 13:11:31 +00:00
George Bosilca
9c6374b515 Swap the open and register.
This commit was SVN r28253.
2013-03-27 22:19:57 +00:00
Nathan Hjelm
f1fa290157 btl/vader: add missing return statement
This commit was SVN r28252.
2013-03-27 22:16:21 +00:00
Nathan Hjelm
113fadd749 btl/vader: do not use common/sm for shared memory fragments
This commit was SVN r28250.
2013-03-27 22:10:02 +00:00
Nathan Hjelm
9d4a26f47d Update OMPI frameworks to use the MCA framework system.
Notes:
  - This commit also eliminates the need for an available components list in use
    in several frameworks. None of the code in question was making use of the
    priority field of the priority component list item so these extra lists were
    removed.
  - Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
  - Cleans up the ompi/orte-info functions. Expose the functions that construct the
    list of params so they can be used elsewhere.

patches for mtl/portals4 from brian

missed a few output variables in openib

This commit was SVN r28241.
2013-03-27 21:17:31 +00:00
Nathan Hjelm
c041156f60 Update ORTE frameworks to use the MCA framework system.
This commit was SVN r28240.
2013-03-27 21:14:43 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Ralph Castain
317915225c Finish the binding cleanup by removing the no-longer-used binding level scheme. This proved to be fallible as there is no guarantee that the hierarchy it used matched physical reality of the machine (e.g., is L3 "above" the socket or not). Still have to complete the ppr update, but get the rest of it correct.
This commit was SVN r28223.
2013-03-26 20:09:49 +00:00
Jeff Squyres
44e371a65d Remove (bogus) port number from the opal_output -- there's no port
number associated with creating a QP.

This commit was SVN r28222.
2013-03-26 19:48:50 +00:00
Vishwanath Venkatesan
e092cc34e0 Fixing the read all bugs discovered by Coverity
This commit was SVN r28189.
2013-03-20 20:27:09 +00:00
Samuel Gutierrez
8ce2041102 Cleanup in error path. Fixes CID 967211. Thanks, Jeff.
This commit was SVN r28183.
2013-03-19 20:00:08 +00:00
Jeff Squyres
2513122d31 Remove extraneous semicolon.
This commit was SVN r28180.
2013-03-18 23:58:11 +00:00
Jeff Squyres
7ac02fb9d4 Two fixes for the ROMIO io module:
* Don't call PMPI_* anything from our module code; that's terribly
   bad form (and disallowed!).  Instead, do the proper back-end stuff
   to reset the error handler on the file handle.
 * If we've already started to MPI_Finalize, then just give up and
   don't actually perform all the file closing actions (because
   ROMIO's file close calls MPI_Barrier, which will obviously fail if
   MPI_Finalize has already been invoked).  Bad user behavior should
   be punished (by leaking resources, not closing the file properly,
   etc.).

This commit was SVN r28177.
2013-03-18 20:11:20 +00:00
Vasily Filipov
7bda23dd84 SBGP, BCOL: add missing "show_help.h" includes.
This commit was SVN r28163.
2013-03-10 09:11:09 +00:00
Brian Barrett
65109de931 Fix leak of comm and datatype references for mprobe/improbe and fix a request leak in improbe
This commit was SVN r28157.
2013-03-07 21:55:22 +00:00
Brian Barrett
db858827df Fill in more of the process info structure when using PMI
This commit was SVN r28152.
2013-03-06 19:32:47 +00:00
Brian Barrett
a67d768ee4 quick hack to get things compiling again. Still need to fill in the fixme parts. sigh.
This commit was SVN r28150.
2013-03-06 18:33:25 +00:00
Nathan Hjelm
3c5cd95087 mtl/psm: add missing header for opal_show_help (one more)
This commit was SVN r28147.
2013-03-05 00:18:51 +00:00
Nathan Hjelm
25d0d97d6b mtl/psm: add missing header for opal_show_help
This commit was SVN r28146.
2013-03-05 00:17:48 +00:00
Nathan Hjelm
213cb79fab mtl/psm: add missing header for opal_show_help
This commit was SVN r28145.
2013-03-05 00:15:11 +00:00
Rolf vandeVaart
037729dcbb Add a search path. Refactor code.
This commit was SVN r28142.
2013-03-01 21:50:56 +00:00
Rolf vandeVaart
5c761d701d Remove tabs for spaces, fix some error messages.
This commit was SVN r28141.
2013-03-01 19:13:06 +00:00
Rolf vandeVaart
ebe63118ac Remove dependency on libcuda.so when building in CUDA-aware support. Dynamically load it if needed.
This commit was SVN r28140.
2013-03-01 13:21:52 +00:00
Ralph Castain
a4b6fb241f Remove all remaining vestiges of the Windows integration
This commit was SVN r28137.
2013-02-28 17:31:47 +00:00
Nathan Hjelm
b5a2cd1cce remove csum pml
This commit was SVN r28133.
2013-02-28 00:17:56 +00:00
Brian Barrett
1370d4569a workaround for case when MD can't span all of memory (sigh)
This commit was SVN r28132.
2013-02-27 17:02:45 +00:00
Vasily Filipov
f897c8a1e0 MTL MXM: STREAM supporting for isend and irecv.
This commit was SVN r28122.
2013-02-27 13:21:30 +00:00
Ralph Castain
8d2fa3693b First cut at removing the native Windows support. Remove all the Windows-specific components, and the .windows files sprinkled around. Remove the Windows platform files and MTT scripts. Update the NEWS to point Windows users to the cygwin package.
This commit was SVN r28116.
2013-02-26 20:44:56 +00:00
Ralph Castain
bd9265c560 Per the meeting on moving the BTLs to OPAL, move the ORTE database "db" framework to OPAL so the relocated BTLs can access it. Because the data is indexed by process, this requires that we define a new "opal_identifier_t" that corresponds to the orte_process_name_t struct. In order to support multiple run-times, this is defined in opal/mca/db/db_types.h as a uint64_t without identifying the meaning of any part of that data.
A few changes were required to support this move:

1. the PMI component used to identify rte-related data (e.g., host name, bind level) and package them as a unit to reduce the number of PMI keys. This code was moved up to the ORTE layer as the OPAL layer has no understanding of these concepts. In addition, the component locally stored data based on process jobid/vpid - this could no longer be supported (see below for the solution).

2. the hash component was updated to use the new opal_identifier_t instead of orte_process_name_t as its index for storing data in the hash tables. Previously, we did a hash on the vpid and stored the data in a 32-bit hash table. In the revised system, we don't see a separate "vpid" field - we only have a 64-bit opaque value. The orte_process_name_t hash turned out to do nothing useful, so we now store the data in a 64-bit hash table. Preliminary tests didn't show any identifiable change in behavior or performance, but we'll have to see if a move back to the 32-bit table is required at some later time.

3. the db framework was a "select one" system. However, since the PMI component could no longer use its internal storage system, the framework has now been changed to a "select many" mode of operation. This allows the hash component to handle all internal storage, while the PMI component only handles pushing/pulling things from the PMI system. This was something we had planned for some time - when fetching data, we first check internal storage to see if we already have it, and then automatically go to the global system to look for it if we don't. Accordingly, the framework was provided with a custom query function used during "select" that lets you seperately specify the "store" and "fetch" ordering.

4. the ORTE grpcomm and ess/pmi components, and the nidmap code,  were updated to work with the new db framework and to specify internal/global storage options.

No changes were made to the MPI layer, except for modifying the ORTE component of the OMPI/rte framework to support the new db framework.

This commit was SVN r28112.
2013-02-26 17:50:04 +00:00
Ralph Castain
70a28c8a27 Now that we are using local ranks in OMPI, we need to define an ompi_local_rank_t and equate it to orte_local_rank_t. Change the sm btl to use the correct abstraction.
This commit was SVN r28098.
2013-02-22 17:48:53 +00:00
Samuel Gutierrez
af5ed9b25c OMPI_NODE_RANK_INVALID ==> OMPI_LOCAL_RANK_INVALID
This commit was SVN r28096.
2013-02-21 18:28:07 +00:00
Samuel Gutierrez
4bf0134901 Remove debug.
This commit was SVN r28095.
2013-02-21 18:21:22 +00:00
Samuel Gutierrez
b7791963f2 Fix sm BTL initialization for MPI_Comm_spawn and friends. Thanks to Jeff for
finding the issue.

This commit was SVN r28094.
2013-02-21 18:19:46 +00:00
Nathan Hjelm
55cf850eca Add comment about r28083
This commit was SVN r28084.

The following SVN revision numbers were found above:
  r28083 --> open-mpi/ompi@5411e28c00
2013-02-20 21:42:13 +00:00
Nathan Hjelm
5411e28c00 btl/openib: don't align fragments on 2 byte boundaries (changed to 8)
cmr:v1.6,v1.7

This commit was SVN r28083.
2013-02-20 21:27:01 +00:00
Rolf vandeVaart
da3e9ff906 Add show_help.h where needed.
This commit was SVN r28071.
2013-02-19 15:42:09 +00:00
Brian Barrett
3c83618799 fix a missing header file issue with IB
This commit was SVN r28070.
2013-02-18 18:29:14 +00:00
Vasily Filipov
52a9241859 MTL MXM: adapt to mxm 2.0 api changes - flags are only for send requests, and SYNC is part of the opcode.
This commit was SVN r28069.
2013-02-17 10:04:19 +00:00
Vasily Filipov
8270d8f52a MTL MXM: "#include "opal/util/show_help.h" adding.
This commit was SVN r28068.
2013-02-17 09:51:03 +00:00
Ralph Castain
ebad55b933 Apply patches from ORNL to fix compile issues - minor stuff. Thanks to Geoffroy Vallee for the patches.
This commit was SVN r28065.
2013-02-15 22:14:23 +00:00
Jeff Squyres
bbddd6ea03 Add header file for opal_show_help().
This commit was SVN r28056.
2013-02-13 16:31:59 +00:00
Brian Barrett
312f37706e In talking about this with Jeff and Ralph, we don't actually need
ompi_show_help, because opal_show_help is replaced with an 
aggregating version when using ORTE, so there's no reason to
directly call orte_show_help.

This commit was SVN r28051.
2013-02-12 21:10:11 +00:00
Joshua Ladd
70ad711337 Backing out the Open SHMEM project
This commit was SVN r28050.
2013-02-12 17:45:27 +00:00
Mike Dubman
ff384daab4 Added new project: oshmem.
This commit was SVN r28048.
2013-02-12 15:33:21 +00:00
Mike Dubman
55cb00f8a3 Remove references to unexisting files:
ompi/mca/common/netpatterns/
    ompi/mca/common/commpatterns/

This commit was SVN r28044.
2013-02-12 13:21:47 +00:00
Pavel Shamis
a31bc57849 Moving mca/common/netpatterns and commpaterns to ompi/patterns.
This commit was SVN r28035.
2013-02-05 21:52:55 +00:00
Brian Barrett
d80218996f Rather than setting up the direct call stuff in ompi_mca (which requires
modifying ompi_mca for every interface that is direct called), do it in
the framework's .m4 file.

This commit was SVN r28031.
2013-02-04 23:26:42 +00:00
Vasily Filipov
21b170b43b MTL MXM: push commit r27987 back, now with right user.
r27987 - MTL MXM: ver. 2.0 interface changes.

This commit was SVN r28026.

The following SVN revision numbers were found above:
  r27987 --> open-mpi/ompi@2735658d81
2013-02-04 06:59:24 +00:00
Vasily Filipov
aa5e436479 Revert revesion -r27986, the reason is - it was submitted with wrong user name.
This commit was SVN r28025.

The following SVN revision numbers were found above:
  r27986 --> open-mpi/ompi@729caaf0cd
2013-02-04 06:54:24 +00:00
Jeff Squyres
c8dc1905f0 Fixes trac:3494: If we get 0 bytes back for the ACK, it doesn't
necessarily mean an error -- it could (and usually does) mean that the
peer realized that we both initiated a connect at the same time, and
therefore it decided to hang up.

I also added a friendly show_help error message for other cases where
recv_blocking() fails (i.e., "Something went wrong. Kaboom! Your job
will abort...").

This commit was SVN r28023.

The following Trac tickets were found above:
  Ticket 3494 --> https://svn.open-mpi.org/trac/ompi/ticket/3494
2013-02-02 01:19:03 +00:00
Jeff Squyres
f05b7aa6d8 As the help message states, it's not an ''error'' if the specified
interface is not found.  It should just be skipped.

This commit was SVN r28016.
2013-02-01 20:17:43 +00:00
Ralph Castain
afb0db5b6f Okay, Jeff - just for you...flow the show help thru the orte functions so help messages will be aggregated
This commit was SVN r28007.
2013-02-01 00:35:48 +00:00
Ralph Castain
e6555408f4 When we say abort, we mean ABORT!! Actually implement the ompi_rte_abort and ompi_rte_show_help functions in the ORTE module.
This commit was SVN r28004.
2013-01-31 23:12:11 +00:00
Igor Usarov
8d80af6c10 Support FCA v3.0
This commit was SVN r27988.
2013-01-31 11:14:27 +00:00
Pavel Shamis
2735658d81 MTL MXM: ver. 2.0 interface changes.
This commit was SVN r27987.
2013-01-31 08:38:08 +00:00
Rolf vandeVaart
729caaf0cd Remove any dependency on libcuda.so in opal layer. All changes are within OMPI_CUDA_SUPPORT code.
This commit was SVN r27986.
2013-01-30 23:07:32 +00:00
Rolf vandeVaart
aa04de4f1e Add run-time parameter to enable and disable CUDA GPU support.
This commit was SVN r27970.
2013-01-29 20:24:04 +00:00
Rolf vandeVaart
de5b7f5c6a Add mpool_base_verbose parameter. All the other base components appear to have this and it can help with debug.
This commit was SVN r27968.
2013-01-29 17:52:18 +00:00
Brian Barrett
49b2b5bf4f Fix double-install issue when --with-devel-headers is used
This commit was SVN r27967.
2013-01-29 17:23:18 +00:00
Brian Barrett
b8442ba505 Revamp the handling of wrapper compiler flags. The user flags, main configure
flags, and mca flags are kept seperate until the very end.  The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro.  MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.

This commit was SVN r27950.
2013-01-29 00:00:43 +00:00
Rolf vandeVaart
b5672927f2 Fix build issue when building with --disable-dlopen.
This commit was SVN r27945.
2013-01-28 20:14:59 +00:00
Rolf vandeVaart
c6412f6dff Add new rte headers in files that need them.
This commit was SVN r27943.
2013-01-28 19:32:33 +00:00
Pavel Shamis
1f1e1efb7b Removing leftovers of old infrastructure.
cmr:v1.7

This commit was SVN r27942.
2013-01-28 19:11:42 +00:00
Vishwanath Venkatesan
5be992f445 The pointer to the structure was also never allocated before retrieving
the stripe size. Fixing that too.

This commit was SVN r27941.
2013-01-28 07:21:22 +00:00
Vishwanath Venkatesan
817f6cd868 To remove the warning due to uninitialized variable.
This commit was SVN r27940.
2013-01-28 06:55:46 +00:00
George Bosilca
4defdea9f2 The shortest lifespan for a BTL.
This commit was SVN r27939.
2013-01-28 03:43:23 +00:00
George Bosilca
1b7dff3f2f A copy for posterity of the Open MPI Sicortex BTL.
This commit was SVN r27938.
2013-01-28 03:42:52 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00
Brian Barrett
14f4aa1198 Fix memory leak in nbc init
This commit was SVN r27884.
2013-01-21 22:45:59 +00:00
Brian Barrett
407714a85a Fix a memory leak in the RDMA one-sided component. Thanks to Victor Vysotskiy
for letting us know about this one.

This commit was SVN r27883.
2013-01-21 22:45:37 +00:00
George Bosilca
42753b4690 Make the TCP BTL really fail-safe. It now trigger the error callback on
all pending fragments when the destination goes down. This allows the PML
to recalibrate its behavior, either find an alternate route or just give up.

This commit was SVN r27881.
2013-01-21 11:41:08 +00:00
George Bosilca
d2281cc672 Remove the CMA related warnings.
This commit was SVN r27872.
2013-01-19 14:26:43 +00:00
Rolf vandeVaart
f63c88701f Improve CUDA GPU transfers over openib BTL. Use aynchronous copies.
This is RFC that was submitted in July and December of 2012.

This commit was SVN r27862.
2013-01-17 22:34:43 +00:00
Rolf vandeVaart
a07a4bb3f7 Update smcuda to match recent changes in sm BTL.
This commit was SVN r27803.
2013-01-14 14:42:19 +00:00
Rolf vandeVaart
34d1f0a585 Add some comments to the #ifdefs for clarity. No functional changes.
This commit was SVN r27802.
2013-01-13 16:08:48 +00:00
Alex Mikheev
344d407ed4 fixed compilation warning
always send signalled when BTL_OPENIB_FAILOVER is defined

This commit was SVN r27801.
2013-01-13 10:11:03 +00:00
Jeff Squyres
b2d5d1e348 Along with the Automake 1.13.x changes in r27790, rename these third
party configure.in scripts to be configure.ac so that Automake stops
complaining about them.

This commit was SVN r27791.

The following SVN revision numbers were found above:
  r27790 --> open-mpi/ompi@675a2f5c48
2013-01-11 20:26:19 +00:00
Jeff Squyres
675a2f5c48 Updates for Automake 1.13.x. Without these changes, Automake 1.13.x
will error out, due to use of the
previously-deprecated-and-now-removed AM_CONFIG_HEADER macro.

This commit was SVN r27790.
2013-01-11 20:20:02 +00:00
Samuel Gutierrez
4c28c8cbd0 New sm BTL initialization take two. This approach is pretty simple. Instead of
using the modex or RML to share sm initialization information, have node rank 0
create a file containing initialization information in a well-known place. Then
during add_procs, the rest of the node processes requiring sm BTL initialization
will just read from that file to complete their initialization.

This commit was SVN r27789.
2013-01-11 16:24:56 +00:00
Brian Barrett
b817166072 Use a process name instead of a name list in bcol_basesmuma
This commit was SVN r27779.
2013-01-09 16:43:49 +00:00
Joshua Ladd
77df51c516 Fixes the definition of the first fragment and does not assume that first frag has offset_into_user_buff equal to zero. This fix should be added to cmr:v1.7.1:reviewer=pasha
This commit was SVN r27775.
2013-01-08 20:24:58 +00:00
Alex Mikheev
fe672f255f request signal when sending over SRQ and number of SRQ sd_credits is 0
This commit was SVN r27767.
2013-01-08 14:00:29 +00:00
Samuel Gutierrez
c4acd20eb9 Backout r27739.
This commit was SVN r27745.

The following SVN revision numbers were found above:
  r27739 --> open-mpi/ompi@a159bfaf25
2013-01-05 01:54:23 +00:00
Nathan Hjelm
84e34ee0d7 Fix a bug in the uGNI btl that could cause certain descriptor callbacks to be called twice.
There was a race condition in the eager get protocol where the RDMA complete message could be received before the local completion of the SMSG message that started the eager get protocol.

cmr:v1.7

This commit was SVN r27740.
2013-01-03 23:11:13 +00:00
Samuel Gutierrez
a159bfaf25 sm BTL initialization via modex, as discussed at last year's meeting.
This commit was SVN r27739.
2013-01-03 21:52:20 +00:00
Mike Dubman
889d46e966 support for FCA v3.0 and up
This commit was SVN r27731.
2012-12-31 05:49:22 +00:00
Mike Dubman
b6d50a5733 Performance optimizations by alexm:
* btl sendi(): if message can be send inline try to avoid signal
* signal is requested one per 64 or when
    there are no send wqes 
    when message can not be send inline 
    any other btl method then sendi()

This commit was SVN r27724.
2012-12-26 10:19:12 +00:00
George Bosilca
ed77868984 No need for event.h in the SM BTL.
This commit was SVN r27718.
2012-12-23 19:33:53 +00:00
Nathan Hjelm
ef49fcea25 Remove debug printfs.
cmr:v1.7

This commit was SVN r27680.
2012-12-17 16:34:07 +00:00
Nathan Hjelm
ba5b2b0540 btl/vader: fix bug in single copy code that could cause ob1 sends to not get marked complete.
cmr:v1.7

This commit was SVN r27671.
2012-12-13 23:18:53 +00:00
Mike Dubman
a454341e2b add support for mxm 2.0
This commit was SVN r27661.
2012-12-09 22:58:37 +00:00
Nathan Hjelm
3e1b13b13a Re-add support for old flex (2.5.4a and earlier) while still cleaning up properly in new flex.
This commit was SVN r27657.
2012-12-07 00:12:43 +00:00
Brian Barrett
702451111b Remove Portals 3.3 support
This commit was SVN r27656.
2012-12-06 20:11:27 +00:00
Jeff Squyres
c00e6a7abf Remove the OFUD BTL. It doesn't work, and isn't included in 1.7.
An upcoming BTL from Cisco used ofud as a starting point, and should
probably be used as a starting point for any future UD-based BTL.

And this OFUD BTL is obviously still in history if anyone ever wants
to resurrect it.

This commit was SVN r27655.
2012-12-06 17:43:28 +00:00
Steve Wise
176a5a9b3b Update the Chelsio T4 openib device params. This fixes trac:3414 and should be added to cmr:v1.6:reviewer=jsquyres and cmr:v1.7:reviewer=jsquyres
This commit was SVN r27648.

The following Trac tickets were found above:
  Ticket 3414 --> https://svn.open-mpi.org/trac/ompi/ticket/3414
2012-11-30 16:32:34 +00:00
Jeff Squyres
ad15fb5437 Developer enhancement: if a BTL component returns a NULL in its array
of modules, print a BTL_ERROR and exit(1) (previous behavior was to
segv).  This at least explicitly tells the developer that their BTL
component is behaving badly.

This commit was SVN r27634.
2012-11-26 21:19:02 +00:00
Vishwanath Venkatesan
9320beaacc Modified/improved implmentation of dynamic segmentation algorithm to avoid merging in
fbtl modules. This implmentation in alignment with all other collective modules tries to 
keep all the file-ops as contiguous as possible.

This commit was SVN r27611.
2012-11-15 00:59:10 +00:00
Vishwanath Venkatesan
ac1dfae007 Perror should say readv in preadv fbtl, currently says writev
This commit was SVN r27610.
2012-11-15 00:57:13 +00:00
Nathan Hjelm
312857ea86 remove unused allocator output declaration (missed by r27599)
This commit was SVN r27600.

The following SVN revision numbers were found above:
  r27599 --> open-mpi/ompi@87e5f97400
2012-11-13 07:21:10 +00:00
Nathan Hjelm
87e5f97400 add missing #include of opal/util/output.h
This commit was SVN r27599.
2012-11-13 07:14:41 +00:00
Pavel Shamis
cbe6d6548a Cleaning warnings in ml, sbgp, bcol.
cmr:v1.7

This commit was SVN r27598.
2012-11-12 22:30:32 +00:00
Vishwanath Venkatesan
f91340f648 Fixes for the 2gb limitation. This fixes problems both static and two-phase
algorithms. 

This commit was SVN r27596.
2012-11-12 17:39:58 +00:00
Nathan Hjelm
e0f5137e46 add prototypes for lex destroy functions
This commit was SVN r27580.
2012-11-09 22:00:27 +00:00
Mike Dubman
9392f1894e Fixes MPI_Allgather when sendbuf is MPI_IN_PLACE. fixes 3342
This commit was SVN r27579.
2012-11-09 18:05:10 +00:00
Nathan Hjelm
8658bbc902 instead of relying on yyterminate to clean up the lex context call the destroy functions directly (after closing the file)
This commit was SVN r27577.
2012-11-09 16:10:55 +00:00
Aleksey Senin
ae92f64842 Check that MXM runtime version match compiled.
Reviewed by Mike Dubman.

This commit was SVN r27575.
2012-11-07 14:44:33 +00:00
Nathan Hjelm
51bf75f8e9 fix typo
This commit was SVN r27573.
2012-11-06 21:25:19 +00:00
Nathan Hjelm
1400bab466 fix a couple of errors in r27569
This commit was SVN r27572.

The following SVN revision numbers were found above:
  r27569 --> open-mpi/ompi@f3ce12e71a
2012-11-06 20:06:54 +00:00
Nathan Hjelm
7fb5caea92 Remove the finish_parsing function from various .l files. The function is incomplete (doesn't clean up the lex state) and should be replaced by *_yylex_destroy which correctly cleans up the state.
Checked with the flex 2.5.35. Verified with valgrind that this fixes several "still reachable" leaks.

cmr:v1.7

This commit was SVN r27571.
2012-11-06 19:26:14 +00:00
Nathan Hjelm
bdedd8b0d3 Per RFC modify the behavior of mca_base_components_close to NOT close the output. Modify frameworks to always close their output and set to -1.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.

This commit was SVN r27570.
2012-11-06 19:09:26 +00:00
Nathan Hjelm
f3ce12e71a Per RFC fix several leaks in opal and ompi. Details below.
pml/v:
  - If vprotocol is not being used vprotocol_include_list is leaked. Assume vprotocol never takes ownership (see below) and always free the string.

coll/ml:
  - (patch verified) calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.
  - Need to free mca_coll_ml_component.config_file_name on component close.

btl/openib:
  - calling mca_base_param_lookup_string after mca_base_param_reg_string is unnecessary. The call to mca_base_param_lookup_string causes the value returned by mca_base_param_reg_string to be leaked.

vprotocol/base:
  - There was no way for pml/v to determine if vprotocol took ownership of vprotocol_include_list. Fix by always never ownership (use strdup).

mca/base:
  - param_lookup will result in storage->stringval to be a newly allocated string if the mca parameter has a string value. ensure this string is always freed.

cmr:v1.7

This commit was SVN r27569.
2012-11-06 18:57:46 +00:00
Jeff Squyres
d6e9a14b14 Fix minor issue: the argv_delete may change the top list pointer. So
be sure to save it.

This commit was SVN r27568.
2012-11-06 16:05:58 +00:00
Mike Dubman
d47d550dfc performance optimization: process completions in the batch manner
This commit was SVN r27559.
2012-11-05 14:02:37 +00:00
Mike Dubman
ca308974e0 Switched FCA collectives component from dlopen to compile-time linking to libfca
This commit was SVN r27557.
2012-11-02 17:30:00 +00:00
Jeff Squyres
4569f77645 Remove redundant common_verbs.h include.
This commit was SVN r27556.
2012-11-02 14:16:55 +00:00
Mike Dubman
5cdb3654d7 SRQ now supported in ConnectIB
This commit was SVN r27552.
2012-11-01 08:13:56 +00:00
Brian Barrett
25ccfe1bdd Add autogen.sh to list of files to be distributed, so autogen works off a
tarball

This commit was SVN r27548.
2012-11-01 02:26:28 +00:00
Vishwanath Venkatesan
0e6378bfc9 Modifying the static component accordingly for the modification of interfaces in io_ompio.c
This commit was SVN r27546.
2012-10-31 22:09:21 +00:00
Vishwanath Venkatesan
d1fc22883a Changing the dynamic component accordingly for the modified interfaces
This commit was SVN r27545.
2012-10-31 22:08:25 +00:00
Vishwanath Venkatesan
67463de96f Changing the two-phase component accordingly for the modified interfaces.
This commit was SVN r27544.
2012-10-31 22:07:02 +00:00
Vishwanath Venkatesan
2922fa28a6 Changes to the interface for extracting timing information,
to avoid accessing datastructures across frameworks.

This commit was SVN r27543.
2012-10-31 22:03:05 +00:00
Nathan Hjelm
2acd0f83de Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter".
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.

This commit was SVN r27527.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-30 19:45:18 +00:00
Brian Barrett
c0f1775620 Fix warnings in nbc
This commit was SVN r27514.
2012-10-29 19:52:43 +00:00
Ralph Castain
6aac54b02e Revert r27510, r27509, and r27508.
Not sure what happened here, but the resulting trunk wouldn't even configure. After spending time fixing that problem, I found it wouldn't compile due to multiple syntax errors that had been introduced in both the OPAL and OMPI layer. This raised questions as to the completeness of the work.

Given that the author is departing, I pinged Jeff about it and we agreed to revert this for now. Hopefully, it can either be fixed by the author prior to actual departure, or someone else can pick it up (now that it is in the history) and fix it.

This commit was SVN r27511.

The following SVN revision numbers were found above:
  r27508 --> open-mpi/ompi@12c3c743de
  r27509 --> open-mpi/ompi@79e4a8ca38
  r27510 --> open-mpi/ompi@1ad5ff625a
2012-10-27 16:43:45 +00:00
Shiqing Fan
1ad5ff625a Don't know why these lines are modified, probably by subversion. Although it is harmless, change it back as what it was.
This commit was SVN r27510.
2012-10-27 03:04:45 +00:00
Shiqing Fan
12c3c743de Per the MemPin RFC, submit the component source files, and update the memchecker macros.
This commit was SVN r27508.
2012-10-27 02:48:20 +00:00
Ralph Castain
b1e119fe2c Add missing header files to tarball
This commit was SVN r27504.
2012-10-26 23:06:31 +00:00
Edgar Gabriel
3291e3abe3 make ROMIO operational with OpenMPI for PVFS2 file systems.
This commit was SVN r27501.
2012-10-26 18:43:59 +00:00
Brian Barrett
8b40c0de9b * Lock around tag management, so that it's thread safe
* Only register the progress function on first call to a non-blocking
  collective operation, to try to reduce overall performance impact
* Fix tag management in roll-over case

This commit was SVN r27498.
2012-10-26 15:36:09 +00:00
Brian Barrett
51a3ec2d7b remove now unused (after r27485) fake mpool component
This commit was SVN r27486.

The following SVN revision numbers were found above:
  r27485 --> open-mpi/ompi@a1a52c9e90
2012-10-25 21:52:17 +00:00
Brian Barrett
a1a52c9e90 Rather than use the fake mpool for handling callbacks into the MX library,
use the memory hooks interface (which does allow for multiple callbacks
to be registered) directly.

This commit was SVN r27485.
2012-10-25 21:50:07 +00:00
Ralph Castain
e6014bf2e1 Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
This commit was SVN r27477.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Yael Dayan
d6f7e4eb73 openib: modified Mellanox ConnectIB max_inline_data param
This commit was SVN r27457.
2012-10-18 15:59:18 +00:00
Nathan Hjelm
d59034e6ef MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
cmr:v1.7

This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Vishwanath Venkatesan
6e3c9754a9 Initialization issue + fixing error in determining minimum value.
This commit was SVN r27443.
2012-10-11 23:41:25 +00:00
Vishwanath Venkatesan
95d38fdaf5 # Extracting timing information for the static collective write/read algorithms.
# The processes register their information and continue.
# Actual printing of timing information happens at file close.
# Triggered by MCA parameter at runtime

This commit was SVN r27442.
2012-10-11 21:27:47 +00:00
Vishwanath Venkatesan
240d56feeb # Extracting timing information for the two-phase collective write/read algorithms.
# The processes register their information and continue.
# Actual printing of timing information happens at file close.
# Triggered by MCA parameter at runtime

This commit was SVN r27441.
2012-10-11 21:25:30 +00:00
Vishwanath Venkatesan
7bc35f7862 # Extracting timing information for the dynamic collective write/read algorithms.
# The processes register their information and continue.
# Actual printing of timing information happens at file close.
# Triggered by MCA parameter at runtime

This commit was SVN r27440.
2012-10-11 21:23:24 +00:00
Vishwanath Venkatesan
9eeb3b5d50 # Extracting timing information for individual components of collective algorithm using a generic queue.
# This is triggered based on a mca-paramater and can be used with all collective modules.
# Individual queues maintained for read and write.
# The additional communication to combine data is done at file-close so that the 
  actual timing of collective-operations will not get affected. 
# The queues are initialized in file-open

This commit was SVN r27439.
2012-10-11 21:14:07 +00:00
Jeff Squyres
2ab48b997e Give a better verbose message if we're able to make an RC QP and we
didn't want one.

This commit was SVN r27438.
2012-10-11 19:21:21 +00:00
George Bosilca
9984a7143f Reorder the loop index.
This commit was SVN r27423.
2012-10-08 21:34:26 +00:00
George Bosilca
b46167fc4a Fix some issues with the MPI_IN_PLACE support.
This commit was SVN r27422.
2012-10-08 21:34:04 +00:00
Vishwanath Venkatesan
c89c9e40be Code to extract neigbhouing offsets information from OMPIO into a file. Driven by an MCA parameter,
turned-off by default.

This commit was SVN r27407.
2012-10-04 21:53:26 +00:00
Vishwanath Venkatesan
c86e5f6263 set status->_ucount correctly for two-phase collective read and write operations in the module
This commit was SVN r27406.
2012-10-04 21:39:37 +00:00
Vishwanath Venkatesan
12363d50bd Setting the default stripe size to 1MB as 64KB is too small for a generic scenario.
This commit was SVN r27405.
2012-10-04 21:26:29 +00:00
Vishwanath Venkatesan
2ddbfbeb26 Setting the default stripe size for lustre to 64KB. This is an MCA parameter, can be overridden at runtime if needed.
This commit was SVN r27404.
2012-10-04 20:56:26 +00:00
Eugene Loh
25ad84b925 Ensure that MPI_Status objects have proper alignment:
- fix the Fortran layer to use new macros to convert Fortran-to-C status
- change the C internals to pull out old OMPI_SET_STATUS* macros

Also, change name of "status" argument in topo_test_f.c to "topo_type".

This commit was SVN r27403.
2012-10-04 14:39:51 +00:00
Yael Dayan
0122cf6cbb openib: added Mellanox ConnectIB device ID and params to the device parameters ini file
This commit was SVN r27402.
2012-10-04 13:20:47 +00:00
Pavel Shamis
e9df33c10b Fixing a coupe of issues in the iboffload code. It should fix the XRC compilation issue.
This commit was SVN r27400.
2012-10-03 15:44:51 +00:00
Jeff Squyres
8c369224bf More common/verbs improvements:
* Add OMPI_COMMON_VERBS_FLAGS_NOT_RC, which looks for a device	that
   does ''not'' support RC
 * Add ompi_common_verbs_find_max_inline(), and	remove that code from
   the openib BTL component

This commit was SVN r27393.
2012-10-03 00:57:39 +00:00
Nathan Hjelm
9200473a59 mtl/psm: do not let psm set processor affinity
This commit was SVN r27389.
2012-10-02 15:56:51 +00:00
Vishwanath Venkatesan
76dc8a7c55 Fix for two_phase_read_all for multiple cycles.
This commit was SVN r27388.
2012-10-01 19:14:14 +00:00
Ralph Castain
54db4c35eb Get the trunk to build again when --without-hwloc is specified. Move a couple of key type definitions and utilities out from under the HAVE_HWLOC test so they are always available as they don't really depend on hwloc's presence. Tell two compnents not to build if hwloc is disabled:
ompi/mca/sbgp/basesmsocket
orte/mca/rmaps/lama

Remove stale configure.params files from the sbgp framework as the OMPI build system no longer looks at those files.

This commit was SVN r27377.
2012-09-26 23:24:27 +00:00
George Bosilca
48f528f142 icc complains about the missing prototype.
This commit was SVN r27373.
2012-09-26 09:56:14 +00:00
George Bosilca
890dedf13f Cleanup.
This commit was SVN r27372.
2012-09-26 09:44:46 +00:00
Vishwanath Venkatesan
c6751eaf70 Fixing all two_phase read_all bugs,
1. Multiple aggregator with non-contiguous datatype,
2. Memory corruption bugs.

Cleaned version, with proper initialization and memory management.

This commit was SVN r27370.
2012-09-26 00:16:08 +00:00
Vishwanath Venkatesan
2e2a46b2be Fixing two-phase write all bug for non-contiguous file-type.
This commit was SVN r27369.
2012-09-26 00:10:40 +00:00
Jeff Squyres
30d9c36275 FreeBSD detection improvement. Thanks to Brooks Davis for the patch.
This commit was SVN r27334.
2012-09-13 13:25:04 +00:00
Jeff Squyres
3cc8b0461a More updates to common verbs infrastructure:
* Moved "check basics" sanity check from openib BTL to common/verbs
   (which also allows us to have openib ''not'' include
   <infiniband/driver.h>, which is a Very Good Thing)
 * Add new ompi_common_verbs_qp_test() function, which tests to see
   whether a device supports RC and/or UD QPs.  The openib BTL now
   uses this function to ensure that the device supports RC QPs.
 * Rename ompi_common_verbs_find_ibv_ports() to be
   ompi_common_verbs_find_ports() -- the "ibv" was redundant.
 * Re-work ompi_common_verbs_find_ports() to use
   ompi_common_verbs_qp_test() instead of testing for RC/UD QPs itself
 * Add bunches of opal_output_verbose() to the find_ports() routine
   (to help diagnosing connectivity problems -- imaging running with
   --mca btl_base_verbose 10; you'll see all the find_ports() test
   results)
 * Make ompi_common_verbs_qp_test() warn if devices/ports are supplied
   in the if_include/if_exclude strings that do not exists (quite
   similar to what the openib BTL does today).
 * Add ompi_common_verbs_mca_register() function, which registers
   common verbs MCA params.  It will also register MCA param synonyms
   for thse MCA params to upper-level components (e.g.,
   btl_<upper-level-component>_<the-mca-param>). 
   * common_verbs_warn_nonexistent_if: warn if
     if_include/if_exclude-specified devices or ports do not exist.  

This commit was SVN r27332.
2012-09-12 20:47:47 +00:00
Pavel Shamis
1e7b958c2a Cleaning warning in collectives code
This commit was SVN r27331.
2012-09-12 19:47:23 +00:00
Jeff Squyres
171f6efd70 Don't free heap objects!
This commit was SVN r27326.
2012-09-12 15:11:56 +00:00
Jeff Squyres
fb2e543a57 Refs trac:3275.
We ran into a case where the OMPI SVN trunk grew a new acceptable MCA
parameter value, but this new value was not accepted on the v1.6
branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts
the value "silent", but on the older v1.6 branch, it doesn't).  If you
set "hwloc_base_mem_bind_failure_action=silent" in the default MCA
params file and then accidentally ran with the v1.6 branch, every OMPI
executable (including ompi_info) just failed because hwloc_base_open()
would say "hey, 'silent' is not a valid value for
hwloc_base_mem_bind_failure_action!".  Kaboom.

The only problem is that it didn't give you any indication of where
this value was being set.  Quite maddening, from a user perspective.

So we changed the ompi_info handles this case.  If any framework open
function return OMPI_ERR_BAD_PARAM (either because its base MCA params
got a bad value or because one of its component register/open
functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out
a warning that it received and error, and then dump out the parameters
that it has received so far in the framework that had a problem.

At a minimum, this will show the user the MCA param that had an error
(it's usually the last one), and ''where it was set from'' (so that
they can go fix it).  

We updated ompi_info to check for O???_ERR_BAD_PARAM from each from
the framework opens.  Also updated the doxygen docs in mca.h for this
O???_BAD_PARAM behavior.  And we noticed that mca.h had MCA_SUCCESS
and MCA_ERR_??? codes.  Why?  I think we used them in exactly one
place in the code base (mca_base_components_open.c).  So we deleted
those and just used the normal OPAL_* codes instead.

While we were doing this, we also cleaned up a little memory
management during ompi_info/orte-info/opal-info finalization.
Valgrind still reports a truckload of memory still in use at ompi_info
termination, but they mostly look to be components not freeing
memory/resources properly (and outside the scope of this fix).

This commit was SVN r27306.

The following Trac tickets were found above:
  Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275
2012-09-11 20:47:24 +00:00
Shiqing Fan
ec4cf39925 Windows doesn't need to exclude any interface by default. This will avoid tcp warnings.
This commit was SVN r27291.
2012-09-11 15:39:37 +00:00
Shiqing Fan
0c4c2a5f5d Revert r27283. A better solution is found. Thanks to Ralph anyway.
This commit was SVN r27290.

The following SVN revision numbers were found above:
  r27283 --> open-mpi/ompi@38bcd86ae4
2012-09-11 15:37:22 +00:00
Ralph Castain
38bcd86ae4 Per request by Shiqing, specifically exclude the "lo" interface from the TCP btl. Apparently, Windows sometimes fails to resolve the 127.0.0.1 to "lo", causing subsequent failures.
This commit was SVN r27283.
2012-09-10 16:22:46 +00:00
Vishwanath Venkatesan
6d9d0f2968 Initialize the iov_count, this crashes static write/read in certain platforms while decoding datatype
This commit was SVN r27273.
2012-09-08 00:40:21 +00:00
Jeff Squyres
dd254cc202 OMPI_HAVE_IBV_LINK_LAYER does not exist. Instead, check defined(HAVE_IBV_LINK_LAYER_ETHERNET).
This commit was SVN r27251.
2012-09-06 18:25:36 +00:00
Jeff Squyres
0d2962ebf0 Fixes trac:3294: space for the periods has already been allocated by
ompi_comm_split(), and the entire set of periods from the old
communicator have already been copied to the new communicator.  But up
here in mca_topo_base_cart_sub(), we need to subset the periods that
are actually stored on the new communicator according to remain_dims
(just like we did for the set of dimensions).

This commit renames a few variables to be a little less misleading,
and then adds a loop to copy over the periods information.  I could
have added this into the first loop (that subset-copies the
dimensions), but this code is already confusing enough and this is not
a performance-critical section: so I made it a new loop.

Note that all the topo code will be revamped a bit when the new
MPI-2.2 topo stuff (currently off in a mercurial branch) finally makes
it back to the SVN trunk.  But that new stuff will only get to v1.7 --
this commit will need to be CMR'ed to v1.6.x.

cmr:v1.7
cmr:v1.6.2

This commit was SVN r27248.

The following Trac tickets were found above:
  Ticket 3294 --> https://svn.open-mpi.org/trac/ompi/ticket/3294
2012-09-06 14:16:29 +00:00
Vishwanath Venkatesan
b75d877a3f Removing .ompi_ignore for the lustre component.
This commit was SVN r27247.
2012-09-05 22:20:18 +00:00
Vishwanath Venkatesan
640aca6654 Modifying the file view generation to remove the merging of offset-length pair.
Its no longer needed as the default file view makes sure the chunks are large enough.

This commit was SVN r27246.
2012-09-05 21:00:47 +00:00
Brian Barrett
fa4c2af9ed THe Portals 4 reference implementation will sometimes return a NI_FLOWCTL for both a
send and an ack.  I'm not sure whether this violates the spec, so work around until
we decide...

This commit was SVN r27244.
2012-09-05 19:36:19 +00:00
Jeff Squyres
9feb8d8879 Oops; the error paths were not correct on the initial commit. Fixed.
This commit was SVN r27228.
2012-09-04 15:48:44 +00:00
Aleksey Senin
33ae1fe6c7 Fix untitialized return code in ompi_mtl_mxm_add_procs function.
This commit was SVN r27216.
2012-09-02 13:17:49 +00:00
Yevgeny Kliteynik
3fe239702a Fixed compilation error
Thanks to Alex Margolin for the fix

This commit was SVN r27215.
2012-09-02 08:26:30 +00:00
Jeff Squyres
341ce2f9a4 Per some discussions between LANL, Cisco, ORNAL, and Mellanox, move
some new common OpenFabrics functionality to ompi/mca/common/verbs.
Also move everything that was in ompi/mca/common/ofautils under
ompi/mca/common/verbs.  

 * Move ofautils -> verbs
 * Add new functionality in ompi/mca/common/verbs (see doxygen
 * comments in ompi/mca/common/verbs/common_verbs.h for details):
   * ompi_common_verbs_find_ibv_ports()
   * ompi_common_verbs_port_bw()
   * ompi_common_verbs_mtu()
   * '''If you're writing verbs-based code, you should be using this
     common functionality'''
 * Adapt openib BTL to use some trivial common functionality in
   common/verbs
 * Don't use "#ifdef OMPI_HAVE_RDMAOE",use 
   "#if defined(HAVE_IBV_LINK_LAYER_ETHERNET)"
 * Update the following to include/link against common/verbs
   * bcol/iboffload
   * sbgp/ibnet
   * btl/openib

This commit was SVN r27212.
2012-09-01 01:42:37 +00:00
Pavel Shamis
888b04ab36 Fixing gcc 4.7.1 warning in ptpcoll bcol. Refs trac:3243.
This commit was SVN r27209.

The following Trac tickets were found above:
  Ticket 3243 --> https://svn.open-mpi.org/trac/ompi/ticket/3243
2012-08-31 21:16:58 +00:00
Ralph Castain
6319014ab0 Sigh - get the end of the loop at the right place
This commit was SVN r27197.
2012-08-31 15:54:11 +00:00
Ralph Castain
7ac257e169 At least prevent the segfault if a proc isn't found in a sparse group
This commit was SVN r27196.
2012-08-31 15:13:52 +00:00
Pavel Shamis
ecbbfcd6dd Fixing typo in iboffload code.
Refs trac:3243

This commit was SVN r27182.

The following Trac tickets were found above:
  Ticket 3243 --> https://svn.open-mpi.org/trac/ompi/ticket/3243
2012-08-29 21:48:47 +00:00
Pavel Shamis
8cf3c95494 Fixing ML COLL compilation issues on some SUN platforms. For more detail see following mail thread:
http://www.open-mpi.org/community/lists/devel/2012/08/11448.php
A lot of thanks to Paul Hargrove for the issue analysis and patch testing.
Refs trac:3243

This commit was SVN r27178.

The following Trac tickets were found above:
  Ticket 3243 --> https://svn.open-mpi.org/trac/ompi/ticket/3243
2012-08-29 14:10:42 +00:00
Aleksey Senin
68e0894a58 MXM send/recv request changes.
Adapt OMPI to the latest MXM changes in send/recv request.
Use memory handle structure instead of memory key.

This commit was SVN r27155.
2012-08-28 05:57:36 +00:00
Vishwanath Venkatesan
6ee377c4f5 Modifying the file-open to use the amode argument instead of file structure values for lustre component
This commit was SVN r27154.
2012-08-27 21:13:23 +00:00
Vishwanath Venkatesan
91104cbdca Modifying the file-open to use the amode argument instead of file structure values.
This commit was SVN r27153.
2012-08-27 21:12:56 +00:00
Vishwanath Venkatesan
bf58af295b Changes to the two_phase implementation, for supporting the
data-seiving feature of two-phase algorithm. 

This commit was SVN r27152.
2012-08-27 21:11:05 +00:00
Vishwanath Venkatesan
960c47f604 Changes to io_ompio.c to support data-seiving in two-phase I/O.
This commit was SVN r27151.
2012-08-27 21:09:08 +00:00
Yevgeny Kliteynik
8b5d634231 Enable support for FCA v2.5
This commit was SVN r27145.
2012-08-26 15:20:46 +00:00
Jeff Squyres
e5babf830a Fixes trac:3258: add btl_openib_abort_not_enough_reg_mem MCA parameter
that causes MPI jobs to abort if there is not enough registered memory
available (vs. just warning).

This commit was SVN r27140.

The following Trac tickets were found above:
  Ticket 3258 --> https://svn.open-mpi.org/trac/ompi/ticket/3258
2012-08-25 11:39:06 +00:00
Ralph Castain
e0c39c94e8 Complete the cleanup of the preload files system. Remove the dest_dir option as moving things to arbitrary locations - especially absolute paths - can prove disastrous. Remove the preload_libs option as these can be treated as just files. Cleanup some of the pack/unpack code as the dss handles NULL strings just fine. Deal a little better with absolute paths, noting that tar now strips the leading '/' for us (showing my age as it didn't used to do so).
Remove the odls_base_state.c file as that code is now covered by the new broadcast form of preload_files.

This commit was SVN r27127.
2012-08-24 02:28:29 +00:00
Shiqing Fan
d141d94bd7 Include the new .windows files into the tarball.
This commit was SVN r27121.
2012-08-23 12:50:51 +00:00
Pavel Shamis
0c10bc9853 Fixing iboffload compilation issues on some MLNX platforms (on behalf of Joshua Ladd). Refs trac:3243
This commit was SVN r27120.

The following Trac tickets were found above:
  Ticket 3243 --> https://svn.open-mpi.org/trac/ompi/ticket/3243
2012-08-23 12:33:23 +00:00
Shiqing Fan
cc20409f60 A few more header protection.
Replace "ERROR" with "error".
Remove redefinitions of mca_sbgp_base_module_t.

This commit was SVN r27109.
2012-08-22 14:53:46 +00:00
Shiqing Fan
95b9552546 include several components for Windows build.
This commit was SVN r27108.
2012-08-22 14:46:49 +00:00
Shiqing Fan
9986cea044 BEGIN_C_DECLS is missing.
This commit was SVN r27107.
2012-08-22 14:14:45 +00:00
Shiqing Fan
f746fe152f * change variable iov_len to iovec_len, in order to fix the conflict with the io vector support on Windows.
* several include header protection
* do not use ERROR, it's preserved for Visual Studio, use error instead.

This commit was SVN r27106.
2012-08-22 13:36:23 +00:00
Shiqing Fan
b0ef486304 exclude one file that is not compatible for Windows.
This commit was SVN r27105.
2012-08-22 13:06:33 +00:00
Pavel Shamis
5cedbb843c Fixing compilation problems in ML collective component on SUN's systems. Thank you to Eugene Loh (Oracle) for discovering the problem and pin-pointing the solution. Refs trac:3243.
This commit was SVN r27100.

The following Trac tickets were found above:
  Ticket 3243 --> https://svn.open-mpi.org/trac/ompi/ticket/3243
2012-08-21 17:43:24 +00:00
Jeff Squyres
c8cee23ee7 Priorities really shouldn't be less than 0.
This commit was SVN r27098.
2012-08-21 15:47:15 +00:00
Ralph Castain
dacb07000d Turn udcm and ud oob off by default, but allow them to build and be used if someone wants to test them
cmr:v1.7

This commit was SVN r27097.
2012-08-21 15:18:34 +00:00
Pavel Shamis
d5628fa62b More warnings clean up in the collectives code on behalf of Joshua Ladd. Refs trac:3243.
This commit was SVN r27090.

The following Trac tickets were found above:
  Ticket 3243 --> https://svn.open-mpi.org/trac/ompi/ticket/3243
2012-08-17 17:05:31 +00:00
Pavel Shamis
6fac989588 Cleaning warnings in collectives code. Refs trac:3243.
This commit was SVN r27089.

The following Trac tickets were found above:
  Ticket 3243 --> https://svn.open-mpi.org/trac/ompi/ticket/3243
2012-08-17 15:36:13 +00:00
Jeff Squyres
7642656aa7 Add more missing files so that dist tarballs aren't borked. Refs trac:3243.
This commit was SVN r27086.

The following Trac tickets were found above:
  Ticket 3243 --> https://svn.open-mpi.org/trac/ompi/ticket/3243
2012-08-17 00:47:10 +00:00
Jeff Squyres
2102c05504 Add missing .windows files. Refs trac:3243.
This commit was SVN r27083.

The following Trac tickets were found above:
  Ticket 3243 --> https://svn.open-mpi.org/trac/ompi/ticket/3243
2012-08-16 23:38:03 +00:00
Ralph Castain
69753c37ef Turn off one place that won't compile if ompi progress threads enabled because it calls a non-existent function
This commit was SVN r27082.
2012-08-16 22:53:14 +00:00
Jeff Squyres
fc3ecd5d5a Remove generated file.
This commit was SVN r27080.
2012-08-16 22:08:04 +00:00
Ralph Castain
eda4cd5aa7 Cleanup warnings for improper use of C++ comment style, set ignores
This commit was SVN r27079.
2012-08-16 21:52:14 +00:00
Pavel Shamis
b89f8fabc9 Adding Hierarchical Collectives project to the Open MPI trunk.
The project includes following components and frameworks: 
- ML Collective component
- NETPATTERNS and COMMPATTERNS common components
- BCOL framework
- SBGP framework

Note: By default the ML collective component is disabled. In order to enable
new collectives user should bump up the priority of ml component (coll_ml_priority)

=============================================

Primary Contributors (in alphabetical order):

Ishai Rabinovich (Mellanox)
Joshua S. Ladd (ORNL / Mellanox)
Manjunath Gorentla Venkata (ORNL)
Mike Dubman (Mellanox)
Noam Bloch (Mellanox)
Pavel (Pasha) Shamis (ORNL / Mellanox)
Richard Graham (ORNL / Mellanox)
Vasily Filipov (Mellanox)

This commit was SVN r27078.
2012-08-16 19:11:35 +00:00
Jeff Squyres
a4e97fb4c0 Ensure we assign "err" properly when invoking MCA_PML_CALLs. Although
technically this is a necessary thing to do, it wasn't a tragedy that
we didn't have it because err was initialize to 0 in the beginning of
the functions where this problem occurred.  Also, OMPI will likely
abort if one of the MCA_PML_CALLs actually incurs an error (or, even
if it doesn't, MPI doesn't define the behavior anyway ;-) ).  

But looking forward to an FT-aware world, fixing this issue is a Good
Thing.  Many thanks to Hristo Iliev for pointing out the issue.

This commit was SVN r27070.
2012-08-16 17:49:48 +00:00
Yael Dayan
b3b8a2a23a function mca_btl_openib_endpoint_post_send can return 3 statuses:
- OMPI_SUCCESS
- OMPI_ERROR
- OMPI_ERR_RESOURCE_BUSY

If an "OMPI_ERR_OUT_OF_RESOURCE" occurs, the request is added to the pending list, and will be handled later. An error message 
should not be printed to the user in this case. This is not an error, but rather a notification of a possible valid condition.
Only in the case of "OMPI_ERROR" should it be printed to the user.

This commit was SVN r27065.
2012-08-16 07:04:40 +00:00
Christopher Yeoh
cc091f4979 Adds synchronisation between main thread and service thread in
btl_openib_connect_udcm when notifying not to listen to an fd to ensure
that the main thread does not continue until the service thread has
processed the message

Adds ability to send message to openib async thread to tell it to
ignore the ERR state on a specific QP. Adds this call to udcm_module_finalize
so when we set the error state on the QP it doesn't cause the 
openib async thread to abort the mpi program prematurely

Fixes trac:3161

This commit was SVN r27064.

The following Trac tickets were found above:
  Ticket 3161 --> https://svn.open-mpi.org/trac/ompi/ticket/3161
2012-08-16 03:56:21 +00:00
Samuel Gutierrez
7867330dcc Fix the PSM MTL in trunk by gathering node locality information differently.
This commit was SVN r27063.
2012-08-16 00:50:24 +00:00
Nathan Hjelm
4bde7f3efe silence warning
This commit was SVN r27019.
2012-08-13 17:43:36 +00:00
Nathan Hjelm
702e6d5a68 pml/ob1: fix bugs in mca_pml_ob1_recv_request_progress_rget
This commit was SVN r27018.
2012-08-13 16:26:06 +00:00
Jeff Squyres
02e2c88224 Back out r26869 (i.e., put back a single per-peer QP in the default
receive queues value) so that we don't break the use of RDMA CM, and
therefore break RoCE.

This commit was SVN r27017.

The following SVN revision numbers were found above:
  r26869 --> open-mpi/ompi@fe0e7f81df
2012-08-13 15:57:21 +00:00
Samuel Gutierrez
6188d97e1a Getting out of bed this morning was a bad idea... Reverting the sm update once more because it breaks direct launch. Will address this issue and commit the update once it has all been tested. Sorry everyone!
This commit was SVN r27001.
2012-08-10 22:20:38 +00:00
Jeff Squyres
7390ab8a23 Many updates and bug fixes for the Fortran bindings. Sorry these
aren't separated out into individual commits; they represent a few
months of work in the Mercurial branch, and it seemed error-prone to
try to break them up into multiple SVN commits.

 * Remove 2nd overloaded interfaces for MPI_TESTALL, MPI_TESTSOME,
   MPI_WAITALL, and MPI_WAITSOME in the "mpi" module implementations
   (because we're not allowed to have them, anyway -- it causes
   complications in the profiling interface).  This forced an MPI-2.2
   errata in the MPI Forum; we applied the errata here (the array of
   statuses parameter could not have a specific dimension specified in
   the dummy argument).  Fixes trac:3166.
 * Similarly, fix type for MPI_ARGVS_NULL in Fortran
 * Add MPI_3.0 function MPI_F_SYNC_REG (Fortran interfaces only).
 * Add MPI-3.0 MPI_MESSAGE_NO_PROC in the mpi_f08 module.
 * Added mpi_f08 handle comparison operators, per MPI-3.0 addendum to
   the F08 proposal at the last Forum meeting.  
 * Added missing type(MPI_File) and type(Message) in mpi_f08 module.
 * Fix --disable-mpi-io configure switch with all Fortran interfaces
 * Re-factor the Fortran header files to be fundamentally simpler and
   easier to maintain.  Fortran constant values in the header files
   are now generated by a script named mpif-values.pl during
   autogen.pl (they were previously generated by mpif-common.pl, but
   it was quite a bit more subtle/complex).  A second commit will
   follow this one to update svn:ignore values (just to ensure we
   don't muck up the first commit with the SVN client getting confused
   by the changed ignore values and new/changed files).
 * Fix some dependencies for compile ordering in
   ompi/mpi/fortran/use-mpi-ignore-tkr/Makefile.am. 
 * Fix bad wording in several places (.m4 file name, ompi_info output,
   etc.): we previoulsy said "F08 assumed shape" when we really meant
   "F08 assumed rank" (for Fortran gurus, those are very different
   things). 
 * Removed the GREEK/SVN version string from mpif.h.  It really had no
   purpose being there.

Still to be done:

 * Handling of 2D array of strings in MPI_COMM_SPAWN_MULTIPLE still
   isn't right yet.  Not sure how many people really care about this
   :-), but it is still broken.

This commit was SVN r26997.

The following Trac tickets were found above:
  Ticket 3166 --> https://svn.open-mpi.org/trac/ompi/ticket/3166
2012-08-10 21:19:47 +00:00
Samuel Gutierrez
159bd2e62e Let's try this again: sm BTL initialization via modex.
This commit was SVN r26989.
2012-08-10 20:12:36 +00:00
Samuel Gutierrez
6a70063812 Yikes - that's not right! Back out 26987. I'll try again in a bit... Sorry!
This commit was SVN r26988.
2012-08-10 19:57:51 +00:00
Samuel Gutierrez
2c80273246 sm BTL initialization via modex.
This commit was SVN r26987.
2012-08-10 19:51:41 +00:00
Yael Dayan
79e6b9c91d Adapt OMPI to use newer version of MXM.
This commit was SVN r26974.
2012-08-08 15:29:38 +00:00
Yael Dayan
954bcdc0a5 adapt the way to find amount of local processes to OMPI trunk.
This commit was SVN r26973.
2012-08-08 15:26:28 +00:00
George Bosilca
3e288aaef6 Indentation.
This commit was SVN r26961.
2012-08-07 12:46:47 +00:00
Shiqing Fan
2f442799f8 fix several typecasts
This commit was SVN r26957.
2012-08-07 10:41:53 +00:00
Yael Dayan
7895cd1114 adding a fragmentation mechanism to the Get flow in function mca_pml_ob1_recv_request_progress_rget
This commit was SVN r26956.
2012-08-07 07:15:21 +00:00
Vasily Filipov
fc712182db MTL MXM: make MXM use MXM_VERSION macro for MXM version checking.
This commit was SVN r26952.
2012-08-06 06:35:57 +00:00
Ralph Castain
05845214b8 Add missing include file
This commit was SVN r26934.
2012-08-01 04:08:47 +00:00
Vasily Filipov
c386847d9a MTL MXM: Adding MXM version protect for Mprobe, Mrecv resources.
This commit was SVN r26922.
2012-07-31 07:57:25 +00:00
Vishwanath Venkatesan
dccfd18481 1. Removing two-phase support functions
2. Moving nbc headers to a seperate header file and modifying
io_ompio_nbc.c accordingly.

This commit was SVN r26921.
2012-07-31 04:39:13 +00:00
Vishwanath Venkatesan
539571171b Moving support functions of two-phase to the two_phase component.
This commit was SVN r26920.
2012-07-31 04:37:04 +00:00
Edgar Gabriel
fb64322dc3 this code section was supposed to be commented out...
This commit was SVN r26918.
2012-07-30 20:46:07 +00:00
Edgar Gabriel
1078f13ad2 set status->_ucount correctly for collective read and write operations in the module
This commit was SVN r26916.
2012-07-30 20:14:36 +00:00
Edgar Gabriel
91c8577d9d fix in the offset calculation for explicit offset operations.
This commit was SVN r26915.
2012-07-30 20:08:00 +00:00
Edgar Gabriel
81a050add9 simplify the individual fcoll module by just calling the
mca_io_ompio_file_read/write functions directly. Avoid replicating the code in 
both places.

This commit was SVN r26909.
2012-07-30 15:44:22 +00:00
Edgar Gabriel
66c5a80dfd - get rid of a warning about an unused variable
- return MPI_ERR_OTHER instead of MPI_SUCCESS for the functions that are not
  yet implemented 
- add another field to the mca_io_ompio_file_t structure to point back to the
  ompi_file_t structure.

This commit was SVN r26908.
2012-07-30 15:29:59 +00:00
Jeff Squyres
9f8265eccb The files for automake to generate are specified via AC_CONFIG_FILES
in the */configure.m4 files.  configure.params files are obsolete.

This commit was SVN r26897.
2012-07-27 14:33:17 +00:00
Yevgeny Kliteynik
a6458da4ba Using 8K as a minimal CQ length
- For now we'll use 8192 as a base value
 - We leave the adjust_cq() as is
 - For the long term we can work on an appropriate setting to expose through the INI file.

8K CQEs are 512K per process, which is 8MB for ppn=16

This commit was SVN r26877.
2012-07-26 21:06:18 +00:00
Nathan Hjelm
8736953c7f btl/openib/connect improve the help message printed when a queue pair can not be created
This commit was SVN r26876.
2012-07-26 20:36:46 +00:00
Shiqing Fan
204fbfe4b1 update the wv btl component.
This commit was SVN r26872.
2012-07-26 15:35:01 +00:00
George Bosilca
3a8478827b Fix the MPI_Cancel issue identified by Fujitsu. And a typo.
This commit was SVN r26871.
2012-07-26 14:06:24 +00:00
Vasily Filipov
4e66ff030b MTL MXM Mrecv: adding missed return message to a free list.
This commit was SVN r26870.
2012-07-26 11:22:22 +00:00
Nathan Hjelm
fe0e7f81df btl/openib: as discussed remove the per-peer queue pair from the default configuration
This commit was SVN r26869.
2012-07-25 22:53:58 +00:00
Jeff Squyres
5ec6a65a72 After I spent a while looking in libibverbs for
ibv_get_device_list_compat() and not finding it, I finally realized
that it was a function in OMPI.  So let's name it with a proper ompi_
prefix, not an ibv_ prefix.

This commit was SVN r26867.
2012-07-25 16:32:51 +00:00
Vasily Filipov
ef9bd8e4cb MTL MXM: MPI_Mprobe, MPI_Mrecv implementation for MXM adding.
This commit was SVN r26866.
2012-07-25 13:26:40 +00:00
George Bosilca
118e30a2ac Typo.
This commit was SVN r26865.
2012-07-25 12:42:31 +00:00
George Bosilca
b6f4bc9656 size_t not int everywhere. Correctly compute with size_t (don't initialize
it to a negative number). Get rid of the multiplication in the critical
path, and keep the functions as simple as possible.

This commit was SVN r26864.
2012-07-25 12:41:53 +00:00
Yael Dayan
8cad1d6481 while working on a fix for the Get flow in pml, I've encountered a problem in "mca_pml_ob1_compute_segment_length" function, at pml_ob1.h file.
The return value from this function was truncated from size_t to int. This fix changes the return value to type size_t.

This commit was SVN r26863.
2012-07-25 12:08:41 +00:00
Jeff Squyres
9cb3f31b50 Odd; this compiled on OS X without needing #include "opal_stdint.h".
Linux appears to need it.  Shrug.

This commit was SVN r26858.
2012-07-24 13:47:24 +00:00
Jeff Squyres
6f5fd6245f Add missing %d
This commit was SVN r26857.
2012-07-24 13:33:11 +00:00
Jeff Squyres
0b4a659683 Stomp some compiler warnings; use proper printf sequences for uint64_t.
This commit was SVN r26856.
2012-07-24 13:03:55 +00:00
Jeff Squyres
e66d386441 Add a new missing field to the template BTL module that was causing a
bunch of compiler warnings.

This commit was SVN r26855.
2012-07-24 12:55:12 +00:00
Mike Dubman
4784253f5c revert commit, breaks backwards compatability, will be revised
This commit was SVN r26852.
2012-07-24 11:48:18 +00:00
Vasily Filipov
99bd5977bd MTL MXM: small fix in the mxm_req_probe func interface.
This commit was SVN r26850.
2012-07-24 08:46:38 +00:00
George Bosilca
6ebbacb054 Complete the dump function for the SM BTL. Now we can see all fragments in all
the queues as long as the BTL is dump-friendly (only SM right now).

This commit was SVN r26849.
2012-07-24 00:22:22 +00:00
George Bosilca
55bc3c4763 Fix the copyright.
This commit was SVN r26848.
2012-07-24 00:20:24 +00:00
George Bosilca
1ad6c82015 Implement the dump function for the PML OB1.
This commit was SVN r26847.
2012-07-24 00:19:18 +00:00
Samuel Gutierrez
76d94bf9bf Plug leak. Thanks, Nathan.
This commit was SVN r26846.
2012-07-23 21:11:21 +00:00
Samuel Gutierrez
8096852a16 Towards RML-less shared-memory initialization (primarily for eventual BTL
move).  Extended common sm API with: mca_common_sm_module_create and
mca_common_sm_module_attach. Please note that the new routines aren't currently
used -- but will be...

This commit was SVN r26845.
2012-07-23 19:38:13 +00:00
Eugene Loh
10e3dc396b Add a missing return value.
This commit was SVN r26815.
2012-07-20 01:32:06 +00:00
Brian Barrett
2518014037 Fix a number of issues with IN_PLACE
This commit was SVN r26814.
2012-07-19 21:29:43 +00:00
Nathan Hjelm
cd2cbdca09 btl/openib: limit each process to a ppn fraction of the available registered memory when using mellanox hardware (mlx4 and mthca). fixed
This commit was SVN r26811.
2012-07-19 17:52:21 +00:00
Ralph Castain
66fe57f746 Revert r26804 so openib can build again
This commit was SVN r26810.

The following SVN revision numbers were found above:
  r26804 --> open-mpi/ompi@610be870f9
2012-07-19 16:16:38 +00:00
Ralph Castain
44bd855717 Silence warnings
This commit was SVN r26808.
2012-07-19 14:29:32 +00:00
Vasily Filipov
597a422272 MTL: make MXM work with read (in blocking send case) call-backs.
This commit was SVN r26807.
2012-07-19 13:28:06 +00:00
Nathan Hjelm
610be870f9 btl/openib: limit each process to a ppn fraction of the available registered memory when using mellanox hardware (mlx4 and mthca)
This commit was SVN r26804.
2012-07-18 17:29:48 +00:00
Nathan Hjelm
4a97ecbdd2 btl/openib: remove tab characters
This commit was SVN r26803.
2012-07-18 17:29:37 +00:00
Eugene Loh
a3e02fdaff With non-blocking collectives, a "round schedule" could fall on any address
alignment, which typically causes problems on SPARC.  Further, the pointer
manipulation to access elements in a round schedule was clumsy.  This change
introduces macros to facilitate addressing and make it more portable.

This commit was SVN r26802.
2012-07-18 17:08:24 +00:00
Nathan Hjelm
771b427027 udcm: unmonitor the fd BEFORE tearing down the listen qp
This commit was SVN r26800.
2012-07-18 14:22:45 +00:00
Nathan Hjelm
35de50b823 remove the elan btl
This commit was SVN r26798.
2012-07-17 14:51:41 +00:00
Nathan Hjelm
fc1b295606 udcm: evict from the lru of the openib device's grdma mpool if a qp can not be created. Note: there doesn't appear to be a standard way to differentiate between ibv_create_qp failing because the node is out of registered memory and failing because no more qps are available
This commit was SVN r26797.
2012-07-14 01:58:29 +00:00
Nathan Hjelm
3798f38386 do not print out an error message if ibv_reg_mr fails
This commit was SVN r26796.
2012-07-14 01:35:45 +00:00
Abhishek Kulkarni
1878f276cd Replace the pattern while(flag) { opal_progress() }; in the C/R code
with the ORTE_WAIT_FOR_COMPLETION macro.

This commit was SVN r26794.
2012-07-13 23:31:56 +00:00
Nathan Hjelm
4d1920ee87 Fix a bug on 32-bit systems introduced by r26626. This fix ensures that all supported btls (with exception of wv-- shiqing will need to help bring that one up to date with r26626) set the lval in prepare_src/dst when preparing a put or get segment. This fix also ensures a consistent use of lval in put and get for both local and remote segments.
This commit was SVN r26793.

The following SVN revision numbers were found above:
  r26626 --> open-mpi/ompi@249066e06d
2012-07-13 21:19:16 +00:00
Nathan Hjelm
344fe61616 remove assertion in udcm
This commit was SVN r26790.
2012-07-13 15:14:48 +00:00
Jeff Squyres
e719d6ab78 It turns out that "sppp" on the Oracle Mx000 series of servers (where x =
{3, 4, 5, 9}, SPARC VI-based machines) is not a 127.x.y.z interface,
so it needs to stay in the exclude list.

This commit was SVN r26789.
2012-07-13 12:11:41 +00:00
Jeff Squyres
196bc0a53e Update the TCP BTL MCA param btl_tcp_if_exclude default value to use
CIDR notation 127.0.0.1/8 to ignore localhost devices instead of the
imprecise (and not always correct!) "lo,sppp".

This commit was SVN r26788.
2012-07-12 15:13:08 +00:00
Edgar Gabriel
92271c571d set the status field for collective read and write operations.
This commit was SVN r26786.
2012-07-12 10:26:27 +00:00
Nathan Hjelm
b79a61a360 move btl_vader.c to btl_vader_module.c
This commit was SVN r26785.
2012-07-11 20:14:19 +00:00
Terry Dontje
6f3195faca add some missing casts
This commit was SVN r26779.
2012-07-10 18:03:29 +00:00
Nathan Hjelm
05c5c1f412 remove unused i_initiate function from udcm
This commit was SVN r26778.
2012-07-10 17:22:19 +00:00
Jeff Squyres
bb13e21538 Roll back r26730, but bump the default CQ length base up to 1500, not
1000.  Refs trac:3154.

IB/iWarp vendors need to get together to figure out a real fix.

This commit was SVN r26777.

The following SVN revision numbers were found above:
  r26730 --> open-mpi/ompi@5315c91baf

The following Trac tickets were found above:
  Ticket 3154 --> https://svn.open-mpi.org/trac/ompi/ticket/3154
2012-07-10 16:53:27 +00:00
Nathan Hjelm
4c0c937953 Remove use of ompi_ptr_ltop in BTLs. This fixes a crash seen on big-endian 32-bit platforms with MPI one-sided.
This commit was SVN r26776.
2012-07-10 16:18:53 +00:00
George Bosilca
7d6006a5a6 Fix various compiler warnings.
This commit was SVN r26774.
2012-07-10 15:57:15 +00:00
Abhishek Kulkarni
2ca8292f46 Fix a typo in the sm btl (related to CMA support).
This commit was SVN r26772.
2012-07-10 00:12:05 +00:00
Abhishek Kulkarni
5c58a1c9c1 Fix C/R support in the trunk.
Among other things, this patch deals with the following issues:
* fix ompi-checkpoint argument parsing
* ompi-restart -showme prints an extraneous "Restarted child with PID" 
  message. Move around the debug statement to avoid this.
* fixes for the state machine changes

This commit was SVN r26770.
2012-07-09 23:34:13 +00:00
Terry Dontje
43314776ae add cast to correct a type mismatch warning
This commit was SVN r26767.
2012-07-09 18:32:39 +00:00
Edgar Gabriel
d18dad7109 remove the file io_ompio_coll_offset. These routines were the predecessors of
the routines in io_ompio_coll_array. No need to keep both versions around.

This commit was SVN r26766.
2012-07-09 17:12:46 +00:00
Edgar Gabriel
8ae22cacc1 - remove two functions that were not used anymore
- change the location where we mark the file view as contiguous and the
  condition on how it is determined to be contiguous
- remove the unnecessary include statements

This commit was SVN r26763.
2012-07-08 12:57:17 +00:00