1
1
Граф коммитов

15365 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
4859bb82e2 * Update to support direct call
* Add missing cancel (not that it does anything useful)
* Fix bug in opal_output call

This commit was SVN r24269.
2011-01-19 20:49:28 +00:00
Brian Barrett
8f6a19b0fc export component/module interface so that direct call works again
This commit was SVN r24268.
2011-01-19 20:47:17 +00:00
Brian Barrett
b98afd298b update to remove unneeded fields
This commit was SVN r24267.
2011-01-19 20:46:06 +00:00
Rolf vandeVaart
f22f76a6ff Add byte swapping macro for failover control message per jsquyres review.
This commit was SVN r24266.
2011-01-19 19:58:35 +00:00
Rolf vandeVaart
e75b86d3ab Fix some issues from jsquyres review.
1. Use asprintf instead of snprintf
2. Return remote_proc where possible.
3. Remove dead code.
4. Fix two comment typos.

This commit was SVN r24265.
2011-01-19 16:09:17 +00:00
Sylvain Jeaugey
0e921bba7f Romio Refresh from mpich2-1.3.1. Work by Pascal Deveze, tested through bitbucket by Jeff Squyres (https://bitbucket.org/devezep/new-romio-for-openmpi).
This commit was SVN r24264.
2011-01-19 15:55:10 +00:00
Shiqing Fan
b2f3a5b7c2 Correctly check system specific datatypes on Windows.
This commit was SVN r24257.
2011-01-18 09:40:58 +00:00
Jeff Squyres
189b541dbd Add a proper help message for the mca_verbose MCA param (and shuffle
the code to be slightly more efficient).

This commit was SVN r24256.
2011-01-14 20:18:06 +00:00
Abhishek Kulkarni
fd7ef7a1f1 Fixes broken trunk compile: call process status notify
only when ft-enable-cr is selected.

This commit was SVN r24255.
2011-01-14 18:37:07 +00:00
Jeff Squyres
32e722e4d9 Add first bullets about 1.5.2.
This commit was SVN r24254.
2011-01-14 15:15:23 +00:00
George Bosilca
29c7f2fba5 Update the tests to match the new datatype engine.
This commit was SVN r24252.
2011-01-14 07:58:50 +00:00
Abhishek Kulkarni
87d2c9b31d Few fault tolerance updates related to the CIFTS project (http://www.mcs.anl.gov/research/cifts/)
* Improve the FTB notifier to publish (C/R, process/communication failure) events to the FTB with the
   OMPI jobid as the associated payload.
 * Add notifier calls for C/R events and process status events in SnapC and ErrMgr components.
 * Fix a bug where the SnapC states and process states collide before being thrown out over the notifier.

This commit was SVN r24251.
2011-01-13 20:13:49 +00:00
George Bosilca
5390fd6f33 Reshape the datatype engine. The basic types are built down in OPAL. MPI types are
either direct link to these basic predefined types, or a combination of them.
Anyway, the first items in the datatype list belong to OPAL, the second round
are MPI datatypes created by composing basic OPAL datatypes, and the last
batch are mapped datatype (direct correspondance between an OMPI datatype and
an OPAL one such as int -> int32_t).

Modify the op to fit this new scheme.

This commit was SVN r24247.
2011-01-13 06:08:54 +00:00
Ralph Castain
b09f57b03d Update the multicast subsystem - ported from Cisco branch
This commit was SVN r24246.
2011-01-13 01:54:05 +00:00
Terry Dontje
f3aaa885a3 corrected a couple places in orte where it said cpu_model when it should have been cpu_type.
This commit was SVN r24221.
2011-01-11 19:56:26 +00:00
Terry Dontje
56c03a3853 removing a file I should not have added
This commit was SVN r24220.
2011-01-11 19:02:08 +00:00
Terry Dontje
a374661ead add configure.params to solaris sysinfo module to allow it to be built
This commit was SVN r24219.
2011-01-11 18:31:55 +00:00
Jeff Squyres
cd8f12d8e5 Remove a few useless files that were missed last night.
This commit was SVN r24218.
2011-01-11 14:15:31 +00:00
Jeff Squyres
54cb4eb2b5 Merge over new version of hwloc 1.1 from the vendor branch. Update
the module to use the new hwloc bitmap API (the cpuset API is both
klunkier and deprecated), which simplified a few things.

This commit was SVN r24217.
2011-01-11 01:41:10 +00:00
Jeff Squyres
f08433c1e1 Fixes trac:2669.
Apparently, gcc 4.4.x and 4.5.x complain about the ''possibility'' of
us calling free() on a non-heap variable.  We know that this case can
never happen because the refcount will absolutely not go to zero
here.  We think it may be gcc being a bit too aggressive on the
warnings.

However, since this happens with gcc 4.4.x and 4.5.x, and since gcc
4.5.x ship in RHEL6 and Fedora 14 (and others), someone '''will'''
complain about this in the future, so we might as well code around it
so that we don't have to keep explaining "despite the warning, it's
really ok."

The workaround is pretty simple: just OBJ_RELEASE the values from
ompi_mpi_comm_parent before it is re-assigned to the new
intercommunicator.  Then the compiler's static code analysis can't
possibly tell that it's not a heap variable, and we're ok.

So yes, we are still calling OBJ_RELEASE on a non-heap variable.  But
free() '''will never be called''' on it because of the refcount.

This commit was SVN r24214.

The following Trac tickets were found above:
  Ticket 2669 --> https://svn.open-mpi.org/trac/ompi/ticket/2669
2011-01-10 21:12:27 +00:00
Abhishek Kulkarni
11ffa854ff Update the FTB notifier
* fix indentation issues
 * update the name of one of the fault events published to the FTB (per the FTB MPI standard)

This commit was SVN r24213.
2011-01-10 18:58:31 +00:00
Ralph Castain
ac1853b5d8 Took me a couple of days, but finally tracked this one down. Some compilers/glibc's don't like composite test statements in a return and just randomly pick one of the two options.
So....don't do that!!!

This commit was SVN r24212.
2011-01-10 16:29:42 +00:00
Nathan Hjelm
02e60d326e removed csum from rr p1 architecture conf files
This commit was SVN r24211.
2011-01-10 16:04:48 +00:00
Ethan Mallove
82054cb02c Include <stdlib.h> instead of <malloc.h>. This avoids a compiler error
on some systems caused by the definition of malloc in
opal_config_bottom.h getting expanded in the system malloc.h when
OPAL_ENABLE_MEM_DEBUG is set to 1.

This commit was SVN r24210.
2011-01-06 18:16:36 +00:00
Jeff Squyres
58445f3775 After being hit by "why is openib not working?" ''again'', add a
verbose statement that shows up when you --mca btl_base_verbose 100.
It clearly states that the openib BTL disqualifies itself when
MPI_THREAD_MULTIPLE is used.

This commit was SVN r24209.
2011-01-05 22:01:15 +00:00
Eugene Loh
9bbcd51c5a Properly initialize ep->btl_max_send_size, ep->btl_pipeline_send_length, and
ep->btl_send_limit in mca_bml_r2_del_proc_btl() so that the loops will correctly
compute new endpoint max/min after the BTL has been removed.  See
http://www.open-mpi.org/community/lists/devel/2011/01/8829.php

This commit was SVN r24202.
2011-01-04 20:35:33 +00:00
Jeff Squyres
7648c36023 Correct an error in MPI_FILE_SET_VIEW man page: the fh does ''not''
have to have an identical ''value'' (it must be the same file handle,
but that means nothing about its actual value).  But the datarep must
have an identical value.  Additionaly, the etype does not need to be
identical, but the extent of all the etypes supplied must be
identical.

See MPI-2.2 p401-402 for further details.

This commit was SVN r24201.
2011-01-04 18:10:52 +00:00
Josh Hursey
2bdff63e6f move the INIT to after the error handler, so it matches MPI_INIT. Thanks to Jeff for catching this
This commit was SVN r24200.
2011-01-03 18:16:53 +00:00
Mike Dubman
4a2e29eb32 updated Makefile with a new file
This commit was SVN r24199.
2011-01-01 14:11:49 +00:00
Mike Dubman
c56e3141cb fca: fix segmentation fault when no underlying collective implementation is found
This commit was SVN r24198.
2010-12-31 12:03:49 +00:00
Ralph Castain
80ef1af8ba Add psm key generator program
This commit was SVN r24197.
2010-12-30 20:54:58 +00:00
Josh Hursey
bbfdf04a81 Fix a couple of 'unused variable' warnings, and one return value warning.
{{{
base/paffinity_base_service.c: In function ‘opal_paffinity_base_cset2mapstr’:
base/paffinity_base_service.c:623: warning: unused variable ‘range_last’
base/paffinity_base_service.c:623: warning: unused variable ‘range_first’
base/paffinity_base_service.c:622: warning: unused variable ‘count’
base/paffinity_base_service.c:622: warning: unused variable ‘m’
}}}

{{{
connect/btl_openib_connect_oob.c: In function ‘init_ud_qp’:
connect/btl_openib_connect_oob.c:1111: warning: control reaches end of non-void function
connect/btl_openib_connect_oob.c: In function ‘init_device’:
connect/btl_openib_connect_oob.c:1235: warning: unused variable ‘i’
connect/btl_openib_connect_oob.c: In function ‘get_pathrecord_sl’:
connect/btl_openib_connect_oob.c:1323: warning: unused variable ‘i’
}}}

This commit was SVN r24196.
2010-12-30 15:37:50 +00:00
Doron Shoham
834625cc51 Currently the service lever is passed as static parameter (ib_service_level), but the service level is possibly dynamic and if so the only way to get a proper value is to ask the SA.
New mca parameter is added (ib_path_rec_service_level) - positive value means that we should get the SL from the SA.

This is usable for torus topologies where different SL value is used for different endpoints.

A cache is kept of ib queue pairs used to communicate with the SA for a particular device and port and path record SL values retrieved from that SA.

The interaction with the cache assumes that there are no recursive calls to these routines. This must be solved either by code flow, by using higher level locks, or by adding a locking mechanism to these routines along with some method for avoiding deadlock.

This code use a UD queue pair to talk to the SA, and not need to chmod /dev/infiniband/umad* for use by normal users.  

The request to the SA is a SubnAdmGet(), not a SubnAdmGetTable().
In the future we might add a support of a SubnAdmGetTable(), but it will require implementing RMPP (Reliable Multi-Packet Transaction Protocol) and I'm not sure we want to do that.

This patched is based on the work of David McMillen <davem@systemfabricworks.com>.

This commit was SVN r24195.
2010-12-30 08:20:24 +00:00
Josh Hursey
0b514e234b MPI_Init_thread is used in place of MPI_Init, so for the checkpoint/restart functionality it must correctly init the C/R functionality instead of simply making a critical section. This allows the C/R thread to be started properly.
Thanks to Takayuki Seki for finding this bug.

This commit was SVN r24194.
2010-12-29 15:37:30 +00:00
Mike Dubman
3d517c0285 ABI cleanups
This commit was SVN r24193.
2010-12-28 07:11:46 +00:00
Mike Dubman
b339a7a07b Add FCA 1.2/2.0 backward compatibility, depending on OMPI_FCA_VERSION_xx macro definition.
This commit was SVN r24192.
2010-12-27 21:32:34 +00:00
Doron Shoham
bfe611d3bd This patch fixes bugs #2627 (1.5.2) and #2623 (1.4.2) - Sending large messages over RDMA fails.
The patch includes the following:
 *  Add new mca parameter - btl_openib_max_hw_msg_size - Maximum size (in bytes) of a single fragment of a long message when using the RDMA protocols (must be > 0 and <= hw capabilities).
 *  If btl_openib_max_hw_msg_size is larger than the maximum hw limitation print error message.
 *  Change the default openib flags to include only PUT and not GET.
 *  Print error message if user choose manually GET flag in openib btl.
 *  In prepare_dst: limit the message size to be the minimum of both endpoint's hw_limitation and the user limitation (if requested).

This commit was SVN r24191.
2010-12-23 11:48:43 +00:00
Ethan Mallove
9251785161 Emit an error (instead of a SEGV) if the "compiler" parameter is not set
in the wrapper data file.

This commit was SVN r24190.
2010-12-21 19:01:39 +00:00
Brian Barrett
621344cce4 Remove duplicate DT tests
This commit was SVN r24189.
2010-12-20 23:38:36 +00:00
Brian Barrett
9876e65137 Fix race condition in unlock code, as well as a small memory leak.
Somehow they got fixed in the pt2pt implementation, but not the RDMA
implementation.  Thanks to Guillaume Thouvenin for finding this issue.

This commit was SVN r24188.
2010-12-20 22:15:29 +00:00
Nathan Hjelm
c082d05ecb Reset the timer on MPIR_being_debugged only if MPIR_being_debugged is not set. Fix typo in return code.
This commit was SVN r24187.
2010-12-20 21:00:49 +00:00
Rolf vandeVaart
a57e5587f6 Brief description of bfo functionality.
This commit was SVN r24186.
2010-12-17 19:12:00 +00:00
Jeff Squyres
a525e70f46 Convert "opal_show_help" to be a global variable pointer.
It is statically initialized to the real back-end OPAL show_help
function.  During orte_show_help_init(), the variable is re-assigned
with the value of the back-end ORTE show_help function (the one that
does error message aggregation).  

Therefore, anything that calls opal_show_help() after a certain point
in orte_init() will have their show_help messages be aggregated.
w00t!  Even code down in OPAL -- that has no knowledge of ORTE -- will
have their messages aggregated.  '''Double w00t!'''

During orte_show_help_finalize(), we restore the original pointer
value so that it something calls opal_show_help() after
orte_finalize(), it'll still work properly (but it won't be
aggregated).  

This commit was SVN r24185.
2010-12-16 23:00:25 +00:00
Terry Dontje
80c1e9acac added the format parameter to OMPI_Affinity_str call in README
This commit was SVN r24184.
2010-12-16 19:22:59 +00:00
Rolf vandeVaart
b9f11a874d Bump revision in bfo utility script.
This commit was SVN r24183.
2010-12-16 18:14:18 +00:00
Terry Dontje
6da16ab0d7 add format parameter and layout format to OMPI_Affinity_str
This commit was SVN r24182.
2010-12-16 15:11:17 +00:00
Jeff Squyres
b113b1a382 Add the btl_tcp_if_seq MCA parameter. From the help string:
If specified, a comma-delimited list of TCP interfaces.  Interfaces
  will be assigned, one to each MPI process, in a round-robin fashion
  on each server.  For example, if the list is "eth0,eth1" and four
  MPI processes are run on a single server, then local ranks 0 and 2
  will use eth0 and local ranks 1 and 3 will use eth1.

This feature is only useful for environments with virtual ethernet
interfaces on the same network.  For example, if eth0 and eth1 are
virtual interfaces to the same NIC on the same subnet, and if the NIC
provides different hardware resources to eth0 and eth1 (not just
different kernel resources), some HOL blocking and congestion issues
can be eased in a modest fashion.

This commit was SVN r24181.
2010-12-16 00:54:32 +00:00
Jeff Squyres
741ba6518b Install ompi_config.h (and friends) into /openmpi, not /openmpi/ompi/include. This is consistent with prior releases and the OPAL and ORTE <foo>_config.h file installation locations.
This commit was SVN r24180.
2010-12-15 20:54:22 +00:00
Jeff Squyres
b46e0291c2 Add 1.5.1 section. Update 1.5 section to reflect what the actual text
that was shipped in 1.5.

This commit was SVN r24176.
2010-12-15 18:35:59 +00:00
Shiqing Fan
883aeedd26 Add support for nanosleep function using Sleep on Windows. The accuracy of the sleep function on Windows is 1 millisecond mentioned in MSDN doc.
This commit was SVN r24175.
2010-12-15 15:43:25 +00:00