1
1
Граф коммитов

4504 Коммитов

Автор SHA1 Сообщение Дата
Rolf vandeVaart
4cd1958deb Fix so we do not get warnings when running on system without CUDA software installed and CUDA-aware compiled in.
This commit was SVN r30032.
2013-12-20 20:39:25 +00:00
Dave Goodell
bd901a68ed usnic: fix 'fls' warnings+errors
The old version caused compilation errors on Solaris.  Thanks to Paul
Hargrove for testing and reporting the bug:

  http://www.open-mpi.org/community/lists/devel/2013/12/13520.php

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30025.
2013-12-20 17:37:22 +00:00
Ralph Castain
6959ba5577 Add missing include file.
Thanks to Paul Hargrove for spotting it.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29998.
2013-12-19 23:39:21 +00:00
Dave Goodell
0c6b292442 romio: pick "infinitely stale" fix from upstream
Some NFS scenarios can result in an infinite ESTALE return, which will
hang ROMIO.  This commit causes ROMIO to error out after a large number
of retries instead of spinning forever.

This is MPICH commit b250d338:

http://git.mpich.org/mpich.git/commit/b250d338e66667a8a1071a5f73a4151fd59f83b2

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r29993.
2013-12-19 22:55:26 +00:00
Ralph Castain
b745078535 Support user-provided envars for comm_spawn using info key "env"
Thanks to Tom Fogal for the request

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29990.
2013-12-19 20:59:50 +00:00
Jeff Squyres
3a14adef63 Remove the comments around these assignments; otherwise, we won't get
function pointers set to the _map functions, and we get segv's in MTT
testing (e.g., the C++ suite, which actually calls MPI_Cart_map and
MPI_Graph_map).

cmr=v1.7.4:reviewer=bosilca:subject=Fix topo _map function pointer assignments

This commit was SVN r29988.
2013-12-19 20:41:32 +00:00
Yossi Etigin
6ab4aba9e6 Fix missing include of show_help.h in mtl mxm.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29987.
2013-12-19 19:37:21 +00:00
Tom Naughton
3aefca32b0 + update rte db_fetch comments with change from r29931
This commit was SVN r29971.

The following SVN revision numbers were found above:
  r29931 --> open-mpi/ompi@0995a6f3b9
2013-12-19 01:16:58 +00:00
Jeff Squyres
bb59b07321 Remove CFLAGS setting that was really only intended for the v1.6
branch (it's not necessary on trunk/v1.7 because they require C99,
which allows variadic macros).

Also fix another compiler warning (using %p to print a (void*)).

Submitted by Jeff, reviewed by Dave.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=two usnic BTL fixes

This commit was SVN r29966.
2013-12-19 00:19:05 +00:00
Nathan Hjelm
b9765a380f Update NEWS with new MPI-3 features and a note about the new ROMIO
version.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29965.
2013-12-19 00:16:07 +00:00
Jeff Squyres
515fd00411 CSCul95082: DMAR faults during mtt testing
usnic_channel_finalize() was deregistering recv buffers before
destroying the QP to which they were posted. The QP needs to be
destroyed first so that the NIC does not attemp tto write to
deregistered memory, causing the DMAR messages.

Submitted by Reese, reviewed by Jeff.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29963.
2013-12-19 00:01:35 +00:00
Yossi Etigin
ecfb122c97 Fix segfault in osc pt2pt completion handler, when the request is canceled during finalization.
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29938.
2013-12-17 17:30:14 +00:00
Ralph Castain
0995a6f3b9 Revert r29917 and replace it with a fix that resolves the thread deadlock while retaining the desired debug info. In an earlier commit, we had changed the modex accordingly:
* automatically retrieve the hostname (and all RTE info) for all procs during MPI_Init if nprocs < cutoff

* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon the first call to modex_recv for that proc. This would provide the hostname for debugging purposes as we only report errors on messages, and so we must have called modex_recv to get the endpoint info

* BTLs are not to call modex_recv until they need the endpoint info for first message - i.e., not during add_procs so we don't call it for every process in the job, but only those with whom we communicate

My understanding is that only some BTLs have been modified to meet that third requirement, but those include the Cray ones where jobs are big enough that launch times were becoming an issue. Other BTLs would hopefully be modified as time went on and interest in using them at scale arose. Meantime, those BTLs would call modex_recv on every proc, and we would therefore be no worse than the prior behavior.

This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of the ompi_process_name_t for the proc so that the hostname can be easily inserted. I have advised the ORNL folks of the change.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock

This commit was SVN r29931.

The following SVN revision numbers were found above:
  r29917 --> open-mpi/ompi@1a972e2c9d
2013-12-17 03:26:00 +00:00
Adrian Reber
b42aad44a3 Trying to get the C/R code to compile again. This patch
includes various fixes all over the C/R code which are
hard to group like the other patches.

Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values

Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)

This commit was SVN r29922.
2013-12-16 15:35:28 +00:00
George Bosilca
1a972e2c9d Don't be greedy, just do what we asked for.
This commit was SVN r29917.
2013-12-15 16:54:01 +00:00
George Bosilca
430a13719f Only if OMPI_BTL_SM_HAVE_CMA is set to 1.
This commit was SVN r29916.
2013-12-15 16:49:27 +00:00
Jeff Squyres
0ab48ad0d2 Fix some annoying flex warnings that have been there for years.
Many thanks to Tom Fogal for the initial patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings

This commit was SVN r29904.
2013-12-14 00:36:12 +00:00
Rolf vandeVaart
b955dbd6d9 Fix various items discovered by review of ticket #3951.
This commit was SVN r29900.
2013-12-13 21:25:07 +00:00
Jeff Squyres
f4afa4fd1f Add missing include, exposed in "external libevent" work.
Refs trac:3694

This commit was SVN r29898.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:21:30 +00:00
Jeff Squyres
bcfe2156d5 Bring over m4 quoting fix from v1.7 branch (in r29894) that was
discovered when removing some components.

This commit was SVN r29895.

The following SVN revision numbers were found above:
  r29894 --> open-mpi/ompi@58ed00296c
2013-12-13 20:27:33 +00:00
Ralph Castain
f763be26c4 Closes trac:2433. Check for hetero architecture and disqualify sm connections if that is found as the sm btl currently doesn't support hetero operations.
cmr=v1.7.4:reviewer=brbarret:subject=Disqualify sm btl for hetero procs

This commit was SVN r29882.

The following Trac tickets were found above:
  Ticket 2433 --> https://svn.open-mpi.org/trac/ompi/ticket/2433
2013-12-13 15:23:33 +00:00
Mike Dubman
fb3f94a16e remove debug print
Refs trac:3969

This commit was SVN r29876.

The following Trac tickets were found above:
  Ticket 3969 --> https://svn.open-mpi.org/trac/ompi/ticket/3969
2013-12-13 06:08:44 +00:00
Mike Dubman
21be95c9b5 Initialize sm global variables in mca_btl_sm_component_open(), because they are destructed in mca_btl_sm_component_close(), and init() function might not be called or fail.
For exammple, mca_btl_sm.knem_fd remained 0, and mca_btl_sm_component_close() ended up doing closing fd 0 which belongs to someone else.

fixed by Yossi, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29875.
2013-12-13 06:01:24 +00:00
Jeff Squyres
bac67e0d81 Per discussion @Chicago OMPI dev meeting Dec 2013: remove all MX support.
This commit was SVN r29873.
2013-12-12 18:54:47 +00:00
Nathan Hjelm
3262080391 Cleanup udcm structures to avoid issues with nesting structures with
flexible members.

UDCM is ready to go for 1.7.4 with this patch.

cmr=v1.7.4:ticket=3940

This commit was SVN r29861.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-12 05:24:37 +00:00
Nathan Hjelm
e0e94a6029 Fix warning caused by typo in r29815
This commit was SVN r29860.

The following SVN revision numbers were found above:
  r29815 --> open-mpi/ompi@d556b60b21
2013-12-11 21:45:39 +00:00
Nathan Hjelm
6ab69c758b Fix warnings in udcm.
cmr=v1.7.4:reviewer=rhc:ticket=3940

This commit was SVN r29859.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-11 21:40:06 +00:00
Rolf vandeVaart
3ae88f8a24 Ensure no fork support with GDR. CUDA-aware code only.
This commit was SVN r29854.
2013-12-10 18:08:53 +00:00
Rolf vandeVaart
1cc55f305f Add extra check for GDR. Adjust some names and replace opal_output with opal_show_help.
This commit was SVN r29853.
2013-12-10 16:04:08 +00:00
Jeff Squyres
0f61bb651e Technically, the PORT_ACTIVE is not a "bad" event.
Note that this event should never happen within a single OMPI job,
because OMPI will ignore usnic ports that are down.  The PORT_ACTIVE
event should only occur if a port ''was'' down and is now ''up''.  But
what the heck -- if we ever do get this event, it is harmless -- just
ignore it.

This commit was SVN r29852.
2013-12-09 20:45:55 +00:00
Edgar Gabriel
c253c2eec6 fix the condition for the lazy open of shared filepointers.
This commit was SVN r29850.
2013-12-09 19:37:21 +00:00
Mike Dubman
9a65e0d8c6 cosmetic fixed fpr hcol autotools
Refs: #3694

This commit was SVN r29841.
2013-12-08 09:45:13 +00:00
Mike Dubman
2e124454b4 cosmitic fix to remove redundant -lfca
use CPP extra flags var which propagated to coll/fca and scoll/fca
Refs: #3694

This commit was SVN r29832.
2013-12-07 15:00:54 +00:00
Jeff Squyres
3bd9c603ff Clean up variables used in configure with OPAL_VAR_SCOPE.
This is helpful in the work for #3694: ensure that many places that
eventually end up in configure don't overly-pollute the global shell
variable space (because debugging accidental shell variable pollution
can be a real pain).

Refs trac:3694

This commit was SVN r29830.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-06 23:40:34 +00:00
Rolf vandeVaart
d556b60b21 Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880.
Functionally, nothing changes.

This commit was SVN r29815.
2013-12-06 14:35:10 +00:00
Nathan Hjelm
231ebb09c9 Update romio configury to remove a warning message.
cmr=v1.7.4:ticket=3158

This commit was SVN r29811.

The following Trac tickets were found above:
  Ticket 3158 --> https://svn.open-mpi.org/trac/ompi/ticket/3158
2013-12-06 00:12:35 +00:00
Dave Goodell
da26226e3c usnic: add some extra debug-build sanity checks
On the off chance that the PML is twiddling fields that it really
shouldn't be...

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29804.
2013-12-05 00:28:11 +00:00
Jeff Squyres
ba018b3603 Protect the container_of #define.
MOFED apparently has a /usr/include/infiniband/verbs.h that also
defines a (slightly different but fully compatible) container_of
macro.  So put proper #ifndef protection around our definition of
container_of.

Thanks to Rolf vandeVaart for pointing out the issue.

Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29799.
2013-12-04 14:24:56 +00:00
Yossi Etigin
a913b00f89 mtl mxm: update configuration parsing api to mxm 2.1, drop
older version support (1.0 and 1.1), and cleanup the code.

reviewed by miked.

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29797.
2013-12-04 09:11:55 +00:00
Jeff Squyres
c74c1e86d3 Per suggestion from Paul Kapinos, report in BTL verbosity if a device
is skipped because it is too far away.

(see thread starting here:
http://www.open-mpi.org/community/lists/devel/2013/06/12470.php)

This commit was SVN r29790.
2013-12-03 22:44:11 +00:00
Rolf vandeVaart
218c05a4d1 Make sure synchronous copies are complete before moving the data.
This commit was SVN r29789.
2013-12-03 21:20:14 +00:00
Rolf vandeVaart
ab77435d9b Fix the CUDA-aware case where we are not sending any GPU data.
This commit was SVN r29788.
2013-12-03 20:25:58 +00:00
Devendar Bureddy
4554770ee4 hcol fixes
cmr=v1.7.4:reviewer=jladd

This commit was SVN r29787.
2013-12-03 20:21:40 +00:00
Nathan Hjelm
fe327d9859 udcm: cleanup code and improve the ack handling
Originally udcm acks used the immediate data to indicate which message was
being acknowleged. This data was (mysteriously) junk when using QLogic HCAs so I
updated udcm to use the source info (slid, qp, etc) to determine which message was being
acked. This works as long as we don't have two messages simultaneously in flight
to a particular peer and then loose the first of the two messages. The chances of this
happening are tiny. To fix this case I updated the udcm message header to include
a pointer to the in flight message. This pointer is then sent back to the sending
process to ack receipt.

cmr=v1.7.4:ticket=trac:3940

This commit was SVN r29775.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-02 20:18:46 +00:00
Jeff Squyres
3a7af4ab40 Fix another clang warning: sendreq is undefined if proc==NULL.
cmr=v1.7.4:reviewer=hjelmn:subject=fix ob1 undefined sendreq value

This commit was SVN r29774.
2013-12-02 19:44:42 +00:00
Nathan Hjelm
fb0b0442c4 openib/connect: re-enable xrc support in the openib btl
This commit updates the udcm cpc to support xrc. The steps followed by udcm
mimic those in the removed xoob cpc. This update has been tested with both XRC
and RC.

Mellanox, this is intended to go into 1.7.4. Please review carefully and let
me know if there are any issues.

cmr=v1.7.4:reviewer=miked

This commit was SVN r29767.
2013-11-27 22:28:04 +00:00
George Bosilca
cb24277737 Restrict the usage of MPI_Type_extent only to receiving processes
(aka the root). This commit is based on a patch provided by Pierre 
Jolivet.
Fix all the output to match the failing MPI call.

This commit was SVN r29761.
2013-11-27 12:09:31 +00:00
Rolf vandeVaart
aa98b0333b Call function from function table. Discovered during static build.
This commit was SVN r29755.
2013-11-25 22:46:07 +00:00
Ralph Castain
ac9820c46f Link against common cuda library
Thanks to Jorg Bornschein for pointing it out

cmr=v1.7.4:reviewer=rolfv

This commit was SVN r29750.
2013-11-24 17:06:51 +00:00
George Bosilca
68268377af Fix an error message for the igather and the usage of the extent on
non non-root processes for the iscatter. Thanks to Pierre
Jolivet for the bug report and the patch.

This commit was SVN r29736.
2013-11-23 00:59:22 +00:00