1
1

22667 Коммитов

Автор SHA1 Сообщение Дата
Rolf vandeVaart
d6d7184703 Enhance verbose message 2015-04-09 12:29:09 -04:00
Ralph Castain
acc2c7937c Thanks Nathan - decrement the counter to ensure singleton's startup correctly 2015-04-08 11:23:35 -07:00
Nathan Hjelm
eb56117405 Merge pull request #513 from hjelmn/mca_bug_fixes
opal: fix multiple bugs in MCA and opal
2015-04-08 10:29:44 -06:00
Rolf vandeVaart
7163c41a4d Fix pathname 2015-04-08 08:52:17 -04:00
Nathan Hjelm
9cd955badf opal: fix multiple bugs in MCA and opal
This commit fixes the following bugs:

 - opal_output_finalize did not properly set internal state. This
   caused problems when calling the sequence opal_output_init (),
   opal_output_finalize (), opal_output_init ().

 - opal_info support called mca_base_open () but never called the
   matching mca_base_close (). mca_base_open () and mca_base_close ()
   have been updated to use a open count instead of an open flag to
   allow mca_base_open to be called through multiple paths (as may be
   the case when MPI_T is in use).

 - orte_info support did not register opal variables. This can cause
   orte-info to not return opal variables.

 - opal_info, orte_info, and ompi_info support have been updated to
   use a register count.

 - When opening the dl framework the reference count was added to
   ensure the framework stuck around. The framework being closed
   prematurely was a bug in the MCA base that has since been
   corrected. The increment (and associated decrement) have been
   removed.

 - dl/dlopen did not set the value of
   mca_dl_dlopen_component.filename_suffixes_mca_storage on each call
   to register. Instead the value was set in the component
   structure. This caused the value to be lost when re-loading the
   component. Fixed by setting the default value in register.

 - Reset shmem framework state on close to avoid returning a stale
   component after reloading opal/shmem.

 - MCA base parameters were not properly deregistered when the MCA
   base was closed.

This commit may fix #374.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-07 19:13:20 -06:00
Howard Pritchard
5ee18f4f00 Merge pull request #514 from hppritcha/topic/mpi_win_lock_all_man
man pages: fix problem with MPI_Win_lock_all
2015-04-07 17:17:30 -06:00
Howard Pritchard
291c775e74 man pages: fix problem with MPI_Win_lock_all
thanks to Thomas Jahns for pointing this out -

http://www.open-mpi.org/community/lists/users/2015/04/26633.php

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-04-07 16:29:00 -06:00
Nathan Hjelm
2409715fc3 Merge pull request #511 from hjelmn/osc_pt2pt_fix
osc/pt2pt: fix synchronization bugs
2015-04-07 09:14:00 -06:00
Howard Pritchard
fc3a0f60c5 Merge pull request #512 from hppritcha/topic/java_better_dlopen_error
ompi/java: better error message if dlopen fails
2015-04-06 14:08:10 -06:00
Howard Pritchard
18039b34b4 ompi/java: better error message if dlopen fails
The error message emitted by ompi/java when dlopen
fails is misleading and not very informative.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-04-06 13:35:09 -06:00
Nathan Hjelm
80ed805a16 osc/pt2pt: fix synchronization bugs
The fragment flush code tries to send the active fragment before
sending any queued fragments. This could cause osc messages to arrive
out-of-order at the target (bad). Ensure ordering by alway sending
the active fragment after sending queued fragments.

This commit also fixes a bug when a synchronization message (unlock,
flush, complete) can not be packed at the end of an existing active
fragment. In this case the source process will end up sending 1 more
fragment than claimed in the synchronization message. To fix the issue
a check has been added that fixes the fragment count if this situation
is detected.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-06 08:39:19 -06:00
Ralph Castain
108bcb70b0 Update NEWS with 1.8.5 items 2015-04-05 11:30:56 -07:00
Ralph Castain
c32609b1c7 Bring over open-mpi/hwloc@f714f8d
linux: only use the device-tree on Power machines

It's available on ARM but the assumption that cpus' "reg" start at 0
is invalid.
We could make that work but the device-tree doesn't currently
bring anything better than sysfs on ARM, so don't bother for now.
2015-04-04 09:30:21 -07:00
rhc54
657490c763 Merge pull request #510 from jithinjosepkl/pr/mtl-opt-pr
Optimizations to PML-CM, MTL-OFI
2015-04-04 09:22:55 -07:00
Jithin Jose
9c937d44ae Inline MTL-OFI
Signed-off-by: Jithin Jose <jithin.jose@intel.com>

Conflicts:
	ompi/mca/mtl/ofi/mtl_ofi_recv.c
2015-04-03 15:19:30 -07:00
Jithin Jose
50304dfe05 Inline mtl-datatype pack/unpack
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-04-03 15:19:21 -07:00
Jithin Jose
c09582a3ff - CM blocking send/recv optimizations
This patch tries to do as little as possible in the PML CM blocking
    send/receive routines.  Basically, avoid creating and filling in an
    entire request object.  An OMPI-level request is still needed, but we
    can create that on the stack instead of going to a free list.

Signed-off-by: Andrew Friedley <andrew.friedley@intel.com>
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-04-03 15:19:08 -07:00
Jeff Squyres
5f19436cd2 Merge pull request #508 from jsquyres/pr/usnic-libfabric-eagain-fix
usnix: fix the CQ-reading logic for -FI_EAGAIN
2015-04-02 19:08:22 -04:00
Jeff Squyres
d825ec7cc7 usnic: fix the CQ-reading logic for -FI_EAGAIN 2015-04-02 15:56:50 -07:00
Jeff Squyres
5aabee2644 libfabric: a few fixes since 1.0rc3
Including a critical atomic initialization fix for the usnic provider.
2015-04-02 15:54:01 -07:00
Howard Pritchard
db680058e3 Merge pull request #507 from hppritcha/topic/coverity_fixes
fcoll/static: coverity fixes
2015-04-02 16:12:52 -06:00
Howard Pritchard
05324e32ff fcoll/static: coverity fixes
Fix CIDs 72138, 72139, 72143

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-04-02 14:51:44 -06:00
Ralph Castain
0c043dbdc9 Fix typo in var name 2015-04-02 02:32:42 -07:00
rhc54
6408c87aa0 Merge pull request #506 from rhc54/topic/retry
Support attempts to connect async processes
2015-04-02 01:33:06 -07:00
Ralph Castain
a4b466efc4 Support attempts to connect async processes by allowing the oob/tcp connection to retry the attempt to connect to a peer. Off by default, operates if someone specifies how long to wait between retry attempts. 2015-04-01 20:21:23 -07:00
Ralph Castain
9f8ae59162 Properly enclose the different && clauses 2015-04-01 18:48:25 -07:00
Ralph Castain
57c21d5209 Ensure the DVM flows thru the "daemons reported" state 2015-04-01 16:47:34 -07:00
Jeff Squyres
4ad102bb4d README: whitespace cleanup -- no content change 2015-04-01 15:40:42 -07:00
Jeff Squyres
3b998781df make_dist_tarball: bump up the minimum versions 2015-04-01 15:05:48 -07:00
Jeff Squyres
d6d8ab01e5 libfabric: the fi_log.h file moved 2015-04-01 14:43:07 -07:00
Jeff Squyres
99754afd25 orterun.c: re-justify the output message text
The type-A personality / english lit major in me compells me to
re-justify the text.  :-)
2015-04-01 10:57:23 -07:00
Devendar Bureddy
6ddc7ac35c HCOLL: Fix assertion
hcoll context may not be destroyed if it is cached.
2015-04-01 20:33:28 +03:00
Jeff Squyres
26b3c48ccb usnic: update to API change in libfabric 2015-04-01 06:43:08 -07:00
Jeff Squyres
5e47eb81bf libfabric: update component configury for new libfabric test 2015-04-01 06:43:08 -07:00
Jeff Squyres
a89a5872c2 libfabric: update to official 1.0.0rc3 release
One change was made to the 1.0.0rc3 tarball: remove an errand
debugging printf that accidentally made its way into the tarball (but
isn't in git).
2015-04-01 06:43:08 -07:00
Mike Dubman
8914a9c070 Merge pull request #494 from elenash/modifiers
changed mindist mapping policy specifier
2015-04-01 16:31:46 +03:00
Mike Dubman
af63c1815b Merge pull request #505 from nkogteva/master
grpcomm rcd:remove unnecessary malloc warning when number of daemons == 1
2015-04-01 15:41:56 +03:00
Elena
1e913c76c4 changed mindist mapping policy specifier from map-bt dist:device,modifiers to --map-by dist:modifiers -mca rmaps_dist_device device 2015-04-01 15:07:35 +03:00
Nadezhda Kogteva
2d49d9bd45 grpcomm rcd: remove unnecessary malloc warning for case when number of daemons == 1 2015-04-01 11:07:44 +03:00
Mike Dubman
58d002098b Merge pull request #474 from elenash/master
Introduce -tune command line option to set env vars and mca params from ...
2015-04-01 08:23:34 +03:00
Ralph Castain
b468f6a503 Okay, Jeff - use opal_setenv 2015-03-31 20:34:02 -07:00
Ralph Castain
6f9140a341 Add a little more debug to launch 2015-03-31 20:10:21 -07:00
Ralph Castain
e5d96417e7 Update warnings for run-as-root 2015-03-31 17:55:28 -07:00
Ralph Castain
41dd65d6cd Per Jeff's request, tone down the comments and "standardize" the warning 2015-03-31 17:54:54 -07:00
Ralph Castain
f04eb6a9c0 Extend the root-user protection to some more ORTE tools 2015-03-31 10:34:35 -07:00
Ralph Castain
f863147b05 Per the telecon and chat with Jeff, let root only do the version option without warning. Otherwise, require that the user specifically indicate allow-use-as-root 2015-03-31 10:34:35 -07:00
Nathan Hjelm
b6043ec459 Merge pull request #503 from hjelmn/vader_32bit_fix
btl/vader: fix fast box support for 32-bit architectures
2015-03-31 09:12:22 -06:00
Ralph Castain
b209c9efa5 Move the "dvm ready" message to stdout so it is easier to trap 2015-03-30 20:12:56 -07:00
Ralph Castain
6d205a3c80 Ensure that singletons pickup the oob/tcp component 2015-03-30 18:10:08 -07:00
Ralph Castain
2fa56fb329 Ensure that orte-submit picks the correct ess module as it is -never- allowed to be used as a distributed tool
Thanks to Mark Santcroos for diagnosing this one.
2015-03-30 18:08:34 -07:00