1
1
Граф коммитов

22453 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
9cd955badf opal: fix multiple bugs in MCA and opal
This commit fixes the following bugs:

 - opal_output_finalize did not properly set internal state. This
   caused problems when calling the sequence opal_output_init (),
   opal_output_finalize (), opal_output_init ().

 - opal_info support called mca_base_open () but never called the
   matching mca_base_close (). mca_base_open () and mca_base_close ()
   have been updated to use a open count instead of an open flag to
   allow mca_base_open to be called through multiple paths (as may be
   the case when MPI_T is in use).

 - orte_info support did not register opal variables. This can cause
   orte-info to not return opal variables.

 - opal_info, orte_info, and ompi_info support have been updated to
   use a register count.

 - When opening the dl framework the reference count was added to
   ensure the framework stuck around. The framework being closed
   prematurely was a bug in the MCA base that has since been
   corrected. The increment (and associated decrement) have been
   removed.

 - dl/dlopen did not set the value of
   mca_dl_dlopen_component.filename_suffixes_mca_storage on each call
   to register. Instead the value was set in the component
   structure. This caused the value to be lost when re-loading the
   component. Fixed by setting the default value in register.

 - Reset shmem framework state on close to avoid returning a stale
   component after reloading opal/shmem.

 - MCA base parameters were not properly deregistered when the MCA
   base was closed.

This commit may fix #374.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-07 19:13:20 -06:00
Ralph Castain
108bcb70b0 Update NEWS with 1.8.5 items 2015-04-05 11:30:56 -07:00
Ralph Castain
c32609b1c7 Bring over open-mpi/hwloc@f714f8d
linux: only use the device-tree on Power machines

It's available on ARM but the assumption that cpus' "reg" start at 0
is invalid.
We could make that work but the device-tree doesn't currently
bring anything better than sysfs on ARM, so don't bother for now.
2015-04-04 09:30:21 -07:00
rhc54
657490c763 Merge pull request #510 from jithinjosepkl/pr/mtl-opt-pr
Optimizations to PML-CM, MTL-OFI
2015-04-04 09:22:55 -07:00
Jithin Jose
9c937d44ae Inline MTL-OFI
Signed-off-by: Jithin Jose <jithin.jose@intel.com>

Conflicts:
	ompi/mca/mtl/ofi/mtl_ofi_recv.c
2015-04-03 15:19:30 -07:00
Jithin Jose
50304dfe05 Inline mtl-datatype pack/unpack
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-04-03 15:19:21 -07:00
Jithin Jose
c09582a3ff - CM blocking send/recv optimizations
This patch tries to do as little as possible in the PML CM blocking
    send/receive routines.  Basically, avoid creating and filling in an
    entire request object.  An OMPI-level request is still needed, but we
    can create that on the stack instead of going to a free list.

Signed-off-by: Andrew Friedley <andrew.friedley@intel.com>
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-04-03 15:19:08 -07:00
Jeff Squyres
5f19436cd2 Merge pull request #508 from jsquyres/pr/usnic-libfabric-eagain-fix
usnix: fix the CQ-reading logic for -FI_EAGAIN
2015-04-02 19:08:22 -04:00
Jeff Squyres
d825ec7cc7 usnic: fix the CQ-reading logic for -FI_EAGAIN 2015-04-02 15:56:50 -07:00
Jeff Squyres
5aabee2644 libfabric: a few fixes since 1.0rc3
Including a critical atomic initialization fix for the usnic provider.
2015-04-02 15:54:01 -07:00
Howard Pritchard
db680058e3 Merge pull request #507 from hppritcha/topic/coverity_fixes
fcoll/static: coverity fixes
2015-04-02 16:12:52 -06:00
Howard Pritchard
05324e32ff fcoll/static: coverity fixes
Fix CIDs 72138, 72139, 72143

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-04-02 14:51:44 -06:00
Ralph Castain
0c043dbdc9 Fix typo in var name 2015-04-02 02:32:42 -07:00
rhc54
6408c87aa0 Merge pull request #506 from rhc54/topic/retry
Support attempts to connect async processes
2015-04-02 01:33:06 -07:00
Ralph Castain
a4b466efc4 Support attempts to connect async processes by allowing the oob/tcp connection to retry the attempt to connect to a peer. Off by default, operates if someone specifies how long to wait between retry attempts. 2015-04-01 20:21:23 -07:00
Ralph Castain
9f8ae59162 Properly enclose the different && clauses 2015-04-01 18:48:25 -07:00
Ralph Castain
57c21d5209 Ensure the DVM flows thru the "daemons reported" state 2015-04-01 16:47:34 -07:00
Jeff Squyres
4ad102bb4d README: whitespace cleanup -- no content change 2015-04-01 15:40:42 -07:00
Jeff Squyres
3b998781df make_dist_tarball: bump up the minimum versions 2015-04-01 15:05:48 -07:00
Jeff Squyres
d6d8ab01e5 libfabric: the fi_log.h file moved 2015-04-01 14:43:07 -07:00
Jeff Squyres
99754afd25 orterun.c: re-justify the output message text
The type-A personality / english lit major in me compells me to
re-justify the text.  :-)
2015-04-01 10:57:23 -07:00
Devendar Bureddy
6ddc7ac35c HCOLL: Fix assertion
hcoll context may not be destroyed if it is cached.
2015-04-01 20:33:28 +03:00
Jeff Squyres
26b3c48ccb usnic: update to API change in libfabric 2015-04-01 06:43:08 -07:00
Jeff Squyres
5e47eb81bf libfabric: update component configury for new libfabric test 2015-04-01 06:43:08 -07:00
Jeff Squyres
a89a5872c2 libfabric: update to official 1.0.0rc3 release
One change was made to the 1.0.0rc3 tarball: remove an errand
debugging printf that accidentally made its way into the tarball (but
isn't in git).
2015-04-01 06:43:08 -07:00
Mike Dubman
8914a9c070 Merge pull request #494 from elenash/modifiers
changed mindist mapping policy specifier
2015-04-01 16:31:46 +03:00
Mike Dubman
af63c1815b Merge pull request #505 from nkogteva/master
grpcomm rcd:remove unnecessary malloc warning when number of daemons == 1
2015-04-01 15:41:56 +03:00
Elena
1e913c76c4 changed mindist mapping policy specifier from map-bt dist:device,modifiers to --map-by dist:modifiers -mca rmaps_dist_device device 2015-04-01 15:07:35 +03:00
Nadezhda Kogteva
2d49d9bd45 grpcomm rcd: remove unnecessary malloc warning for case when number of daemons == 1 2015-04-01 11:07:44 +03:00
Mike Dubman
58d002098b Merge pull request #474 from elenash/master
Introduce -tune command line option to set env vars and mca params from ...
2015-04-01 08:23:34 +03:00
Ralph Castain
b468f6a503 Okay, Jeff - use opal_setenv 2015-03-31 20:34:02 -07:00
Ralph Castain
6f9140a341 Add a little more debug to launch 2015-03-31 20:10:21 -07:00
Ralph Castain
e5d96417e7 Update warnings for run-as-root 2015-03-31 17:55:28 -07:00
Ralph Castain
41dd65d6cd Per Jeff's request, tone down the comments and "standardize" the warning 2015-03-31 17:54:54 -07:00
Ralph Castain
f04eb6a9c0 Extend the root-user protection to some more ORTE tools 2015-03-31 10:34:35 -07:00
Ralph Castain
f863147b05 Per the telecon and chat with Jeff, let root only do the version option without warning. Otherwise, require that the user specifically indicate allow-use-as-root 2015-03-31 10:34:35 -07:00
Nathan Hjelm
b6043ec459 Merge pull request #503 from hjelmn/vader_32bit_fix
btl/vader: fix fast box support for 32-bit architectures
2015-03-31 09:12:22 -06:00
Ralph Castain
b209c9efa5 Move the "dvm ready" message to stdout so it is easier to trap 2015-03-30 20:12:56 -07:00
Ralph Castain
6d205a3c80 Ensure that singletons pickup the oob/tcp component 2015-03-30 18:10:08 -07:00
Ralph Castain
2fa56fb329 Ensure that orte-submit picks the correct ess module as it is -never- allowed to be used as a distributed tool
Thanks to Mark Santcroos for diagnosing this one.
2015-03-30 18:08:34 -07:00
Nathan Hjelm
17b80a987e btl/vader: fix fast box support for 32-bit architectures
On 32-bit architectures loads/stores of fast box headers may take
multiple instructions. This can lead to a data race between the
sender/receiver when reading/writing the sequence number. This can
lead to a situation where the receiver could process incomplete
data. To fix the issue this commit re-orders the fast box header to
put the sequence number and the tag in the same 32-bits to ensure they
are always loaded/stored together.

Fixes #473

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-03-30 16:28:16 -06:00
rhc54
bc016617a0 Merge pull request #501 from rhc54/topic/sec2
Support authentication across security domains
2015-03-30 09:59:43 -07:00
Ralph Castain
79b90a54b6 Remove stale and unused component 2015-03-30 09:56:06 -07:00
Howard Pritchard
0c553c2693 Merge pull request #502 from nkogteva/master
sm dstore: set pmix segment size to proper value
2015-03-30 09:05:35 -06:00
Nadezhda Kogteva
a828eada98 sm dstore: set pmix segment size to proper value 2015-03-30 13:34:25 +03:00
Ralph Castain
d07dc362d5 Ensure we can authenticate when crossing security domains by including all available credentials, and letting the receiver use the highest priority one they have in common. 2015-03-28 20:34:26 -07:00
Ralph Castain
b67b3619fc If we are using the default bindings, and one or more nodes are not setup to support binding, then don't error out - just don't bind.
Thanks to Annu Desari for pointing out the problem.
2015-03-28 08:20:24 -07:00
Ralph Castain
2f365720b0 Allow root to request the version and help from mpirun without having to override the run-as-root protection.
Thanks to Robert McLay for pointing this out
2015-03-28 08:17:44 -07:00
Ralph Castain
d2d02a1642 ckpt 2015-03-28 07:59:20 -07:00
Jeff Squyres
89e14f5ad6 usnic: fix comment typos 2015-03-27 17:21:53 -07:00