Gilles Gouaillardet
5c81658d58
pmix: fix big endian arch
...
use the appropriate 64 bits type otherwise data gets incorrectly
truncated on big endian arch
2014-10-15 17:17:09 +09:00
Mike Dubman
ab22dcb875
Merge pull request #229 from nkogteva/master
...
oshmem mmap: new mca parameters were introduced - sshmem_mmap_anonymous,...
2014-10-15 10:24:29 +03:00
bureddy
3d77abaa1f
Merge pull request #234 from bureddy/master
...
OSHMEM: Fix application abort
2014-10-14 13:07:10 -07:00
Edgar Gabriel
0219c87039
set the fs_ptr to NULL in case of an error, to avoid a malicious free on file_close.
2014-10-14 13:09:06 -05:00
Devendar Bureddy
cbb3e95ce9
OSHMEM: Fix application abort
...
register on_exit() hook to know exit status inorder to
skip shmem_finalize destructor in case of non-zero exit status
2014-10-14 21:07:28 +03:00
Nathan Hjelm
083a659217
Correct some typos in Cray PMI detection
2014-10-14 10:28:36 -06:00
Alex Mikheev
314ba245e9
Merge branch 'topic/oshmem_spml_ikrit_hw_rdma_channel'
2014-10-14 16:21:06 +03:00
Alex Mikheev
643e64497d
OSHMEM: spml ikrit: hw rdma channel is disabled by default
2014-10-14 16:09:51 +03:00
Alex Mikheev
74ab30b738
OSHMEM: spml ikrit: improve mxm transport sanity check
...
Do not allow combination of transports that is not compliant with
shmem spec. Especially do not allow mix of hw and software atomic
ops
Issue: 4721
Change-Id: Ide382f7510495df3d385f2a5ae5f9def6ef5332c
2014-10-14 15:44:57 +03:00
Alex Mikheev
1bcc88cfb1
OSHMEM: spml ikrit: hardware rdma endpoint
...
Create additional endpoint that is capable of true
one sided RDMA transfers.
MXM atomics component now uses this endpoint
2014-10-14 15:31:09 +03:00
Alina Sklarevich
1eb6286547
OSHMEM: fix the makefile.
...
(oshmem/mca/sshmem/base/Makefile.am)
2014-10-14 11:57:46 +03:00
Gilles Gouaillardet
e3f74aca1c
Correctly mote the pointer back by the true_lb.
...
Fixes #231
2014-10-14 16:26:54 +09:00
Nadezhda Kogteva
b2a93943dc
oshmem mmap: set lvl4 for sshmem_mmap_anonymous and sshmem_mmap_fixed variables, define MAP_ANONYMOUS returned.
2014-10-14 08:54:44 +03:00
Gilles Gouaillardet
0f983d5a4f
add a disable function for coll module
2014-10-14 14:46:36 +09:00
Devendar Bureddy
7a6b4c36b0
HCOLL: Update the proc structure dereference
...
Update the proc structure dereference to reflect the new opal_proc_t
super field
2014-10-13 20:49:19 +03:00
Devendar Bureddy
b8d2a15be9
HCOLL: by default off
2014-10-13 20:49:09 +03:00
Mike Dubman
ec1f761d8e
OSHMEM: add missing help file, got lost during merge. Thanks to Yossi/Igor for finding it.
...
Change-Id: I466e40a3fea70e8045dd1e897edcc50ccf0451a3
Conflicts:
oshmem/mca/sshmem/base/Makefile.am
oshmem/mca/sshmem/base/help-oshmem-sshmem.txt
2014-10-13 16:58:35 +03:00
Alex Mikheev
8fcbcba516
Merge branch 'topic/oshmem_shared_mr_fix'
2014-10-13 15:24:12 +03:00
Alex Mikheev
cd67642183
OSHMEM: sshmem verbs: workaround shared_mr procfs bug
...
dereg shared_mr before doing dereg on its mr.
2014-10-13 15:14:34 +03:00
Nadezhda Kogteva
c68c4b45b5
Merge remote-tracking branch 'upstream/master'
2014-10-13 15:12:39 +03:00
Mike Dubman
a1db93077d
Merge pull request #230 from nkogteva/oshmem_refactor_macro_style
...
oshmem: refactor of oshmem/mca/sshmem/*.[ch] files to use #if MACRO style
2014-10-13 13:33:32 +03:00
Nadezhda Kogteva
de68d58a9e
oshmem: refactor of oshmem/mca/sshmem/*.[ch] files to use #if MACRO style
2014-10-13 13:12:16 +03:00
Nadezhda Kogteva
3e7002e8aa
oshmem mmap: copyrights for memheap_base_alloc.c files updated
2014-10-13 11:41:35 +03:00
Nadezhda Kogteva
ce4ee2aa8d
oshmem mmap: new mca parameters were introduced - sshmem_mmap_anonymous, sshmem_mmap_fixed and sshmem_base_backing_file_dir - for runtime mmap management.
...
(cherry picked up from Mellanox-v1.8 repo commit 4c391a)
2014-10-13 11:39:26 +03:00
Mike Dubman
6372ac926c
tools: fix cli args parsing
...
No need to "shift" if argument does not expect parameter on the command line.
2014-10-13 11:33:26 +03:00
Vasily Filipov
a215a4831d
MTL/MXM: disable "bulk_connect" by default.
2014-10-13 09:47:56 +03:00
Ralph Castain
3ef94a0675
Per email thread on devel list:
...
Revert "OPAL: drop dead with core on bad flow. rarely happens with helloworld on large scale."
This reverts commit 86f1d5af3e
.
Will be reconsidered via RFC as it represents a significant change in behavior
2014-10-12 21:13:42 -07:00
Mike Dubman
113f40b0ec
OSHMEM: sshmem verbs: allocate memory at fixed address
...
Use experimental verbs to allocate memory at fixed base
virtual address.
verbs will disqualify itself if shared_mr is disabled
or not supported and it is impossible to allocate memory
starting at fixed base virtual address.
verbs contig pages allocator did not guarantee fixed va, now it does.
(cherry picked from commit fd77ebd452
)
Apply Jeff`s comments
Update with Jeff commits
(cherry picked from commit open-mpi/ompi-release@4dc487fc3d )
2014-10-12 09:53:48 +03:00
Ralph Castain
4d27eb70f2
Extend the dstore framework to include a new "update_handle" API so the attributes of an existing handle can be changed. We can't just open a new handle as the upper layers won't know where to find the info. :-(
2014-10-10 12:40:32 -07:00
Ralph Castain
1ae34da5e5
Add an attributes parameter to the dstore.open function so we can pass directives to the active storage component. This can, for example, include the backing file info for a new shared memory segment.
2014-10-10 12:13:25 -07:00
Ralph Castain
63f619f871
Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build.
2014-10-10 11:39:08 -07:00
Ralph Castain
1be1654e5f
Correctly identify the synonym for orte_direct_modex_cutoff as ompi_hostname_cutoff
2014-10-10 06:05:06 -07:00
Gilles Gouaillardet
8eb2d62919
coll/sm: fix an other memory leak
2014-10-10 19:54:45 +09:00
Gilles Gouaillardet
5d44a30111
coll/sm: fix minor memory leaks
...
port 4488.1.patch attached in #196 to master
2014-10-10 14:21:34 +09:00
Ralph Castain
4fc4a8346b
Fix a couple of minor issues. Ensure usock isn't used if the session dirs aren't setup. Protect an oddball case where orte_xml_fp is NULL.
2014-10-09 20:58:46 -07:00
Nathan Hjelm
169a1866b8
Modify Cray PMI check to detect PMI on older systems
2014-10-09 17:01:31 -06:00
Ralph Castain
b1a58726ac
Cleanup the PMI m4 syntax with respect to -a, and look for libpmi* so we can pickup both .a, .la, and whatever other extensions that particular system might use.
2014-10-09 14:04:43 -07:00
Nathan Hjelm
a31cf3b740
btl/vader: missing include
2014-10-09 13:57:21 -06:00
Nathan Hjelm
9e0c07e4ce
btl/ugni: improve the handling of eager get fragments when the btl runs out
...
of preregistered buffers
Before this change eager gets we retried on each progress loop. This commit
modifies the protocol to only retry eager gets when another eager get has
completed. This commit also cleans up some callback code that is no longer
needed.
2014-10-09 13:57:21 -06:00
Howard Pritchard
ebc368d26b
remove GNI_RDMAMODE_FENCE bit in GNI_PostRdma
...
The GNI_RDMAMODE_FENCE bit was a left over from
async progress work that is not needed at this point
in the gni BTL. Removing the bit also allows
for the removal of the GNI_CDM_MODE_BTE_SINGLE_CHANNEL
bit from the GNI_CdmCreate call.
2014-10-09 12:41:19 -06:00
Ralph Castain
ce8e33447f
Silence warning
2014-10-09 10:45:25 -07:00
Joshua Ladd
1cabd73522
Adding a new OPAL hash table routine. Please read the algorithm description in opal/class/opal_hash_table.c for more precise details on the design and implementation. This algorithm was contributed by David Linden of H.P. in partnership with Mellanox Technologies. This contribution achieves two objectives:
...
1. It's actually hashing now, whereas the old OPAL hash table was not. Thus, it is a bug fix for and, as such, should be included in the 1.8 series.
2. It is dynamic and can grow and shrink the number of buckets in accordance with job size, whereas the old OPAL hash table had a fixed number of buckets which resulted in poor retrieval performance at large scale.
This scheme has been deployed in the field on very large H.P./Mellanox systems and has been demonstrated to significantly decrease job start-up time (~ 20% improvement) when launching applications directly with srun in SLURM environments. However, neither SLURM nor direct launch are prerequisites to take advantage of this change as any entity that utilizes OPAL hash table objects can benefit (at least partially) from this contribution.
2014-10-09 17:24:23 +02:00
Nadezhda Kogteva
ffa8674e01
Fix bugs in PMI configure: set correct include path, fix test command with multiple conditions.
2014-10-09 17:23:56 +03:00
Elena
b937b31693
fix for multiple spawn test
2014-10-09 06:18:16 +02:00
Elena
3d65799236
pmix: fixed ugly bug which caused many strange hangs
2014-10-09 06:17:03 +02:00
Elena
c905fe9b78
pmix: removed pmix_base_direct modex mca parameter, renamed orte_full_modex_cutoff and ompi_hostname_cutoff to direct_modex_cutoff
2014-10-09 06:15:31 +02:00
Elena
e319c95267
fixes for grpcomm rcd/brucks algorithms
2014-10-09 06:12:26 +02:00
Howard Pritchard
9947758d98
initial thread safety for ugni btl
...
This commit adds initial ugni thread safety support.
With this commit, sun thread tests (excepting MPI-2 RMA)
pass with various process counts and threads/process.
Also osu_latency_mt passes.
2014-10-08 10:13:22 -06:00
Mike Dubman
81917412a8
tools: add flag to avoid git pull during tarball create
...
it breaks jenkins scripts because jenkins fetches the branch and disconnects it from the repo.
hence git pull fails
2014-10-08 12:16:08 +03:00
Nathan Hjelm
23cb00d7d5
Merge pull request #225 from hjelmn/master
...
osc/rdma: fix issue identified by Berk Hess
2014-10-07 12:04:40 -06:00