1
1
Граф коммитов

21190 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
5c81658d58 pmix: fix big endian arch
use the appropriate 64 bits type otherwise data gets incorrectly
truncated on big endian arch
2014-10-15 17:17:09 +09:00
Mike Dubman
ab22dcb875 Merge pull request #229 from nkogteva/master
oshmem mmap: new mca parameters were introduced - sshmem_mmap_anonymous,...
2014-10-15 10:24:29 +03:00
bureddy
3d77abaa1f Merge pull request #234 from bureddy/master
OSHMEM: Fix application abort
2014-10-14 13:07:10 -07:00
Edgar Gabriel
0219c87039 set the fs_ptr to NULL in case of an error, to avoid a malicious free on file_close. 2014-10-14 13:09:06 -05:00
Devendar Bureddy
cbb3e95ce9 OSHMEM: Fix application abort
register on_exit() hook to know exit status inorder to
skip shmem_finalize destructor in case of non-zero exit status
2014-10-14 21:07:28 +03:00
Nathan Hjelm
083a659217 Correct some typos in Cray PMI detection 2014-10-14 10:28:36 -06:00
Alex Mikheev
314ba245e9 Merge branch 'topic/oshmem_spml_ikrit_hw_rdma_channel' 2014-10-14 16:21:06 +03:00
Alex Mikheev
643e64497d OSHMEM: spml ikrit: hw rdma channel is disabled by default 2014-10-14 16:09:51 +03:00
Alex Mikheev
74ab30b738 OSHMEM: spml ikrit: improve mxm transport sanity check
Do not allow combination of transports that is not compliant with
shmem spec. Especially do not allow mix of hw and software atomic
ops

Issue: 4721
Change-Id: Ide382f7510495df3d385f2a5ae5f9def6ef5332c
2014-10-14 15:44:57 +03:00
Alex Mikheev
1bcc88cfb1 OSHMEM: spml ikrit: hardware rdma endpoint
Create additional endpoint that is capable of true
one sided RDMA transfers.

MXM atomics component now uses this endpoint
2014-10-14 15:31:09 +03:00
Alina Sklarevich
1eb6286547 OSHMEM: fix the makefile.
(oshmem/mca/sshmem/base/Makefile.am)
2014-10-14 11:57:46 +03:00
Gilles Gouaillardet
e3f74aca1c Correctly mote the pointer back by the true_lb.
Fixes #231
2014-10-14 16:26:54 +09:00
Nadezhda Kogteva
b2a93943dc oshmem mmap: set lvl4 for sshmem_mmap_anonymous and sshmem_mmap_fixed variables, define MAP_ANONYMOUS returned. 2014-10-14 08:54:44 +03:00
Gilles Gouaillardet
0f983d5a4f add a disable function for coll module 2014-10-14 14:46:36 +09:00
Devendar Bureddy
7a6b4c36b0 HCOLL: Update the proc structure dereference
Update the proc structure dereference to reflect the new opal_proc_t
super field
2014-10-13 20:49:19 +03:00
Devendar Bureddy
b8d2a15be9 HCOLL: by default off 2014-10-13 20:49:09 +03:00
Mike Dubman
ec1f761d8e OSHMEM: add missing help file, got lost during merge. Thanks to Yossi/Igor for finding it.
Change-Id: I466e40a3fea70e8045dd1e897edcc50ccf0451a3

Conflicts:
	oshmem/mca/sshmem/base/Makefile.am
	oshmem/mca/sshmem/base/help-oshmem-sshmem.txt
2014-10-13 16:58:35 +03:00
Alex Mikheev
8fcbcba516 Merge branch 'topic/oshmem_shared_mr_fix' 2014-10-13 15:24:12 +03:00
Alex Mikheev
cd67642183 OSHMEM: sshmem verbs: workaround shared_mr procfs bug
dereg shared_mr before doing dereg on its mr.
2014-10-13 15:14:34 +03:00
Nadezhda Kogteva
c68c4b45b5 Merge remote-tracking branch 'upstream/master' 2014-10-13 15:12:39 +03:00
Mike Dubman
a1db93077d Merge pull request #230 from nkogteva/oshmem_refactor_macro_style
oshmem: refactor of oshmem/mca/sshmem/*.[ch] files to use #if MACRO style
2014-10-13 13:33:32 +03:00
Nadezhda Kogteva
de68d58a9e oshmem: refactor of oshmem/mca/sshmem/*.[ch] files to use #if MACRO style 2014-10-13 13:12:16 +03:00
Nadezhda Kogteva
3e7002e8aa oshmem mmap: copyrights for memheap_base_alloc.c files updated 2014-10-13 11:41:35 +03:00
Nadezhda Kogteva
ce4ee2aa8d oshmem mmap: new mca parameters were introduced - sshmem_mmap_anonymous, sshmem_mmap_fixed and sshmem_base_backing_file_dir - for runtime mmap management.
(cherry picked up from Mellanox-v1.8 repo commit 4c391a)
2014-10-13 11:39:26 +03:00
Mike Dubman
6372ac926c tools: fix cli args parsing
No need to "shift" if argument does not expect parameter on the command line.
2014-10-13 11:33:26 +03:00
Vasily Filipov
a215a4831d MTL/MXM: disable "bulk_connect" by default. 2014-10-13 09:47:56 +03:00
Ralph Castain
3ef94a0675 Per email thread on devel list:
Revert "OPAL: drop dead with core on bad flow. rarely happens with helloworld on large scale."

This reverts commit 86f1d5af3e.

Will be reconsidered via RFC as it represents a significant change in behavior
2014-10-12 21:13:42 -07:00
Mike Dubman
113f40b0ec OSHMEM: sshmem verbs: allocate memory at fixed address
Use experimental verbs to allocate memory at fixed base
virtual address.

verbs will disqualify itself if shared_mr is disabled
or not supported and it is impossible to allocate memory
starting at fixed base virtual address.

verbs contig pages allocator did not guarantee fixed va, now it does.
(cherry picked from commit fd77ebd452)

Apply Jeff`s comments

Update with Jeff commits
(cherry picked from commit open-mpi/ompi-release@4dc487fc3d)
2014-10-12 09:53:48 +03:00
Ralph Castain
4d27eb70f2 Extend the dstore framework to include a new "update_handle" API so the attributes of an existing handle can be changed. We can't just open a new handle as the upper layers won't know where to find the info. :-( 2014-10-10 12:40:32 -07:00
Ralph Castain
1ae34da5e5 Add an attributes parameter to the dstore.open function so we can pass directives to the active storage component. This can, for example, include the backing file info for a new shared memory segment. 2014-10-10 12:13:25 -07:00
Ralph Castain
63f619f871 Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build. 2014-10-10 11:39:08 -07:00
Ralph Castain
1be1654e5f Correctly identify the synonym for orte_direct_modex_cutoff as ompi_hostname_cutoff 2014-10-10 06:05:06 -07:00
Gilles Gouaillardet
8eb2d62919 coll/sm: fix an other memory leak 2014-10-10 19:54:45 +09:00
Gilles Gouaillardet
5d44a30111 coll/sm: fix minor memory leaks
port 4488.1.patch attached in #196 to master
2014-10-10 14:21:34 +09:00
Ralph Castain
4fc4a8346b Fix a couple of minor issues. Ensure usock isn't used if the session dirs aren't setup. Protect an oddball case where orte_xml_fp is NULL. 2014-10-09 20:58:46 -07:00
Nathan Hjelm
169a1866b8 Modify Cray PMI check to detect PMI on older systems 2014-10-09 17:01:31 -06:00
Ralph Castain
b1a58726ac Cleanup the PMI m4 syntax with respect to -a, and look for libpmi* so we can pickup both .a, .la, and whatever other extensions that particular system might use. 2014-10-09 14:04:43 -07:00
Nathan Hjelm
a31cf3b740 btl/vader: missing include 2014-10-09 13:57:21 -06:00
Nathan Hjelm
9e0c07e4ce btl/ugni: improve the handling of eager get fragments when the btl runs out
of preregistered buffers

Before this change eager gets we retried on each progress loop. This commit
modifies the protocol to only retry eager gets when another eager get has
completed. This commit also cleans up some callback code that is no longer
needed.
2014-10-09 13:57:21 -06:00
Howard Pritchard
ebc368d26b remove GNI_RDMAMODE_FENCE bit in GNI_PostRdma
The GNI_RDMAMODE_FENCE bit was a left over from
async progress work that is not needed at this point
in the gni BTL.  Removing the bit also allows
for the removal of the GNI_CDM_MODE_BTE_SINGLE_CHANNEL
bit from the GNI_CdmCreate call.
2014-10-09 12:41:19 -06:00
Ralph Castain
ce8e33447f Silence warning 2014-10-09 10:45:25 -07:00
Joshua Ladd
1cabd73522 Adding a new OPAL hash table routine. Please read the algorithm description in opal/class/opal_hash_table.c for more precise details on the design and implementation. This algorithm was contributed by David Linden of H.P. in partnership with Mellanox Technologies. This contribution achieves two objectives:
1. It's actually hashing now, whereas the old OPAL hash table was not. Thus, it is a bug fix for and, as such, should be included in the 1.8 series.

2. It is dynamic and can grow and shrink the number of buckets in accordance with job size, whereas the old OPAL hash table had a fixed number of buckets which resulted in poor retrieval performance at large scale.

This scheme has been deployed in the field on very large H.P./Mellanox systems and has been demonstrated to significantly decrease job start-up time (~ 20% improvement) when launching applications directly with srun in SLURM environments. However, neither SLURM nor direct launch are prerequisites to take advantage of this change as any entity that utilizes OPAL hash table objects can benefit (at least partially) from this contribution.
2014-10-09 17:24:23 +02:00
Nadezhda Kogteva
ffa8674e01 Fix bugs in PMI configure: set correct include path, fix test command with multiple conditions. 2014-10-09 17:23:56 +03:00
Elena
b937b31693 fix for multiple spawn test 2014-10-09 06:18:16 +02:00
Elena
3d65799236 pmix: fixed ugly bug which caused many strange hangs 2014-10-09 06:17:03 +02:00
Elena
c905fe9b78 pmix: removed pmix_base_direct modex mca parameter, renamed orte_full_modex_cutoff and ompi_hostname_cutoff to direct_modex_cutoff 2014-10-09 06:15:31 +02:00
Elena
e319c95267 fixes for grpcomm rcd/brucks algorithms 2014-10-09 06:12:26 +02:00
Howard Pritchard
9947758d98 initial thread safety for ugni btl
This commit adds initial ugni thread safety support.
With this commit, sun thread tests (excepting MPI-2 RMA)
pass with various process counts and threads/process.
Also osu_latency_mt passes.
2014-10-08 10:13:22 -06:00
Mike Dubman
81917412a8 tools: add flag to avoid git pull during tarball create
it breaks jenkins scripts because jenkins fetches the branch and disconnects it from the repo.
hence git pull fails
2014-10-08 12:16:08 +03:00
Nathan Hjelm
23cb00d7d5 Merge pull request #225 from hjelmn/master
osc/rdma: fix issue identified by Berk Hess
2014-10-07 12:04:40 -06:00