1
1
Граф коммитов

1867 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
bd9d185951 pmix/cray: remove workaround for OBJ_RELEASE
Per feedback from rhc, manually set the base_ptr member
of the opal_buffer_t variable to NULL prior to calling
OBJ_RELEASE.  A similar feature of opal_dss.load also
exists so likewise reset the base_ptr to NULL prior to
invoking it.

Hopefully the opal_buffer_t struct does not change
frequently.

Minor cleanups to reduce output when pmix_base_verbose
mca paramater is set.
2015-02-13 07:47:26 -08:00
Jeff Squyres
f7b4b23383 usnic: ensure to NULL-terminate the string/not overflow
This was CID 1269921.
2015-02-12 13:41:30 -08:00
Jeff Squyres
8febd41a39 usnic: fix minor memory leak
This was CID 1269859.
2015-02-12 13:41:30 -08:00
Jeff Squyres
4c074da1c2 usnic: fix minor memory leak
This was CID 1269853.
2015-02-12 13:41:30 -08:00
Jeff Squyres
a7ce2d406c usnic: don't bother comparing unsigned values for <0
This was CID 1269812.
2015-02-12 13:41:30 -08:00
Jeff Squyres
caacc6ad91 usnic: properly differentiate data pool vs. malloc
usnic_fls() can actually return 0, leading us to incorrectly free() a
buffer instead of OMPI_FREE_LIST_RETURN_MT'ing it.

So add an explicit bool in the struct that tracks whether the buffer
came from malloc or a freelist.

This was CID 1269660.
2015-02-12 13:41:30 -08:00
Jeff Squyres
3b39535ebb usnic: ensure that the string is NULL-terminated
This was CID 1269666.
2015-02-12 13:41:30 -08:00
Jeff Squyres
41c6e26a38 usnic: ensure the copied string is NULL-terminated
This was CID 1269667
2015-02-12 13:41:30 -08:00
Jeff Squyres
81585c0a7c usnic: strengthen the check-if-accept()-failed test
This was Coverity CID 1269801.
2015-02-12 13:41:30 -08:00
Jeff Squyres
117e6feaa1 shmem sysv: ensure we don't shmdt(NULL)
This was CID 71999.
2015-02-12 13:41:30 -08:00
Jeff Squyres
6d3a84514f mca_base_cmd_line.c: fix minor memory leak
This was CID 1269874.
2015-02-12 13:41:29 -08:00
Jeff Squyres
f8e334357d mca_base_pvar.c: protect removal from list
Only remove it from the list if it is actually on the list.

This was CID 1269758.
2015-02-12 13:41:29 -08:00
Nathan Hjelm
f1dc29b145 btl/vader: fix modex size when xpmem is in use
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-12 14:06:24 -07:00
Nathan Hjelm
49ba150972 mca/base: fix path string parsing
CID 993709
2015-02-12 13:03:46 -07:00
Jeff Squyres
00c878957c mca_base_var.c: add debug check for another programming error
Coverity alerted us to the fact that there are places where
the synonym_for param is hard-coded to -1 when calling
register_variable().  It would be a coding error if synonym_for==-1
and (flags & MCA_BASE_VAR_FLAG_SYNONYM)>0, so let's add that to the
debug-only check at the top of the function.

This was CID 993717.
2015-02-12 10:24:02 -08:00
Jeff Squyres
332943f1c3 pstat linux: ensure to close the file
This was CID 71983.
2015-02-12 10:24:02 -08:00
Jeff Squyres
6a64fe85a1 pstat linux: ensure read() returns >=0
This was CID 71182.
2015-02-12 10:24:02 -08:00
Jeff Squyres
8be0e0b0ca usnic: don't close fp upon error
Let the caller close fp.  Properly check for errors when calling
subroutines.

This was Coverity CID 1269995.
2015-02-12 10:24:01 -08:00
Howard Pritchard
0cf2b478e0 Merge pull request #391 from hppritcha/topic/cray_pmi_kvs
pmix/cray: initial kvs removal work
2015-02-11 19:55:34 -07:00
Howard Pritchard
9955834ff1 pmix/cray: initial kvs removal work
Remove use of the Cray PMI KVS - which is designed for a lighweight
MPI that exchanges only a minimimal amount of connection info
(about 128 bytes per rank) - within cray/pmix.  Use Cray PMI
collective extensions instead.

This is the first of several steps to accelerate launch of
Open MPI on Cray systems using either native aprun or nativized
slurm.
2015-02-11 15:14:55 -08:00
Rolf vandeVaart
08dceda2c0 Fix logic for handling priority and eager RDMA. There was some refactoring that was done
in this code and it ended up changing the logic that is used to set up eager RDMA.
Rather than setting up eager RDMA with a high priority message, it did it the other
way around.  For some reason, CUDA-aware support did not like this.  So, basically,
restore the logic to the way it was prior to the refactoring.  The refactoring did not
intend to change this.  Lightly reviewed by hjelmn.
2015-02-11 16:38:36 -05:00
Jeff Squyres
4f1996df5d various: remove $(LTDLINCL) from Makefile.am's that didn't need it 2015-02-11 12:25:20 -08:00
Ralph Castain
3de8c5c7c6 Cleanup the munge support - the credential cannot be reused for multiple connections 2015-02-10 20:34:35 -08:00
George Bosilca
e173f9b0c0 Somehow we lost one of the most critical parameter
allowing the PML to decide how to order the different
interconnects. Bring it back !
2015-02-10 20:32:05 -05:00
Ralph Castain
3ae3b96c17 Fix master compilation - a buried header dependency must have been removed. 2015-02-10 07:22:10 -08:00
Mike Dubman
6816e3421f Merge pull request #377 from regrant/ib_wr_fix
fix problem with get_pathrecord posting too many recv requests
2015-02-10 08:47:23 +02:00
Ralph Castain
bef830efef Fix debug output 2015-02-09 20:49:04 -08:00
Ralph Castain
07134f5b17 Add munge security 2015-02-09 20:49:03 -08:00
Ralph Castain
a3275aa867 Once again, fix the blasted singleton comm_spawn 2015-02-05 17:34:25 -08:00
Jeff Squyres
0dbbffb753 pmix_base_frame: use the "= { 0 }" initializer
Per open-mpi/ompi#381, convert the specific intialization of opal_pmix
to use the generic "= { 0 }" initializer.  This form can be used to
initialize any type when the intent is just to zero out / assign
*some* value.
2015-02-05 17:51:06 -05:00
Ralph Castain
4d882796b6 Silence warnings 2015-02-05 11:41:00 -08:00
Howard Pritchard
e508a4078e Merge pull request #376 from regrant/ib_error_fix
fixes OpenIB connect error reporting for ibv_* calls that return an errn...
2015-02-04 10:22:03 -07:00
Jeff Squyres
621af3aa07 pmix_base: fix global opal_pmix symbol for static linking on OS X
OS X has weirdness when static linking.  If a symbol is not
initialized, it is put into the common block section, and Weird Things
happen (linking when trying to using that global symbol will fail).
If you initialize the variable, it goes into a different section (and
linking to it will work).

This link (that might go stale someday) has some information about OS
X linker scope and treatment of symbol definitions:
https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachOTopics/1-Articles/executing_files.html#//apple_ref/doc/uid/TP40001829-98432-TPXREF120

Fixes #375.
2015-02-04 12:12:31 -05:00
Ryan Grant
de93497789 fix problem with get_pathrecord posting too many recv requests 2015-02-04 09:53:58 -07:00
Ryan Grant
5d5e9bc1f8 fixes OpenIB connect error reporting for ibv_* calls that return an errno 2015-02-04 09:09:14 -07:00
Jeff Squyres
a3728f09af libfabric: add another missing file to the Makefile.am 2015-02-04 04:02:27 -08:00
Jeff Squyres
66a680879e libfabric: fix header file name in Makefile.am 2015-02-03 19:41:25 -08:00
Jeff Squyres
cb7cc171f9 usnic: update README.txt notes
Update notes about copying the usnic BTL between master and the v1.8
branch.
2015-02-03 15:54:36 -08:00
Jeff Squyres
edf7232e00 usnic: enable building with an external libfabric 2015-02-03 13:46:06 -08:00
Jeff Squyres
bfa54d5d7b usnic: update to match new libfabric 2015-02-03 13:46:06 -08:00
Jeff Squyres
d2490d2fd8 libfabric: update Makefile.am to match new libfabric drop 2015-02-03 13:46:05 -08:00
Jeff Squyres
3dc0abfbc4 libfabric: update to (just past) 1.0rc1
Updated to Github ofiwg/libfabric@6b005d0d19.
2015-02-03 13:46:05 -08:00
Ralph Castain
d3267c200f Add missing OMPI-changes to libevent 2.0.22 2015-02-02 20:57:40 -08:00
Jeff Squyres
965ccab6cc libfabric: remove a few warnings
Embedding libfabric is a temporary measure; I'm removing some warning
notifications so that the output isn't so cluttered (we're getting
the real warnings fixed upstream, but the OMPI community doesn't
really care/need to see the warnings in the meantime).
2015-01-29 17:38:02 -08:00
Todd Kordenbrock
37e6096fe7 Copyright update. 2015-01-29 11:08:13 -06:00
Todd Kordenbrock
ca30e129e8 Add the option to use the Portals4 logical to physical table.
This commit adds an MCA variable to select Portals4 logical
addressing, populates the logical-to-physical mapping table and
initializes the NI in this mode.
2015-01-29 11:08:13 -06:00
George Bosilca
b9a63cbe7a One less warning. 2015-01-27 13:25:55 -05:00
Ralph Castain
294ebc907a Fix singleton operations so they can work inside a slurm environment 2015-01-27 09:29:42 -06:00
Ralph Castain
ba25e8a0ce Fix singletons 2015-01-27 09:29:42 -06:00
Ralph Castain
028b00154d Complete implementation of the schizo framework to support OMPI component 2015-01-27 09:29:42 -06:00
Jeff Squyres
436223959d usnic: update to match new libfabric APIs 2015-01-24 05:49:36 -08:00
Jeff Squyres
7d5755f62b libfabric: update to ofiwg/libfabric@b3f7af4c67
Pull down a new embedded copy of libfabric from
https://github.com/ofiwg/libfabric.
2015-01-24 05:48:48 -08:00
Howard Pritchard
056daa05bf btl/ugni: use PMIX_GLOBAL for modex_send in ugni
Using PMIX_REMOTE is not the right thing for ugni
BTL when its possible that spawned ranks end up
on the same node as some of the spawnee ranks.
2015-01-22 06:53:45 -08:00
Gilles Gouaillardet
9f80aa2d28 btl/openib: regression fix when rdmacm or udcm are disabled
This fixes a regression introduced in open-mpi/ompi@661c35ca67

Thanks to Mark Santcroos for reporting this issue
2015-01-20 11:31:50 +09:00
Rolf vandeVaart
66f6026214 Improve error message to help user figure out what to do 2015-01-16 13:55:27 -05:00
Jeff Squyres
65a279019e usnic: fix typo in memchecker usage 2015-01-16 09:42:19 -08:00
Jeff Squyres
3969fe3a94 libfabric: ensure wrapper libs are loaded for static builds
For static builds, we need to also set
<framework>_<component>_WRAPPER_EXTRA_LIBS so that the wrappers know
what other libraries to add to link executables.
2015-01-16 09:29:52 -08:00
Gilles Gouaillardet
661c35ca67 cleanup dead code caused by the removal of the --with-threads configure option 2015-01-16 19:13:59 +09:00
Nathan Hjelm
006074c48d Merge pull request #332 from hjelmn/openib_updates
Openib updates
2015-01-15 15:05:18 -06:00
Jeff Squyres
d13c14ec82 CSCus22527: fix off-by-one error in checking the number of VFs
Ensure to count *this* process when checking for how many VFs we need
on the local server.

(cherry picked from commit 386c01934e98cb8dcb48ff648ecdfb0c8677baa9)
2015-01-15 11:44:29 -08:00
Jeff Squyres
4685767b2d libfabric: update usnic configury
Use new common m4 macro for choosing between libnl3 and libnl.
2015-01-15 07:12:39 -08:00
Jeff Squyres
400b02e566 libfabric: update to github:ofiwg/libfabric HEAD
Specifically: bbf0f3ea8e92c92a7cee56473ecdbbbb34cceb7d (15 Jan 2015)
2015-01-15 07:11:54 -08:00
Aurélien Bouteiller
f49981bb2a Disable coalescing until pull request #332 gets in. 2015-01-14 14:12:47 -05:00
Nathan Hjelm
cf4975501d rcache/vma: fix parent class of mca_rcache_vma_t
There was a mismatch between the structure for mca_rcache_vma_t and
the OBJ_CLASS_INSTANCE. One was opal_list_item_t and the other was
ompi_free_list_item_t. The super class in the structure looks like it
is the correct one. Changed the superclass in OBJ_CLASS_INSTANCE to
match.
2015-01-14 10:21:24 -07:00
Jeff Squyres
e4e5e7dbc0 usnic: ensure to clean up nicely in case of low resources
If there are not enough resources (e.g., low VFs), we can end up
calling finalize_one_channel() on the same channel multiple times.  So
ensure to NULL out fields that we have freed already so that we do not
try to free them a second time.

Fixes CSCus26648.
2015-01-13 14:37:31 -08:00
Jeff Squyres
8807ae2497 usnic libfabric: also set the us_netmask_be field.
From libfabric upstream commit ofiwg/libfabric@3976745.

Part of the fix for CSCus22495.
2015-01-13 12:04:57 -08:00
Jeff Squyres
d00cede718 usnic: fix if_include/exclude of CIDR-specified networks
Fix the ordering so that we obtain the usnic netmask information
*before* we do the filtering based on CIDR-specified networks.

Also requires upstream Github libfabric commit 3976745.

Fixes CSCus22495.
2015-01-13 12:04:51 -08:00
Jeff Squyres
a220b92cf8 usnic: fix function name in opal_output 2015-01-13 12:04:07 -08:00
Gilles Gouaillardet
955f3c2730 configury: check existence of the atomic_init function in libfabric
intel compilers implements atomic_init in c++ only,
so disable c11 atomic in libfabric for now
2015-01-13 16:39:41 +09:00
Gilles Gouaillardet
cbe0d26b2d configury: do test the __STDC_NO_ATOMICS__ macro for libfabric 2015-01-13 16:06:37 +09:00
Jeff Squyres
5ed688a074 usnic: enusre that we only get "usnic"-named providers
Also, a minor update to a verbose message.
2015-01-12 13:21:22 -08:00
Jeff Squyres
881b1dcf19 usnic: document libfabric abstractions
Handy tips to remember the libfabric abstractions and what they
correspond to in usnic/VIC terms.
2015-01-09 15:21:51 -08:00
Gilles Gouaillardet
194d9f84d3 btl/usnic: move call to check_reg_mem_basics()
avoid annoying memlock related messages when there is no usnic device.
2015-01-09 11:37:45 +09:00
George Bosilca
1344097d35 Turn OFF the TCP dump mechanism. 2015-01-08 18:50:49 -05:00
George Bosilca
8ddd3b3b09 Cleanup the TCP dump mechanism. 2015-01-08 18:50:05 -05:00
Nathan Hjelm
c65f026fee btl/vader: fix typo in xpmem setup 2015-01-08 12:52:38 -07:00
Gilles Gouaillardet
4c29d8e247 btl/openib: silence warning (unused code) 2015-01-08 17:18:07 +09:00
Gilles Gouaillardet
8ab605d9c5 btl/tcp: fix overflow in mca_btl_tcp_endpoint_dump() 2015-01-08 15:40:16 +09:00
Nathan Hjelm
7d206ae769 btl/ugni: fix a couple of bugs
Two fixes:

 - Do not try to return a mailbox to the free list if one wasn't
   allocated.

 - Do not try to tear down IRQ CQs if they were not created.
2015-01-07 13:48:17 -07:00
Dave Goodell
49069bc661 usnic: fix fi_av_insert (ARP resolution) bugs
We had several problems in the old code:

1. We were specifying an arbitrary timeout (100 ms) and then abandoning
   all remaining pending AV insert operations.  We would then free the
   endpoint buffer that we gave to fi_av_insert(), usually causing
   libfabric's progress thread to write to a freed buffer.

2. We were claiming in a show_help message that the timeout was
   controllable via an MCA parameter.  This commit removes that
   parameter, since there's no good method for us to specify a timeout
   like this to libfabric right now.

3. We also weren't waiting for the correct number of fi_av_insert()
   operations to complete.  We were waiting for nprocs, which is
   accidentally fine for 2 procs on separate hosts, but not for most
   other proc counts.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
2015-01-07 08:25:17 -08:00
Gilles Gouaillardet
06e071454e btl/openib: cleanup duplicate code 2015-01-07 14:07:30 +09:00
Gilles Gouaillardet
135ecce0eb btl/openib: rename OPAL_HAVE_XRCD macro into OPAL_HAVE_CONNECTX_XRC_DOMAINS 2015-01-07 13:27:25 +09:00
George Bosilca
bf62bed65f Typo in the poll/epoll ops declaration. 2015-01-06 21:21:25 -05:00
Ralph Castain
a7c5ff2ace Update to libevent 2.0.22-stable 2015-01-06 16:37:25 -08:00
Nathan Hjelm
6733d89cf9 btl/vader: fix return code check when opening ptrace_scope file 2015-01-06 15:17:56 -07:00
Nathan Hjelm
cde79bfa60 btl/openib: misc cleanup (tabs, etc) and put credit code into a common place (was duplicated in the send and sendi paths) 2015-01-06 11:39:23 -07:00
Nathan Hjelm
9bae131589 btl/openib: fix message coalescing
There was a bug in the openib btl handling this valid sequence of
calls:

desc = btl_alloc ();
btl_free (desc);

When triggered the bug would cause either fragment loss or undefined
behavior (SEGV, etc). The problem occured because btl_alloc contained
the logic to modify the pending fragment (length, etc) and these
changes were not corrected if the fragment was freed instead of sent.

To fix this issue I 1) moved some of the coalescing logic to the
btl_send function, and 2) retry the coalesced fragment on btl_free
if it was never sent. This appears to completely address the issue.
2015-01-06 11:39:16 -07:00
Nathan Hjelm
9aaac11648 btl/openib: fix recieve queue source detection 2015-01-06 11:39:11 -07:00
Howard Pritchard
7df648f1cf btl/openib: fix problems from commit b3617e73
For systems with OFED's lacking XRC support, commit b3617e73
broke the build of the openib btl.  This commit addresses
the issues introduced by this commit.
2015-01-06 11:31:12 -07:00
Gilles Gouaillardet
b3617e736e btl/openib: add XRC support with OFED 3.12+
based on an original patch contributed by Bull.
2015-01-06 15:30:52 +09:00
Howard Pritchard
c857cc926c Merge pull request #327 from hppritcha/topic/async_progress
Topic/async progress
2015-01-05 16:20:44 -07:00
Gilles Gouaillardet
9e9261e90a pmix: correctly set locality flags in proc_flags
do not use opal_process_info.cpuset which is not
set at that time.
2014-12-26 15:37:08 +09:00
Howard Pritchard
0a6f841d5f xpmem/config: simple xpmem search on Cray's
Use the pkg-config related m4 functions to find out where
Cray's xpmem.h and libxpmem are located on a system.

With this commit, there is no longer any need to have to
explicitly indicate an xpmem install location on the configure
line, at least for Cray systems running CLE 4.X and 5.X.
2014-12-24 14:40:06 -07:00
Howard Pritchard
065c756860 btl/ugni: improve error handling
Improve error handling when pthread functions return errors.
Remove stale debug code.
2014-12-24 11:50:24 -07:00
Howard Pritchard
f8e354ce00 btl/ugni: add a request_progress_thread mca param
Replace temporary environment variables with a MCA
parameter for the ugni btl.  A user wishing to
use the ugni btl async. progress thread needs to
set the request_progress_thread param to true.
For example, using env. variable format:

export OMPI_MCA_btl_ugni_request_progress_thread=1
2014-12-24 11:50:24 -07:00
Howard Pritchard
8b250cc15b btl/ugni: more debug cleanup 2014-12-24 11:50:24 -07:00
Howard Pritchard
f0c519517b btl/ugni: switch to using opal_progress
Switch to invoking opal_progress from the async progress
thread, rather than calling ugni btl specific progress.
2014-12-24 11:50:24 -07:00
Howard Pritchard
47747c1b27 btl/ugni: remove some debug output 2014-12-24 11:50:24 -07:00
Howard Pritchard
2d14c2a204 btl/ugni: switch to using tx cq irqs for rdma
Verified via testing with unit tests, etc. that
in fact BTE TX descriptors using CQs configured to
generate IRQs were in fact working correctly on Cray XC.  Disable
send message back to self and just use IRQs generated
by completion of TX descriptors posted to BTE.
2014-12-24 11:50:24 -07:00
Howard Pritchard
acd07d98da btl/ugni: turn off chatty debug in irq cq setup 2014-12-24 11:50:24 -07:00
Howard Pritchard
0dec2f4af7 btl/ugni: mark btl frags for irqs as btl owned
Make sure frags allocated to generate irqs to wake
the progress thread, etc. set the MCA_BTL_DES_FLAGS_BTL_OWNERSHIP
flag.
2014-12-24 11:50:23 -07:00
Howard Pritchard
d188f0bc6f btl/ugni: honor enable_mpi_threads
Honor enable_mpi_threads setting to enable the ugni btl
async progress thread.  If the app doesn't request thread-multiple
the thread will not be created.
2014-12-24 11:50:23 -07:00
Howard Pritchard
43cdcb745f btl/ugni: add missing mutex lock 2014-12-24 11:50:23 -07:00
Howard Pritchard
83bcbd1cf9 btl/ugni: compilation fixes
Fix compilation problems in ugni btl associated with
async progress additions.
2014-12-24 11:50:23 -07:00
Howard Pritchard
13ab8a9e5a btl/ugni: use MCA_BTL_DES_FLAGS_SIGNAL
Use MCA_BTL_DES_FLAGS_SIGNAL frag flag to indicate
whether or not an interrupt needs to be delivered
along with a control message going through smsg.
2014-12-24 11:50:23 -07:00
Howard Pritchard
3fc7b389ff initial async progress changes for gni 2014-12-24 11:50:23 -07:00
Devendar Bureddy
ccafc62c07 OMPI: btl openib: fix max registarable memory caluclation
- by default allow to register maximum possible (i.e 2 * total_memory)
      memory. This beheviour can be turned off using mca parameter
      "btl_openib_allow_max_memory_registration"

    - In fallback case, use device specific parameters to calulate
      memory limit.
2014-12-23 23:35:54 +02:00
Howard Pritchard
ffbf9738a3 btl/vader: disable SGI UV xpmem for now
This commit allows master to build again on SGI UV systems.

Fixes #322
2014-12-23 12:04:25 -07:00
Gilles Gouaillardet
f6da257477 configury: test external hwloc version is 1.8 or greater
hwloc_topology_dup is only available from hwloc 1.8
2014-12-22 13:42:38 +09:00
Jeff Squyres
40dd4c5b76 configury: manually remove some stamp-h? files
Due to what might be a bug in Automake, we need to remove stamp-h?
files manually.  See
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19418.
2014-12-20 08:32:57 -08:00
Jeff Squyres
d5b3e5802e libfabric configury: add more tests
Properly test for some dependent libraries; don't just assume
elsewhere in Open MPI's configury will find those libraries.  Also
consolidate some CPPFLAGS and clarify some comments.
2014-12-20 08:32:47 -08:00
Jeff Squyres
012e008649 libfabric configury: make AC_CONFIG_FILES be unconditional
Also add the generated config.h file to .gitignore.
2014-12-20 08:32:47 -08:00
Jeff Squyres
45ef0352d7 libfabric: do a proper check for intrinsic atomics 2014-12-20 08:32:46 -08:00
Jeff Squyres
ff1364cbe4 Revert "libfabric: add missing header file"
That wasn't a missing header file; in fact, it should have been
.gitignored!

This reverts commit 35bf5fc60c.
2014-12-19 17:39:30 -08:00
Jeff Squyres
35bf5fc60c libfabric: add missing header file 2014-12-19 17:33:11 -08:00
Jeff Squyres
e0f660cb9e libfabric: fix clang compile error in usnic provider
From ofiwg/libfabric@0078c93ae4
2014-12-19 15:45:16 -08:00
Jeff Squyres
75797c4f30 libfabric: update embedded libfabric configury
To support the newly-copied libfabric downloaded from github
ofiwg/libfabric@8da3957de3.
2014-12-19 14:45:30 -08:00
Jeff Squyres
e2362988a9 libfabric: update to ofiwg/libfabric@8da3957de3
Pull down a new embedded copy of libfabric from
https://github.com/ofiwg/libfabric.
2014-12-19 14:45:21 -08:00
Howard Pritchard
91b0d03bf2 pmix/cray: remove dead code 2014-12-19 13:08:23 -08:00
Ralph Castain
123fdd603f If we are using hwthread cpus, then default to binding there, letting the user override to whatever they want 2014-12-19 08:04:28 -08:00
Rolf vandeVaart
26482db736 Bump up max send size. Gives much better performance for GPU transfers while only decreasing host transfers by a small amount. 2014-12-18 13:22:58 -08:00
Jeff Squyres
c621d1e622 libfabric: don't LIBADD the common library in the static case
Adding the libfabric common library in the --disable-dlopen case will
result in duplicate symbols.
2014-12-18 11:04:08 -08:00
Jeff Squyres
140bb3d421 hwloc configure: fix typo -- add missing $
Arrgh!  Missed a "$" in the last commit, making the test always
false.
2014-12-18 10:25:43 -08:00
Jeff Squyres
be6d46490f hwloc: only add CPPFLAGS if hwloc is actually being built
As pointed out by @ggouaillardet, we were adding some unnecessary -I
flags to CPPFLAFGS when --without-hwloc was being used.  This commit
slightly updates the hwloc191 component configury to only add such
things when the component is, in fact, going to be
compiled/installed.
2014-12-18 08:56:49 -08:00
Jeff Squyres
c205c70f39 usnic libfabric: remove useless "config.h" includes
This change was also committed upstream in libfabric.
2014-12-18 08:47:59 -08:00
Jeff Squyres
269d7f9713 openib: don't use opal_using_threads() in component_init
Use the flag that was passed in, instead.
2014-12-17 15:08:43 -08:00
Jeff Squyres
c1b43b6753 libfabric: the LIBADD should be unconditional
The LIBADD for the common libfabric library does not belong down in
the providers; it needs to be set when the libfabric core itself
decides to build.
2014-12-17 14:02:08 -08:00
Jeff Squyres
f1a5d3a90d configury: propagate a libtool shared lib version for libfabric 2014-12-17 13:36:01 -08:00
Jeff Squyres
d6f059f538 configury: add some descriptive output messages in configure
Ensure that the ofi MTL and the usnic BTL have good descriptive output
messages in configure.
2014-12-17 13:36:01 -08:00
Jeff Squyres
6edc19d78d libfabric: ensure that shell variables are initialized
Ensure that the <provider>_happy shell variables are initialized to
0.  Without this, the --without-libfabric case would leave them
initialized, resulting in "test: -eq operator expecting a value" kinds
of errors.
2014-12-17 13:36:01 -08:00
Rolf vandeVaart
f55de452ab Change the way we register the sm memory pool with CUDA. Rather than just registering local free lists, register the entire pool as the local process does not know which memory the remote processes are using for free lists. Fixes performance problem we were seeing with copying out of memory (since host piece was not pinned). 2014-12-17 14:21:34 -05:00
George Bosilca
830df07202 Fix the indentation. 2014-12-16 16:07:42 -05:00
George Bosilca
146ab96e29 These variables are now unnecessary. 2014-12-16 16:05:00 -05:00
Aurélien Bouteiller
ee3b090316 The fallback case when yama is not installed was not correct in CMA vader 2014-12-16 14:39:14 -05:00
Aurélien Bouteiller
0bf860ef02 indentation 2014-12-16 14:22:26 -05:00
Jeff Squyres
95da4a5a0e usnic: no longer use opal_using_threads()
Instead, use the flag that is passed in.
2014-12-16 08:49:01 -08:00
George Bosilca
357daa834e Stay on the safe side: Only one thread is allowed
to handle an event_base.
2014-12-15 23:19:51 -05:00
George Bosilca
2fec570fe7 There is no need to keep track of these events. They are scheduled
as triggers in libevent, so one bookkepping should be enough.
2014-12-15 22:35:29 -05:00
George Bosilca
46baab350c The event is automatically deleted by default. 2014-12-15 21:59:20 -05:00
George Bosilca
b01abfa0d7 Don't over-do it! 2014-12-15 21:33:32 -05:00
George Bosilca
f87a4b691b Solve another handshake problem, where one threads was calling del_event
while cleaning up after receiving a zero byte on the connect socket
(localyy started connection), while another was trying to accept a
new connection from the same peer. Create a zero-timed event and
delocalize the accept into a timer_event.
Add support for registering an error callback, that can be used when a
connection is discovered as failed during the initialization process.
2014-12-15 20:27:32 -05:00
George Bosilca
e20413c885 Rearrange the code to remove a compiler complaint about
the missing return from a non-void function.
2014-12-15 15:42:57 -05:00
Ralph Castain
573a574a3c Remove an unused dstore type that was redundant with another one. Define a corresponding PMIX_NODE_ID type (contains the vpid of the daemon hosting the proc) and ensure that the PMIx server includes that info in its process map 2014-12-15 12:11:13 -08:00
Ralph Castain
9658256a98 Restore the passing of the complete job map to the local proc on first get_attr so the info can be used by the MPI layer without continual calls back to the server. We'll find a more memory efficient method later. 2014-12-13 18:44:09 -08:00
George Bosilca
2edbe16c47 Add the necessary infrastructure to allow the dumping of all TCP
informations related to an endpoint (status and all pending fragments).
Do some minor space cleanup.
2014-12-13 01:59:55 -05:00
George Bosilca
5b8616d890 Fix the race condition in endpoint connection initialization. The race
was quite subtle, and only happened on the process with the smallest
guid (as this process will tear down the connection created locally and
replace it with the result of accept). If multiple threads are active in
the system, the deadlock occurs during the recv event deletion as one
thread will hold the recv event lock of the endpoint and try to access
the TCP event base lock, while the other thread will hold the TCP event
base lock while trying to access the recv event lock (in case data is
available on the socket).

The proposed solution let the event callback fail to process the data,
preventing the deadlock and allowing the other thread to always complete
it's job. As the event is not execute the same triggered will trigger
again at the next opportunity, so this solution introduce a minimal
delay in the connection establishement.
2014-12-13 01:45:00 -05:00
Ralph Castain
bffb2b7a4b Correct some issues with variables used before being set 2014-12-12 17:23:32 -08:00
Ralph Castain
0630680f36 Two cleanups required for transfer to 1.8.4:
* Use %d format for the topo signature as some systems apparently have problems with %u
* Use correct variable in show_help message
2014-12-12 17:23:32 -08:00
Howard Pritchard
6cf258638a mpool/udreg: minor comment improvement 2014-12-12 14:05:18 -07:00
Nathan Hjelm
38d66272c5 btl/vader: fix compile on SGI UV 2014-12-12 09:09:01 -07:00