openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	beb8d8ce32	pmi return code wtf This commit was SVN r25336.	2011-10-20 17:51:24 +00:00
Ralph Castain	43e35486a4	Correct flag type - thanks George! This commit was SVN r25335.	2011-10-20 04:00:13 +00:00
Nathan Hjelm	e1e8837992	add a uintptr_t to the seg_key union This commit was SVN r25334.	2011-10-19 21:48:52 +00:00
George Bosilca	78751b3b2d	Put back the OPI errors after the ORTE one. This commit was SVN r25333.	2011-10-19 20:57:13 +00:00
Ralph Castain	84713d5a84	Fix singletons again - must have been broken for a very long time, which only shows how little anyone cares about this capability. This commit was SVN r25332.	2011-10-19 20:19:08 +00:00
Ralph Castain	b44f8d4b28	Complete implementation of the ess.proc_get_locality API. Up to this point, the API was only capable of telling if the specified proc was sharing a node with you. However, the returned value was capable of telling you much more detailed info - e.g., if the proc shares a socket, a cache, or numa node. We just didn't have the data to provide that detail. Use hwloc to obtain the cpuset for each process during mpi_init, and share that info in the modex. As it arrives, use a new opal_hwloc_base utility function to parse the value against the local proc's cpuset and determine where they overlap. Cache the value in the pmap object as it may be referenced multiple times. Thus, the return value from orte_ess.proc_get_locality is a 16-bit bitmask that describes the resources being shared with you. This bitmask can be tested using the macros in opal/mca/paffinity/paffinity.h Locality is available for all procs, whether launched via mpirun or directly with an external launcher such as slurm or aprun. This commit was SVN r25331.	2011-10-19 20:18:14 +00:00
George Bosilca	1bc5da0911	These are supposed to be OPAL level errors. This commit was SVN r25326.	2011-10-19 14:22:09 +00:00
Ralph Castain	72a4b0bd8a	Fix constants This commit was SVN r25325.	2011-10-19 14:14:58 +00:00
George Bosilca	a5f24bcdcf	The error here is meaningless. This commit was SVN r25324.	2011-10-19 13:04:46 +00:00
George Bosilca	efd88e10d7	Cleanup the error codes. Get rid of all the useless ones, and mark the distinction between ORTE and OMPI errors. This commit was SVN r25323.	2011-10-19 03:51:53 +00:00
Ralph Castain	2958f3de34	Add some clarifying comments and a small efficiency improvement This commit was SVN r25322.	2011-10-18 18:30:43 +00:00
Ralph Castain	b771114086	Fix the fix :-) If the errmgr is going to try and hold the orted until all routes and children are gone, then the exit cmd must do the same. Otherwise, the orted exits immediately without waiting for routes to be dismantled, which is why we don't see the connections close. Also cleanup some diagnostics and add some debug to more clearly see what's going on. This commit was SVN r25321.	2011-10-18 17:56:37 +00:00
Nathan Hjelm	adf950f4ab	LANL: don't use per-peer receive queues on rr-class This commit was SVN r25320.	2011-10-18 16:45:44 +00:00
Nathan Hjelm	9155f1ba1f	LANL: up cq size This commit was SVN r25319.	2011-10-18 16:40:35 +00:00
Nathan Hjelm	e16559983e	LANL: match tlcc QP settings with tlcc2 This commit was SVN r25318.	2011-10-18 16:32:05 +00:00
Nathan Hjelm	607d387088	LANL: use only shared receive queues on tlcc This commit was SVN r25317.	2011-10-18 16:23:46 +00:00
Nathan Hjelm	90c55c5b35	LANL: use pmi on tlcc This commit was SVN r25316.	2011-10-18 16:12:14 +00:00
Ralph Castain	ae8e556d14	Okay, once again let's fix the vpid calculator. Identified problem with prior commit (some rmaps components already place their procs in the jdata->procs array, and others don't), so account for those variations. This commit was SVN r25315.	2011-10-18 15:50:11 +00:00
George Bosilca	749b63c09d	Provide a generic fix for the termination issue instead of r25248. The termination condition is to be checked at the daemon/HNP level not down in the routing. This commit was SVN r25313. The following SVN revision numbers were found above: r25248 --> open-mpi/ompi@b42ccc89b8	2011-10-18 03:07:37 +00:00
George Bosilca	c453614f8b	A more meaningful name for this function (mpi_proc_complete_init instead of ompi_proc_set_arch). Change the comment to reflect the real behavior of the function. This commit was SVN r25312.	2011-10-18 02:54:38 +00:00
George Bosilca	f28890fbb7	Revert r25302 as it break the --bynode option. This commit was SVN r25311. The following SVN revision numbers were found above: r25302 --> open-mpi/ompi@d7a8553179	2011-10-18 02:48:17 +00:00
Ralph Castain	0bf4f48aa3	Don't need priority in this framework This commit was SVN r25308.	2011-10-17 22:39:15 +00:00
Ralph Castain	2fdd9c6dea	Ensure mpirun doesn't pick this component This commit was SVN r25307.	2011-10-17 22:28:28 +00:00
Nathan Hjelm	ad9005820f	fixed typo in last commit This commit was SVN r25306.	2011-10-17 21:35:22 +00:00
Nathan Hjelm	e6ead53eef	add opal_atomic_swap_xx for amd64 This commit was SVN r25305.	2011-10-17 21:33:44 +00:00
Ralph Castain	8f0ef54130	Complete implementation of pmi support. Ensure we support both mpirun and direct launch within same configuration to avoid requiring separate builds. Add support for generic pmi, not just under slurm. Add publish/subscribe support, although slurm's pmi implementation will just return an error as it hasn't been done yet. This commit was SVN r25303.	2011-10-17 20:51:22 +00:00
Ralph Castain	d7a8553179	Fix the mapping algo for computing vpids - it was borked for bynode operations when using nperxxx directives This commit was SVN r25302.	2011-10-17 19:49:04 +00:00
Ralph Castain	e7f6be5385	Unused variable This commit was SVN r25301.	2011-10-17 18:59:22 +00:00
Rainer Keller	ec6ac33b75	- On Linux x86-64 with intel compiler v12.1, any ompi-app fails before calling main(): >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> which ompi_info ~/openmpi-1.5.4/COMPILE-intel-12.1.0/usr/bin/ompi_info xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> ompi_info Segmentation fault xxx:~/openmpi-1.5.4/COMPILE-intel-12.1.0> gdb usr/bin/ompi_info ... (gdb) run Starting program: ... Program received signal SIGSEGV, Segmentation fault. opal_memory_ptmalloc2_int_malloc (av=0x7ffff7fe83d8, bytes=4102) at ../../../../../opal/mca/memory/linux/malloc.c:4080 4080 /* remove from unsorted list */ (gdb) where #0 opal_memory_ptmalloc2_int_malloc (av=0x7ffff7fe83d8, bytes=4102) at ../../../../../opal/mca/memory/linux/malloc.c:4080 #1 0x00007ffff7c232b9 in opal_memory_linux_malloc_hook (sz=140737354040280, caller=0x1006) at ../../../../../opal/mca/memory/linux/hooks.c:687 #2 0x0000003dd96a6871 in __alloc_dir () from /lib64/libc.so.6 #3 0x0000003ddfa053cd in ?? () from /usr/lib64/libnuma.so.1 #4 0x0000003dd8e0e445 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2 ... >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A lot of combinations and trials have been done, yet to no avail. Intel v11.0 worked... Thanks to Hubert Haberstock (Intel) providing the hint in: http://software.intel.com/en-us/forums/showthread.php?t=87132 This was tested on openmpi-1.5.4 and therefore should cmr:v1.5 This commit was SVN r25290.	2011-10-14 20:47:08 +00:00
Ralph Castain	f1a5a26ba0	Minor cleanups This commit was SVN r25289.	2011-10-14 18:46:03 +00:00
Ralph Castain	89a20de474	Remove unused includes. Ensure that the error log is at least always available as we otherwise segfault when reporting errors that occur prior to opening the errmgr framework This commit was SVN r25288.	2011-10-14 18:45:11 +00:00
Ralph Castain	07dbbc6513	Sorry for mid-day correction - but folks are trying to test this, and we didn't realize it was still ignored :-( This commit was SVN r25287.	2011-10-14 16:19:20 +00:00
Ralph Castain	7bb294f917	Fix debug flags - thanks Terry! This commit was SVN r25286.	2011-10-14 16:10:21 +00:00
Ralph Castain	054c485dcf	Cleanup a race condition and an unreliable method that caused us to not properly handle procs that trapped sigterm for cleanup purposes while ORTE was trying to kill them. Thanks to Rick Payne and Ian Wells of Cisco for spending weeks chasing this down. Fix a termination issue that caused procs local to mpirun to not be killed if they weren't calling into the library. Thanks to Terry Dontje for spending countless hours chasing his tail on this one! :-( This commit was SVN r25285.	2011-10-14 15:39:54 +00:00
Ralph Castain	2eaadcfab9	Remove unused variable This commit was SVN r25284.	2011-10-14 15:32:18 +00:00
Vishwanath Venkatesan	8dd07bdceb	Removing .ompi_ignore and .ompi_unignore from fs/pvfs2 and fbtl/pvfs2 This commit was SVN r25283.	2011-10-14 00:40:11 +00:00
Ralph Castain	08fa9e1c6a	Correct include path This commit was SVN r25282.	2011-10-13 23:46:52 +00:00
Vishwanath Venkatesan	8f6b29e95b	Fixing the default file view issue and merging contiguous lengths and offsets for explicit offset case. This commit was SVN r25281.	2011-10-13 19:50:45 +00:00
Jeff Squyres	2c6254b70d	Second change from Intel. This commit was SVN r25279.	2011-10-12 23:26:34 +00:00
Jeff Squyres	28118d0611	Updte the parameters for the Intel iWARP devices, per request from Faisal Latif <faisal.latif@intel.com>. This commit was SVN r25278.	2011-10-12 22:58:30 +00:00
Ralph Castain	69a0882207	Correctly setup hwloc when passing a topology from an external source This commit was SVN r25277.	2011-10-12 21:34:46 +00:00
Ralph Castain	fe661eb76f	Update platform file to use pmi This commit was SVN r25276.	2011-10-12 20:59:45 +00:00
Ralph Castain	b96ef2161d	Complete the PMI support. Generalize PMI operations to support both slurm and non-slurm environments. Correct some configuration issues - we really only want the PMI integration at the individual component level. Ensure that the pmi grpcomm component doesn't get selected when launching via mpirun by setting its priority below the bad component. Only verified in a slurm environment as that's all I have access to... This commit was SVN r25275.	2011-10-12 20:59:25 +00:00
Ralph Castain	634f83fc52	Fix the routed components. All had errors, some completely broken. You cannot test 0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID) when epoch is not configured as this will always return true. This caused get_route to return an error in all non-binomial routed modules, and caused all components to return an error when delete_route was called. So protect the checks with ORTE_ENABLE_EPOCH so we get the correct behavior. This commit was SVN r25274.	2011-10-12 20:18:57 +00:00
Brian Barrett	d8b5b544ad	Update list name to match change in spec This commit was SVN r25273.	2011-10-12 20:09:39 +00:00
Ralph Castain	24a46f2acb	These were missed by prior commit - need to remove lingering references to OPAL_HWLOC_HAVE_XML This commit was SVN r25272.	2011-10-12 16:54:03 +00:00
Jeff Squyres	ff97b57c90	Change the names to be slightly more descriptive. This commit was SVN r25271.	2011-10-12 16:07:09 +00:00
Rainer Keller	4e6a6fc146	- Check, whether the compiler supports __builtin_clz (count leading zeroes); if so, use it for bit-operations like opal_cube_dim and opal_hibit. Implement two versions of power-of-two. In case of opal_next_poweroftwo, this reduces the average execution time from 83 cycles to 4 cycles (Intel Nehalem, icc, -O2, inlining, measured rdtsc, with loop over 2^27 values). Numbers for other functions are similar (but of course heavily depend on the usage, e.g. opal_hibit() with a start of 4 does not save much). The bsr instruction on AMD Opteron is also not as fast. - Replace various places where the next power-of-two is computed. Tested on Intel Nehalem Cluster with openib, compilers GNU-4.6.1 and Intel-12.0.4 using mpi_testsuite -t "Collective" with 128 processes. This commit was SVN r25270.	2011-10-11 22:49:01 +00:00
George Bosilca	74c88a9e48	This was never used (sm_ctl_header). This commit was SVN r25267.	2011-10-11 20:37:00 +00:00
George Bosilca	ca6c282f23	Small cleanups in the SM BTL. This commit was SVN r25266.	2011-10-11 20:32:10 +00:00

1 2 3 4 5 ...

16078 Коммитов