Jeff Squyres
88cbe9c780
.ompi_ignore this component until it can be fixed.
...
This commit was SVN r26930.
2012-07-31 21:02:06 +00:00
Nathan Hjelm
980692804d
oob/ud: don't start listening for ud requests unless we have one usable port
...
This commit was SVN r26929.
2012-07-31 19:00:18 +00:00
Ralph Castain
23c2a315a9
Add missing line to set flag indicating at least one port found
...
This commit was SVN r26914.
2012-07-30 17:54:38 +00:00
Ralph Castain
6285f7d8c0
Per request of Shiqing, restore the ccp components
...
This commit was SVN r26904.
2012-07-29 23:49:59 +00:00
Ralph Castain
c7f9a0fa34
Check for recursive use of mpirun - issue error message and abort if detected
...
This commit was SVN r26903.
2012-07-28 21:50:56 +00:00
Ralph Castain
94d11e04fd
Add an intermediate state when the VM is ready so that third party tools can take action prior to mapping/launching apps
...
This commit was SVN r26902.
2012-07-28 15:33:09 +00:00
Shiqing Fan
660188307c
fix an export declaration name
...
This commit was SVN r26895.
2012-07-27 13:26:24 +00:00
Shiqing Fan
42dfbc7d2f
Another CMake scripts update for:
...
correctly generate hwloc library
automatically define OMPI/OPAL/ORTE_OMPORTS for user applications
update the f77 bindings
This commit was SVN r26893.
2012-07-27 11:49:09 +00:00
Ralph Castain
8bc6694a62
Ensure the daemons don't incorrectly declare a failed launch
...
This commit was SVN r26875.
2012-07-26 19:05:06 +00:00
Ralph Castain
07846f12ae
Reconnect the rsh/ssh error reporting code for remote spawns to report failure to launch. Ensure the HNP correctly reports non-zero exit status when ssh encounters a problem.
...
Thanks to Terry for spotting it!
This commit was SVN r26868.
2012-07-25 21:46:45 +00:00
Jeff Squyres
e5cfad0c1a
This variable is only used in FT builds.
...
This commit was SVN r26854.
2012-07-24 12:48:47 +00:00
Shiqing Fan
8c4a3e1269
correct the symbol dllexports for windows build
...
This commit was SVN r26827.
2012-07-22 08:54:50 +00:00
Shiqing Fan
12d99a9ebb
Update the hwloc build on Windows and related files.
...
This commit was SVN r26818.
2012-07-20 12:14:28 +00:00
Abhishek Kulkarni
1ce378b5c6
Make C/R work with nodes > 1. This fix makes sure that the app coordinators send
...
the "ready-to-checkpoint" signal to the global coordinator only after ORTE has
initialized.
This commit was SVN r26795.
2012-07-13 23:37:29 +00:00
Abhishek Kulkarni
1878f276cd
Replace the pattern while(flag) { opal_progress() }; in the C/R code
...
with the ORTE_WAIT_FOR_COMPLETION macro.
This commit was SVN r26794.
2012-07-13 23:31:56 +00:00
George Bosilca
772ec212eb
Fix another compiler warning.
...
This commit was SVN r26775.
2012-07-10 15:57:42 +00:00
Abhishek Kulkarni
eec5a28aa4
More C/R fixes.
...
* Fix a typo introduced by the removal of the notifier framework
* Fix to flush the modex cached data correctly using the orte DB API.
This commit was SVN r26773.
2012-07-10 01:19:46 +00:00
Abhishek Kulkarni
5c58a1c9c1
Fix C/R support in the trunk.
...
Among other things, this patch deals with the following issues:
* fix ompi-checkpoint argument parsing
* ompi-restart -showme prints an extraneous "Restarted child with PID"
message. Move around the debug statement to avoid this.
* fixes for the state machine changes
This commit was SVN r26770.
2012-07-09 23:34:13 +00:00
George Bosilca
ec760454a6
Cleaning ...
...
This commit was SVN r26747.
2012-07-04 21:22:13 +00:00
Ralph Castain
cf4606cdd5
Add debug of nidmap subsystem
...
This commit was SVN r26739.
2012-07-04 00:04:16 +00:00
Ralph Castain
6ae5776904
Cleanup IPV6 build
...
This commit was SVN r26738.
2012-07-04 00:03:50 +00:00
Ralph Castain
1a90471374
Drat - missed the other one
...
This commit was SVN r26718.
2012-07-02 22:18:31 +00:00
Ralph Castain
9a6a969f60
Remove debug
...
This commit was SVN r26717.
2012-07-02 22:18:08 +00:00
Ralph Castain
b83fc41d54
Add a state that allows mpirun or other tools to be notified of a job completion prior to terminating so that alternative actions can be performed.
...
This commit was SVN r26716.
2012-07-02 22:16:32 +00:00
Ralph Castain
e335de3564
Refactor ompi_info, splitting it into parts according to the layer involved. Thus, we call down to the opal layer to get those frameworks and components, and down to the orte layer to get those. Still some abstraction breaks, but they mostly involve renaming of OMPI_foo labels that have been around since before we split the build system by layer.
...
This commit was SVN r26695.
2012-06-28 18:23:34 +00:00
Ralph Castain
8bebf2fa47
Ensure we don't build the MR iof components unless hadoop support is enabled
...
This commit was SVN r26694.
2012-06-28 18:20:15 +00:00
Ralph Castain
9aa821d8b4
Add missing file to tarball
...
This commit was SVN r26688.
2012-06-28 02:57:10 +00:00
Ralph Castain
0dfe29b1a6
Roll in the rest of the modex change. Eliminate all non-modex API access of RTE info from the MPI layer - in some cases, the info was already present (either in the ompi_proc_t or in the orte_process_info struct) and no call was necessary. This removes all calls to orte_ess from the MPI layer. Calls to orte_grpcomm remain required.
...
Update all the orte ess components to remove their associated APIs for retrieving proc data. Update the grpcomm API to reflect transfer of set/get modex info to the db framework.
Note that this doesn't recreate the old GPR. This is strictly a local db storage that may (at some point) obtain any missing data from the local daemon as part of an async methodology. The framework allows us to experiment with such methods without perturbing the default one.
This commit was SVN r26678.
2012-06-27 14:53:55 +00:00
Brian Barrett
b22faedd9d
Remove the Portals4 SHMEM reference implementation runtime support, as we're
...
no longer using the runtime provided by the reference implementation.
Remove the Catamount support from ORTE, since we're no longer supporting
Catamount. Left the Catamount timer component, because I'm not sure whether
it's used on the XTs running CNL.
This commit was SVN r26677.
2012-06-27 14:17:43 +00:00
Josh Hursey
28681deffa
Backout the ORCA commit. :(
...
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.
This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
542330e3a7
Commit of ORCA: Open MPI Runtime Collaborative Abstraction
...
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.
The project is described on the wiki:
https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition
And on this email thread:
http://www.open-mpi.org/community/lists/devel/2012/06/11109.php
This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Ralph Castain
a34f09e67a
Ensure common port is off when not being used
...
This commit was SVN r26666.
2012-06-26 16:09:58 +00:00
Ralph Castain
92527da4e3
Remove unused component
...
This commit was SVN r26660.
2012-06-26 00:49:28 +00:00
Ralph Castain
0103f82918
Turn off the common port for slurm for now
...
This commit was SVN r26656.
2012-06-25 21:55:51 +00:00
Ralph Castain
b990c65a53
Remove another antiquated dss function - the 'size' API isn't used anywhere since the GPR went away
...
This commit was SVN r26646.
2012-06-25 13:33:45 +00:00
Shiqing Fan
6f746cdb33
remove a unused file.
...
This commit was SVN r26645.
2012-06-25 10:17:21 +00:00
Ralph Castain
abe7dd8274
Cleanup the dss by removing unused functions
...
This commit was SVN r26644.
2012-06-23 21:20:09 +00:00
Ralph Castain
9680c52f5e
Add mrplus examples to tarball
...
This commit was SVN r26643.
2012-06-23 02:40:46 +00:00
Jeff Squyres
148ae6d6e3
This commit unifies the configury of some verbs-lovin' components.
...
* Add new configure command line options and deprecate some old ones:
* --with-verbs replaces --with-openib
* --with-verbs-libdir replaces --with-openib-libdir
* If you specify --with-openib[-libdir] without
--with-verbs[-libdir], you'll get a "these options have been
deprecated!" warning, but then they'll act just like
--with-verbs[--libdir].
'''Sidenote:''' Note that we are not renaming any components at this
time, nor are we renaming the top-level OMPI_CHECK_OPENIB m4 macro
(which is pretty strongly tied to the openib BTL and is bastaridzed
by the ofud BTL). Note that there will likely be more changes in
this area coming soon (next week?) when some long-standing changes
move to the SVN trunk: some openib BTL infrastructure will move to
ompi/mca/common, and its configury gets split up / refactored.
We extend our philosophy of other --with-<foo> configure options of
--with-verbs to ''all'' verbs-lovin components:
* If you specify --with-verbs, then all verbs-lovin' components must
configure successfully (or abort). This currently means: OOB ud,
BTL ofud, BTL openib.
* If you specify --with-verbs=DIR, then all verbs-lovin' component
must configure successfully (or abort), and will use DIR to find
verbs headers and libraries.
* If you specify --without-verbs, then all verbs-lovin' components
will be ignored.
This commit also fixes a problem where the --with-openib=DIR form
would not use DIR for ''all'' verbs-lovin' components (I think only
BTL openib and BTL ofud used that DIR). Now all of them do, as does
hwloc (because hwloc has some !OpenFabrics helper functions that
require ibv types from verbs.h).
There's a little new m4 infrastructure worth mentioning:
* If you create a new verbs-lovin' component (i.e., a component that
need verbs), your configure.m4 should
AC_REQUIRE([OPAL_CHECK_VERBS_DIR]).
* You can then use three global shell variables: $opal_want_verbs,
$opal_verbs_dir, $opal_verbs_libdir, which will be set as follows:
* opal_want_verbs will be "yes" and opal_verbs_dir and
opal_verbs_libdir will both be set to directory values, '''OR'''
* opal_want_verbs will be "no" and opal_verbs_dir and
opal_verbs_libdir will both be set empty
This commit was SVN r26640.
2012-06-22 19:53:56 +00:00
Ralph Castain
e6f3586415
Remove the orte notifier framework, per discussion at the devel meeting and follow-up with Jeff (who took the action item)
...
This commit was SVN r26637.
2012-06-22 18:09:23 +00:00
Ralph Castain
60758faa55
Fix data type
...
This commit was SVN r26633.
2012-06-21 23:48:55 +00:00
Ralph Castain
e9591f2563
Fix tree spawn in the rsh/qrsh environment
...
This commit was SVN r26631.
2012-06-21 21:29:28 +00:00
Brian Barrett
9af72072a3
Use MKDIR_P instead of mkdir_p in Makefiles, as MKDIR_P is the only one
...
defined in recent versions of AC/AM.
This commit was SVN r26625.
2012-06-21 16:52:37 +00:00
Ralph Castain
019857b616
Ensure that we don't attempt to use common ports if --disable-static was specified.
...
This commit was SVN r26620.
2012-06-20 03:14:11 +00:00
Ralph Castain
0a713cd27e
Add database framework to ORTE and refactor modex code to utilize it. Create the "hash" db component from the prior modex db code. Leave the other components ignored for now - will activate them later.
...
Modex is still a blocking operation at this point.
This commit was SVN r26618.
2012-06-19 13:38:42 +00:00
Ralph Castain
9e0bb6ae28
Revert r26600 and r26601 for a couple of reasons:
...
1. they modified the OMPI-ORTE interface, which is something I promised to avoid doing unless absolutely necessary, and
2. the framework ident is already in the component name key provided to the modex db. What is missing is the project ident, but as Jeff and I discussed last week, we really need to add that field to the component struct anyway to avoid multi-project collisions on framework names. That will be done over the next couple of weeks as a separate effort.
This commit was SVN r26613.
The following SVN revision numbers were found above:
r26600 --> open-mpi/ompi@5ba4deff07
r26601 --> open-mpi/ompi@0e3094c318
2012-06-16 09:11:03 +00:00
Ralph Castain
9b026c6695
For now, run MTT with the use_common_port option enabled. This would be the desirable scenario for users, especially at scale, so let's see if it creates any issues.
...
This commit was SVN r26609.
2012-06-15 15:46:38 +00:00
Ralph Castain
3c2a03b16d
Update the other routed components to use common ports. Per conversation with Josh, remove the "cm" component.
...
This commit was SVN r26608.
2012-06-15 15:36:08 +00:00
Ralph Castain
96c778656a
Improve launch performance on clusters that use dedicated nodes by instructing the orteds to use the same port as the HNP, thus allowing them to "rollup" their initial callback via the routed network. This substantially reduces the HNP bottleneck and the number of ports opened by the HNP.
...
Restore enable-static-ports option by default - the Cray will have to disable it to get around their library issues, but that's just a warning problem as opposed to blocking the build.
This commit was SVN r26606.
2012-06-15 10:15:07 +00:00
Ralph Castain
0e3094c318
Update the other grpcomm modules to new API
...
This commit was SVN r26601.
2012-06-14 03:28:48 +00:00
Ralph Castain
5ba4deff07
Extend the modex database to support multiple projects and frameworks that might have duplicate component names. No visible API change in the BTL's as it was executed solely in the ompi modex code.
...
This commit was SVN r26600.
2012-06-14 02:55:06 +00:00
Ralph Castain
ecc51d8583
Add missing endif
...
This commit was SVN r26596.
2012-06-12 15:07:09 +00:00
Ralph Castain
078a4667e4
Some more cleanup on direct routed when daemons are involved
...
This commit was SVN r26594.
2012-06-11 23:46:22 +00:00
Ralph Castain
cee5a75d19
Revert the default configuration to no orte progress thread and no libevent thread support until we can get more of the kinks ironed out.
...
This commit was SVN r26593.
2012-06-11 20:52:28 +00:00
Ralph Castain
9506ac1617
Remove debug
...
This commit was SVN r26592.
2012-06-11 20:02:53 +00:00
Ralph Castain
269cb2b8d9
Some cleanup to remove calls to opal_progress when running with orte progress threads, and to ensure that all orte-related events are in the orte event base.
...
This commit was SVN r26591.
2012-06-11 19:59:53 +00:00
Ralph Castain
75e66ad51e
Restore the direct routed component
...
This commit was SVN r26590.
2012-06-11 17:16:02 +00:00
Brian Barrett
7406ef1241
Make all the PMI components depend on the common pmi library and properly
...
install the common pmi library
This commit was SVN r26588.
2012-06-11 15:58:09 +00:00
Ralph Castain
2812579246
Just because we find an IB device does not mean we can get a QP on it. Check to see if we can before we select the UD OOB module for use.
...
This commit was SVN r26587.
2012-06-10 01:42:51 +00:00
Ralph Castain
0442a807c0
Default the OOB to the "ud" component IFF the HNP finds itself on a node with a supported Infiniband device. Ensure that the daemons all pick the matching component by dictating the selection via mca param on the orted cmd line.
...
This commit was SVN r26582.
2012-06-08 01:23:08 +00:00
Ralph Castain
05122a2f93
Make debruijn the default routed component. Update the radix component to "short-circuit" the tree when the job size permits
...
This commit was SVN r26580.
2012-06-08 00:35:36 +00:00
Ralph Castain
ffcca0185a
Remove no longer needed component
...
This commit was SVN r26578.
2012-06-08 00:18:59 +00:00
Ralph Castain
980768965f
Remove unused and unsupported component
...
This commit was SVN r26577.
2012-06-07 23:48:06 +00:00
Ralph Castain
350900f70e
Remove unused and unsupported component
...
This commit was SVN r26576.
2012-06-07 23:47:35 +00:00
Nathan Hjelm
625c8078c3
oob/ud: fix typo
...
This commit was SVN r26569.
2012-06-07 19:21:23 +00:00
Ralph Castain
7a94a52420
No reason not to build this
...
This commit was SVN r26568.
2012-06-07 19:11:44 +00:00
Ralph Castain
5876496f4c
Enable orte progress threads and libevent thread support by default
...
This commit was SVN r26565.
2012-06-07 04:25:00 +00:00
Shiqing Fan
2abf783fa0
Remove a unnecessary definition before the real one.
...
This commit was SVN r26562.
2012-06-06 14:15:39 +00:00
Ralph Castain
166d254d4e
Add new routed component
...
This commit was SVN r26557.
2012-06-06 11:53:12 +00:00
Ralph Castain
d6279fc971
Fix the debugger daemon launch support to fit the new state machine. Treat debugger daemons just like any other job, except that we map them only to nodes where an app process currently exists (as opposed to every node in the system). Trigger breakpoint and rank0 release only after the debugger daemons are in position.
...
This commit was SVN r26556.
2012-06-06 02:01:23 +00:00
Jeff Squyres
0b8849e2c4
Make "mpirun --report-bindings" have a user-friendly output (i.e.,
...
readable by normal human beings, vs. having a bitmap of physical
PU's). Use the new hwloc base prettyprint functions to generate the
output.
This commit was SVN r26533.
2012-06-01 16:35:31 +00:00
Jeff Squyres
99c5afb397
Remove clang compiler warnings.
...
This commit was SVN r26523.
2012-05-29 23:36:06 +00:00
Ralph Castain
b0938a254e
Dont use mutex where it isn't needed
...
This commit was SVN r26521.
2012-05-29 20:21:11 +00:00
Ralph Castain
32b66c166b
Missed one blasted spot
...
This commit was SVN r26520.
2012-05-29 20:20:10 +00:00
Ralph Castain
9bedb25dda
Cleanup some compiler warnings, some of which are actual logic errors
...
This commit was SVN r26519.
2012-05-29 20:11:51 +00:00
Ralph Castain
d7ac424d8d
Silence optimized build warnings
...
This commit was SVN r26518.
2012-05-29 19:55:47 +00:00
Ralph Castain
bf5ec1ac0c
Silence optimized build warnings
...
This commit was SVN r26517.
2012-05-29 19:55:31 +00:00
Shiqing Fan
08d553d7bf
Add a file to the installation list.
...
This commit was SVN r26507.
2012-05-29 13:58:23 +00:00
Ralph Castain
9883f42caf
Add missing commit
...
This commit was SVN r26501.
2012-05-28 02:20:20 +00:00
Ralph Castain
e705de1ce6
Complete nidmap cleanup - we don't know our node until we have unpacked all the jobs since our job is always the last one, so wait until all jobs are unpacked before assigning locality
...
This commit was SVN r26500.
2012-05-27 18:37:57 +00:00
Ralph Castain
be6ed9c2df
Allow partial use of allocations by specifying the max number of daemons (i.e., max VM size) for the job
...
This commit was SVN r26499.
2012-05-27 16:48:19 +00:00
Ralph Castain
c69a04e16b
Cleanup the pidmap decoding for apps to avoid confusion
...
This commit was SVN r26498.
2012-05-27 16:21:38 +00:00
Ralph Castain
31beff6362
Oops - if we don't want the Java bindings, then we really shouldn't be building them :-/
...
Also ensure we don't try to build them if no Java support was found, and error out if the user requests the bindings and we didn't find Java support.
Add a configure flag to skip the Java tests and just force-set the Java support to "disabled"
This commit was SVN r26484.
2012-05-23 19:51:27 +00:00
Ralph Castain
7fb49b1559
Silence warning
...
This commit was SVN r26480.
2012-05-23 13:59:41 +00:00
Ralph Castain
da28a4b0e6
Silence warning
...
This commit was SVN r26479.
2012-05-23 13:59:22 +00:00
Jeff Squyres
7969faf372
Fixes trac:3057: minor update to the man page to state that slot locations
...
in rankfiles use ''physical'' device indexes (vs. logical indexes).
This commit was SVN r26478.
The following Trac tickets were found above:
Ticket 3057 --> https://svn.open-mpi.org/trac/ompi/ticket/3057
2012-05-23 11:43:33 +00:00
Nathan Hjelm
b9959a95cd
ack! one more
...
This commit was SVN r26472.
2012-05-22 20:52:52 +00:00
Nathan Hjelm
f2d4e95429
doh! add missing include
...
This commit was SVN r26471.
2012-05-22 20:49:13 +00:00
Nathan Hjelm
cdc3c87ba6
move pmi init/finalize into a common component
...
This commit was SVN r26470.
2012-05-22 15:15:39 +00:00
Nathan Hjelm
78b8b3cf76
bug fix: actually close ess components
...
This commit was SVN r26469.
2012-05-22 15:09:18 +00:00
Ralph Castain
b217124bd8
Symlink instead of copy
...
This commit was SVN r26464.
2012-05-21 23:07:48 +00:00
Ralph Castain
da3873af6f
Rename the mapreduce tool to "mr+" per the marketing types
...
This commit was SVN r26463.
2012-05-21 21:17:44 +00:00
Nathan Hjelm
6eeca66475
add an option to enable static ports. diabled by default
...
This commit was SVN r26462.
2012-05-21 19:56:15 +00:00
Ralph Castain
83d69b6c95
Enable the ORTE progress thread for apps (not needed in the tools as they already continuously loop in the event lib). This appears to be working, at least for MPI apps that only use shared memory (a simple "hello"). More testing is required to identify where problems will occur - this is only intended to allow further development.
...
In order to use the progress thread, you must configure with:
--enable-orte-progress-threads --enable-event-thread-support
This commit was SVN r26457.
2012-05-20 15:14:43 +00:00
Ralph Castain
c4f8043064
Per Nathan, with a little cleanup by me: update the PMI support to aggregate modex info, thus reducing the number of keys required so it fits within Cray default constraints
...
This commit was SVN r26456.
2012-05-19 16:12:52 +00:00
Ralph Castain
a526afae92
Ensure we always cleanup local procs, no matter how we exited.
...
This commit was SVN r26454.
2012-05-18 23:37:40 +00:00
Ralph Castain
12ebc0e269
Don't need this to be a bin program as the class is captured in the jar
...
This commit was SVN r26453.
2012-05-18 23:37:18 +00:00
Ralph Castain
b16e43f489
Silence a warning on Mac
...
This commit was SVN r26449.
2012-05-18 15:27:04 +00:00
Ralph Castain
ca1b325738
Tweak the java setup so it works better on Mac. Only build mapreduce and allocators if hadoop support was requested.
...
This commit was SVN r26448.
2012-05-18 01:02:01 +00:00
Jeff Squyres
cab31eafce
Revert r26413: it was causing too much confusion. When an MPI proc
...
exits with status 77, the whole job will be killed, but mpirun will
still return an exit status of 77, so MTT will report it as a skip
anyway.
This commit was SVN r26445.
The following SVN revision numbers were found above:
r26413 --> open-mpi/ompi@02aa36f2e5
2012-05-16 14:45:58 +00:00