Ralph Castain
05c0464dcb
Add missing protections
...
This commit was SVN r27183.
2012-08-30 12:17:29 +00:00
Ralph Castain
1b659de132
Get staged execution working on multi-node setups. Improve efficiency by only remapping if all procs not yet mapped in the job.
...
This commit was SVN r27181.
2012-08-29 20:35:52 +00:00
Ralph Castain
a3b08f5800
Fix a few things relating to comm_spawn that causes new daemons to be launched. Ensure that all new daemons receive a full pidmap. Properly mark the daemon job as "updated" when daemons are added
...
This commit was SVN r27177.
2012-08-29 03:11:37 +00:00
Ralph Castain
f0077820f2
Silence warning
...
This commit was SVN r27175.
2012-08-28 22:27:41 +00:00
Ralph Castain
a414ffdf4c
Remove debug
...
This commit was SVN r27174.
2012-08-28 22:18:00 +00:00
Ralph Castain
30fd9d7abc
MPI procs should definitely not be trapping SIGCHLD - only ORTE tools need to do so
...
This commit was SVN r27166.
2012-08-28 21:39:06 +00:00
Ralph Castain
98580c117b
Introduce staged execution. If you don't have adequate resources to run everything without oversubscribing, don't want to oversubscribe, and aren't using MPI, then staged execution lets you (a) run as many procs as there are available resources, and (b) start additional procs as others complete and free up resources. Adds a new mapper as well as a new state machine.
...
Remove some stale configure.m4's we no longer need.
Optimize the nidmaps a bit by only sending info that has changed each time, instead of sending a complete copy of everything. Makes no difference for the typical MPI job - only impacts things like staged execution where we are sending multiple (possibly many) launch messages.
This commit was SVN r27165.
2012-08-28 21:20:17 +00:00
Ralph Castain
aadfe1b61e
Fix a missing test that breaks novm operation.
...
CMR:v1.7
This commit was SVN r27163.
2012-08-28 21:13:57 +00:00
Ralph Castain
d310dd8c58
Fix a strange race condition by creating a separate buffer for each send - apparently, just a retain isn't enough protection on some systems
...
This commit was SVN r27161.
2012-08-28 17:17:34 +00:00
Ralph Castain
11c68e2299
Correct the count in the pmi key
...
This commit was SVN r27156.
2012-08-28 15:05:02 +00:00
Ralph Castain
6e8c97c77c
Per Sam's eagle-eyed review, free the malloc'd memory if getcwd fails for some strange reason.
...
This commit was SVN r27150.
2012-08-27 19:15:16 +00:00
Ralph Castain
bccc20d13e
Deal with one last corner case of positioning a dot-file
...
This commit was SVN r27144.
2012-08-26 03:49:31 +00:00
Ralph Castain
63d41c643d
Minor cleanup
...
This commit was SVN r27143.
2012-08-25 14:24:45 +00:00
Ralph Castain
0e1dbe8711
Remove non-existent files
...
This commit was SVN r27136.
2012-08-25 01:29:17 +00:00
Ralph Castain
05f0b4c653
Couple of minor cleanups
...
This commit was SVN r27135.
2012-08-24 21:14:40 +00:00
Ralph Castain
d6cbff6d4e
Since the preload flags are at the app_context level, we need to link only those files/exe's that pertain to each app_context to the corresponding procs. Also, gain a little optimization by checking to ensure we only send files once - this probably won't work when daemons are created on-the-fly, but that's for some other day
...
This commit was SVN r27134.
2012-08-24 16:16:30 +00:00
Jeff Squyres
20612c4194
Don't close the IOF stdin if we happen to read less than a full
...
buffer's worth of data -- interactive stdin will have that behavior
frequently.
This commit was SVN r27131.
2012-08-24 14:29:19 +00:00
Ralph Castain
e0c39c94e8
Complete the cleanup of the preload files system. Remove the dest_dir option as moving things to arbitrary locations - especially absolute paths - can prove disastrous. Remove the preload_libs option as these can be treated as just files. Cleanup some of the pack/unpack code as the dss handles NULL strings just fine. Deal a little better with absolute paths, noting that tar now strips the leading '/' for us (showing my age as it didn't used to do so).
...
Remove the odls_base_state.c file as that code is now covered by the new broadcast form of preload_files.
This commit was SVN r27127.
2012-08-24 02:28:29 +00:00
Ralph Castain
c8b511d18a
Remove stale tests
...
This commit was SVN r27126.
2012-08-24 02:22:11 +00:00
Ralph Castain
b4a544ad2a
Per discussion with Josh, use the --preload-xxx cmd line options to broadcast files to all nodes. Add --set-cwd-to-session-dir option to start procs in their session directories. Add OMPI_FILE_LOCATION envar to tell procs where their prepositioned files went.
...
This commit was SVN r27125.
2012-08-23 21:28:05 +00:00
Ralph Castain
855c9ae6cf
Support archives .tar, .bz[2,zip], and .gz[ip]
...
This commit was SVN r27123.
2012-08-23 15:38:39 +00:00
Ralph Castain
286c610712
Protect us against the scenario where filem is included in enable-mca-no-build
...
This commit was SVN r27122.
2012-08-23 13:52:06 +00:00
Shiqing Fan
d141d94bd7
Include the new .windows files into the tarball.
...
This commit was SVN r27121.
2012-08-23 12:50:51 +00:00
Ralph Castain
7237a938bf
Extend the filem interface to support prepositioning and linking required local files for execution. Create a new "raw" module that uses xcast to send the files to all nodes as this is faster than doing an scp in a linear pattern
...
This commit was SVN r27118.
2012-08-22 21:43:20 +00:00
Ralph Castain
ed4b354846
Ensure we pass along user-specified mca params from the cmd line when doing a tree spawn, but don't extend the cmd line with duplicates or things that shouldn't be there
...
This commit was SVN r27117.
2012-08-22 21:41:50 +00:00
Ralph Castain
5d7872fd68
Cleanup the tag list
...
This commit was SVN r27115.
2012-08-22 21:37:58 +00:00
Ralph Castain
3c13176aa7
Remove test code
...
This commit was SVN r27114.
2012-08-22 21:36:54 +00:00
Ralph Castain
7bcf2f8b5c
Stop leaving droppings behind us
...
This commit was SVN r27111.
2012-08-22 17:39:22 +00:00
Shiqing Fan
95b9552546
include several components for Windows build.
...
This commit was SVN r27108.
2012-08-22 14:46:49 +00:00
Jeff Squyres
c8cee23ee7
Priorities really shouldn't be less than 0.
...
This commit was SVN r27098.
2012-08-21 15:47:15 +00:00
Ralph Castain
dacb07000d
Turn udcm and ud oob off by default, but allow them to build and be used if someone wants to test them
...
cmr:v1.7
This commit was SVN r27097.
2012-08-21 15:18:34 +00:00
Nathan Hjelm
0061ac066b
orte/alps: add support for --with-alps=yes on CLE 5.0 and clean out tabs
...
This commit was SVN r27096.
2012-08-20 15:26:58 +00:00
Ralph Castain
64cf75cec5
Add some debug
...
This commit was SVN r27087.
2012-08-17 02:19:26 +00:00
Ralph Castain
a572b6fa9f
Pick the right place
...
This commit was SVN r27085.
2012-08-17 00:28:28 +00:00
Ralph Castain
b2cd2b1289
Allow developers to enable OMPI progress threads for debugging purposes. Warn and error out if ORTE progress threads are enabled, but they forgot to enable the libevent thread support.
...
This commit was SVN r27071.
2012-08-16 17:50:52 +00:00
Ralph Castain
335c0eafcf
Add a filem test program and set ignores
...
This commit was SVN r27069.
2012-08-16 17:46:46 +00:00
Jeff Squyres
96f640a762
Add new "opal_hotel" class. Abstractly speaking, this class does the
...
following:
* Provides a fixed number of resource slots (i.e., "hotel rooms").
* Allows one thing to occupy a resource slot at a time (i.e., each
hotel room can have an occupant check in to that room).
* Resource slots can be vacated at any time (i.e., occupants can
voluntarily check out of their hotel room).
* Resource slots can be occupied for a specific maximum amount of
time. If that time expires, the occupant is forcibly evicted and
the upper layer is notified via (libevent) callback (i.e., the maid
will kick an occupant of out of their room when their reservation
is over).
This class can be to be used for things like retransmission schemes
for unreliable transports. For example, a message sent on an
unreliable transport can be checked in to a hotel room. If an ACK for
that message is received, the message can be checked out. But if the
ACK is never received, the message will eventually be evicted from its
room and the upper layer will be notified that the message failed to
check out in time (i.e., that an ACK for that message was not received
in time).
Code using this class is currently being developed off-trunk, but will
be coming to SVN soon.
This commit was SVN r27067.
2012-08-16 17:29:55 +00:00
Ralph Castain
e4d82b8912
Turn off the common port by default by now until we get rollup working properly on ALL platforms
...
This commit was SVN r27060.
2012-08-15 22:13:04 +00:00
Ralph Castain
35fef87202
Make the "no virtual machine" selection more intuitive by providing a --novm option to mpirun.
...
This commit was SVN r27048.
2012-08-15 14:55:03 +00:00
Ralph Castain
229e3f9b2a
This will break systems like orcm, but we aren't trying to support those any more - so put the nodes back in their daemon-indexed position. Will continue working to reduce search requirements in other parts of the code
...
This commit was SVN r27038.
2012-08-14 22:26:40 +00:00
Ralph Castain
481ed4e292
Only one equal sign, if you please...
...
This commit was SVN r27037.
2012-08-14 22:08:19 +00:00
Ralph Castain
8c890b1c46
Fix the alps configury so it doesn't attempt to build alps by default, even if --with-alps wasn't given.
...
This commit was SVN r27036.
2012-08-14 22:04:39 +00:00
Ralph Castain
3cb8d55c8b
We can't just lookup the node in the node pool by daemon vpid as the daemons aren't stored that way - this was done because when holes exist in daemon vpids, we can generate huge orte_node_pool arrays even when only a few daemons actually exist. So we have to search for the vpid in the array
...
This commit was SVN r27035.
2012-08-14 18:17:59 +00:00
Nathan Hjelm
d5824f7800
add missing test
...
This commit was SVN r27028.
2012-08-14 03:14:07 +00:00
Nathan Hjelm
3d03d8f08b
fix typo in orte_check_alps.m4
...
This commit was SVN r27027.
2012-08-13 23:00:06 +00:00
Nathan Hjelm
8e03f77004
update alps configure scripts
...
This commit was SVN r27026.
2012-08-13 22:57:55 +00:00
Ralph Castain
589acf550c
Improve the new MPI_INFO_ENV to better handle Java applications and to correctly report the info for singletons.
...
This commit was SVN r27025.
2012-08-13 22:13:49 +00:00
Ralph Castain
3938ec5361
Remove debug
...
This commit was SVN r27024.
2012-08-13 21:35:21 +00:00
Ralph Castain
49a757e0bd
Silly me - now that all daemons are stripping their prefix on the backend, we no longer need to do it as they report
...
This commit was SVN r27023.
2012-08-13 20:48:13 +00:00
Ralph Castain
b9b41d8662
For cases where the alpha+non-zero prefix must be removed from a node name, be sure to do it everywhere we access node names - otherwise, modex methods such as pmi will fail to correctly identify procs on the same node
...
This commit was SVN r27022.
2012-08-13 20:44:56 +00:00