Ralph Castain
855c9ae6cf
Support archives .tar, .bz[2,zip], and .gz[ip]
...
This commit was SVN r27123.
2012-08-23 15:38:39 +00:00
Ralph Castain
286c610712
Protect us against the scenario where filem is included in enable-mca-no-build
...
This commit was SVN r27122.
2012-08-23 13:52:06 +00:00
Shiqing Fan
d141d94bd7
Include the new .windows files into the tarball.
...
This commit was SVN r27121.
2012-08-23 12:50:51 +00:00
Ralph Castain
7237a938bf
Extend the filem interface to support prepositioning and linking required local files for execution. Create a new "raw" module that uses xcast to send the files to all nodes as this is faster than doing an scp in a linear pattern
...
This commit was SVN r27118.
2012-08-22 21:43:20 +00:00
Ralph Castain
ed4b354846
Ensure we pass along user-specified mca params from the cmd line when doing a tree spawn, but don't extend the cmd line with duplicates or things that shouldn't be there
...
This commit was SVN r27117.
2012-08-22 21:41:50 +00:00
Ralph Castain
5d7872fd68
Cleanup the tag list
...
This commit was SVN r27115.
2012-08-22 21:37:58 +00:00
Ralph Castain
3c13176aa7
Remove test code
...
This commit was SVN r27114.
2012-08-22 21:36:54 +00:00
Ralph Castain
7bcf2f8b5c
Stop leaving droppings behind us
...
This commit was SVN r27111.
2012-08-22 17:39:22 +00:00
Shiqing Fan
95b9552546
include several components for Windows build.
...
This commit was SVN r27108.
2012-08-22 14:46:49 +00:00
Jeff Squyres
c8cee23ee7
Priorities really shouldn't be less than 0.
...
This commit was SVN r27098.
2012-08-21 15:47:15 +00:00
Ralph Castain
dacb07000d
Turn udcm and ud oob off by default, but allow them to build and be used if someone wants to test them
...
cmr:v1.7
This commit was SVN r27097.
2012-08-21 15:18:34 +00:00
Nathan Hjelm
0061ac066b
orte/alps: add support for --with-alps=yes on CLE 5.0 and clean out tabs
...
This commit was SVN r27096.
2012-08-20 15:26:58 +00:00
Ralph Castain
64cf75cec5
Add some debug
...
This commit was SVN r27087.
2012-08-17 02:19:26 +00:00
Ralph Castain
a572b6fa9f
Pick the right place
...
This commit was SVN r27085.
2012-08-17 00:28:28 +00:00
Ralph Castain
b2cd2b1289
Allow developers to enable OMPI progress threads for debugging purposes. Warn and error out if ORTE progress threads are enabled, but they forgot to enable the libevent thread support.
...
This commit was SVN r27071.
2012-08-16 17:50:52 +00:00
Ralph Castain
335c0eafcf
Add a filem test program and set ignores
...
This commit was SVN r27069.
2012-08-16 17:46:46 +00:00
Jeff Squyres
96f640a762
Add new "opal_hotel" class. Abstractly speaking, this class does the
...
following:
* Provides a fixed number of resource slots (i.e., "hotel rooms").
* Allows one thing to occupy a resource slot at a time (i.e., each
hotel room can have an occupant check in to that room).
* Resource slots can be vacated at any time (i.e., occupants can
voluntarily check out of their hotel room).
* Resource slots can be occupied for a specific maximum amount of
time. If that time expires, the occupant is forcibly evicted and
the upper layer is notified via (libevent) callback (i.e., the maid
will kick an occupant of out of their room when their reservation
is over).
This class can be to be used for things like retransmission schemes
for unreliable transports. For example, a message sent on an
unreliable transport can be checked in to a hotel room. If an ACK for
that message is received, the message can be checked out. But if the
ACK is never received, the message will eventually be evicted from its
room and the upper layer will be notified that the message failed to
check out in time (i.e., that an ACK for that message was not received
in time).
Code using this class is currently being developed off-trunk, but will
be coming to SVN soon.
This commit was SVN r27067.
2012-08-16 17:29:55 +00:00
Ralph Castain
e4d82b8912
Turn off the common port by default by now until we get rollup working properly on ALL platforms
...
This commit was SVN r27060.
2012-08-15 22:13:04 +00:00
Ralph Castain
35fef87202
Make the "no virtual machine" selection more intuitive by providing a --novm option to mpirun.
...
This commit was SVN r27048.
2012-08-15 14:55:03 +00:00
Ralph Castain
229e3f9b2a
This will break systems like orcm, but we aren't trying to support those any more - so put the nodes back in their daemon-indexed position. Will continue working to reduce search requirements in other parts of the code
...
This commit was SVN r27038.
2012-08-14 22:26:40 +00:00
Ralph Castain
481ed4e292
Only one equal sign, if you please...
...
This commit was SVN r27037.
2012-08-14 22:08:19 +00:00
Ralph Castain
8c890b1c46
Fix the alps configury so it doesn't attempt to build alps by default, even if --with-alps wasn't given.
...
This commit was SVN r27036.
2012-08-14 22:04:39 +00:00
Ralph Castain
3cb8d55c8b
We can't just lookup the node in the node pool by daemon vpid as the daemons aren't stored that way - this was done because when holes exist in daemon vpids, we can generate huge orte_node_pool arrays even when only a few daemons actually exist. So we have to search for the vpid in the array
...
This commit was SVN r27035.
2012-08-14 18:17:59 +00:00
Nathan Hjelm
d5824f7800
add missing test
...
This commit was SVN r27028.
2012-08-14 03:14:07 +00:00
Nathan Hjelm
3d03d8f08b
fix typo in orte_check_alps.m4
...
This commit was SVN r27027.
2012-08-13 23:00:06 +00:00
Nathan Hjelm
8e03f77004
update alps configure scripts
...
This commit was SVN r27026.
2012-08-13 22:57:55 +00:00
Ralph Castain
589acf550c
Improve the new MPI_INFO_ENV to better handle Java applications and to correctly report the info for singletons.
...
This commit was SVN r27025.
2012-08-13 22:13:49 +00:00
Ralph Castain
3938ec5361
Remove debug
...
This commit was SVN r27024.
2012-08-13 21:35:21 +00:00
Ralph Castain
49a757e0bd
Silly me - now that all daemons are stripping their prefix on the backend, we no longer need to do it as they report
...
This commit was SVN r27023.
2012-08-13 20:48:13 +00:00
Ralph Castain
b9b41d8662
For cases where the alpha+non-zero prefix must be removed from a node name, be sure to do it everywhere we access node names - otherwise, modex methods such as pmi will fail to correctly identify procs on the same node
...
This commit was SVN r27022.
2012-08-13 20:44:56 +00:00
Ralph Castain
c90b7380c1
Sigh - of course, they changed the name of the silly MPI_Info object in the final standard, but not in the proposal. So change to the new MPI_INFO_ENV name. Also, don't set unknown values to "N/A", but just leave them unset.
...
This commit was SVN r27012.
2012-08-12 05:00:57 +00:00
Ralph Castain
cb48fd52d4
Implement the MPI_Info part of MPI-3 Ticket 313. Add an MPI_info object MPI_INFO_GET_ENV that contains a number of run-time related pieces of info. This includes all the required ones in the ticket, plus a few that specifically address recent user questions:
...
"num_app_ctx" - the number of app_contexts in the job
"first_rank" - the MPI rank of the first process in each app_context
"np" - the number of procs in each app_context
Still need clarification on the MPI_Init portion of the ticket. Specifically, does the ticket call for returning an error is someone calls MPI_Init more than once in a program? We set a flag to tell us that we have been initialized, but currently never check it.
This commit was SVN r27005.
2012-08-12 01:28:23 +00:00
Ralph Castain
c4ee297a60
Cleanup the pmi grpcomm module so it passes non-btl modex data correctly.
...
This commit was SVN r26992.
2012-08-10 20:35:50 +00:00
Ralph Castain
e3e9b7345d
First cut at updating the ccp launcher to use the state machine
...
This commit was SVN r26986.
2012-08-10 17:09:33 +00:00
Jeff Squyres
3719b6c68b
After some further discussion between Jeff, Ralph, and Josh, rever
...
r26951. The feeling is that fixing the actual problem of the command
line parser not always identifying when invalid command line options
were specified (i.e., r26953) was a better solution.
This commit was SVN r26979.
The following SVN revision numbers were found above:
r26951 --> open-mpi/ompi@1f8df92c3c
r26953 --> open-mpi/ompi@0b7b3feba9
2012-08-09 20:56:01 +00:00
Shiqing Fan
e304c19920
This is also used on Windows.
...
This commit was SVN r26975.
2012-08-08 16:44:00 +00:00
George Bosilca
ba879c2c51
Remove the unused map.
...
This commit was SVN r26960.
2012-08-07 12:06:13 +00:00
Shiqing Fan
2f442799f8
fix several typecasts
...
This commit was SVN r26957.
2012-08-07 10:41:53 +00:00
Ralph Castain
1f8df92c3c
Remove the confusion over which options are "to" and which are "by" by creating synonyms so that either spelling works.
...
This commit was SVN r26951.
2012-08-05 14:40:38 +00:00
Ralph Castain
53b1a1c976
Cleanly error out when someone asks to map-to <object> if that object doesn't exist on a node.
...
This commit was SVN r26950.
2012-08-04 21:52:36 +00:00
Ralph Castain
61b09a132b
Fix bynode mapping of multiple app-contexts
...
This commit was SVN r26949.
2012-08-03 21:45:40 +00:00
Ralph Castain
96f6f94c24
Ensure we don't get trapped in an infinite loop when ranking bynode if something isn't right
...
This commit was SVN r26948.
2012-08-03 21:45:10 +00:00
Ralph Castain
0d878937fe
If a callback is set in the state machine, and the state doesn't yet exist, create it
...
This commit was SVN r26947.
2012-08-03 21:43:36 +00:00
Ralph Castain
431d5361ed
For those who really preferred our prior mode of operation that mapped procs and only launched daemons on the nodes that had procs on them, introduce the "novm" state machine component. This recreates the old mode of operation by re-ordering the launch sequence so that we allocate, then map, and then launch daemons only on the reqd nodes (instead of across the entire allocation).
...
This commit was SVN r26946.
2012-08-03 16:30:05 +00:00
Ralph Castain
dc22ea5cde
A little cleaner on the message about repeated ctrl-c, and re-enable the event so we can abort if we see multiple ctrl-c's that don't meet the time requirement
...
This commit was SVN r26945.
2012-08-03 01:26:18 +00:00
Ralph Castain
e6c72bfd53
Ensure we can forcibly exit even when we are stuck inside of an event by replacing the libevent signal handler with a POSIX one that (a) attempts to trip a libevent termination event and (b) if anothe ctrl-c hits within 5 seconds, just calls exit.
...
This commit was SVN r26943.
2012-08-02 21:15:35 +00:00
Ralph Castain
d818c9d407
Includes a patch from Jeff and Josh: update the simulator module to allow specification of multiple slot and max_slot counts for each node group (but don't require it). Remove the requirement that each node group provide its own topology. Adjust verbosities to allow showing some light debug output to see what nodes have been added without getting a bunch of other stuff.
...
This commit was SVN r26936.
2012-08-02 04:57:13 +00:00
Jeff Squyres
62c2ff7ee7
It's actually ''not'' an error to exit if all routes and children are
...
gone. So exit with 0, not ORTE_ERROR_DEFAULT_EXIT_CODE (which is 1).
This fixes a race condition in the rsh launcher upon termination,
where ORTE would sometimes think that a daemon failed to launch.
This commit was SVN r26935.
2012-08-01 19:49:19 +00:00
Nathan Hjelm
4557e15c18
oob/ud fix compile error
...
This commit was SVN r26933.
2012-07-31 21:50:34 +00:00
Ralph Castain
6ee35e4977
Add num_local_peers to orte_process_info so we don't keep re-computing it, ensure it is available for direct launch via pmi as well
...
This commit was SVN r26931.
2012-07-31 21:21:50 +00:00