1
1
Граф коммитов

3626 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
3719b6c68b After some further discussion between Jeff, Ralph, and Josh, rever
r26951.  The feeling is that fixing the actual problem of the command
line parser not always identifying when invalid command line options
were specified (i.e., r26953) was a better solution.

This commit was SVN r26979.

The following SVN revision numbers were found above:
  r26951 --> open-mpi/ompi@1f8df92c3c
  r26953 --> open-mpi/ompi@0b7b3feba9
2012-08-09 20:56:01 +00:00
Shiqing Fan
e304c19920 This is also used on Windows.
This commit was SVN r26975.
2012-08-08 16:44:00 +00:00
George Bosilca
ba879c2c51 Remove the unused map.
This commit was SVN r26960.
2012-08-07 12:06:13 +00:00
Shiqing Fan
2f442799f8 fix several typecasts
This commit was SVN r26957.
2012-08-07 10:41:53 +00:00
Ralph Castain
1f8df92c3c Remove the confusion over which options are "to" and which are "by" by creating synonyms so that either spelling works.
This commit was SVN r26951.
2012-08-05 14:40:38 +00:00
Ralph Castain
53b1a1c976 Cleanly error out when someone asks to map-to <object> if that object doesn't exist on a node.
This commit was SVN r26950.
2012-08-04 21:52:36 +00:00
Ralph Castain
61b09a132b Fix bynode mapping of multiple app-contexts
This commit was SVN r26949.
2012-08-03 21:45:40 +00:00
Ralph Castain
96f6f94c24 Ensure we don't get trapped in an infinite loop when ranking bynode if something isn't right
This commit was SVN r26948.
2012-08-03 21:45:10 +00:00
Ralph Castain
0d878937fe If a callback is set in the state machine, and the state doesn't yet exist, create it
This commit was SVN r26947.
2012-08-03 21:43:36 +00:00
Ralph Castain
431d5361ed For those who really preferred our prior mode of operation that mapped procs and only launched daemons on the nodes that had procs on them, introduce the "novm" state machine component. This recreates the old mode of operation by re-ordering the launch sequence so that we allocate, then map, and then launch daemons only on the reqd nodes (instead of across the entire allocation).
This commit was SVN r26946.
2012-08-03 16:30:05 +00:00
Ralph Castain
dc22ea5cde A little cleaner on the message about repeated ctrl-c, and re-enable the event so we can abort if we see multiple ctrl-c's that don't meet the time requirement
This commit was SVN r26945.
2012-08-03 01:26:18 +00:00
Ralph Castain
e6c72bfd53 Ensure we can forcibly exit even when we are stuck inside of an event by replacing the libevent signal handler with a POSIX one that (a) attempts to trip a libevent termination event and (b) if anothe ctrl-c hits within 5 seconds, just calls exit.
This commit was SVN r26943.
2012-08-02 21:15:35 +00:00
Ralph Castain
d818c9d407 Includes a patch from Jeff and Josh: update the simulator module to allow specification of multiple slot and max_slot counts for each node group (but don't require it). Remove the requirement that each node group provide its own topology. Adjust verbosities to allow showing some light debug output to see what nodes have been added without getting a bunch of other stuff.
This commit was SVN r26936.
2012-08-02 04:57:13 +00:00
Jeff Squyres
62c2ff7ee7 It's actually ''not'' an error to exit if all routes and children are
gone.  So exit with 0, not ORTE_ERROR_DEFAULT_EXIT_CODE (which is 1).

This fixes a race condition in the rsh launcher upon termination,
where ORTE would sometimes think that a daemon failed to launch.

This commit was SVN r26935.
2012-08-01 19:49:19 +00:00
Nathan Hjelm
4557e15c18 oob/ud fix compile error
This commit was SVN r26933.
2012-07-31 21:50:34 +00:00
Ralph Castain
6ee35e4977 Add num_local_peers to orte_process_info so we don't keep re-computing it, ensure it is available for direct launch via pmi as well
This commit was SVN r26931.
2012-07-31 21:21:50 +00:00
Jeff Squyres
88cbe9c780 .ompi_ignore this component until it can be fixed.
This commit was SVN r26930.
2012-07-31 21:02:06 +00:00
Nathan Hjelm
980692804d oob/ud: don't start listening for ud requests unless we have one usable port
This commit was SVN r26929.
2012-07-31 19:00:18 +00:00
Ralph Castain
23c2a315a9 Add missing line to set flag indicating at least one port found
This commit was SVN r26914.
2012-07-30 17:54:38 +00:00
Ralph Castain
6285f7d8c0 Per request of Shiqing, restore the ccp components
This commit was SVN r26904.
2012-07-29 23:49:59 +00:00
Ralph Castain
c7f9a0fa34 Check for recursive use of mpirun - issue error message and abort if detected
This commit was SVN r26903.
2012-07-28 21:50:56 +00:00
Ralph Castain
94d11e04fd Add an intermediate state when the VM is ready so that third party tools can take action prior to mapping/launching apps
This commit was SVN r26902.
2012-07-28 15:33:09 +00:00
Shiqing Fan
660188307c fix an export declaration name
This commit was SVN r26895.
2012-07-27 13:26:24 +00:00
Shiqing Fan
42dfbc7d2f Another CMake scripts update for:
correctly generate hwloc library
automatically define OMPI/OPAL/ORTE_OMPORTS for user applications
update the f77 bindings

This commit was SVN r26893.
2012-07-27 11:49:09 +00:00
Ralph Castain
8bc6694a62 Ensure the daemons don't incorrectly declare a failed launch
This commit was SVN r26875.
2012-07-26 19:05:06 +00:00
Ralph Castain
07846f12ae Reconnect the rsh/ssh error reporting code for remote spawns to report failure to launch. Ensure the HNP correctly reports non-zero exit status when ssh encounters a problem.
Thanks to Terry for spotting it!

This commit was SVN r26868.
2012-07-25 21:46:45 +00:00
Jeff Squyres
e5cfad0c1a This variable is only used in FT builds.
This commit was SVN r26854.
2012-07-24 12:48:47 +00:00
Shiqing Fan
8c4a3e1269 correct the symbol dllexports for windows build
This commit was SVN r26827.
2012-07-22 08:54:50 +00:00
Shiqing Fan
12d99a9ebb Update the hwloc build on Windows and related files.
This commit was SVN r26818.
2012-07-20 12:14:28 +00:00
Abhishek Kulkarni
1ce378b5c6 Make C/R work with nodes > 1. This fix makes sure that the app coordinators send
the "ready-to-checkpoint" signal to the global coordinator only after ORTE has
initialized.

This commit was SVN r26795.
2012-07-13 23:37:29 +00:00
Abhishek Kulkarni
1878f276cd Replace the pattern while(flag) { opal_progress() }; in the C/R code
with the ORTE_WAIT_FOR_COMPLETION macro.

This commit was SVN r26794.
2012-07-13 23:31:56 +00:00
George Bosilca
772ec212eb Fix another compiler warning.
This commit was SVN r26775.
2012-07-10 15:57:42 +00:00
Abhishek Kulkarni
eec5a28aa4 More C/R fixes.
* Fix a typo introduced by the removal of the notifier framework
* Fix to flush the modex cached data correctly using the orte DB API.

This commit was SVN r26773.
2012-07-10 01:19:46 +00:00
Abhishek Kulkarni
5c58a1c9c1 Fix C/R support in the trunk.
Among other things, this patch deals with the following issues:
* fix ompi-checkpoint argument parsing
* ompi-restart -showme prints an extraneous "Restarted child with PID" 
  message. Move around the debug statement to avoid this.
* fixes for the state machine changes

This commit was SVN r26770.
2012-07-09 23:34:13 +00:00
George Bosilca
ec760454a6 Cleaning ...
This commit was SVN r26747.
2012-07-04 21:22:13 +00:00
Ralph Castain
cf4606cdd5 Add debug of nidmap subsystem
This commit was SVN r26739.
2012-07-04 00:04:16 +00:00
Ralph Castain
6ae5776904 Cleanup IPV6 build
This commit was SVN r26738.
2012-07-04 00:03:50 +00:00
Ralph Castain
1a90471374 Drat - missed the other one
This commit was SVN r26718.
2012-07-02 22:18:31 +00:00
Ralph Castain
9a6a969f60 Remove debug
This commit was SVN r26717.
2012-07-02 22:18:08 +00:00
Ralph Castain
b83fc41d54 Add a state that allows mpirun or other tools to be notified of a job completion prior to terminating so that alternative actions can be performed.
This commit was SVN r26716.
2012-07-02 22:16:32 +00:00
Ralph Castain
e335de3564 Refactor ompi_info, splitting it into parts according to the layer involved. Thus, we call down to the opal layer to get those frameworks and components, and down to the orte layer to get those. Still some abstraction breaks, but they mostly involve renaming of OMPI_foo labels that have been around since before we split the build system by layer.
This commit was SVN r26695.
2012-06-28 18:23:34 +00:00
Ralph Castain
8bebf2fa47 Ensure we don't build the MR iof components unless hadoop support is enabled
This commit was SVN r26694.
2012-06-28 18:20:15 +00:00
Ralph Castain
9aa821d8b4 Add missing file to tarball
This commit was SVN r26688.
2012-06-28 02:57:10 +00:00
Ralph Castain
0dfe29b1a6 Roll in the rest of the modex change. Eliminate all non-modex API access of RTE info from the MPI layer - in some cases, the info was already present (either in the ompi_proc_t or in the orte_process_info struct) and no call was necessary. This removes all calls to orte_ess from the MPI layer. Calls to orte_grpcomm remain required.
Update all the orte ess components to remove their associated APIs for retrieving proc data. Update the grpcomm API to reflect transfer of set/get modex info to the db framework.

Note that this doesn't recreate the old GPR. This is strictly a local db storage that may (at some point) obtain any missing data from the local daemon as part of an async methodology. The framework allows us to experiment with such methods without perturbing the default one.

This commit was SVN r26678.
2012-06-27 14:53:55 +00:00
Brian Barrett
b22faedd9d Remove the Portals4 SHMEM reference implementation runtime support, as we're
no longer using the runtime provided by the reference implementation.

Remove the Catamount support from ORTE, since we're no longer supporting
Catamount.  Left the Catamount timer component, because I'm not sure whether
it's used on the XTs running CNL.

This commit was SVN r26677.
2012-06-27 14:17:43 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Ralph Castain
a34f09e67a Ensure common port is off when not being used
This commit was SVN r26666.
2012-06-26 16:09:58 +00:00
Ralph Castain
92527da4e3 Remove unused component
This commit was SVN r26660.
2012-06-26 00:49:28 +00:00
Ralph Castain
0103f82918 Turn off the common port for slurm for now
This commit was SVN r26656.
2012-06-25 21:55:51 +00:00