1
1

2537 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
4ec9c4b532 Do a better job of ensuring session directories are removed when procs abnormally terminate and/or we order "kill local procs"
This commit was SVN r22258.
2009-12-03 04:46:17 +00:00
Ralph Castain
93ebed48b1 Update the multicast test. Some cleanups to the basic rmcast module
This commit was SVN r22257.
2009-12-03 04:30:58 +00:00
Ralph Castain
66efa05a53 Don't cancel the recv unless it was issued or else we generate an error whenever we launch an app without having to launch daemons (e.g., a completely local launch to mpirun)
This commit was SVN r22256.
2009-12-03 04:28:43 +00:00
Ralph Castain
3a72ee9dca Fix a bug reported by Rainer whereby we could free and reuse an address if the user specified the tmp dir base. After discussing with Josh, we also removed the code that had us retry creation of the session dir (using default values) if the user-specified value didn't work for some reason. Adhering to OMPI standard practices, we abort if the user-specified value doesn't work.
This commit was SVN r22255.
2009-12-03 01:57:35 +00:00
George Bosilca
7bf1d7a1c4 A more asynchronous startup over rsh/ssh.
This commit was SVN r22253.
2009-12-02 20:29:32 +00:00
Ralph Castain
a0d5c80ce0 Add a new framework for discovering local resource information such as cpu type/model, #cpus, available physical memory, etc. Two initial components (darwin and linux) are provided. This is needed to support bootstrap operations where daemons are started at node boot, and applications where initial knowledge of cpu identification is needed to guide framework component selection.
Add orte configuration option to control the use of the framework in the system. Although the code will build, it will not be active unless configured with --enable-bootstrap.

If bootstrap is enabled and the new opal_sysinfo framework can successfully determine the cpu model, pass that info to the application as an MCA param to support some work at Sun.

Also, have daemons report back the resources they find to guide process mapping in bootstrap operations (i.e., where the daemon starts at node boot as opposed to being launched at application start).

Adjust some platform files to enable these capabilities.

This commit was SVN r22244.
2009-11-30 23:11:25 +00:00
Ralph Castain
e38a0eab9f Remove the fddp and sensor frameworks - relocated to new cluster mgr project
This commit was SVN r22240.
2009-11-27 22:14:47 +00:00
Rainer Keller
70a69e796f - Get rid of a small nuisance: after installation of the
alps-resid script, set it to exec, to allow:

   export OMPI_ALPS_RESID=`$OMPI/share/openmpi/ras-alps-command.sh`

This commit was SVN r22234.
2009-11-25 19:01:33 +00:00
Ralph Castain
9a6d5697a8 Protect against NULL input - I'm -sure- no one will do it, but...well, actually, they did. :-/
This commit was SVN r22232.
2009-11-25 15:13:21 +00:00
Ralph Castain
c1206139dd Ensure the thread-safe data buffers are initialized prior to use
This commit was SVN r22231.
2009-11-25 15:12:45 +00:00
Ralph Castain
92733b13d9 Add a couple of new tests to the orte system.
Modify the job_complete check so we don't kill jobs when a single proc was terminated by ORTE command via plm.terminate_procs

Still dies gracefully with a ctrl-c, and behaves as before when using plm.terminate_job

This commit was SVN r22227.
2009-11-20 01:47:49 +00:00
Ralph Castain
5e031d9ded Let a restarted process have access to all known nodes instead of only those already in its prior job map
This commit was SVN r22225.
2009-11-19 19:45:11 +00:00
Ralph Castain
852e5d9ee0 Add some diag output
This commit was SVN r22224.
2009-11-19 19:43:36 +00:00
Ralph Castain
a401f05ea3 Add some diagnostics to chase down forced termination of procs. Ensure that procs are removed from the local data list upon termination
This commit was SVN r22223.
2009-11-19 19:43:10 +00:00
Ralph Castain
3921069230 Ensure we completely cleanout the old nidmap info
This commit was SVN r22222.
2009-11-19 19:42:15 +00:00
Ralph Castain
8dc08e304f No longer require name passed separately
This commit was SVN r22221.
2009-11-19 19:41:41 +00:00
Ralph Castain
1a44b84b25 If a process is in certain states (e.g., polling for messages in the event lib), then it can blissfully ignore SIGTERM when we try to order it to die. Unfortunately, the OS thinks the process actually did die, leading us to leave orphaned procs around.
The only sure way to kill the thing is with SIGKILL. After hours spent trying to debug this bizarre situation with a reliable reproducer, I finally tracked it down and fixed it.

Go figure...I sure can't.

This commit was SVN r22220.
2009-11-19 17:25:15 +00:00
Shiqing Fan
11ad25fa77 A few windows fixes:
Add a missing value for the configure file. 
Fix the bug that generating wrong svn version number.
Correct the wrong string length of the headnode name.

cmr:v1.5
cmr:v1.3.4

This commit was SVN r22219.
2009-11-18 09:43:47 +00:00
Ralph Castain
840766a894 Update the rmcast APIs to include tag params and reorder them to look like their rml cousins
This commit was SVN r22218.
2009-11-17 15:58:59 +00:00
Ralph Castain
aea1ab3bd6 Remove diagnostic
This commit was SVN r22216.
2009-11-11 22:16:15 +00:00
Ralph Castain
a2f3a47b92 Update the orte_mcast test
This commit was SVN r22214.
2009-11-11 22:11:19 +00:00
Ralph Castain
6496ce7212 Expand the reliable multicast APIs to support sending/recving of iovecs
This commit was SVN r22213.
2009-11-11 22:10:35 +00:00
Rainer Keller
366bd96c88 - Allow to work without xt-catamount module on Jaguar,
reducing the amount of components, that up to now needed to be
   deselected.

This commit was SVN r22205.
2009-11-09 14:26:24 +00:00
Shiqing Fan
6f8d0a1ab8 Update a few CMake scripts.
Add Program Database (pdb) files for installation for debug build.

This commit was SVN r22188.
2009-11-03 10:40:58 +00:00
Rainer Keller
f121e46db1 - Finalize ornl_configure
This commit was SVN r22178.
2009-11-01 03:25:57 +00:00
Rainer Keller
7dfe709ac1 - Initialize n before usage.
This commit was SVN r22169.
2009-10-29 15:52:53 +00:00
Terry Dontje
c6ebc7c341 rename macros ompi_check_optflags and ompi_make_stripped_flags based on comments in #2072
This commit was SVN r22151.
2009-10-28 10:51:59 +00:00
Terry Dontje
6df802424d remove duplicate setting of CFLAGS_WITHOUT_OPTFLAGS and special case DEBUGGER_FLAGS for intel compiler
This commit was SVN r22143.
2009-10-26 18:41:53 +00:00
Ralph Castain
13d86e100b Courtesy of Ralph and Jeff:
Continue the reorganization of the configure system. Move files from the main config directory to their appropriate level-specific config directories. Modify the configure system to correctly handle compiler detection, test, and setup so that all things pertaining to opal and orte are done at the lower level, with the ompi configure system only looking at mpi-specific options.

Ensure the wrapper compilers for orte and ompi only get built when appropriate. Add support for c++ to the orte wrapper compilers, both script and non-script versions.

This commit was SVN r22138.
2009-10-24 01:04:35 +00:00
Ralph Castain
7afd65d631 Add a couple of test programs
This commit was SVN r22137.
2009-10-24 01:00:38 +00:00
Jeff Squyres
02db4f5146 Terry pointed out that ORTE also needs the "totalview" flags, and
therefore the m4 test really belongs on orte/config.  Thank Terry!

Additionally, I took the opprotunity to rename the variable so that
"TOTALVIEW" is not in the name anymore (because it applies to all
variables, not just Totalview).

This commit was SVN r22134.
2009-10-23 13:00:59 +00:00
Tim Mattox
4acfbe6554 Unfortunately, the typo's that r22129 tried to fix were not
as simple as I or Ralph had hoped.  This should be the real fix,
or very close to it.  I can now see both the sensor and rmcast
information from ompi_info when configured
with --enable-monitoring --enable_multicast

This commit was SVN r22131.

The following SVN revision numbers were found above:
  r22129 --> open-mpi/ompi@02ff00dfb5
2009-10-23 02:38:51 +00:00
Jeff Squyres
e0e20870e1 Generated files should not be in SVN.
This commit was SVN r22126.
2009-10-22 17:57:05 +00:00
Pavel Shamis
7425255be5 Fixing compilation failure. Adding missing include.
This commit was SVN r22119.
2009-10-21 16:28:40 +00:00
Ralph Castain
c33866f0df Per Tim Mattox (again, via my branch):
Add a script wrapper compiler version of ortecc for use when in cross-compile scenarios

This commit was SVN r22115.
2009-10-20 23:46:46 +00:00
Ralph Castain
ee82d42a1c Add a new sensor component that pulls data via an external shared memory interface
Only builds when the appropriate library is present

This commit was SVN r22114.
2009-10-20 23:45:35 +00:00
Ralph Castain
214e26b539 Per Jeff (this work was done on a branch of mine, so I will do the commit):
Re-enable "./autogen.sh -no-ompi" again. If you -no-ompi, the entire OMPI
configury is skipped and the entire ompi/ subtree is not built. There's
some simple m4-isms that prune out the relevant parts.

I added ompi/config/, orte/config/, and opal/config/ directories. I moved a
bunch of m4 files from the top-level config/ dir into ompi/config/, and a few
into orte/config/.

Note that all 3 <project>/config directories have a config_files.m4 file. This
file contains the AC_CONFIG_FILES list for that project. The AC_CONFIG_FILES
call cannot be in an AC_DEFUN macro and conditionally called -- if it is
included at all, Autoconf will process it. Hence, these config_files.m4 files
don't AC_DEFUN -- they just have AC_CONFIG_FILES. m4_ifdef() is used to
conditionally include the files or not.

I moved a bunch of obvious OMPI-only m4 files from config/ to ompi/config/,
but I'm sure that there's more that could go. A ticket will be filed with
thoughts on future work in this area.

This commit was SVN r22113.
2009-10-20 23:44:20 +00:00
Ralph Castain
f1f156d57b Make rmaps base open function play nicely with ompi_info
This commit was SVN r22111.
2009-10-20 07:28:23 +00:00
Ralph Castain
ff9d72b3ab Add a new multicast tag for collecting ps data
This commit was SVN r22107.
2009-10-16 04:21:22 +00:00
Terry Dontje
13907781b2 missed adding report-uri option
This commit was SVN r22106.
2009-10-15 18:05:24 +00:00
Ralph Castain
49ce2b4342 Add a new interface to the rmcast framework to query the output channel for the proc
This commit was SVN r22105.
2009-10-15 17:47:42 +00:00
Terry Dontje
c96af5654c correct options and wording that were dropped in the last change due to committing v1.3 manpage to the trunk
This commit was SVN r22104.
2009-10-15 15:03:21 +00:00
Ralph Castain
99c67183d2 Minor cleanups, mainly to ensure we correctly block on blocking sends
This commit was SVN r22102.
2009-10-15 02:39:15 +00:00
Ralph Castain
2f91a4833b Have the trigger event return the event itself in the callback function so it can be reset, if desired
This commit was SVN r22101.
2009-10-15 02:35:53 +00:00
Ralph Castain
2665825693 Correct an error that causes the system to "bounce" when we order a job killed. We didn't used to discriminate between a process being ordered to die, and a process that was aborted by an external signal. Unfortunately, that means the error mgr gets called and told a process abnormally aborted when we order termination, thus causing the errmgr to send out a "kill procs" command again.
Wouldn't be so bad, except...the errmgr orders the termination of ALL procs, which kills any other job that should have been left alone.

Add a new proc and job state indicating "killed_by_cmd" so we can tell the difference between a proc/job that was deliberately terminated by us vs one that is killed by external signal.

This change was tested to ensure it didn't interfere with ctrl-c operation (it doesn't - we order termination of all jobs when we get a ctrl-c).

This commit was SVN r22100.
2009-10-14 22:49:56 +00:00
Ralph Castain
18960a9c5a Refactor the multicast support so the data type objects can be accessed beyond just the one component
Ensure that the local node is included in the allocation prior to bootstrap discovery

This commit was SVN r22099.
2009-10-14 17:43:40 +00:00
Terry Dontje
0a8645a411 This commit fixes trac:2017
This commit was SVN r22098.

The following Trac tickets were found above:
  Ticket 2017 --> https://svn.open-mpi.org/trac/ompi/ticket/2017
2009-10-14 11:40:47 +00:00
Ralph Castain
bc869636be Reset the verbosity levels to suppress debug output
This commit was SVN r22095.
2009-10-13 15:29:38 +00:00
Ralph Castain
e501589b3b Cleanup the bootstrap procedure for multiple daemons starting up
This commit was SVN r22094.
2009-10-13 15:14:54 +00:00
Ralph Castain
c25dd14440 Correctly set the multicast interface, cleanup a comment
This commit was SVN r22093.
2009-10-13 15:14:28 +00:00