1
1
Граф коммитов

13470 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
0ba845fed2 Continue development of regular expression support by implementing it for slurm launches. Works for both initial (cmd line and non-cmd line) and comm_spawn launch.
Additional work required to fully enable static port support when using cmd line regular expression launch system.

This commit was SVN r21502.
2009-06-23 20:25:38 +00:00
George Bosilca
bca8015b94 Add the proc_get_daemon capability to the bproc launcher.
This commit was SVN r21501.
2009-06-23 20:21:55 +00:00
George Bosilca
7339530061 Remove the prototype of a non-existant function.
This commit was SVN r21500.
2009-06-23 19:50:23 +00:00
George Bosilca
225f2b01c9 Don't release uninitialized objects.
This commit was SVN r21499.
2009-06-23 19:47:58 +00:00
Jeff Squyres
ecaa00ba73 Patch from Nadia/Bull from the opal-sos HG branch:
orte_session_dir_finalize doesn't clean the right directories.
orte_session_dir_cleanup neither.

This patch fixes several issues:

 1. orte_session_dir_cleanup():
   1. when jobid is not a wildcard, jobid is used to build the job
      session dir (instead of ORTE_LOCAL_JOBID).
   1. ORTE_SUCCESS is unconditionally returned (instead of rc that
      might have been previously set to another value).
 1. orte_session_dir_finalize():
   1. convert_jobid_to_string is not the right call to get the job
      session dir.
   1. in some places orte_process_info.top_session_dir is directly
      used, without being prefixed with the base directory.

Factorized the code sections that build the job_session_dir into a
single orte_build_job_session_dir() function that is now called by
both orte_session_dir_finalize() and orte_session_dir_cleanup().

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

This commit was SVN r21498.
2009-06-23 16:07:41 +00:00
Nysal Jan
938599cb2d Fix build failure with latest IBM XL C/C++ v10.1 compiler. Also this seems like cleaner code.
This commit was SVN r21497.
2009-06-23 14:08:04 +00:00
Ralph Castain
7a802a9d3a Move the plm designation to the argv from the env to support those systems not setup to pass env via rsh.
This commit was SVN r21495.
2009-06-22 18:08:45 +00:00
George Bosilca
7f24b41051 This function doesn't have to be globally visible.
This commit was SVN r21494.
2009-06-22 17:14:54 +00:00
Jeff Squyres
246caafe06 Correct the logic of the check for the env variable
OMPI_MCA_memory_ptmalloc2_disable and also add an explicit check for
FAKEROOTKEY (see http://bugs.debian.org/531522).

This commit was SVN r21489.
2009-06-20 11:22:06 +00:00
Ralph Castain
c199dbb241 Revert r21480 - we already did open/select the PLM on non-HNP daemons. This commit broke slave launches on Torque and SLURM as it caused the PLM to be open/selected twice.
The open/select of the PLM is done in orte/mca/ess/base/ess_base_std_orted.c. It only is done when the PLM MCA param is set directing a specific PLM be selected. The function

orte_plm_base_orted_append_basic_args

clears the params passed to the daemon of any PLM selection passed to the HNP. Each PLM then adds a PLM directive if-and-only-if backend PLM support is desired. At present, Torque, SLURM, and rsh all specify this support and direct that the backend orted open the "rsh" PLM.

This commit was SVN r21488.

The following SVN revision numbers were found above:
  r21480 --> open-mpi/ompi@ed585bce8a
2009-06-20 03:58:00 +00:00
Camille Coti
ed585bce8a Initialize the PML if we are a non-HNP daemon.
If we do not initialize the PML, non-HNP daemons will not be able to use its functions. For example, RSH needs it when the tree_spawn mode is 
enabled: daemons call orte_pml.remote_spawn() function to spawn their children in the deployment tree. 

This commit was SVN r21480.
2009-06-19 18:50:06 +00:00
Jeff Squyres
f42727707b Per http://bugs.debian.org/531522, add an MCA param/environment
variable to allow the disabling of the ptmalloc2 component at init
time.

This commit was SVN r21479.
2009-06-19 10:50:23 +00:00
Ralph Castain
110c95fb1c Make the match for the "env" option to mpi_show_mca_params to be less strict - match it if the string at least starts with "env" so "enviro" and "environ" all match.
This commit was SVN r21478.
2009-06-19 03:43:53 +00:00
Matthias Jurenz
edce15d08f Changes in VT documentation:
- added note for BFD license issues
- minor cleanups

This commit was SVN r21472.
2009-06-18 14:33:04 +00:00
Jeff Squyres
c39998db17 Also show the "you might not have enough registered memory" warning
message earlier in the openib BTL startup sequence

This commit was SVN r21469.
2009-06-18 12:24:39 +00:00
Ralph Castain
771ce035a5 Complete implementation of regular expression generator and parser - now handles leading zero's and suffix in node names.
This commit was SVN r21468.
2009-06-18 04:36:00 +00:00
Jeff Squyres
2a5813ac2d Silence a compiler warning.
This commit was SVN r21459.
2009-06-17 12:26:38 +00:00
Matthias Jurenz
d7aa9abc4e Added configure option '--with[out]-bfd' to control usage of BFD library to get symbol information for GNU, Intel, and Pathscale compiler instrumentation
This commit was SVN r21458.
2009-06-17 08:51:26 +00:00
Ralph Castain
8db7a9f9a7 Add regular expression generator to encode complete nid/pid maps - decoder to come.
This commit was SVN r21455.
2009-06-17 02:54:20 +00:00
Ralph Castain
85e55c5087 Make the IOF macros match for debug vs optimized builds
This commit was SVN r21453.
2009-06-16 22:30:53 +00:00
Rolf vandeVaart
36a560506c Fix error message to match code default.
This commit was SVN r21452.
2009-06-16 20:59:53 +00:00
Rolf vandeVaart
633b996a0f Add sys/wait.h so we can compile on Solaris.
This commit was SVN r21451.
2009-06-16 19:48:43 +00:00
Ralph Castain
e9fc0a74fb Silence compiler warnings
This commit was SVN r21445.
2009-06-16 13:34:31 +00:00
Ralph Castain
74bd80afd9 Do not preload binaries or files if the app isn't being executed on this node
This commit was SVN r21444.
2009-06-16 03:12:30 +00:00
Rainer Keller
5e6061af02 - Few fixes and comments
This commit was SVN r21443.
2009-06-15 21:12:04 +00:00
Ralph Castain
d1dd8c2653 Ensure we accurately count the number of new daemons to be launched, especially if we are restarting processes.
Have the resilient mapper also setup for new daemons in case the PLM needs them.

This commit was SVN r21437.
2009-06-15 13:55:01 +00:00
Ralph Castain
49d6cefe07 Fix a typo in the lanl platform files
This commit was SVN r21435.
2009-06-15 13:41:43 +00:00
Matthias Jurenz
24afd66352 Reverting previous commit and going back to r21178
This commit was SVN r21433.

The following SVN revision numbers were found above:
  r21178 --> open-mpi/ompi@fb3bbb8021
2009-06-15 11:41:33 +00:00
Abhishek Kulkarni
63845591a0 This fix adds a missing topic (cant-open-logfile) to help-orte-top.txt
and changes another incorrectly titled topic from pid-required
to no-contact-given.

This commit was SVN r21431.
2009-06-13 23:05:44 +00:00
Ralph Castain
b087336c6c Refinements to platform file
This commit was SVN r21429.
2009-06-12 19:48:30 +00:00
Ralph Castain
c0c56e30c9 Add a missing function to the resilient mapper so it defines daemons in case they are needed
This commit was SVN r21428.
2009-06-12 19:48:13 +00:00
Ralph Castain
44bb265a52 Add a new MPI_Info key to preposition OMPI libraries - implementation underway, but this just defines and passes the new key
This commit was SVN r21425.
2009-06-12 17:53:13 +00:00
Ralph Castain
170327e575 Reorg the rmaps components to collect shared code for byslot and bynode mapping in the base so we quit duplicating it in every mapper
This commit was SVN r21424.
2009-06-12 17:52:17 +00:00
Ralph Castain
1ee4acb247 Cleanup how we handle pointer arrays in the odls base fns to avoid potential segfaults
This commit was SVN r21423.
2009-06-12 17:51:23 +00:00
Ralph Castain
89b7e20a8b Add some Cisco-related platform files
This commit was SVN r21422.
2009-06-12 17:50:14 +00:00
Jeff Squyres
814a8f5e0f * Fix #1916: endian problems in iwarp wireup on big endian machines
(now works on both big and little endian machines)
 * Be a little more flexible when looking for active devices in
   btl_openib_component.c
 * Add device name and port number to lots of verbose and help
   messages
 * Add a bunch of verbose messages to give insight into what is
   occurring during all the CPC wireups

This commit was SVN r21418.
2009-06-11 17:30:30 +00:00
Ralph Castain
4881cd0df3 Revert the prior change out from the individual .h files - the problem was in the Makefile.am's, causing the make dist to fail.
This commit was SVN r21414.
2009-06-11 03:15:47 +00:00
Ralph Castain
91ab2b3e4f Specify complete path to included header files so it compiles in all environments
This commit was SVN r21412.
2009-06-11 02:46:30 +00:00
Jeff Squyres
6777f01380 Also look for /dev/ipath
This commit was SVN r21410.
2009-06-11 00:35:21 +00:00
Avneesh Pant
295c0c36bc Add new QLogic adapter to device parameter file.
This commit was SVN r21409.
2009-06-10 23:37:38 +00:00
Ralph Castain
70b8c89b44 Fix slave spawn, which was hanging because the local daemon never saw the slave job report - it doesn't do it in the normal way, and so the slave launch system itself has to "fake it".
Also complete implementation to printout app_context objects so we see all the fields.

This commit was SVN r21408.
2009-06-10 19:01:08 +00:00
Ralph Castain
f24cefe3d2 Shift the check for adequate file descriptors to after we check if this proc is part of the job to be launched - no point in doing the check more often than absolutely required
This commit was SVN r21406.
2009-06-10 15:18:48 +00:00
Ralph Castain
87d7d693f0 Add a notifier call when the oob retries are exceeded so sys admins are aware of the problem
This commit was SVN r21405.
2009-06-10 15:17:16 +00:00
Terry Dontje
fa9f356c8c This commit fixes trac:1931. By separating out structures and defines used by
the debugger plugins into files suffixed by _dbg.h.

This commit was SVN r21404.

The following Trac tickets were found above:
  Ticket 1931 --> https://svn.open-mpi.org/trac/ompi/ticket/1931
2009-06-10 14:57:28 +00:00
Ralph Castain
f966d9f972 Fix visibility issues with opal_graph functions.
Fix the carto test so it can compile - need to update input file so it can run

This commit was SVN r21403.
2009-06-09 15:02:57 +00:00
Ralph Castain
c3c1ab1337 Correct a comment in paffinity.h about what paffinity_get returns - it was inaccurate.
Revamp the affinity detection/set procedure in mpi_init to correctly detect when we have already been bound to processors, given the revised understanding of paffinity_get. Add a new paffinity macro to make checking for already bound a little nicer.

This commit was SVN r21402.
2009-06-09 14:33:35 +00:00
George Bosilca
f9a510fd8a There is no need for an atomic read if we are not in a threaded case.
This commit was SVN r21394.
2009-06-08 23:55:52 +00:00
George Bosilca
77a6f27d44 Update the call to orte_plm_base_create_jobid based on the new interface.
This commit was SVN r21393.
2009-06-08 23:53:53 +00:00
Ralph Castain
86d55d7ebf Fix tight loops over comm_spawn by checking to see if the system has enough child procs and file descriptors available before attempting to launch. If not, introduce a 1sec delay and then test again. This provides a chance for the orted to complete processing of proc terminations from other children, hopefully creating room for the new proc(s).
Update the loop_spawn test to remove a sleep so that it runs at max speed, letting the new code catch when we overrun ourselves and wait for room to be cleared for the next comm_spawn.

This commit was SVN r21390.
2009-06-08 18:28:26 +00:00
Shiqing Fan
5a90b3068e Two type casts.
This commit was SVN r21388.
2009-06-07 12:51:46 +00:00