1
1

2546 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
e554fc282d Add some diagnostic output when daemons die
This commit was SVN r21960.
2009-09-09 18:16:50 +00:00
Ralph Castain
c20d977a30 Report the allocate event, if requested
This commit was SVN r21959.
2009-09-09 17:47:58 +00:00
Ralph Castain
2688ad2c9f Ensure the odls_types are included when referencing the APIs
This commit was SVN r21958.
2009-09-09 17:47:13 +00:00
Ralph Castain
cb7f608006 Remove debug output
This commit was SVN r21957.
2009-09-09 17:46:28 +00:00
Ralph Castain
51b13b3d5c A few minor cleanups in where threads are unlocked.
Reset mpirun's exit code when we restart failed procs

This commit was SVN r21955.
2009-09-09 05:31:06 +00:00
Ralph Castain
8ae4b55d16 Enable a new command line option to --report-events that instructs mpirun to RML-report specific events during job life to the requestor.
This commit was SVN r21954.
2009-09-09 05:28:45 +00:00
Ralph Castain
c877b1a5f8 Silence a compiler warning about no format
This commit was SVN r21951.
2009-09-08 15:03:14 +00:00
Ralph Castain
81b8bc5b54 Silence a compiler warning about no format
This commit was SVN r21950.
2009-09-08 15:02:48 +00:00
Ralph Castain
142036f2c0 Issue an error message and abort if the user requests a number of processes that conflicts with nperxxx directives when evaluated against available resources
This commit was SVN r21949.
2009-09-07 03:36:10 +00:00
Ralph Castain
ca09e8f604 Minor modification required to allow opal_paffinity_alone to default to bind-to-core
This commit was SVN r21948.
2009-09-05 15:24:26 +00:00
Ralph Castain
17444243f7 Correct the bit mask to properly set the binding policy
This commit was SVN r21934.
2009-09-03 17:58:23 +00:00
Jeff Squyres
e1fe03ad44 Minor grammar fixes, and use "#" for separating lines, not blank lines.
This commit was SVN r21931.
2009-09-03 07:02:21 +00:00
Ralph Castain
0421a49844 Update the xml support to allow -xml-file foo whereby we redirect all xml formatted output (and ONLY xml formatted output) to a specified file
This commit was SVN r21930.
2009-09-02 18:03:10 +00:00
Ralph Castain
d3d34f8f15 Correct a bug in the assignment of node index value. Ensure we set the app number so that MPI attributes get set correctly.
This commit was SVN r21927.
2009-09-02 01:15:44 +00:00
Ralph Castain
50ca27c1c8 Ensure that procs launched natively by slurm do not mistakenly identify themselves as daemons to the system
This commit was SVN r21926.
2009-09-01 17:57:15 +00:00
Lenny Verkhovsky
2a594fec6c added help message to rankfile mapper when failed if using alias instead of full hostname
This commit was SVN r21919.
2009-09-01 11:17:32 +00:00
Ralph Castain
59645c5c8e Per direction from the slurm team, change the envar we look at to get our allocation
This commit was SVN r21915.
2009-08-30 15:57:27 +00:00
Ralph Castain
0394a4884d Setup cpus-per-proc and cpus-per-rank as synonyms, both in mca params and on mpirun cmd line
This commit was SVN r21914.
2009-08-30 14:30:36 +00:00
Ralph Castain
ef4cdeeb69 Fix round-robin mapping when bind-to-socket in cases where #procs > #sockets and #cores
This commit was SVN r21913.
2009-08-29 03:36:21 +00:00
Ralph Castain
433673c64f Report bindings in all cases, including external bindings and slot lists
This commit was SVN r21911.
2009-08-28 13:58:46 +00:00
Ralph Castain
3ef028ca23 Trap mpirun error messages in xml format
This commit was SVN r21910.
2009-08-28 02:46:15 +00:00
Ralph Castain
59f08dd2ff Support the combination of npersocket and bind-to-core
This commit was SVN r21909.
2009-08-28 02:31:26 +00:00
Shiqing Fan
fb777134cf Adjust the command string length.
This commit was SVN r21905.
2009-08-27 13:42:55 +00:00
Ralph Castain
2d27bc9824 Default npersocket to bind-to-socket unless otherwise directed
This commit was SVN r21904.
2009-08-27 13:21:14 +00:00
Shiqing Fan
bdb45b25c6 Skip a few more signal events on Windows for orted, for the signal events don't work for select on Windows.
This commit was SVN r21903.
2009-08-27 13:17:13 +00:00
Shiqing Fan
ffd55631bc Deal with the case when the prefix is NULL.
This commit was SVN r21902.
2009-08-27 13:11:18 +00:00
Ralph Castain
01ba0eaa47 Correctly handle npersocket corner cases
This commit was SVN r21901.
2009-08-27 11:25:48 +00:00
Ralph Castain
12d6595c41 Do not deprecate the rankfile mca param.
This commit was SVN r21900.
2009-08-27 10:40:51 +00:00
Shiqing Fan
4119497c5a Use select() for Windows events by default.
For some historical reasons, we didn't use select() for the Windows events. Now it could be merged back to have a better performance on Windows.

This commit was SVN r21899.
2009-08-27 08:11:56 +00:00
Shiqing Fan
6e57772ebd Remove a few redundant security functions, and use USERDOMAIN environment variable for the domain name.
This commit was SVN r21898.
2009-08-27 08:00:49 +00:00
Shiqing Fan
1b6db85988 Complete the support for building on UNC path.
This commit was SVN r21897.
2009-08-27 07:57:26 +00:00
Ralph Castain
5e710928a5 Revise the new binding system slightly:
1. finalize the logic for properly respecting externally assigned bindings. Thanks to Chris Samuel for his help with this. Still needs some acid testing, but appears to now work.

2. remove the double-logic of requiring opal_paffinity_alone AND bind-to-foo. If the user specifies bind-to-foo, trust her and just do it.

This commit was SVN r21885.
2009-08-26 02:01:49 +00:00
Ralph Castain
2016a3180b Silence compiler warnings about uninitialized variables
This commit was SVN r21883.
2009-08-26 01:56:39 +00:00
Ralph Castain
9ad33a4688 Silence compiler warning about uninitialized variable
This commit was SVN r21882.
2009-08-26 01:56:11 +00:00
Ralph Castain
0e528e994f Revert last commit - went to wrong repo!
Didn't we just have that happen the other day too? :-)

This commit was SVN r21878.
2009-08-25 13:06:14 +00:00
Ralph Castain
15d12b240b Sync to r21876
This commit was SVN r21877.

The following SVN revision numbers were found above:
  r21876 --> open-mpi/ompi@ef970293f0
2009-08-25 13:04:12 +00:00
Ralph Castain
0bd12e99ff Fix typo - we want to detect bindings, not set them in the mask.
Thanks to Chris Samuel for finding it!

This commit was SVN r21875.
2009-08-25 11:53:05 +00:00
Ralph Castain
509cc0553c When directly launched by an RM, flag that a process is operating without daemons - i.e., standalone. Provide an error string for the new socket_not_available error. Use errmgr.abort to exit when we cannot get a socket, and ensure that the slurmd module returns the proper exit status for slurm 2.0
This commit was SVN r21868.
2009-08-22 02:58:20 +00:00
Ralph Castain
7370235c3e Create a more specific error code for when specific sockets are not available. Ensure that slurm 2.0 gets the expected error return if the process can't start for that reason so it can take corrective action.
This commit was SVN r21867.
2009-08-21 21:28:15 +00:00
Ralph Castain
7183179f56 Provide native integration with SLURM 2.0's OMPI support
This commit was SVN r21865.
2009-08-21 18:03:34 +00:00
Ralph Castain
35f8b68de6 Note to self: save all changes before committing
This commit was SVN r21863.
2009-08-21 12:54:29 +00:00
Ralph Castain
535408d6c2 Answer a Jeff-ism and check malloc for NULL return - for all xml formatting errors, revert to at least showing the non-xml formatted message
This commit was SVN r21862.
2009-08-21 12:41:54 +00:00
Shiqing Fan
baa81a6525 Add a missing header to compile on Windows.
This commit was SVN r21861.
2009-08-21 07:20:21 +00:00
Ralph Castain
2e0bd04755 Ensure that show_help messages are properly xml formatted
This commit was SVN r21858.
2009-08-20 19:23:26 +00:00
Rainer Keller
8e1b23779f - Replace combinations of
#if defined (c_plusplus)
          defined (__cplusplus)
   followed by
      extern "C" {
   and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.

   Notable exceptions are:
    - opal/include/opal_config_bottom.h:
      This is our generated code, that itself defines BEGIN_C_DECL and
      END_C_DECL
    - ompi/mpi/cxx/mpicxx.h:
      Here we do not include opal_config_bottom.h:                                 
    - Belongs to external code:                                                    
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c        
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h        
    - opal/include/opal/prefetch.h:
      Has C++ specific macros that are protected:                                  

    - Had #if ... } #endif  _and_ END_C_DECLS (aka end up with 2x
      END_C_DECLS)
      ompi/mca/btl/openib/btl_openib.h
    - opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
    - opal/win32/ompi_process.h: had extern "C"\n {...
      opal/win32/ompi_process.h: dito
    - ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
      ompi/mpi/f90/test/align_c.c: dito
    - ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
    - ompi/mpi/f90/xml/common-C.xsl: Amend

   Tested on linux using --with-openib and --with-mx

   The following do not contain either opal_config.h, orte_config.h or
   ompi_config.h
   (but possibly other header files, that include one of the above):
      ompi/mca/bml/r2/bml_r2_ft.h
      ompi/mca/btl/gm/btl_gm_endpoint.h
      ompi/mca/btl/gm/btl_gm_proc.h
      ompi/mca/btl/mx/btl_mx_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_frag.h
      ompi/mca/btl/ofud/btl_ofud_proc.h
      ompi/mca/btl/openib/btl_openib_mca.h
      ompi/mca/btl/portals/btl_portals_endpoint.h
      ompi/mca/btl/portals/btl_portals_frag.h
      ompi/mca/btl/sctp/btl_sctp_endpoint.h
      ompi/mca/btl/sctp/btl_sctp_proc.h
      ompi/mca/btl/tcp/btl_tcp_endpoint.h
      ompi/mca/btl/tcp/btl_tcp_ft.h
      ompi/mca/btl/tcp/btl_tcp_proc.h
      ompi/mca/btl/template/btl_template_endpoint.h
      ompi/mca/btl/template/btl_template_proc.h
      ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
      ompi/mca/btl/udapl/btl_udapl_endpoint.h
      ompi/mca/btl/udapl/btl_udapl_mca.h
      ompi/mca/btl/udapl/btl_udapl_proc.h
      ompi/mca/mtl/mx/mtl_mx_endpoint.h
      ompi/mca/mtl/mx/mtl_mx.h
      ompi/mca/mtl/psm/mtl_psm_endpoint.h
      ompi/mca/mtl/psm/mtl_psm.h
      ompi/mca/pml/cm/pml_cm_component.h
      ompi/mca/pml/csum/pml_csum_comm.h
      ompi/mca/pml/dr/pml_dr_comm.h
      ompi/mca/pml/dr/pml_dr_component.h
      ompi/mca/pml/dr/pml_dr_endpoint.h
      ompi/mca/pml/dr/pml_dr_recvfrag.h
      ompi/mca/pml/example/pml_example.h
      ompi/mca/pml/ob1/pml_ob1_comm.h
      ompi/mca/pml/ob1/pml_ob1_component.h
      ompi/mca/pml/ob1/pml_ob1_endpoint.h
      ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
      ompi/mca/pml/ob1/pml_ob1_recvfrag.h
      ompi/mca/pml/v/pml_v_output.h
      opal/include/opal/prefetch.h
      opal/mca/timer/aix/timer_aix.h
      opal/util/qsort.h
      test/support/components.h

This commit was SVN r21855.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2009-08-20 11:42:18 +00:00
Rainer Keller
3f742fc35b - add missing #include
This commit was SVN r21854.
2009-08-20 11:20:53 +00:00
Rainer Keller
9a0b6ef71e - For now fixup our headers for known issue of Sun Studio 12.1 compiler
"...", line XXX: warning: linker scope was specified more than once:

This commit was SVN r21853.
2009-08-20 11:12:45 +00:00
Ralph Castain
40fc0b6367 Silence compiler warning
This commit was SVN r21850.
2009-08-20 04:57:23 +00:00
Ralph Castain
c3c642aa0d Add two new frameworks for sensing and predicting faults. This is just the bare-bones plumbing for now - will instantiate soon.
No ess modules reference these frameworks yet, so they are completely inactive.

This commit was SVN r21847.
2009-08-20 04:27:16 +00:00
Ralph Castain
646a3500a7 Correctly account for number of procs in the job
This commit was SVN r21843.
2009-08-20 00:07:38 +00:00