1
1

2507 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
7183179f56 Provide native integration with SLURM 2.0's OMPI support
This commit was SVN r21865.
2009-08-21 18:03:34 +00:00
Ralph Castain
35f8b68de6 Note to self: save all changes before committing
This commit was SVN r21863.
2009-08-21 12:54:29 +00:00
Ralph Castain
535408d6c2 Answer a Jeff-ism and check malloc for NULL return - for all xml formatting errors, revert to at least showing the non-xml formatted message
This commit was SVN r21862.
2009-08-21 12:41:54 +00:00
Shiqing Fan
baa81a6525 Add a missing header to compile on Windows.
This commit was SVN r21861.
2009-08-21 07:20:21 +00:00
Ralph Castain
2e0bd04755 Ensure that show_help messages are properly xml formatted
This commit was SVN r21858.
2009-08-20 19:23:26 +00:00
Rainer Keller
8e1b23779f - Replace combinations of
#if defined (c_plusplus)
          defined (__cplusplus)
   followed by
      extern "C" {
   and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.

   Notable exceptions are:
    - opal/include/opal_config_bottom.h:
      This is our generated code, that itself defines BEGIN_C_DECL and
      END_C_DECL
    - ompi/mpi/cxx/mpicxx.h:
      Here we do not include opal_config_bottom.h:                                 
    - Belongs to external code:                                                    
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c        
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h        
    - opal/include/opal/prefetch.h:
      Has C++ specific macros that are protected:                                  

    - Had #if ... } #endif  _and_ END_C_DECLS (aka end up with 2x
      END_C_DECLS)
      ompi/mca/btl/openib/btl_openib.h
    - opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
    - opal/win32/ompi_process.h: had extern "C"\n {...
      opal/win32/ompi_process.h: dito
    - ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
      ompi/mpi/f90/test/align_c.c: dito
    - ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
    - ompi/mpi/f90/xml/common-C.xsl: Amend

   Tested on linux using --with-openib and --with-mx

   The following do not contain either opal_config.h, orte_config.h or
   ompi_config.h
   (but possibly other header files, that include one of the above):
      ompi/mca/bml/r2/bml_r2_ft.h
      ompi/mca/btl/gm/btl_gm_endpoint.h
      ompi/mca/btl/gm/btl_gm_proc.h
      ompi/mca/btl/mx/btl_mx_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_frag.h
      ompi/mca/btl/ofud/btl_ofud_proc.h
      ompi/mca/btl/openib/btl_openib_mca.h
      ompi/mca/btl/portals/btl_portals_endpoint.h
      ompi/mca/btl/portals/btl_portals_frag.h
      ompi/mca/btl/sctp/btl_sctp_endpoint.h
      ompi/mca/btl/sctp/btl_sctp_proc.h
      ompi/mca/btl/tcp/btl_tcp_endpoint.h
      ompi/mca/btl/tcp/btl_tcp_ft.h
      ompi/mca/btl/tcp/btl_tcp_proc.h
      ompi/mca/btl/template/btl_template_endpoint.h
      ompi/mca/btl/template/btl_template_proc.h
      ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
      ompi/mca/btl/udapl/btl_udapl_endpoint.h
      ompi/mca/btl/udapl/btl_udapl_mca.h
      ompi/mca/btl/udapl/btl_udapl_proc.h
      ompi/mca/mtl/mx/mtl_mx_endpoint.h
      ompi/mca/mtl/mx/mtl_mx.h
      ompi/mca/mtl/psm/mtl_psm_endpoint.h
      ompi/mca/mtl/psm/mtl_psm.h
      ompi/mca/pml/cm/pml_cm_component.h
      ompi/mca/pml/csum/pml_csum_comm.h
      ompi/mca/pml/dr/pml_dr_comm.h
      ompi/mca/pml/dr/pml_dr_component.h
      ompi/mca/pml/dr/pml_dr_endpoint.h
      ompi/mca/pml/dr/pml_dr_recvfrag.h
      ompi/mca/pml/example/pml_example.h
      ompi/mca/pml/ob1/pml_ob1_comm.h
      ompi/mca/pml/ob1/pml_ob1_component.h
      ompi/mca/pml/ob1/pml_ob1_endpoint.h
      ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
      ompi/mca/pml/ob1/pml_ob1_recvfrag.h
      ompi/mca/pml/v/pml_v_output.h
      opal/include/opal/prefetch.h
      opal/mca/timer/aix/timer_aix.h
      opal/util/qsort.h
      test/support/components.h

This commit was SVN r21855.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2009-08-20 11:42:18 +00:00
Rainer Keller
3f742fc35b - add missing #include
This commit was SVN r21854.
2009-08-20 11:20:53 +00:00
Rainer Keller
9a0b6ef71e - For now fixup our headers for known issue of Sun Studio 12.1 compiler
"...", line XXX: warning: linker scope was specified more than once:

This commit was SVN r21853.
2009-08-20 11:12:45 +00:00
Ralph Castain
40fc0b6367 Silence compiler warning
This commit was SVN r21850.
2009-08-20 04:57:23 +00:00
Ralph Castain
c3c642aa0d Add two new frameworks for sensing and predicting faults. This is just the bare-bones plumbing for now - will instantiate soon.
No ess modules reference these frameworks yet, so they are completely inactive.

This commit was SVN r21847.
2009-08-20 04:27:16 +00:00
Ralph Castain
646a3500a7 Correctly account for number of procs in the job
This commit was SVN r21843.
2009-08-20 00:07:38 +00:00
Ralph Castain
e66a0be796 First attempt at making OMPI respect external bindings. Detect any external bindings on the daemons, and use that to determine which sockets/cores to bind to.
I have no machine which allows me to do external binding, so I will have to ask others to test the new logic. However, I did verify that these changes don't break the existing logic when no external bindings were present.

This commit was SVN r21842.
2009-08-19 19:29:15 +00:00
Ralph Castain
d3313f732a Flush the mpirun xml tag output to maintain ordering
This commit was SVN r21836.
2009-08-18 21:20:55 +00:00
Ralph Castain
3796cc568a Add a --report-bindings cmd line option to mpirun
This commit was SVN r21829.
2009-08-18 17:10:23 +00:00
Ralph Castain
dbb3cbe3dd Fix a reported problem when specifying orte_launch_agent - if only one word was given, we inadvertently appended a "NULL" to the end of the cmd.
This commit was SVN r21827.
2009-08-18 14:57:34 +00:00
Ralph Castain
a489f3fc6c Surround the entire xml output from an mpirun with <mpirun> tags.
This commit was SVN r21826.
2009-08-18 03:15:29 +00:00
Ralph Castain
511fe5da8b Minor cleanup - get reported iters right in test
This commit was SVN r21819.
2009-08-14 03:33:59 +00:00
Ralph Castain
3f3b46495e Add some error checking to the tm launcher
This commit was SVN r21818.
2009-08-14 03:13:02 +00:00
Ralph Castain
9da6b46e7d Add several options to the sendrecv_blaster to make it more powerful
This commit was SVN r21817.
2009-08-14 03:12:43 +00:00
Ralph Castain
0005e6e834 Correct a couple of bugs in the rank_file mapper that were incorrectly assigning vpids.
Add a capability to parse the rankfile to extract node information in place of requiring both hostfile and rankfile for non-RM managed environments. The rankfile is -only- parsed for this IF the hostfile and -host options are not given. Otherwise, those are used to establish allocation info as we did before this commit.

This commit was SVN r21815.
2009-08-13 16:08:43 +00:00
Rainer Keller
cea3d68ef6 - Fix reference counting of daemons killed.
This commit was SVN r21810.
2009-08-12 14:04:50 +00:00
Shiqing Fan
bce2f44154 Update related .windows files with proper compiling properties, in order to have a successful DSO build.
This commit was SVN r21805.
2009-08-12 08:55:58 +00:00
Josh Hursey
843a61f7eb Add another missing header for FT that got lost in the shuffle yesterday.
This commit was SVN r21794.
2009-08-11 13:32:27 +00:00
Ralph Castain
1dc12046f1 Modify the OMPI paffinity and mapping system to support socket-level mapping and binding. Mostly refactors existing code, with modifications to the odls_default module to support the new capabilities.
Adds several new mpirun options:

* -bysocket - assign ranks on a node by socket. Effectively load balances the procs assigned to a node across the available sockets. Note that ranks can still be bound to a specific core within the socket, or to the entire socket - the mapping is independent of the binding.

* -bind-to-socket - bind each rank to all the cores on the socket to which they are assigned.

* -bind-to-core - currently the default behavior (maintained from prior default)

* -npersocket N - launch N procs for every socket on a node. Note that this implies we know how many sockets are on a node. Mpirun will determine its local values. These can be overridden by provided values, either via MCA param or in a hostfile

Similar features/options are provided at the board level for multi-board nodes.

Documentation to follow...

This commit was SVN r21791.
2009-08-11 02:51:27 +00:00
Ralph Castain
007cbe74f4 Include the sendrecv blaster in the tarball
This commit was SVN r21790.
2009-08-11 02:42:47 +00:00
Rainer Keller
76469ea64a - Change the property of a few files, that obviously
don't need to be svn:executable...

This commit was SVN r21786.
2009-08-11 01:40:00 +00:00
Rainer Keller
784b9b9f5b - Based and updated from Ken's patch: since CLE-2.1 does not offer
the BATCH_PARTITION_ID anymore, use the ras-alps-command.sh script to
   figure out the jobs ID to query from ALPS.

   Gracefully report errors, update the help file and parse the sysconfig file

This commit was SVN r21772.
2009-08-07 01:15:09 +00:00
Ralph Castain
c66a5a9504 Add another test that just blasts the system with MPI_Sendrecv to myself commands of varying sizes
This commit was SVN r21748.
2009-07-31 14:57:03 +00:00
George Bosilca
0bf381e931 This patch try to solve a issue on Leopard. The supposedly global
variables that are not initialized and are declared in a file that
doesn't export any globally visible function are marked as
non-initialized constants, i.e. uninitialized common symbols. For some
obscure reasons, they get removed from the object files on Mac OS X.

So far I found two solution to this problem. One require the addition
of "-c" to the linker command, the second one (corresponding to this
patch) force them to became a common initialized symbol.

This commit was SVN r21739.
2009-07-28 17:06:16 +00:00
Jeff Squyres
c7376ae053 Start using Libtool's shared library versioning scheme. See lengthy
note in VERSION file.

NOTE: the versions will ''always'' be 0:0:0 on the SVN trunk and
developer branches.  They will only have meaningful values (starting
with 0:0:0 in 1.3.4) on release branches.  Only RM's will modify these
values immediately preceeding a release.

This commit was SVN r21729.
2009-07-23 21:35:17 +00:00
Ralph Castain
55e7365e7a Jeff correctly pointed out that char values > 127 also don't print. Adjust the xml output to handle those too.
Thanks Jeff!

This commit was SVN r21727.
2009-07-22 13:28:27 +00:00
Shiqing Fan
3e24d3df70 An ORTE event fix for Windows, i.e. using socket pairs instead of pipes on Windows.
This commit was SVN r21726.
2009-07-22 07:39:52 +00:00
Ralph Castain
6c85d954f3 Use a conditioned wait to serialize launches when they come from multiple sources (e.g., an orte application that spawns multiple jobs).
This commit was SVN r21718.
2009-07-20 01:51:29 +00:00
Ralph Castain
1a5f7245c8 Create a new message handling method for serializing responses. Place recvd messages on a list, using a file descriptor and the event library to trigger processing. This is identical in design to what is used in the IOF.
Use it first in the plm_base_receive to serialize multiple comm_spawn and update_proc requests.

This commit was SVN r21717.
2009-07-19 18:07:04 +00:00
Ralph Castain
1d74ab6e3c Cleanup some pointer array addressing and ensure we always exit the function cleanly
This commit was SVN r21716.
2009-07-19 18:05:04 +00:00
Ralph Castain
c3ce908515 Shift a debug output to come at a better place
This commit was SVN r21715.
2009-07-19 17:56:48 +00:00
Ralph Castain
1a5a591424 Clarify some comments
This commit was SVN r21714.
2009-07-19 17:56:19 +00:00
Ralph Castain
43d532cfd3 Minor tweak to xml ouput
This commit was SVN r21713.
2009-07-19 17:45:01 +00:00
Ralph Castain
2f515c8357 Per request of the Eclipse team, further modify the xml output to "escape" all non-printable characters
This commit was SVN r21712.
2009-07-17 22:45:32 +00:00
Ralph Castain
210f591f1c Cleanup array addressing for opal_pointer_array
This commit was SVN r21710.
2009-07-17 22:20:30 +00:00
Ralph Castain
51a8b89a83 Treat termination of continuously operating processes as an abort
This commit was SVN r21709.
2009-07-17 22:20:05 +00:00
Ralph Castain
08e17b72cf Break a circular logic loop in the cm routed module.
This commit was SVN r21708.
2009-07-17 18:07:35 +00:00
Ralph Castain
ef20e778b3 Ensure that output ends on an appropriate suffix tag when --tag-output or --xml are selected.
When we read the input buffer, we don't always get a complete printf output - we sometimes end mid stream. We still need to add the suffix and a <CR> to keep the output working right.

This commit was SVN r21706.
2009-07-17 05:02:53 +00:00
Ralph Castain
4c1eb040b0 Enable the system to keep functioning even when multiple launches are occurring simultaneously.
This is a bit of a hack, but it does seem to allow the system to work. A better solution is being discussed.

This commit was SVN r21705.
2009-07-17 02:28:47 +00:00
Ralph Castain
c0e85a492c Deleted one too many lines...might be good to set the value of oldnode!
Thanks George.

This commit was SVN r21702.
2009-07-16 18:49:24 +00:00
George Bosilca
3e971e61f3 The system headers are supposed to be protected by #ifdef and not by #if.
This commit was SVN r21700.
2009-07-16 18:27:33 +00:00
George Bosilca
ed93b967f7 Remove some warnings about uninitialized values.
This commit was SVN r21695.
2009-07-16 17:38:09 +00:00
George Bosilca
52d013baae Add a missing header.
This commit was SVN r21694.
2009-07-16 17:21:37 +00:00
Ralph Castain
007d14f238 Add a threshold reporting level to the orte notifier framework. This takes a string value:
"critical" - any error at or above the critical severity will be reported (i.e., only critical errors)
"warning" - any error at or above the warning severity will be reported (i.e., warning and critical errors)
"notice" - pretty much everything will be reported

Default to "critical" to keep down the chatter.

Obviously, only places that call orte_notifier will be affected - all other error reporting (e.g., via opal_output calls) is unaffected.

This commit was SVN r21693.
2009-07-16 13:31:23 +00:00
Ralph Castain
ae6c36ae01 Ensure that jdata->num_procs is correct when the rank_file mapper is mapping more procs than are specified in the rank_file
This commit was SVN r21690.
2009-07-15 22:45:12 +00:00