1
1
Граф коммитов

31 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
fa082cafa9 Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex.
Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer.

This commit was SVN r18198.
2008-04-17 20:43:56 +00:00
Josh Hursey
612ebdc2ac Cleanup some symbol visability issues.
This commit was SVN r17733.
2008-03-05 13:59:25 +00:00
Josh Hursey
3b4073e32c This commit fixes the checkpoint/restart functionality on the trunk. Included in this commit are:
* Extension to the ESS framework to support C/R
 * Fixed support for {{{snapc_base_establish_global_snapshot_dir}}}
 * Fixed FileM support
 * Misc. minor code modifications

There are some outstanding visability issues that I want to fix next.

This commit was SVN r17725.
2008-03-05 04:57:23 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Jeff Squyres
213b5d5c6e Per long threads on the mailing list and much confusion discussion
about linkers, have all OPAL, ORTE, and OMPI components '''not'' link
against the OPAL, ORTE, or OMPI libraries.

See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for
details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a
better-formatted version of the same info).

This commit was SVN r16968.
2007-12-15 13:32:02 +00:00
Josh Hursey
a287c9cb65 This commit distinguishes the file transfer stage from the finish stage.
This commit also cleans up the checkpoint and terminate case making it more
precise than before. Previously the application could make a small amount of
progress between checkpoint completion and application termination. Now the
application will make no progress at all in this time span.

Additional minor change:
 - Start using OPAL_INT_TO_BOOL instead of if/else logic

This commit was SVN r16952.
2007-12-13 14:37:17 +00:00
Josh Hursey
5fb83a4f10 - Remove an unnecessary barrier
- verbose -> VERBOSE just for the fun of it

This commit was SVN r16811.
2007-11-30 22:26:18 +00:00
Gleb Natapov
e0a3a7e53e Move duplicated code all over the code to a single function ompi_request_wait_completion().
This commit was SVN r16494.
2007-10-18 12:33:21 +00:00
Josh Hursey
1fe1276fd5 Make sure to match on the communicator ID as well.
This commit was SVN r16179.
2007-09-21 18:16:02 +00:00
Josh Hursey
3e51d7bb25 Implement the MPI_Iprobe and MPI_Probe wrappers.
Remove some old, unused code.

This commit was SVN r16178.
2007-09-21 16:28:46 +00:00
Josh Hursey
d2ef0d445a Add some basic timing hooks so I can extract a few more detailed performance
numbers for tuning.

Switch the bookmark_recv to be non-blocking. If this is blocking then for
process counts >= 32 slight process delays were causing cascading performance
delays in the protocol. This lead to checkpoints either taking about 3 sec or
45 sec (or more) for 64 procs due to the cascading delays. With the nonblocking
receive version this is no longer the case we get the speedup we expect for this
part of the protocol.

More tuning to come.

This commit was SVN r16137.
2007-09-16 15:13:23 +00:00
Josh Hursey
729c63cf9d Fix invalid MCA 'base' names so they appear in ompi_info.
A subset of this patch needs to be applied to v1.2

Refs trac:928

This commit was SVN r15918.

The following Trac tickets were found above:
  Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928
2007-08-18 03:05:45 +00:00
Brian Barrett
5b9fa7e998 reapply r15517 and r15520, which were removed in r15527 so that I could get
the RML/OOB merge in slightly easier

This commit was SVN r15530.

The following SVN revision numbers were found above:
  r15517 --> open-mpi/ompi@41977fcc95
  r15520 --> open-mpi/ompi@9cbc9df1b8
  r15527 --> open-mpi/ompi@2d17dd9516
2007-07-20 02:34:29 +00:00
Brian Barrett
39a6057fc6 A number of improvements / changes to the RML/OOB layers:
* General TCP cleanup for OPAL / ORTE
  * Simplifying the OOB by moving much of the logic into the RML
  * Allowing the OOB RML component to do routing of messages
  * Adding a component framework for handling routing tables
  * Moving the xcast functionality from the OOB base to its own framework

Includes merge from tmp/bwb-oob-rml-merge revisions:

    r15506, r15507, r15508, r15510, r15511, r15512, r15513

This commit was SVN r15528.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r15506
  r15507
  r15508
  r15510
  r15511
  r15512
  r15513
2007-07-20 01:34:02 +00:00
Brian Barrett
2d17dd9516 temporarily back our r15517 and 15520 so that I can get the RML / OOB changes
to cleanly apply

This commit was SVN r15527.

The following SVN revision numbers were found above:
  r15517 --> open-mpi/ompi@41977fcc95
2007-07-20 01:10:34 +00:00
Ralph Castain
41977fcc95 Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but...
Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point.

Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings.

This commit was SVN r15517.
2007-07-19 20:56:46 +00:00
Brian Barrett
8b9e8054fd Move modex from pml base to general ompi runtime, sicne it's used by more
than just the PML/BTLs these days.  Also clean up the code so that it
handles the situation where not all nodes register information for a given
node (rather than just spinning until that node sends information, like
we do today).

Includes r15234 and r15265 from the /tmp/bwb-modex branch.

This commit was SVN r15310.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r15234
  r15265
2007-07-09 17:16:34 +00:00
Josh Hursey
af38efd27c Use more of the datatype engine supplied functions
This commit was SVN r14519.
2007-04-26 00:06:22 +00:00
Josh Hursey
2af38229c1 Re-worked the implementation of the LAM-like coord component.
It's a bit longer, but much more clear in it's implementation I believe.

Fundamentally it is the same, but is much more solid in the implementation.
I created quite a few directed tests that this version of the implementation 
now passes.

This commit was SVN r14457.
2007-04-21 20:35:01 +00:00
Josh Hursey
8f119d9063 Closes trac:977
Fix for memory corruption in the restarted process stack. This stemed from 
the brute force method we were previously using. This commit fixes this by
using a lighter weight solution focused in the r2 BML instead of above the PML.
This is a more efficient and flexible solution, and it solves the original
problem.

In the process I pulled out the ft_event function in the tcp BTL and r2 BML
into a set of *_ft.[c|h] files just to keep any updates to these code paths
as isolated as possible to make merging easier on everyone.

This commit was SVN r14371.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855

The following Trac tickets were found above:
  Ticket 977 --> https://svn.open-mpi.org/trac/ompi/ticket/977
2007-04-14 02:06:05 +00:00
Jeff Squyres
51f286d737 Just like r14289 on the ORTE trunk:
Per discussions with Brian and Ralph, make a slight correction in
where components are installed. Use $pkglibdir, not $libdir/openmpi,
so that when compiled in the orte trunk, components are installed to
the right directory (because the component search patch is checking
$pkglibdir).

This commit was SVN r14345.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r14289
2007-04-12 11:19:42 +00:00
Josh Hursey
98fb9f26ef Some cleanup.
- Remove an old comment from crcp_base_fns.c
- Let ob1 have its very own ft_event function (which I'll fill in shortly)
- Make sure ob1 finalizes the bsend stuff so we don't leave a bunch of memory sitting around
- PML base - destruct the array upon finalize. Shrink the include search so it stops after finding a match

This commit was SVN r14222.
2007-04-05 13:52:05 +00:00
Josh Hursey
a8918fe3d5 pedantic cleanup. Switch loop to lowest rank sends first
This commit was SVN r14215.
2007-04-04 14:23:45 +00:00
Josh Hursey
51daa15f9c play a bit nicer with references.
This commit was SVN r14201.
2007-04-02 22:27:52 +00:00
Josh Hursey
5ff1c10e70 minor cleanup
This commit was SVN r14199.
2007-04-02 20:39:36 +00:00
Josh Hursey
b0b91a5fde A couple more fixes for async case.
Mostly working again, 1 small bug I'm still tracking.

This commit was SVN r14198.
2007-04-02 20:00:58 +00:00
Josh Hursey
71937c3eaf A bit of cleanup for async case... Still one bug in there.
This commit was SVN r14197.
2007-04-02 19:25:22 +00:00
George Bosilca
f2a6b9394f Deal with the include spree. Protect "environ" on Windows.
Some others minors modifications in order to make it
compile [again] on Windows.

This commit was SVN r14188.
2007-04-01 16:16:54 +00:00
Josh Hursey
3492fdeae3 Fix a couple of compiler warnings (errors?) caught by ICC testing at Cisco.
This commit was SVN r14080.
2007-03-20 14:12:13 +00:00
Josh Hursey
d03073e87d Make sure to protect the finalize call so tools like ompi_info
do not segv.

This commit was SVN r14054.
2007-03-17 19:47:54 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00