1
1
Граф коммитов

100 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
37dfdb76eb Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done.
This commit was SVN r11661.
2006-09-14 21:29:51 +00:00
Brian Barrett
e39f0096a0 * add header file to sources list so make dist works
This commit was SVN r11357.
2006-08-23 13:31:56 +00:00
George Bosilca
c03ef692c1 And the missing header.
This commit was SVN r11348.
2006-08-23 03:33:35 +00:00
George Bosilca
f52c10d18e And ORTE is ready for prime-time. All Windows tricks are in:
- use the OPAL functions for PATH and environment variables
- make all headers C++ friendly
- no unamed structures
- no implicit cast.

Plus a full implementation for the orte_wait functions.

This commit was SVN r11347.
2006-08-23 03:32:36 +00:00
Ralph Castain
c3ba1c1cc1 Fix a pack/unpack mismatch
This commit was SVN r11315.
2006-08-22 13:50:59 +00:00
George Bosilca
6afa4c6c64 Windows friendly version. We have to split the OMPI_DECLSPEC in at least 3
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.

This commit was SVN r11270.
2006-08-20 15:54:04 +00:00
Ralph Castain
8c7f0ed9ae Change the SOH to the new State Monitoring and Reporting (SMR) framework. New API's will be appearing in the new framework shortly - this just gets the name change into the system.
Other changes:

1. Remove the old xcpu components as they are not functional.

2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.

This will require an autogen/configure, I'm afraid.

This commit was SVN r11228.
2006-08-16 16:35:09 +00:00
Ralph Castain
5dfd54c778 With the branch to 1.2 made....
Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced).

Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up).

I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t).

In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but...

Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems.

This commit was SVN r11204.
2006-08-15 19:54:10 +00:00
Ralph Castain
404acc9f65 It's okay to call index prior to anything being put in the registry...
This commit was SVN r10848.
2006-07-17 14:31:42 +00:00
Ralph Castain
574a6f7896 Fix a bug that caused the system to crash when asked for an index of the segment names. Such a request required passing a NULL value for the segment name, but the find_seg function didn't protect itself from that value.
Thanks to James Kennedy (UCC-Ireland) for finding it.

This commit was SVN r10847.
2006-07-17 13:51:07 +00:00
Ralph Castain
7b3ced80e8 Fix a bug that has been causing inconsistent behavior on a number of platforms. Will explain more on the core-devel list.
Jeff: this needs to be back-patched to our supported prior releases. I'll try to verify how far back we need to go - my initial guess is probably all of them

This commit was SVN r10801.
2006-07-14 14:16:20 +00:00
Brian Barrett
5fed99c2c2 Sending SIZE_MAX from machines with different sizeof(size_t) causes big problems,
as the smaller machine's SIZE_MAX won't be SIZE_MAX on the bigger machine, which
can lead to failures along the way -- in this case, with GPR triggers being
improperly fired.

This commit was SVN r9776.
2006-04-28 21:09:42 +00:00
Brian Barrett
e737b0a106 Fix a bunch of warnings the Sun compilers find:
- The constant 1 is a signed int by default.  Explicitly say that
    it is an unsigned value so we can't overflow
  - Fix unreachable statement warnings in dss_arith by breaking out
    of switch statements instead of returning - this should have
    no impact on performance, since it's a non-conditional jump
  - A couple of the GPR files had carriage returns and were in
    DOS mode - put them in unix mode...

These should all probably go to the v1.1 branch...

This commit was SVN r9664.
2006-04-20 15:35:58 +00:00
Ralph Castain
b9bdb2125e Fix and upgrade the console to support better debugging. Activate "dump" commands to display registry content. Remove the blasted opal_output default prefix that made the dump output illegible. Properly connect to existing daemons and/or start new ones.
This commit was SVN r9528.
2006-04-04 11:05:52 +00:00
Brian Barrett
2c64ab562e More fixes to try to get Red Storm port going again....
* Add a platform spec for using the portals reference implementation's
  RTE instead of our own to make local testing easier.
* Add a cnos rmgr component so that 1) we don't have to build nearly
  as many components (no need for ras,rds,pls,etc.) and 2) calls
  to MPI_ABORT() won't print error messages about not being able to
  contact the daemon.  Still need to fill in some of the terminate
  stuff with calls from cnos, but will come in time.
* Make gpr_null use the base code for creating value and keyval
  structures so that we don't segfault in ompi_mpi_init().

This commit was SVN r9510.
2006-04-01 04:54:46 +00:00
Brian Barrett
6be35fb604 * Use the ORTE_<type> constants instead of internal DSS_TYPE_<type>_T constants
for the type to be packed / unpacked when dealing with sized types (like
  size_t) so that the dss_unpack code to deal with types of different sizes is
  activated.  Necessary for proper 32/64 interoperability.

This commit was SVN r9475.
2006-03-30 14:33:25 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
George Bosilca
3bb2eadfaa Do not let them uninitialized.
This commit was SVN r8916.
2006-02-07 06:06:58 +00:00
George Bosilca
dda0e4182f Remove unused variables
Add required include files (stdio.h for NULL definition).
Make it compile on MAC OS 10.3.

This commit was SVN r8914.
2006-02-07 05:41:31 +00:00
Ralph Castain
4b9f015c0b Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list.
This commit was SVN r8912.
2006-02-07 03:32:36 +00:00
George Bosilca
bf266c6109 Rollback the 8682 commit until we figure out the correct way to do it. It break several things
inside (like MPI_Wait* functions).

This commit was SVN r8686.
2006-01-13 22:02:40 +00:00
Rainer Keller
95f886b6ab - Protect callers of opal/ompi_condition_wait from spurious wakeups,
possible when with building with pthreads.
   Compiled on Linux ia32 with and without
   --enable-progress-threads

This commit was SVN r8682.
2006-01-12 17:13:08 +00:00
George Bosilca
32cecc5798 Change ERROR to subscribe_error because ERROR is predefined on Windows. I didn't spend
to much time tracking that down, I just know that cl.exe will replace it with the 
"constant" string ...

This commit was SVN r8449.
2005-12-11 06:23:07 +00:00
Jeff Squyres
3f27e61de6 Fix location of installed header files
This commit was SVN r8403.
2005-12-08 00:04:19 +00:00
Brian Barrett
bc4d3d6fff IRIX compile fixes:
- Need to make sure that SIZE_MAX exists as a constant if stdint.h
    doesn't exist
  - struct timeval is defined in unistd.h on IRIX, so need to include
    that headerfile where ever struct timeval is used.

This commit was SVN r8361.
2005-12-01 18:28:20 +00:00
Brian Barrett
20cea60b82 * fix "make distclean" error in PML
* turns out (duh!) that there was a reason that the <projectdir>dir
  variable was set in the AM conditional.  If not, stupid directories
  are created and not needed...  duh.

This commit was SVN r8205.
2005-11-20 07:41:09 +00:00
Brian Barrett
8faa1884f0 * The last of the build system optimizations. Combine the component and
component/base Makefile.am files, reducing the time configure spends
  stamping out Makefiles at the end
* Install base_impl.h file when devel-headers are being installed

This commit was SVN r8200.
2005-11-20 01:03:01 +00:00
Tim Woodall
7f20198d49 Filter the set of data returned to the daemons during
startup using the new get_conditional command to improve
scalability during launch

This commit was SVN r8097.
2005-11-10 16:44:51 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Jeff Squyres
1b691f8089 Pull NULL checks around releasing of resources to ensure we don't
segv.

This commit was SVN r7971.
2005-11-03 11:27:19 +00:00
Jeff Squyres
653f43cc2b Update to latest prototype
This commit was SVN r7970.
2005-11-03 11:23:23 +00:00
Jeff Squyres
60b0330bc1 Initialize "conditions" to ensure we don't segv
This commit was SVN r7961.
2005-11-01 17:13:18 +00:00
Ralph Castain
399e41d113 Fix a potential memory leak...
This commit was SVN r7960.
2005-11-01 15:17:11 +00:00
Jeff Squyres
a2e507c629 Fix potential segv through uninitialized variable
This commit was SVN r7946.
2005-11-01 13:09:00 +00:00
Jeff Squyres
ce78b76598 Quick fix from Ralph -- this escape committing last night.
This commit was SVN r7917.
2005-10-28 14:03:26 +00:00
Ralph Castain
afeeacd76d Complete hookup of the registry proxy for the get_conditional command.
This commit was SVN r7915.
2005-10-28 05:35:07 +00:00
Ralph Castain
ad9de4ca3b Restore the pointer arrays to the registry dictionaries. Revise the system so that the itag is equivalent to the index into the pointer array. It already was, but it wasn't obvious before (several functions relied upon it, but others "hid" the relationship) - now, make it explicitly clear. Set things up so lookups occur at max speed by just indexing into the dictionary array.
This commit was SVN r7912.
2005-10-28 04:56:06 +00:00
Ralph Castain
eebda71a0b Add a new API to the registry for conditional data retrievals. The new API allows you to retrieve data from registry containers that have key-value pairs where the value matches the specified one. The requested keys are then retrived from that container.
This commit was SVN r7907.
2005-10-28 00:30:58 +00:00
Tim Woodall
c0124fecdd changed segment dictionary to hash table to improve
search time for reverse lookup

This commit was SVN r7893.
2005-10-27 17:00:47 +00:00
Tim Woodall
88c7fd9f8d add support for a "persistent" non-blocking receive
doesn't require a re-registration on every receive

This commit was SVN r7822.
2005-10-20 22:06:11 +00:00
Brian Barrett
1302cb4072 The next in a long line of crazed build system changes from Brian. This was
originally suggested by Ralf Wildenhues, to try to speed autogen, configure,
and make (and possibly even make install).  Use automake's include directive
to drastically reduce the number of Makefile files (although the number of
Makefile.am files is the same - most are just included in a top-level
Makefile.am).  Also use an Automake SUBDIRs feature to eliminate the
dynamic-mca tree, which was no longer really needed.  This makes adding
a framework easier (since you don't have to remember the dynamic-mca
tree) and makes building faster (as make doesn't have to recurse through
the dynamic-mca tree)

This commit was SVN r7777.
2005-10-17 00:21:10 +00:00
Ralph Castain
70779fa2ab Cleanup some old logic - nothing major.
This commit was SVN r7712.
2005-10-12 01:12:27 +00:00
Josh Hursey
8ba2900341 fixed a typo, added comments for future work
This commit was SVN r7700.
2005-10-11 20:59:31 +00:00
Ralph Castain
e1244fc160 Fix a few thread-lock things discovered by Josh. The thread locks in the registry's local notify delivery system had not been updated to reflect the design change whereby the xcast uses the notify delivery system. This has now been fixed.
Also revised the callbacks to store and utilize local variables to avoid problems where threads modify the global structures. Not sure this totally fixes the problem, but it's a shot - suggested by Josh (and Jeff, I believe).

This commit was SVN r7694.
2005-10-11 19:35:04 +00:00
Ralph Castain
a47655b3fd Add unlock/lock around the delivery of a local callback to remove thread-lock condition if the callback function attempts to re-enter the registry.
This commit was SVN r7678.
2005-10-10 02:45:50 +00:00
Ralph Castain
6c839048cf Fix a typo that caused valgrind to bark on 64-bit machines. Actually was a potential source of error, so the barking was legit.
This commit was SVN r7677.
2005-10-10 02:34:26 +00:00
Josh Hursey
d39841174d Must release the lock before entering the non blocking recv, since
it is possible that if the receive has been arrived the callback will
be called before recv_buffer_nb() returns. This causes deadlock
as we try to acquire the lock, but already hold it.

This was causing orterun and orteds to stall in certian situations.
Became evident when stress testing dynamics with remote nodes.

This commit was SVN r7543.
2005-09-29 14:24:11 +00:00
Ralph Castain
b589a93e29 Continue to lace the trace functionality into orte...
This commit was SVN r7427.
2005-09-19 15:29:14 +00:00
Josh Hursey
575afef072 Use non blocking sends in orte_gpr_replica_remote_notify.
This fixes one of the race conditions in orterun is sent a kill signal.
Before it would sometimes spin in the OOB waiting for a message to complete
to a peer that was no longer around. Stalling at this level prevented orterun
from noticing that it had received a kill signal.

This commit was SVN r7408.
2005-09-16 15:34:44 +00:00
Jeff Squyres
f4e8fe4817 Arrgh -- stupid mistake on last commit -- accidentally replaced a
LIBADD instead of appending to the existing one.

Also removed some more Makefile.options whitespace, and I think emacs
removed some tabs (i.e., replaced them with whitespace).

This commit was SVN r7399.
2005-09-15 21:37:24 +00:00