1
1

324 Коммитов

Автор SHA1 Сообщение Дата
Gleb Natapov
f15fc4ef2f include signal.h for SIGPIPE definition
This commit was SVN r10863.
2006-07-18 09:07:53 +00:00
Brian Barrett
2185c059e8 * use opal_free_list_item_t as the type of items stored in an opal_free_list_t,
rather than assuing it's an opal_list_item_t.

This commit was SVN r10860.
2006-07-17 21:51:50 +00:00
Jeff Squyres
82161d20ca Catch a SIGPIPE and allow it to be harmless. Register a no-op SIGPIPE
handler before the write() and de-register it afterwards.  Determine
if the write() succeeded or failed by the return of write().

This commit was SVN r10858.
2006-07-17 21:15:56 +00:00
George Bosilca
33a7634009 Silence the compiler.
This commit was SVN r10851.
2006-07-17 17:13:28 +00:00
Brian Barrett
dfa1221c3b * AC_CONFIG_LINKS has a minor problem in that it always uses ln -s, rather
than $(LN_S).  This causes problems with with Windows and probably
  elsewhere (re: #200).  So use a slightly different trick to get the
  right header selected for the MEMCPY and TIMER components.

* Using the same trick used to solve the AC_CONFIG_LINKS problem, 
  stop using a separate header file for direct calling in the
  PML and MTL.  This lets me remove some icky code in ompi_mca.m4
  that was more fragile than I really liked.

This commit was SVN r10841.
2006-07-16 04:23:52 +00:00
Ralph Castain
cef1ce19d6 Restore the "sleep" delay during startup.
Since Jeff and I are going to a branch for T-bird, we have restored the trunk to its prior state to avoid any possibility of disturbing it.

This commit was SVN r10774.
2006-07-12 22:18:53 +00:00
Ralph Castain
9102b5af3b Remove the "sleep" delay in the oob connection procedure. This shouldn't cause any problems, especially for launches of less than 1000 processes.
Please report any abnormal behavior during launch, though, as we would like to understand what (if any) impact is seen. I couldn't see any on small jobs (the modulo functions render this number down pretty low).

This commit was SVN r10763.
2006-07-12 20:31:30 +00:00
Brian Barrett
4b70bb92db * Per ticket #112, localhost checks should check against 127.0.0.1/8, rather
than just 127.0.0.1.

This commit was SVN r10750.
2006-07-11 20:54:49 +00:00
Tim Woodall
0a56067509 Correction to resolve a problem related to partial reads. We were making a
copy of the receive buffer based on the iovec struct that may have been updated 
during partial reads to reflect the current offset. Need to make the copy using 
the base address of the buffer.

Thanks to Sven Stork for finding this.

This should be backported to 1.0.X and 1.1.X branches.

This commit was SVN r9749.
2006-04-27 14:27:02 +00:00
Tim Woodall
3e57a4ec48 remove debug code - not required
This commit was SVN r9715.
2006-04-25 19:05:57 +00:00
Brian Barrett
f37a77dd08 * Fix potential deadlock when mpi threads are enabled and progress threads are
not.  See lengthy comment in the body of commit.

This commit was SVN r9573.
2006-04-07 18:13:35 +00:00
George Bosilca
ca75ff2569 In the case we have support for threads, then the opal library have it's own
thread, which will do progress independently of MPI. So in this case we 
have to call opal_event_loop instead of opal_progress.

This commit was SVN r9551.
2006-04-06 14:31:38 +00:00
Brian Barrett
7408de0bfb When progress threads are enabled, opal_progress() doesn't call the
event library (since the event library has its own thread).  So when
we are using progress threads, we really want to call opal_event_loop()
and not opal_progres(). 

This commit was SVN r9549.
2006-04-06 12:58:09 +00:00
George Bosilca
50b5a02f8b Let the oob to call opal_progress instead of opal_progress_event. Now, the MPI
communications will be advanced in MPI_Finalize.

This commit was SVN r9442.
2006-03-28 22:09:40 +00:00
Brian Barrett
3e2c51dea8 * fix some silly commenting done by a previous developer that are good for
a laugh but probably not good for usability ;)

This commit was SVN r9253.
2006-03-11 03:09:24 +00:00
Brian Barrett
285581dff2 More endian-related cleanups:
- moved hton64 and ntoh64 from the bunch of places it had been copied
    into one header file
  - properly set and use the btl_tcp's nbo option to put things in
    network byte order on the wire if both sides don't have the same
    endianness
  - Put the OB1 PML's headers (with a couple exceptions I need to discuss
    with Tim) in network byte order on the wire if both sides don't have
    the same endianness
  - since it was needed for the TCP BTL, move the orte_process_name_t
    HTON and NTOH macros from the TCP OOB to ns_types.h

This commit was SVN r9145.
2006-02-26 00:45:54 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Ralph Castain
4b9f015c0b Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list.
This commit was SVN r8912.
2006-02-07 03:32:36 +00:00
Jeff Squyres
b2de55d72e Back out some debugging stuff from a careless r8643 commit (only
intended to include the OMPI_DEBUG_ZERO call).

These debugging statements should not have affected correcteness
because the value of 78 will be overridden in the read() and the
assert()/abort() stuff will only be triggered on an error which should
never happen (i.e., the error should have been handled by the prior if
conditional).  But still, thise code should not be there.

This commit was SVN r8649.

The following SVN revision numbers were found above:
  r8643 --> open-mpi/ompi@a6b869ed68
2006-01-05 14:44:10 +00:00
Jeff Squyres
a6b869ed68 Avoid a false positive in bcheck
This commit was SVN r8643.
2006-01-04 22:29:09 +00:00
George Bosilca
7d8d516a4a A bunch of fixed for Windows support.
- protection with __WINDOWS__ and not WIN32 or _WIN32
 - protect all the headers

This commit was SVN r8463.
2005-12-12 20:04:00 +00:00
Jeff Squyres
31336e4773 Add some missing headers / correct one installation directory
This commit was SVN r8408.
2005-12-08 04:00:52 +00:00
Jeff Squyres
6fbd321442 Fix a bunch of install locations for header files
This commit was SVN r8406.
2005-12-08 00:54:44 +00:00
Brian Barrett
bc4d3d6fff IRIX compile fixes:
- Need to make sure that SIZE_MAX exists as a constant if stdint.h
    doesn't exist
  - struct timeval is defined in unistd.h on IRIX, so need to include
    that headerfile where ever struct timeval is used.

This commit was SVN r8361.
2005-12-01 18:28:20 +00:00
Brian Barrett
20cea60b82 * fix "make distclean" error in PML
* turns out (duh!) that there was a reason that the <projectdir>dir
  variable was set in the AM conditional.  If not, stupid directories
  are created and not needed...  duh.

This commit was SVN r8205.
2005-11-20 07:41:09 +00:00
Brian Barrett
8faa1884f0 * The last of the build system optimizations. Combine the component and
component/base Makefile.am files, reducing the time configure spends
  stamping out Makefiles at the end
* Install base_impl.h file when devel-headers are being installed

This commit was SVN r8200.
2005-11-20 01:03:01 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Tim Woodall
cf5c27c1e3 start all of the sends in parallel (from the same buffer) - wait for
all to complete

This commit was SVN r7935.
2005-10-31 16:21:51 +00:00
Tim Woodall
a891db81e9 set socket options to improve oob performance
This commit was SVN r7934.
2005-10-31 16:21:11 +00:00
Tim Woodall
b60bea9ada dont allow callbacks to processed recursively - appear to be blowing away the stack
This commit was SVN r7862.
2005-10-25 13:48:08 +00:00
Tim Woodall
88c7fd9f8d add support for a "persistent" non-blocking receive
doesn't require a re-registration on every receive

This commit was SVN r7822.
2005-10-20 22:06:11 +00:00
Tim Woodall
cea599a274 back out prior change - investigate an alternate approach
This commit was SVN r7821.
2005-10-20 17:49:13 +00:00
Tim Woodall
56983d3e7f Don't invoke non-blocking recv callbacks when recv is posted. Otherwise,
this can result in recursive callbacks and extremely long call chains

This commit was SVN r7817.
2005-10-20 15:07:06 +00:00
Brian Barrett
1302cb4072 The next in a long line of crazed build system changes from Brian. This was
originally suggested by Ralf Wildenhues, to try to speed autogen, configure,
and make (and possibly even make install).  Use automake's include directive
to drastically reduce the number of Makefile files (although the number of
Makefile.am files is the same - most are just included in a top-level
Makefile.am).  Also use an Automake SUBDIRs feature to eliminate the
dynamic-mca tree, which was no longer really needed.  This makes adding
a framework easier (since you don't have to remember the dynamic-mca
tree) and makes building faster (as make doesn't have to recurse through
the dynamic-mca tree)

This commit was SVN r7777.
2005-10-17 00:21:10 +00:00
Tim Woodall
3280f6e655 add facility to receive callback on disconnection from peer
This commit was SVN r7650.
2005-10-06 19:39:20 +00:00
Josh Hursey
c11ba09655 Remove the progress engine stuff from abort. This was causing
some orted's to stall on locks in the MPI Dynamics cases. Since it
is not essentual that we call these functions, they can so away.

Unlock the peer lock when aborting. This causes a potential deadlock
in do_waitall [see comment in code]. This was causing orteds to
deadlock at times when the seed had terminated. With proper interleaving
and timing the orted was deadlocking. This seems to have fixed this in 
my stress testing with MPI 2 Dynamics.

This commit was SVN r7539.
2005-09-29 05:04:43 +00:00
Andrew Friedley
555ae37255 Add lib{opal,orte,mpi}.la to appropriate LIBADD's, some whitespace cleanup as well.
This commit was SVN r7477.
2005-09-22 12:28:54 +00:00
George Bosilca
193120d434 In the case where we we have to subscribe to get information about the peer. As we call this function
with the mutex locked and as this function will call oob_send which will call the lookup again
... we will deadlock as the mutex is already lock. The solution is to release the mutex before
going into the subscription. Then of course the logic to remote the item when something went
wrong with the subscrition is a little bit more complex.

This commit was SVN r7429.
2005-09-19 15:59:46 +00:00
Tim Woodall
09869daf8e from the list of addresses exported by the peer, attempt to
pick an address on the same subnet. if non are found, give
up and try them in order

This commit was SVN r7426.
2005-09-19 14:47:11 +00:00
Josh Hursey
9d5af5f926 As Tim pointed out we don't want to call orte_finalize in orte_abort.
However we do want to do a bit of cleanup on the node before we exit,
specificly clean out the session directory. I also had a couple of the
subsystems that don't depend upon peers (which is key) clean up as well.

Pedantic formatting issue in oob_tcp.h

This commit was SVN r7387.
2005-09-15 17:13:13 +00:00
Tim Woodall
d9c5245269 change subscription to request pre-existing values
for jobids other than ourself - mpi2 dynamic(s)

This commit was SVN r7335.
2005-09-13 03:52:39 +00:00
George Bosilca
361ff6640f Correct the progress thread function name from opal_progress_thread to opal_event_progress_thread.
This commit was SVN r7300.
2005-09-11 20:04:40 +00:00
Brian Barrett
ed56e743b7 * update configure.ac to use the modern version of AC_INIT and
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
  number to be set at autoconf time (instead of at configure time, as
  it was before).  Set the version number, minus the subversion r number,
  at autoconf time.  Override the internal variables to include the r
  number (if needed) at configure time.  Basically, the right thing
  should always happen.  The only place it might not is the version
  reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
  them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
  in the directory containing source files, even if the Makefile.am is
  in another directory.  This should start making it feasible to
  reduce the number of Makefile.am files we have in the tree, which
  will greatly reduce the time to run autogen and configure.

This commit was SVN r7211.
2005-09-07 05:54:53 +00:00
Josh Hursey
78da530fd2 Fix a bug that Tim highlighted in which orted coredumps when an orterun is
CTRL-C'd. 
We were calling orte_finalize recursively which caused a segv when it tried to 
use a freed framework (orte_rmgr in this case).

I added a status flag to orte_universe_info to indicate where we are in the code.
This was needed to determine if we should call orte_abort or not when shutting
down in the tcp oob.

This commit was SVN r7160.
2005-09-02 21:07:21 +00:00
Ralph Castain
76e622a552 Clean up a few memory leaks - more to go...
This commit was SVN r7134.
2005-09-01 17:38:04 +00:00
Jeff Squyres
3962c53e2e - Add to AM_CPPFLAGS $(OPAL_LTDL_CPPFLAGS) where necessary in order to
add a -I to find the included ltdl.h (vs. a system-installed ltdl.h)
- Clean up kruft in a bunch of Makefile.am's to remove now-unnecessary
  AM_CPPFLAGS settings to get static-components.h for each framework
- Move the component_repository API functions out of opal/mca/base/base.h
  and into opal/mca/base/mca_base_component_repository.h in order to
  decrease unnecessary dependencies (e.g., before this, almost
  everything in the tree depended on ltdl.h, which is unnecessary --
  only a small number of files really need ltdl.h)

This commit was SVN r7127.
2005-09-01 12:16:36 +00:00
Ralph Castain
96f4bb7a63 Hey, sports fans!! Guess what??
Here's the huge registry check-in you've all been waiting for with baited breath. The revised version sends a single message to all processes at the various stage gates, thus making the startup much more scalable. I could provide you with all the tawdry details, but won't for now - you are welcome to ask, though, and I'll merrily bore your ears to tears.

In addition, the commit contains the following:

1. set the ignore properties on ompi/debuggers and orte/mca/pls/poe

2. Added simplified subscribe and put functions to the registry's API. I have also converted all of the ompi functions that registered subscriptions to the new API, and caught their associated put's as well.

In a follow-on commit, I'll be adding support for George's hetero arch registry subscription (wanted to get this one in first).

This commit was SVN r7118.
2005-09-01 01:07:30 +00:00
Tim Woodall
35f96af472 non-destructive read of buffer
This commit was SVN r7114.
2005-08-31 21:21:54 +00:00
Brian Barrett
bf8a3632bb * bunch more memory leak / block in use fixes
This commit was SVN r7085.
2005-08-29 21:35:01 +00:00
George Bosilca
5b59ffbe4f Handle multiple IP addresses for the OOB TCP module. We check the addresses in order, and we give up if
and only if all of them failed.

This commit was SVN r7067.
2005-08-27 17:03:19 +00:00