1
1
Граф коммитов

3379 Коммитов

Автор SHA1 Сообщение Дата
Mitch Sukalski
1741006a4e first version of memory registration bug fixes: RDMA write target
memory freed on receipt of FIN packet with memory region information
from the sender (mca_ptl_ib_fin_header_t); RDMA write source
memory freed on completion notification from RDMA write...

This commit was SVN r3776.
2004-12-10 20:14:12 +00:00
Brian Barrett
e5605b3e1b * ignore the non-RSH pcms for a day or so while I get the job cleanup for
RSH figured out

This commit was SVN r3775.
2004-12-10 20:03:33 +00:00
Ralph Castain
72e1161605 Clean up the notify message class - missing the ompi_object_t element in structure.
This commit was SVN r3774.
2004-12-10 17:53:41 +00:00
Craig E Rasmussen
a4847ca677 Removed initial junk that was just for testing things out.
This commit was SVN r3773.
2004-12-10 17:28:41 +00:00
Ralph Castain
65f874c556 Fix some memory problems. This will temporarily break comm_spawn again - Tim needs to grab this to take a look at something in connect_accept.
This commit was SVN r3772.
2004-12-10 16:27:09 +00:00
Rich Graham
0d76949743 add #include <sys/stat.h> for the mkfifo() prototype.
This commit was SVN r3771.
2004-12-10 04:32:49 +00:00
Ralph Castain
2ee47e0708 Some cleanup at the end of the comm_spawn work.
Comm_spawn is now fully functional. I'll send out a separate message about some of the problems encountered, and resulting action items.

This commit was SVN r3770.
2004-12-10 02:34:19 +00:00
Tim Woodall
73a9e95816 test for null set of procs
This commit was SVN r3769.
2004-12-10 00:04:00 +00:00
Tim Woodall
04cb323004 - corrections for persistent requests
- sm/mx/tcp/self now pass all intel p2p tests (single threaded)

This commit was SVN r3768.
2004-12-09 23:58:23 +00:00
Rich Graham
29ebe90676 start to debug the non-symmetric shared memory case. Ping-pong
w/o threads runs correctly at this stage.  fixed errors with using
addresses valid in remote memory scope, not local memory scope.

This commit was SVN r3767.
2004-12-09 23:09:06 +00:00
Tim Woodall
1da70cd6a5 removed debug
This commit was SVN r3766.
2004-12-09 21:19:49 +00:00
Ralph Castain
be53d9d720 Comm_spawn stuff
This commit was SVN r3765.
2004-12-09 20:54:31 +00:00
Brian Barrett
cfe68a959b * add missing headers
This commit was SVN r3764.
2004-12-09 19:46:03 +00:00
Ralph Castain
3c29621a4f A little bit further.....transferring to Tim's box for further work.
This commit was SVN r3763.
2004-12-09 17:04:59 +00:00
Mitch Sukalski
02d5381c25 updated IB memory registry support with new public
interfaces for the other IB PTL code

This commit was SVN r3762.
2004-12-09 16:28:38 +00:00
Brian Barrett
13d23efb65 * ignore the elan ptl to save some build time
* remove ignore on the bjs discovery / mapping code.  It seems to work
  correctly
* Fix up svn:ignore properties in the dynamic-mca directory

This commit was SVN r3761.
2004-12-09 15:15:50 +00:00
Jeff Squyres
ce3d8f3812 Part 2 of the .ompi_unignore patch
This commit was SVN r3760.
2004-12-09 12:49:54 +00:00
Ralph Castain
95510d3684 Remove the diagnostic messages from the xcast. Move one step closer on comm_spawn....
This commit was SVN r3759.
2004-12-09 06:32:44 +00:00
Brian Barrett
017d39a633 * Be nice to users and resolve "localhost" or "127.0.0.1" to the current
node, since the default hostfile shipped with Open MPI contains only
  "localhost".  This is performed after all the other checks, so it should
  only get activated as a last-ditch effort.

This commit was SVN r3758.
2004-12-09 06:01:41 +00:00
Brian Barrett
c0f7b4580e * Fix dumb mistake in BJS llm where code was using an uninitialized variable
as loop ending condition
* Add resolver code for a couple of different node formats for BProc.  Given
  hostfiles can now be a combination of notations:

0
n1
2
master
self

This commit was SVN r3757.
2004-12-09 05:06:02 +00:00
Ralph Castain
ed197f0186 More minor changes that continue to make progress on comm_spawn. Nothing significant - no impact on other operations.
PLEASE NOTE: there are some diagnostic messages in oob_xcast that will print out. Please don't have a cow about them - they won't hurt nor injure anyone, and it's just there for a little while to help Tim and I debug a problem. Just didn't want to create yet another MCA parameter to debug 10 lines of code. :-) 

This commit was SVN r3756.
2004-12-09 04:54:37 +00:00
Mitch Sukalski
9b2a2f5247 support for InfiniBand memory registry - registry uses hashed array
for hints, and a ompi_rb_tree_t for all other search operations;
registration occurs only when a different memory range or access
privileges is needed; deregistration is lazy, and only done when
registration fails with VAPI_EAGAIN (no resources...)

This commit was SVN r3755.
2004-12-09 00:45:45 +00:00
Ralph Castain
d21c0027df Well, we are getting closer to resolving the comm_spawn problem. For the benefit of those that haven't been in the midst of this discussion, the problem is that this is the first case where the process starting a set of processes has not been mpirun and is not guaranteed to be alive throughout the lifetime of the spawned processes. This sounds simple, but actually has some profound impacts.
Most of this checkin consists of more debugging stuff. Hopefully, you won't see any printf's that aren't protected by debug flags - if you do, let me know and I'll take them out with my apologies.

Outside of debugging, the biggest change was a revamp of the shutdown process. For several reasons, we had chosen to have all processes "wait" for a shutdown message before exiting. This message is typically generated by mpirun, but in the case of comm_spawn we needed to do something else. We have decided that the best way to solve this problem is to:

(a) replace the shutdown message (which needed to be generated by somebody - usually mpirun) with an oob_barrier call. This still requires that the rank 0 process be alive. However, we terminate all processes if one abnormally terminates anyway, so this isn't a problem (with the standard or our implementation); and

(b) have the state-of-health monitoring subsystem issue the call to cleanup the job from the registry. Since the state-of-health subsystem isn't available yet, we have temporarily assigned that responsibility to the rank 0 process.  Once the state-of-health subsystem is available, we will have it monitor the job for all-processes-complete and then it can tell the registry to cleanup the job (i.e., remove all data relating to this job).

Hope that helps a little. I'll put all this into the design docs soon.

This commit was SVN r3754.
2004-12-08 21:44:41 +00:00
Tim Woodall
8749d0fe39 changed indexing to use smp index
This commit was SVN r3753.
2004-12-08 21:27:48 +00:00
Jeff Squyres
54a4ec5ae6 Fix some harmless compiler warnings
This commit was SVN r3752.
2004-12-08 21:07:12 +00:00
Brian Barrett
dfa211bad2 * fix syntax for anding. must either use test EXP && test EXP
(slightly more portable) or test EXP -a EXP

This commit was SVN r3750.
2004-12-08 18:54:10 +00:00
Brian Barrett
24ed3c9b6f * let me use the BJS llm code
This commit was SVN r3749.
2004-12-08 18:38:51 +00:00
Tim Woodall
606ed364b6 -correct threading build
-allow mx library to assign endpoint number

This commit was SVN r3748.
2004-12-08 18:25:20 +00:00
Jeff Squyres
9e3d914e54 Small patch to prevent fighting with svn replacing a manually-removed
.ompi_ignore file.

If you have an empty .ompi_unignore file (presumably alongside an
.ompi_ignore file), we'll build the component.

If you have a non-empty .ompi_unignore file (presumable alongside an
.ompi_ignore file), and your username can be found in that file, we'll
build the component.  If your username is *not* the the .ompi_unignore
file, we *won't* build the component.

This commit was SVN r3747.
2004-12-08 17:41:29 +00:00
Jeff Squyres
fba18a2b3a Clean up some compiler warnings
This commit was SVN r3746.
2004-12-08 13:52:02 +00:00
Tim Woodall
be7ac2233b use seperate thread rather than event handler
This commit was SVN r3745.
2004-12-07 23:43:24 +00:00
Tim Woodall
538e4cb510 for now - don't poll
This commit was SVN r3744.
2004-12-07 23:41:29 +00:00
Jeff Squyres
7c5cefd592 Clean up some warnings -- always ensure that macros are defined before
we check their values.

This commit was SVN r3743.
2004-12-07 23:04:51 +00:00
Jeff Squyres
c33dd84b51 - Fix some configure.ac tests for figuring out the back-end C type for
MPI_Offset
- Make the ROMIO IO component use MPI_Offset for the back-end type for
  ADIO_Offset
- Removed some extra verbage from configure warnings
- Add some logic to configure to deduce an MPI datatype that
  corresponds to MPI_Offset (because ROMIO needs it).  This is a bit
  of an abuse (i.e., ROMIO's configure should figure this out), but
  it's not too gratuitous because a) the ROMIO component is included
  in Open MPI, and b) other io components to be defined in the future
  could also use this information
- Rename MCA: MPI Component Architecture -> Modular Component
  Architecture

This commit was SVN r3742.
2004-12-07 22:40:37 +00:00
Jeff Squyres
aefb119af8 Properly escape some AM_CONDITIONAL tests
This commit was SVN r3741.
2004-12-07 22:36:03 +00:00
Jeff Squyres
7753253084 Eliminate dead code.
This commit was SVN r3740.
2004-12-07 22:35:35 +00:00
Jeff Squyres
b9f1975141 Eliminating warnings: per Edgar's advice, this code is definitely not
being used.  I added some comments about why it's not being used
(because one would naievely think that increasing / decreasing the
refcount would be a Good Thing for the group constructor /
destructor).

This commit was SVN r3739.
2004-12-07 22:35:10 +00:00
Ralph Castain
e259750296 Clean-up a few things - mostly adding a lot of diagnostics to chase down the comm_spawn issue. Believe this may fix that problem, but still checking that for sure. At least now it starts processes correctly again!
This commit was SVN r3738.
2004-12-07 19:38:42 +00:00
Jeff Squyres
03b9ed1ff5 Make all the f77 prototypes have proper MPI_Offset parameters (i.e.,
the integer size must be the same between Fortran and C).

This commit was SVN r3737.
2004-12-07 19:12:40 +00:00
Brian Barrett
13bf550a0f *checkpoint some bproc pcm changes to move machines
* add first take of mapper component for BJS

This commit was SVN r3736.
2004-12-07 19:03:20 +00:00
Brian Barrett
947f166282 * abort if no bproc so that we don't try to build
This commit was SVN r3735.
2004-12-07 18:23:07 +00:00
Tim Woodall
ea5991f8fa protect creation of global lists
This commit was SVN r3734.
2004-12-07 18:16:26 +00:00
Tim Woodall
63935c0da4 corrected cleanup logic
This commit was SVN r3733.
2004-12-07 18:14:55 +00:00
Brian Barrett
6dc570c7a1 * Implement kill_job and kill_proc code
This commit was SVN r3732.
2004-12-07 15:41:40 +00:00
Tim Woodall
67a2d47a49 dont attempt to progress events for threaded case as this is handled in a seperate thread
This commit was SVN r3731.
2004-12-07 15:39:57 +00:00
Tim Woodall
448c6632ab first cut at support for threaded case - latency is bad
This commit was SVN r3730.
2004-12-07 15:38:01 +00:00
Brian Barrett
d8e1f396a7 * remove the ompi ignore. It's good enough for testing :/
This commit was SVN r3729.
2004-12-07 15:30:29 +00:00
Brian Barrett
88c94f7439 * remove dead code
This commit was SVN r3728.
2004-12-07 15:16:06 +00:00
Brian Barrett
0853d49593 * remove one memory leak and move the setting of BPROC_RANK to where we
know that it is actually useful

This commit was SVN r3727.
2004-12-07 14:34:18 +00:00
Brian Barrett
fceeecd8c5 * process death notification support for the bproc pcm
This commit was SVN r3726.
2004-12-07 06:07:45 +00:00