1
1
Граф коммитов

4612 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
54a481cc14 Fix an incorrect free...
This commit was SVN r5724.
2005-05-16 21:06:09 +00:00
Ralph Castain
89b6a97f0f Bring the resource discovery system's resource file component online so I can find the node I need to launch upon. I removed all reference to the xml library that was causing trouble, and wrote my own limited xml parser instead, so this will now compile just fine anywhere.
Need to do some refining of the component, but it meets basic requirements right now. Nobody else should notice any change - system basically ignores it unless you tell it to do something.

This commit was SVN r5723.
2005-05-16 21:01:09 +00:00
George Bosilca
2c4209f4cb Add the output utilities include.
This commit was SVN r5722.
2005-05-16 19:36:12 +00:00
Ralph Castain
a0393e9cb9 Fix a malloc/free problem when user doesn't specify a name for the universe. No impact on what George is seeing - still looking into that with Tim.
This commit was SVN r5721.
2005-05-16 18:58:22 +00:00
Brian Barrett
09057fe311 * massive cleanup of debugging output to make it much easier to match
messages
* use different event queues for send / recv, part of moving towards dealing
  with dropped fragments

This commit was SVN r5719.
2005-05-15 21:05:00 +00:00
Brian Barrett
ac7b97a0d9 * convert to an array of event handles - still of size 1 - as prep work for
adding event queues for dropped fragments and retransmit requests

This commit was SVN r5718.
2005-05-13 18:36:15 +00:00
Brian Barrett
5d57956a02 * rename event queue flag to prepare for more event queues
This commit was SVN r5717.
2005-05-13 18:27:27 +00:00
Brian Barrett
66a0c49e2b Run the event loop once after adding a new signal event to the event library. Events are only processed at the start of an event loop (not at event add), so there was a window of time between event_add() and event_loop() for the signal event in which the event existed, but was not active. During this window, signals that should have triggered a callback could be lost.
Reviewed by Jeff and Tim.

This commit was SVN r5715.
2005-05-13 17:59:36 +00:00
Jeff Squyres
72f86297c8 Somehow this functionality got lost over time: when a process aborts,
orterun should abort the rest.  Reviewed by Brian.

This commit was SVN r5713.
2005-05-13 17:52:50 +00:00
Josh Hursey
46810fd155 Some fixes to get the subset of mca directories compile under Windows.
Added a special case under the win_makefile for the gpr/replica directory
since it contains multiple dependant layers of directories.

Added a couple of OMPI_DECLSPECs. Change a conflicting variable name in
gpr_replica_dict_tl.c from 'new' to 'new_dict'.

This commit was SVN r5712.
2005-05-13 16:55:14 +00:00
Josh Hursey
f176f85e55 Fix a couple of library checks, and some Windows related code
This commit was SVN r5711.
2005-05-13 15:05:07 +00:00
Jeff Squyres
c12d5c8c88 While waiting for fortran compiles...
Fixes for orterun in handling different MCA params for different
processes (reviewed by Brian):
- By design, if you run the following:
    mpirun --mca foo aaa --mca foo bbb a.out
  a.out will get a single MCA param for foo with value "aaa,bbb".
- However, if you specify multiple apps with different values for the
  same MCA param, you should expect to get the different values for
  each app.  For example:
    mpirun --mca foo aaa a.out : --mca foo bbb b.out
  Should yield a.out with a "foo" param with value "aaa" and b.out
  with a "foo" param with a value "bbb".  
- This did not work -- both a.out and b.out would get a "foo" with
  "aaa,bbb".
- This commit fixes this behavior -- now a.out will get aaa and b.out
  will get bbb.
- Additionally, if you mix --mca and and app file, you can have
  "global" params and per-line-in-the-appfile params.  For example:
    mpirun --mca foo zzzz --app appfile
  where "appfile" contains:
    -np 1 --mca bar aaa a.out
    -np 1 --mca bar bbb b.out
  In this case, a.out will get foo=zzzz and bar=aaa, and b.out will
  get foo=zzzz and bar=bbb.
Spiffy.

Ok, fortran build is done... back to Fortran... sigh...

This commit was SVN r5710.
2005-05-13 14:36:36 +00:00
Jeff Squyres
0fe7168823 Add missing header file.
This commit was SVN r5709.
2005-05-13 12:36:05 +00:00
Brian Barrett
f64e52a28d * more refactoring to reduce duplicate code
This commit was SVN r5708.
2005-05-13 04:04:08 +00:00
Brian Barrett
a242d5ad4f * Add OMPI_OUTPUT_VERBOSE macro that (like OMPI_OUTPUT) is a no-op when
debugging is disabled.
* convert one more output to only happen when debugging is enabled

This commit was SVN r5707.
2005-05-13 03:01:02 +00:00
Brian Barrett
a7fd494448 * start cleaning up output statements
* start refactoring duplicate code into inline functions (probably will
  have to become macros, but not until debugging is done)
* general code cleanup

This commit was SVN r5706.
2005-05-13 02:54:06 +00:00
Ralph Castain
fdfe457578 Bring in the remote launch changes. This still isn't fully functional, but impacted a few other places that were worth fixing.
1. Added a new function to launch head node processes on remote nodes.

2. Added new tool "orteprobe" that checks to see if a daemon is running on a node. If so, it reports the contact info back to the requestor. If not, it will (eventually - but not now) fork/exec a daemon on the node, report the contact info back to requestor, and then die.

3. Modified orted to handle universe name parameters, and added separate command line flags for debugging the daemon and saving daemon debugging output in a file. The "debug" flag now turns on the runtime debug info instead of the daemon debug - thus, you can now just get daemon debug info if you like.

4. Fix the dps to handle zero length strings correctly.

5. Modify the fork and rsh launchers to pass required environmental variables to the daemons and processes

6. Pulled the redirection of stdin/stdout/stderr for the daemon out of orted and put it into the daemon_init function to simplify orted logic.

7. Modified sys_info to correctly deal with passed mca param

8. Modified univ_info to parse incoming universe location information.

This commit was SVN r5705.
2005-05-12 21:44:23 +00:00
Brian Barrett
0c6eaaebe3 * start cleaning up debugging output (still much to do)
* make buffers really big so that we pass allocmem until we figure out
  why we're not flow controlling as I expected
* set event queue to invalid intially and use that as the enabled test
  rather than a seperate bool - shrinks the module a bit
* add dropped count checks, with a panic if one occurs.  Still need to
  implement some type of retransmit logic.

This commit was SVN r5704.
2005-05-12 21:28:48 +00:00
Brian Barrett
e2c2c72b84 Changes to pass allocmem IBM test
- don't free the send buffer unless the converter tells us we need to
  - properly do the math to determine when the receive buffer has been
    fully used and unlinked itself

This commit was SVN r5703.
2005-05-12 19:52:51 +00:00
Jeff Squyres
f5657fb8ee For the rsh pls, if the launch is on the local node, just exec it --
don't bother using the launching agent (typically rsh or ssh).

This commit was SVN r5702.
2005-05-12 19:12:53 +00:00
Jeff Squyres
544f9dd780 Fix silly string error (missing +2 in the len calculation, so just
replace it with asprintf).  Reviewed by Brian.

This commit was SVN r5700.
2005-05-12 18:56:05 +00:00
Jeff Squyres
f96d763aa7 /trunk is working towards 1.0
/branches/v0.9 is working towards 0.9

This commit was SVN r5699.
2005-05-12 17:56:42 +00:00
Brian Barrett
189a536685 * Fix incorrect logic in orted so that --no-daemonize works as intended
* Minor formatting fixes in XGrid RAS component
* Code cleanup in XGrid PLS component:
  - If we can't get daemon contact information, kill the job at the XGrid
    level
  - Add MCA parameter pls_xgrid_delete_job that will delete the job from
    XGrid when complete (this seems like standard behavior, so it's the
    default)
  - Remove compiler warning about getting the name of a XGGrid object
  - Properly populate the daemon information for the killing code

This commit was SVN r5697.
2005-05-12 16:48:41 +00:00
Josh Hursey
4b60235383 remove unnecessary exclusion for Windows which was killing the Windows nightly build
This commit was SVN r5695.
2005-05-12 14:37:40 +00:00
Brian Barrett
decc74d15c * Enable the XGrid components. Only do anything if the XGrid contact info
variables are set.
* show the RAS priority in ompi_info

This commit was SVN r5694.
2005-05-12 03:33:59 +00:00
Jeff Squyres
722ee2103b Fix to the fix -- Brian and I agree that this is a better fix.
This commit was SVN r5693.
2005-05-12 02:44:20 +00:00
Jeff Squyres
3bd7e72608 Fix some minor typos
This commit was SVN r5692.
2005-05-12 02:22:03 +00:00
Jeff Squyres
a963aebfdb Fix the check for socklen_t.
This commit was SVN r5691.
2005-05-12 01:42:24 +00:00
George Bosilca
4ef1d70034 snprintf does not really do what we expect. In some situations it will write
more than we have asked for (on my G5). Anyway now I hope I have enought memory to printout
the full description of the datatype.

This commit was SVN r5690.
2005-05-11 21:30:56 +00:00
Brian Barrett
c477907166 * ignore UNLINK messages earlier in the chian (if Portals supports them)
* process long message fragments properly

This commit was SVN r5689.
2005-05-11 20:22:18 +00:00
Josh Hursey
cc6cb5cac5 Checkpoint on Windows build.
Many changes to headers for OMPI_DECLSPEC, and 
proper placement of c_plusplus defines in those files.

mca/gpr/replica and tools are the two sets of directories
that still need work for the Windows build for this pass.

This commit was SVN r5688.
2005-05-11 20:21:10 +00:00
George Bosilca
6714dfac4e Remove all useless checks that size_t is greater or equal to zero.
This commit was SVN r5687.
2005-05-11 14:19:48 +00:00
George Bosilca
e940ab43b8 Optionally disable the tests.
This commit was SVN r5686.
2005-05-11 14:17:01 +00:00
Thara Angskun
55538e100d - another check point
- able to launch job (sort of)...but not correctly clean up, mpirun hang, blowup terminal etc.

This commit was SVN r5685.
2005-05-11 09:09:55 +00:00
George Bosilca
1a9cef70fb checkpoint
This commit was SVN r5684.
2005-05-11 06:30:10 +00:00
George Bosilca
9bd4110bb5 If we check for errors then let's check all of them.
This commit was SVN r5683.
2005-05-11 04:26:14 +00:00
George Bosilca
f9ae5a282e We are here in a macro. The arguments of the macro should be protected. Otherwise the compiler
will get confused with the precedence of the operators.

This commit was SVN r5682.
2005-05-11 04:23:47 +00:00
George Bosilca
f0adb8b4fd Adapt to the new PML interface.
This commit was SVN r5681.
2005-05-11 03:59:57 +00:00
Jeff Squyres
2dbcf1a1e5 Fix issue raised by Rainer: if we don't find a corresponding C type
for an optional fortran type, it's not an error.  Instead, just
disable support for that fortran optional type.

This commit was SVN r5680.
2005-05-10 23:57:37 +00:00
Jeff Squyres
bc6f5a83c4 Fix a few more header installation directories
This commit was SVN r5679.
2005-05-10 23:56:23 +00:00
Jeff Squyres
6bda1ed699 Fix installation directory for the header files
This commit was SVN r5678.
2005-05-10 23:52:35 +00:00
Jeff Squyres
2b2f2f3c04 Fix a bunch of compiler warnings, mostly on 64 bit:
- some union { void*; int; } fixes for asm tests
- size_t / %lu fixes for a bunch of others

This commit was SVN r5677.
2005-05-10 23:28:31 +00:00
Jeff Squyres
6a85d9bf74 No need for this anymore.
This commit was SVN r5676.
2005-05-10 20:55:09 +00:00
Brian Barrett
caf8551001 * checkpoint - long messages are causing segfaults in the PML, but need to
stop for a bit

This commit was SVN r5675.
2005-05-10 20:42:57 +00:00
Brian Barrett
2ec27c0927 * add ability to respond to RNDV packets with ACKs. short MPI_Ssends now
work properly.  Still need to implement second fragment support

This commit was SVN r5674.
2005-05-10 19:39:21 +00:00
George Bosilca
956782670a One more printf with size_t solved.
This commit was SVN r5673.
2005-05-10 17:25:56 +00:00
Jeff Squyres
f8b1e19076 - Add a few help messages
- app->num_procs changed to a size_t, which hosed the initialization
  of its value to -1 (not sure why the compiler didn't complain
  #$%@#$%), which was there to catch the case when the user forgot to
  specify -np (or some other equivalent).  Fixed.

This commit was SVN r5672.
2005-05-10 17:14:53 +00:00
Josh Hursey
5d1e2c53b0 fix some library path issues
This commit was SVN r5671.
2005-05-10 16:16:36 +00:00
Brian Barrett
0cd4d15824 * Update to match Tim's changes to the PML
* Couple of improvements towards handling ACKs properly

This commit was SVN r5670.
2005-05-10 15:53:41 +00:00
Brian Barrett
eeba1b9a72 * re-enable making DSOs of the TEG PML
This commit was SVN r5669.
2005-05-10 14:56:45 +00:00