1
1
Граф коммитов

1374 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
8294f6de03 The portals_utcp component doesn't actually need the POrtals libraries
and only pokes at environment variables.  So don't link in the libraries,
as that causes a whole other set of problems.

This commit was SVN r15899.
2007-08-17 03:48:39 +00:00
Andrew Friedley
2eedcd2539 Fixes trac:1047
Tie stdin to /dev/null to prevent stdin from being closed and thus making stdin not work in slurm allocations.

This commit was SVN r15892.

The following Trac tickets were found above:
  Ticket 1047 --> https://svn.open-mpi.org/trac/ompi/ticket/1047
2007-08-16 20:49:27 +00:00
Tim Prins
5a795128af Change it so that different components in orte use unique rml tags
This commit was SVN r15881.
2007-08-16 14:02:35 +00:00
Brian Barrett
fe0d1f30d5 need errno.h
This commit was SVN r15862.
2007-08-15 02:15:33 +00:00
Brian Barrett
330003361b * Free memory from asprintf
* need to compare ERANGE to errno

This commit was SVN r15860.
2007-08-14 21:12:00 +00:00
Brian Barrett
881dd0654e * Provide a hook so that a PLS can tell the orted it's starting that it
needs to override the default umask.  By default, this is not used
    since most environments do what the user would expect without any
    help.
  * Have TM use the newly added umask hook, so that processes inherit
    the user's umask from mpirun rather than the pbs_mom's umask, which
    the user has no control over.

This commit was SVN r15858.
2007-08-14 18:44:52 +00:00
Shiqing Fan
eea712f9ab - Export those components in correct way.
This commit was SVN r15804.
2007-08-08 16:20:17 +00:00
Brian Barrett
59524a9009 Fix issue where we set state to SHUTDOWN rather than CONNECTING when we
had to switch socket types.

This commit was SVN r15784.
2007-08-06 22:55:41 +00:00
Ralph Castain
eb3a97f428 Don't overwrite the local rank key
This commit was SVN r15776.
2007-08-06 16:56:23 +00:00
Shiqing Fan
d10570786c - A small fix, add missed flag parameters.
This commit was SVN r15774.
2007-08-06 16:15:38 +00:00
George Bosilca
d658a477af Update the help file to match the real name of the required argument.
This commit was SVN r15762.
2007-08-04 00:35:55 +00:00
Josh Hursey
755658694e Bring in changes to support Cray's Compute Node Linux (CNL) and
Application Level Placement Scheduler (ALPS).

This commit was tested under two Cray machines at ORNL: Jaguar (Catamount)
and Rizzo (CNL Test cage). Both machines performed as they should across
the commit.

It is likely that mor changes will follow this the work and environment
stabilizes.

Most of the infrastructure works the same for Catamount and CNL
except for a few bits. Below are the highlights:

Default IFACE Change:
 On Catamount we can use PTL_IFACE_DEFAULT, but on the CNL system we have access
 to will fail on this interface, and should be set to:
    IFACE_FROM_BRIDGE_AND_NALID(PTL_BRIDGE_UK,PTL_IFACE_SS).
 So if we detect that we are running with YOD then use the former interface
 and if we detect that we are running with ALPS then use the latter.
 We will want to pursue a more elegant solution if this interface continues to 
 change across machines.

PtlGetId and cnos_register_ptlid:
 The header suggests that these should never be called when launching with YOD.
 But in the ALPS environment the cnos_barrier() will hang forever if these 
 functions are not called after PtlNIInit(). Since these functions only need to
 be called once, and the orte rmgr/cnos component is loaded before the ompi 
 common/portals componet then just call these functions once in the rmgr/cnos
 component.

cnos_barrier_init():
 This is a noop for YOD, but critical for ALPS. So be sure to call it before
 calling the first barrier in the rmgr/cnos component.

cnos_barrier vs cnos_pm_barrier:
 It is suggested the cnos_pm_barrier only be used during finalization 
 as it will indicate to the launcher (yod or aprun) that the app is about
 to complete. It was suggested that we use the regular cnos_barrier() instead.
 I want to look into this a bit more to make sure there are not adverse
 side effects. A note has been placed in the code to indicate this reasoning.

This commit was SVN r15756.
2007-08-03 19:46:38 +00:00
Jeff Squyres
106beff744 Ahem. Apparently we should be checking for ORTE_EQUAL upon return
from orte_ns.compare_fields(), not 0 (yes, they're the same [today],
but it is much better to check for symbolic names...).

This commit was SVN r15731.
2007-08-01 18:59:37 +00:00
Jeff Squyres
8d4b6c7b0d The HNP changing into an orted brought a bug in the iof svc component
to light: we weren't ack'ing properly for streams that originated (or
originated via proxy) and terminated within the HNP.  This commit
fixes that.

It also fixes a few style issues, and added some more opal_outputs for
debugging.  Also, fixed a bug where the fact that we forwarded (and
therefore might need to update the ack) was not correctly reported if
there were multiple forwards (which there are not as the system is
currently using IOF, but there could be).

Refs trac:1098 -- want to get another pair of eyes to look at this before
I close the ticket.

This commit was SVN r15730.

The following Trac tickets were found above:
  Ticket 1098 --> https://svn.open-mpi.org/trac/ompi/ticket/1098
2007-08-01 18:38:03 +00:00
Ralph Castain
066ff38d42 Ensure we read all the reported URI contact info when we fork an HNP for singleton support
This commit was SVN r15714.
2007-07-31 18:55:08 +00:00
Shiqing Fan
0f468f3668 - Remove the solution and project files, will commit them later.
This commit was SVN r15705.
2007-07-31 17:07:02 +00:00
George Bosilca
2e2bf472ff Mark the orte_abort function as noreturn and change the return value from
int to void. This function call exit at the end, so there is no way to
return from there. Apply the same thing to the errmsg_abort function and
update all components.

This commit was SVN r15704.
2007-07-31 16:09:52 +00:00
Sven Stork
855434de59 - fixes several coverty issues
- add missing initialisation for variables
  - use strncpy instead of strcpy

This commit was SVN r15683.
2007-07-30 14:44:37 +00:00
Shiqing Fan
4d7b349cdb - Add VC8 solution and project files.
- If one wants to use this solution, remember to unload the project 'orte-restart' which is currently not working for Windows.

This commit was SVN r15680.
2007-07-30 11:05:34 +00:00
Rainer Keller
2c5d07217d - Coverity: use snprintf, instead of sprintf....
This commit was SVN r15669.
2007-07-29 11:23:23 +00:00
Jeff Squyres
3858cf48c0 Stop using the deprecated ORTE_NAME_ARGS() and switch to
ORTE_NAME_PRINT().

This commit was SVN r15665.
2007-07-27 13:33:20 +00:00
Josh Hursey
acbc8ecca3 - On Cray XT systems stop the grpcomm basic component from building.
grpcomm cnos component
  - Remove the .ompi_ignore
  - add a configure.m4 that should keep it from building on any system
    other than Cray XT* (copied from rml/cnos)
  - Fix some mis-named symbols resulting from cut/paste errors.

This patch brings the Cray build back into 'working' order.

This commit was SVN r15651.
2007-07-26 20:42:06 +00:00
Jeff Squyres
188d529beb * We *do* need the LSF task ID as part of our vpid
* Accidentally had the PLS LSF using the env SDS; switch it back to
   the LSF SDS

This commit was SVN r15650.
2007-07-26 20:22:36 +00:00
Josh Hursey
e5a03e7734 - Remove Makefile.in from version control
- Add back support for cnos (copy functionality lost by moving the interface
  from the RML).
- Fix some cut/paste errors.

This commit was SVN r15646.
2007-07-26 18:52:17 +00:00
Jeff Squyres
75192de1fc LSF support is now working. W00t! May be subject to a further tweak
or two.

 * checking lsb_init() is not sufficient to know whether you're in an
   LSF job or not; you also need to check for environment variable
   markers 
 * remove lots of debugging output
 * no need for the sds lsf to call lsb_init()
 * remove some slurm-like dead code and a copy-n-paste error in the
   sds lsf

This commit was SVN r15644.
2007-07-26 18:49:29 +00:00
Jeff Squyres
8e9c71282d Add a bunch more [conditional] debugging output.
This commit was SVN r15643.
2007-07-26 18:46:46 +00:00
Rich Graham
60df8be1a7 initial code - does not even compile, but Josh is picking up on this.
This commit was SVN r15641.
2007-07-26 17:55:51 +00:00
Jeff Squyres
d0137acaa4 If --debug-daemons-file is specified, it should also imply
--debug-daemons.

This commit was SVN r15640.
2007-07-26 17:49:13 +00:00
Brian Barrett
801fffabff Don't assume things about the contact info string in the general case. There
is no need for the IP address in most cases (filem being one dubious
exception), so just publish and hand around the supposedly opaque contact
info strings

This commit was SVN r15638.
2007-07-26 16:51:41 +00:00
Sven Stork
5fd6c69019 - fix a problem showed up with the sun thread tests.
Remove unnecessary locks because functions that are calling this
  function proper lock/unlock the orted_comm_mutex. Therefore this 
  unlocks cause some imballance.

This commit was SVN r15630.
2007-07-26 11:30:27 +00:00
Brian Barrett
e537cc0871 * Add documentation for RML base code
* Move function declaration out of base.h as it isn't needed
    outside the base code

This commit was SVN r15616.
2007-07-25 16:19:29 +00:00
Brian Barrett
f06b61cff9 Don't use the OOB TCP key for contact information, remove the need to
include a not so public header file.  FIxes a compile error on the Cray.

This commit was SVN r15613.
2007-07-25 15:12:07 +00:00
George Bosilca
00796cfdab Make sure the oob_tcp_windows_progress_callback is registered
in all cases. This is now done in the oob tcp open function.
As a result, the unregistering have to be done in the close
function.

This commit was SVN r15603.
2007-07-25 05:55:14 +00:00
George Bosilca
5d8a70e434 Update the Windows ODLS.
This commit was SVN r15600.
2007-07-25 03:57:25 +00:00
George Bosilca
c961cb5749 The Windows support is now back in bussiness.
This commit was SVN r15599.
2007-07-25 03:55:34 +00:00
Brian Barrett
4e23c7c5a2 Fixes for case where IPv6 support is disabled. Fixes trac:1102.
This commit was SVN r15584.

The following Trac tickets were found above:
  Ticket 1102 --> https://svn.open-mpi.org/trac/ompi/ticket/1102
2007-07-24 17:01:39 +00:00
Josh Hursey
1b177cd029 This component is checkpointable.
This commit was SVN r15567.
2007-07-23 20:20:28 +00:00
Josh Hursey
a24e530f8e Some C/R fixes (more to come)
r15390 - Changed the paradigm in which the runtime worked by enabling the mpirun
process to become an orted and spawn processes. This broke the C/R for this 
special case as it required that the orted start the process, and that 
the hierarchy remains.
The fix was to allow the global coordinator to be a local coordinator as well
for this case.

r15528 - Changed the selection logic for the RML. This caused the application to
segv if the 'ftrm' wrapper component was selected as it tried to modify a NULL
pointer.
The fix was to move the 'module swap' code into the init() function, and swap
when passed a NULL pointer. It sounds bad, but actually cleans up the code a bit
more.

Still have to fix the 'routed' framework.

This commit was SVN r15566.

The following SVN revision numbers were found above:
  r15390 --> open-mpi/ompi@bd65f8ba88
  r15528 --> open-mpi/ompi@39a6057fc6
2007-07-23 20:13:37 +00:00
Ralph Castain
f219cc1e6e A few changes to the lsf components - mostly cleanup, no major logic changes
This commit was SVN r15563.
2007-07-23 18:38:36 +00:00
Ralph Castain
d99c764e75 Resolve a problem where the orte daemon comm functions were being accessed by mpirun while still retaining occasional reference to the orted_globals. Remove all dependence on orted_globals from the comm functions. Move those functions back into their own file to make it easier to maintain the separation. Ensure that mpirun ignores any "exit" commands being sent to daemons as it will exit on its own.
This commit was SVN r15562.
2007-07-23 18:36:33 +00:00
Ralph Castain
2017d52df0 Cleanup a few compiler warnings
This commit was SVN r15560.
2007-07-23 18:30:40 +00:00
Sven Stork
4c031de1ab - fix typo to export component structure in the case of visibility enabled
- use BEGIN/END_C_DECLS

This commit was SVN r15559.
2007-07-23 17:33:13 +00:00
Ralph Castain
db267899be Setup and use a tsd ring buffer to avoid overwriting process name outputs when print is called multiple times in same output statement.
This commit was SVN r15558.
2007-07-23 17:27:14 +00:00
Sven Stork
baf5e4b596 - add orte_config.h as first file to be included
- export required symbol

This commit was SVN r15556.
2007-07-23 15:50:55 +00:00
Ralph Castain
ef141d1fbc Ensure daemons know contact info for all other daemons. Update binomial xcast to work in revised design. Add debug output to orted so the daemon lets us know it launched (if --debug-daemons set) early on in case it fails during orte_init
This commit was SVN r15555.
2007-07-23 15:00:39 +00:00
Ralph Castain
6c800d452d Bring orte tests up to date with revised rml system.
Make first cut at fixing non-direct xcast modes

This commit was SVN r15553.
2007-07-23 13:05:34 +00:00
Jeff Squyres
78d214fec8 Oops -- didn't mean to commit the test program...
This commit was SVN r15538.
2007-07-20 20:15:51 +00:00
Jeff Squyres
2baa866026 Compiles to the new API, but doesn't quite work yet...
This commit was SVN r15537.
2007-07-20 19:49:27 +00:00
George Bosilca
1751b289ed Avoid a compiler warning about uninitialized variables.
This commit was SVN r15534.
2007-07-20 04:07:19 +00:00
George Bosilca
d1424689ce Always release the buffer (this imply the buffer has to be created
outside the special case).

This commit was SVN r15533.
2007-07-20 04:06:39 +00:00