1
1
Граф коммитов

15654 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
ef56e6d78b Helps to move the pointer
This commit was SVN r24414.
2011-02-18 14:01:25 +00:00
Ralph Castain
7b35ada7fc Fix ricochet effect - move failed procs to next on list instead of loadbalancing
This commit was SVN r24413.
2011-02-18 13:11:55 +00:00
Ralph Castain
b98a2917ff Add an API to the errmgr so that apps can register for a callback to warn them of an impending migration - this gives apps a chance to cleanly terminate prior to being migrated for external reasons (e.g., impending failures). The timeout provided indicates to the daemon how long it should wait before proceeding to kill/migrate the process - if the process fails to exit before that time, the daemon will kill it.
This commit was SVN r24412.
2011-02-18 02:48:12 +00:00
Ralph Castain
a0f6e153c7 Add missing fields to copy of app_context object
This commit was SVN r24411.
2011-02-17 23:55:05 +00:00
Jeff Squyres
99ea34fe94 Missed updating these 2 scripts yesterday
This commit was SVN r24410.
2011-02-17 14:01:10 +00:00
Donald Kerr
995d46344c simplify the way IBV_ACCESS_SO is discovered
This commit was SVN r24409.
2011-02-17 04:28:56 +00:00
Ralph Castain
51cf0a16c3 Some minor cleanups to support VM and CM operations
This commit was SVN r24408.
2011-02-16 23:03:08 +00:00
Ralph Castain
9b48c07599 CM daemons handle their own output
This commit was SVN r24407.
2011-02-16 23:02:23 +00:00
Ralph Castain
65ba6af44d Cleanup our handling of VMs to ensure daemons don't get mapped when operating with a VM.
Have each mapper flag it did the map so we can see who did it later.

Ensure procs are flagged as "ready to launch".

This commit was SVN r24406.
2011-02-16 23:01:57 +00:00
Jeff Squyres
3f4d4886f2 Minor update for something that has been bugging me for quite a while:
OMPI supports multiple different repository systems (SVN, hg, git).
But the VERSION file has listed "want_svn" and "svn_r" as fields, even
though the actual repo system and version may not be SVN.

So search/replace those fields (and derrivative values that come from
those fields) with "want_repo_rev" and "repo_rev", respectively.

This commit was SVN r24405.
2011-02-16 22:53:23 +00:00
Ralph Castain
a32a7d9a82 Update heartbeat system
This commit was SVN r24404.
2011-02-16 18:50:51 +00:00
Jeff Squyres
c26230809b Fix svn:ignore -- ompi_get_version.sh was renamed to
opal_get_version.sh long ago.

This commit was SVN r24403.
2011-02-16 16:21:49 +00:00
Shiqing Fan
d464601e9b Define all the missing SIG* for Windows.
This commit was SVN r24402.
2011-02-16 13:46:27 +00:00
Shiqing Fan
ddc05e05d7 Avoid blocking select on Windows.
This commit was SVN r24396.
2011-02-16 08:48:21 +00:00
Donald Kerr
2b60b165aa on Solaris, when IBV_ACCESS_SO is available, use strong ordered memory region for eager rdma connection
This commit was SVN r24395.
2011-02-16 05:37:22 +00:00
Ralph Castain
9b38525d1e Remove unused include files
This commit was SVN r24394.
2011-02-16 00:32:47 +00:00
Ralph Castain
5120e6aec3 Redefine the rmaps framework to allow multiple mapper modules to be active at the same time. This allows users to map the primary job one way, and map any comm_spawn'd job in a different way. Modules are given the opportunity to map a job in priority order, with the round-robin mapper having the highest default priority. Priority of each module can be defined using mca param.
When called, each mapper checks to see if it can map the job. If npernode is provided, for example, then the loadbalance mapper accepts the assignment and performs the operation - all mappers before it will "pass" as they can't map npernode requests.

Also remove the stale and never completed topo mapper.

This commit was SVN r24393.
2011-02-15 23:24:31 +00:00
Ralph Castain
29785e4ea1 Update platform file
This commit was SVN r24388.
2011-02-15 19:01:59 +00:00
Ralph Castain
c1da94a444 Dead children have no pid
This commit was SVN r24387.
2011-02-15 13:30:51 +00:00
Ralph Castain
a3607ff35d Make it easier to send a kill-local-procs command for an arbitrary number of procs
This commit was SVN r24386.
2011-02-15 13:26:11 +00:00
Ralph Castain
bf1cff3711 Plug a couple of additional memory leaks - try to highlight a little better that strings returned from reg_string_name must be freed by caller
This commit was SVN r24383.
2011-02-14 20:58:22 +00:00
Ralph Castain
a9dca25ca5 Remove the distinction between local and global restarts - leave it up to the error strategy to decide which to do.
Cleanup the heartbeat handling so it is associated with the proc, not a node.

Cleanup handling of recovery options so that defaults do not override user values iff they are provided.

This commit was SVN r24382.
2011-02-14 20:49:12 +00:00
Ralph Castain
172ad649e1 Update platform files
This commit was SVN r24381.
2011-02-14 20:00:58 +00:00
Ralph Castain
d85916c1c2 Plug memory leak
This commit was SVN r24380.
2011-02-14 19:51:33 +00:00
Ralph Castain
6ee69c670e Add a minor utility
This commit was SVN r24379.
2011-02-14 19:45:59 +00:00
Ralph Castain
b5de068533 Clean up an error in r24371 - can't use a const parameter as target in asprintf as it changes the value of the address.
Add some new proc/job states

Rename a constant to reflect coming change - remove the arbitrary difference between restarting a proc locally and relocating it to another node in terms of the number of restarts allowed.

Add pretty-print of signals for "proc aborted due to signal" reports.

This commit was SVN r24378.

The following SVN revision numbers were found above:
  r24371 --> open-mpi/ompi@93d28a5792
2011-02-14 19:29:09 +00:00
Ralph Castain
e8c2519280 Restore thread-supported condition waits when thread support requested
This commit was SVN r24377.
2011-02-14 19:10:38 +00:00
Jeff Squyres
94356e98d4 Fix from Nikolay Piskun at Rogue Wave (TotalView) -- fixes the case
where MPI jobs are launched directly via srun (i.e., there's no HNP).

This commit was SVN r24376.
2011-02-14 19:03:53 +00:00
Abhishek Kulkarni
93d28a5792 Change opal_err2str_fn_t to return the error string as an argument.
This means that the converters (opal_err2str, orte_err2str) can now
return NULL as a "silent error". The return value of opal_err2str_fn_t
is the status of the operation (OPAL_SUCCESS or OPAL_ERROR).

This fixes the "Unknown error" message issues on the trunk.

This commit was SVN r24371.
2011-02-13 16:09:17 +00:00
Ralph Castain
33b68132cc Update the rmcast framework
This commit was SVN r24370.
2011-02-12 16:52:03 +00:00
Mike Dubman
81222e1fe7 * fix PGI compiler support which does not have __BASE_FILE__ macro
This commit was SVN r24369.
2011-02-10 06:42:37 +00:00
Ethan Mallove
c6fd141923 missing include
This commit was SVN r24368.
2011-02-09 17:59:55 +00:00
Josh Hursey
a9335ea423 Make sure to initialize the 'update_state' function for the default module.
This will prevent tools from segfaulting if the mpirun process goes away suddenly while they are trying to communicate with it over the OOB.

This commit was SVN r24365.
2011-02-08 20:42:32 +00:00
Nysal Jan
92e06b0a1f Missed this change suggested by Terry
This commit was SVN r24364.
2011-02-08 04:06:52 +00:00
Nysal Jan
a31025bb48 Fix pty setup code on AIX
This commit was SVN r24363.
2011-02-08 02:54:47 +00:00
Nysal Jan
f0f1d4e311 Older versions of config.guess detect the canonical system name of an AIX 7.1 system to be rs6000-ibm-aix. Add this workaround until AIX 7.1 support is available in the autotools releases
This commit was SVN r24362.
2011-02-08 02:52:10 +00:00
Jeff Squyres
b0ce9bae8e Oops. Also need to remove myriexpress.h from the Makefile.am.
This commit was SVN r24357.
2011-02-04 03:29:49 +00:00
Eugene Loh
cd5c2e794f Some minor changes to help the openib BTL build and run on Solaris:
- poll() can return POLLRDNORM even if not requested (Solaris bug)
- MIN macro not defined in btl_openib.c
  and while we're at it, we clean up the MIN definition in ad_bgl_pset.h
- btl_openib_connect_rdmacm.c was calling rdma_destroy_id() twice
  leading to undefined behavior (a hang on Solaris)

This commit was SVN r24356.
2011-02-03 23:53:21 +00:00
Abhishek Kulkarni
d711c5a4b1 SOS fix for the Studio compilers (Thanks to Terry for spotting this).
This commit was SVN r24355.
2011-02-03 22:36:28 +00:00
Jeff Squyres
6421abecc7 Fixes trac:2690.
Temporarily remove hwloc's internal version of myriexpress.h.  It is
causing a problem when compiling Open MPI with MX support because
hwloc uses AC_CONFIG_HEADER in hwloc's hwloc.m4 to generate
opal/mca/paffinity/hwloc/hwloc/include/hwloc/config.h.
AC_CONFIG_HEADER apparently has the (undocumented) side effect of
adding -I$(top_builddir)/opal/mca/paffinity/hwloc/hwloc/include/hwloc
to OMPI's compilation flags.  Hence, when the OMPI MX components are
compiled and #include "myriexpress.h" (or <myriexpress.h>) they see
hwloc's myriexpress.h before the system one.  Badness ensures.

This removal is temporary because we need to figure out a better
solution.  But for now, OMPI is not using hwloc's myriexpress.h file --
so it's safe to remove.  I'll push this issue upstream to hwloc to
figure out a better solution...

This commit was SVN r24354.

The following Trac tickets were found above:
  Ticket 2690 --> https://svn.open-mpi.org/trac/ompi/ticket/2690
2011-02-03 14:24:32 +00:00
Jeff Squyres
c0acc75ce0 Update copyrights and clarify README.txt.
This commit was SVN r24348.
2011-02-02 17:25:56 +00:00
Jeff Squyres
0de4f4c35a Oops -- forgot to make the README.txt be version-neutral.
This commit was SVN r24347.
2011-02-02 17:14:41 +00:00
Jeff Squyres
2755a9f261 As discussed here:
http://www.open-mpi.org/community/lists/devel/2011/01/8894.php
    http://blogs.cisco.com/performance/building-3rd-party-open-mpi-components/

Contribute a sample of how to build MCA components outside of the Open
MPI source tree.

This commit was SVN r24346.
2011-02-02 17:11:33 +00:00
Jeff Squyres
0d08f636b0 Add 1.5.2 items
This commit was SVN r24344.
2011-02-02 15:19:10 +00:00
Jeff Squyres
e388450e98 Add 1.5.2 items.
This commit was SVN r24343.
2011-02-02 15:17:30 +00:00
Jeff Squyres
9306bf259b Make it a little more friendly towards svn+hg trees
This commit was SVN r24341.
2011-02-02 14:41:36 +00:00
Nysal Jan
ab2f738b0b Recent versions of IBM XL compilers on AIX support GCC inline assembly format
This commit was SVN r24340.
2011-02-02 11:31:30 +00:00
Nysal Jan
3a8d251daa vsyslog is not included in SUSv3. Add a check for platforms that do not have vsyslog
This commit was SVN r24339.
2011-02-02 10:05:57 +00:00
Jeff Squyres
4674e62929 These files are superflouos.
This commit was SVN r24331.
2011-02-01 21:31:35 +00:00
Jeff Squyres
c8badb79df Don't instantiate variables in for loops; we don't assume C99
compilers. 

This commit was SVN r24330.
2011-02-01 19:23:14 +00:00