1
1
Граф коммитов

17639 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
6b5f9d7767 Some cleanups for staged execution
This commit was SVN r27317.
2012-09-12 09:15:33 +00:00
Matthias Jurenz
e643c09dee Changes to OTF:
- otfaux:
		- fixed build error on Solaris and NetBSD (removed -lm from library dependencies)
Changes to VT:
	- vtunify:
		- disable OpenMP parallelization if PGI compiler version < 9 is used (threadprivate not supported)

This commit was SVN r27316.
2012-09-12 09:03:10 +00:00
Ralph Castain
b22fc54d9b Remove unused variable
This commit was SVN r27312.
2012-09-12 08:41:48 +00:00
Ralph Castain
5f7a5c4793 Update test to include all keys
This commit was SVN r27311.
2012-09-12 05:02:51 +00:00
Jeff Squyres
a7ea880d0a Refs trac:3309
* Minor man page tweaks
 * Use existing ompi_mpi_thread_requested global

This commit was SVN r27308.

The following Trac tickets were found above:
  Ticket 3309 --> https://svn.open-mpi.org/trac/ompi/ticket/3309
2012-09-11 21:12:06 +00:00
Ralph Castain
bc1300f5cc Remove debug
This commit was SVN r27307.
2012-09-11 20:54:04 +00:00
Jeff Squyres
fb2e543a57 Refs trac:3275.
We ran into a case where the OMPI SVN trunk grew a new acceptable MCA
parameter value, but this new value was not accepted on the v1.6
branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts
the value "silent", but on the older v1.6 branch, it doesn't).  If you
set "hwloc_base_mem_bind_failure_action=silent" in the default MCA
params file and then accidentally ran with the v1.6 branch, every OMPI
executable (including ompi_info) just failed because hwloc_base_open()
would say "hey, 'silent' is not a valid value for
hwloc_base_mem_bind_failure_action!".  Kaboom.

The only problem is that it didn't give you any indication of where
this value was being set.  Quite maddening, from a user perspective.

So we changed the ompi_info handles this case.  If any framework open
function return OMPI_ERR_BAD_PARAM (either because its base MCA params
got a bad value or because one of its component register/open
functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out
a warning that it received and error, and then dump out the parameters
that it has received so far in the framework that had a problem.

At a minimum, this will show the user the MCA param that had an error
(it's usually the last one), and ''where it was set from'' (so that
they can go fix it).  

We updated ompi_info to check for O???_ERR_BAD_PARAM from each from
the framework opens.  Also updated the doxygen docs in mca.h for this
O???_BAD_PARAM behavior.  And we noticed that mca.h had MCA_SUCCESS
and MCA_ERR_??? codes.  Why?  I think we used them in exactly one
place in the code base (mca_base_components_open.c).  So we deleted
those and just used the normal OPAL_* codes instead.

While we were doing this, we also cleaned up a little memory
management during ompi_info/orte-info/opal-info finalization.
Valgrind still reports a truckload of memory still in use at ompi_info
termination, but they mostly look to be components not freeing
memory/resources properly (and outside the scope of this fix).

This commit was SVN r27306.

The following Trac tickets were found above:
  Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275
2012-09-11 20:47:24 +00:00
Ralph Castain
a0ffeb205a Add an orted component for staged operations and rename the staged component to "staged_hnp".
This commit was SVN r27305.
2012-09-11 20:35:46 +00:00
Ralph Castain
387f657fc2 Nuts - forgot to include this with the MPI Ticket 313 stuff. Set some of the envars needed for MPI_INFO_ENV
This commit was SVN r27304.
2012-09-11 20:35:09 +00:00
Ralph Castain
cd8aff675b Update test
This commit was SVN r27303.
2012-09-11 20:32:43 +00:00
Ralph Castain
e8ecd67d53 Once again, bloody SLURM changes the envars and breaks things. Try and track their changes so we get a correct allocation.
This commit was SVN r27302.
2012-09-11 20:31:33 +00:00
Ralph Castain
a08c23dfdc Actually, do the right thing - leave the test alone, but just turn if "off" for now until someone, someday fixes it to work with bind mounts.
This commit was SVN r27301.
2012-09-11 19:56:58 +00:00
Ralph Castain
3c016d79db Soft mounts are okay
This commit was SVN r27300.
2012-09-11 19:48:24 +00:00
Jeff Squyres
a8f8064d8b Add a missing free(). Refs trac:3292.
This commit was SVN r27298.

The following Trac tickets were found above:
  Ticket 3292 --> https://svn.open-mpi.org/trac/ompi/ticket/3292
2012-09-11 17:59:40 +00:00
Ralph Castain
ffb8c2a2ba Add the MPI_INFO_ENV man page
This commit was SVN r27293.
2012-09-11 17:35:32 +00:00
Ralph Castain
fb4af5e29c Implement the rest of MPI-3 ticket #313 based on side-bar agreement with MPICH2 folks. Fix a bug in the original ompi_info code that put the NULL terminator one position too far if the returned string exceeded MPI_MAX_INFO_VAL in length in ompi_info_get.
This commit was SVN r27292.
2012-09-11 17:03:49 +00:00
Shiqing Fan
ec4cf39925 Windows doesn't need to exclude any interface by default. This will avoid tcp warnings.
This commit was SVN r27291.
2012-09-11 15:39:37 +00:00
Shiqing Fan
0c4c2a5f5d Revert r27283. A better solution is found. Thanks to Ralph anyway.
This commit was SVN r27290.

The following SVN revision numbers were found above:
  r27283 --> open-mpi/ompi@38bcd86ae4
2012-09-11 15:37:22 +00:00
Josh Hursey
40132f1874 Protect a potentially uninitialized variable (orte_sstore_base_global_snapshot_ref).
If orte_sstore_base_global_snapshot_ref is null, then it will default appropriately when it is used. When prelaunching we always specify this parameter, but if we are not prelaunching it is possible to allow this to be null and it will initialize when used. However we setup the prelaunching variable in both situtations and in the latter that would result in a NULL reference. This patch protects that code segment.

This commit was SVN r27289.
2012-09-11 15:14:28 +00:00
Ralph Castain
ca40cb5f1c Fix comm_spawn by mpirun
This commit was SVN r27285.
2012-09-10 17:09:25 +00:00
Ralph Castain
38bcd86ae4 Per request by Shiqing, specifically exclude the "lo" interface from the TCP btl. Apparently, Windows sometimes fails to resolve the 127.0.0.1 to "lo", causing subsequent failures.
This commit was SVN r27283.
2012-09-10 16:22:46 +00:00
Matthias Jurenz
f5ffb4783c Changes to VT:
- configure: do not build OpenMP support if CFLAGS contains a compiler flag for disabling handling OpenMP directives (e.g. -fno-openmp, -nomp, -hnoomp)
	  Fixes trac:3117

This commit was SVN r27282.

The following Trac tickets were found above:
  Ticket 3117 --> https://svn.open-mpi.org/trac/ompi/ticket/3117
2012-09-10 14:29:47 +00:00
Matthias Jurenz
ef0e8f859a Changes to OTF:
- general:
		- incremented version number ro 1.11.3openmpi
	- otfaux:
		- fixed build error when using the Oracle compiler on Solaris. (removed usage of too recent rint() function)
		  Fixes trac:3257

This commit was SVN r27279.

The following Trac tickets were found above:
  Ticket 3257 --> https://svn.open-mpi.org/trac/ompi/ticket/3257
2012-09-10 12:59:07 +00:00
Jeff Squyres
b793e4ebc6 Now that Absoft has fixed the problem in their compiler, revert the
patch that helped them test (r27184).  Thanks Abosoft!

This commit was SVN r27277.

The following SVN revision numbers were found above:
  r27184 --> open-mpi/ompi@a951a5ee99
2012-09-10 12:54:24 +00:00
Vishwanath Venkatesan
6d9d0f2968 Initialize the iov_count, this crashes static write/read in certain platforms while decoding datatype
This commit was SVN r27273.
2012-09-08 00:40:21 +00:00
Jeff Squyres
8076cf8089 Abort configure if --enable-memchecker was specified, but then no
memchecker components were able to configure successfully.

This commit was SVN r27267.
2012-09-07 16:08:43 +00:00
Jeff Squyres
8585920e49 Add a note in the default hostfile that it is not used in managed
environments. 

This commit was SVN r27264.
2012-09-07 14:41:19 +00:00
Jeff Squyres
ee7155b633 Sync with v1.6 branch NEWS (still need to add 1 more bullet to v1.6 NEWS)
This commit was SVN r27260.
2012-09-07 13:49:21 +00:00
Ralph Castain
4ca495c7e3 In managed allocations, we need to ensure that all nodes are flagged as having their slots "given" in case the user incorrectly asks us to change them. Also need to update the HNP node's "slots_given" flag as we don't replace the orte_node_t object when inserting the node info for that node.
This commit was SVN r27258.
2012-09-07 04:08:17 +00:00
Ralph Castain
2110fb7f95 Add some debug
This commit was SVN r27257.
2012-09-07 04:06:37 +00:00
Ralph Castain
78ccb097f0 Fix vm setup in unmanaged environments - needs to construct a node list in the same way we now do for mapping
This commit was SVN r27256.
2012-09-07 01:53:19 +00:00
Ralph Castain
876d78f36a If JAVA_HOME is present on a Linux system, use it to find Java support
This commit was SVN r27255.
2012-09-07 00:41:14 +00:00
Ralph Castain
36acbe4ca6 Multiple apps might want the same files, so instead of using the app_idx to determine who gets what, use the actual file names as they are sent anyway
This commit was SVN r27254.
2012-09-06 22:02:05 +00:00
Ralph Castain
e9e52fc78f Gain some efficiency in the staged mapper - if soft locations are in use and get_nodes returns busy, then no need to continue cycling thru the remaining apps as all nodes are occupied
This commit was SVN r27253.
2012-09-06 22:01:18 +00:00
Ralph Castain
67f34c3be6 Record the bind_level recvd by the daemon for each job so it can be correctly sent to the procs. Add test in get_relative_locality to avoid descending into an infinite loop if the level is NODE (==0).
This commit was SVN r27252.
2012-09-06 20:50:07 +00:00
Jeff Squyres
dd254cc202 OMPI_HAVE_IBV_LINK_LAYER does not exist. Instead, check defined(HAVE_IBV_LINK_LAYER_ETHERNET).
This commit was SVN r27251.
2012-09-06 18:25:36 +00:00
Jeff Squyres
aca005ccc5 Add bullet about MPI_CART_SUB
This commit was SVN r27249.
2012-09-06 14:24:02 +00:00
Jeff Squyres
0d2962ebf0 Fixes trac:3294: space for the periods has already been allocated by
ompi_comm_split(), and the entire set of periods from the old
communicator have already been copied to the new communicator.  But up
here in mca_topo_base_cart_sub(), we need to subset the periods that
are actually stored on the new communicator according to remain_dims
(just like we did for the set of dimensions).

This commit renames a few variables to be a little less misleading,
and then adds a loop to copy over the periods information.  I could
have added this into the first loop (that subset-copies the
dimensions), but this code is already confusing enough and this is not
a performance-critical section: so I made it a new loop.

Note that all the topo code will be revamped a bit when the new
MPI-2.2 topo stuff (currently off in a mercurial branch) finally makes
it back to the SVN trunk.  But that new stuff will only get to v1.7 --
this commit will need to be CMR'ed to v1.6.x.

cmr:v1.7
cmr:v1.6.2

This commit was SVN r27248.

The following Trac tickets were found above:
  Ticket 3294 --> https://svn.open-mpi.org/trac/ompi/ticket/3294
2012-09-06 14:16:29 +00:00
Vishwanath Venkatesan
b75d877a3f Removing .ompi_ignore for the lustre component.
This commit was SVN r27247.
2012-09-05 22:20:18 +00:00
Vishwanath Venkatesan
640aca6654 Modifying the file view generation to remove the merging of offset-length pair.
Its no longer needed as the default file view makes sure the chunks are large enough.

This commit was SVN r27246.
2012-09-05 21:00:47 +00:00
Ralph Castain
efa50346c8 Error out if we are filtering a hostfile and encounter a node that is not in the resource-managed allocation, giving an error message identifying the file and the node. Don't filter managed allocations thru a default hostfile as this can lead to "hidden" errors.
Don't use dash-host info on managed allocations if we using soft locations

This commit was SVN r27245.
2012-09-05 19:42:00 +00:00
Brian Barrett
fa4c2af9ed THe Portals 4 reference implementation will sometimes return a NI_FLOWCTL for both a
send and an ack.  I'm not sure whether this violates the spec, so work around until
we decide...

This commit was SVN r27244.
2012-09-05 19:36:19 +00:00
Jeff Squyres
38440369a7 Add note about Absoft compiler and the mpi_f08 module.
This commit was SVN r27243.
2012-09-05 18:45:38 +00:00
Ralph Castain
d772e0fc3d Add an option to treat dash-host specifications as "requested, but not required". So-called "soft" location requests can allow an application to execute even if the ideal allocation isn't available.
This commit was SVN r27242.
2012-09-05 18:42:09 +00:00
Ralph Castain
6d29cecce1 Fix the help message warning of multiple prefixes so it correctly prints out the info, and fix a typo.
cmr:v1.7

This commit was SVN r27241.
2012-09-05 16:28:36 +00:00
Ralph Castain
64ccf789f2 Ensure the final output is printed
cmr:v1.7

This commit was SVN r27240.
2012-09-05 16:25:15 +00:00
Ralph Castain
fde83a44ab This confusion has been around for awhile, caused by a long-ago decision to track slots allocated to a specific job as opposed to allocated to the overall mpirun instance. We eliminated that quite a while ago, but never consolidated the "slots_alloc" and "slots" fields in orte_node_t. As a result, confusion has grown in the code base as to which field to look at and/or update.
So (finally) consolidate these two fields into one "slots" field. Add a field in orte_job_t to indicate when all the procs for a job will be launched together, so that staged operations can know when MPI operations are allowed.

This commit was SVN r27239.
2012-09-05 01:30:39 +00:00
Ralph Castain
bae5dab916 If (and only if) a user requests, set the default number of slots on any node to the number of objects of the specified type. This *only* takes effect in an unmanaged environment - i.e., if an external resource manager assigns us a number of slots, then that is what we use. However, if we are using a hostfile, then the user may or may not have given us a value for the number of slots on each node.
For those nodes (and *only* those nodes) where the user does *not* specify a slot count, we will set the number of slots according to their direction: either to the number of cores, numas, sockets, or hwthreads. Otherwise, the slot count is set to 1.

Note that the default behavior remains unchanged: in the absence of any value for #slots, and in the absence of any directive to set #slots, we will set #slots=1.

This commit was SVN r27236.
2012-09-04 20:58:26 +00:00
Ralph Castain
ee6c7702d2 Ensure the cma.h file is included in the tarball
This commit was SVN r27235.
2012-09-04 19:34:09 +00:00
Ralph Castain
18d2f75b56 Ensure we don't re-link files when staging execution as we may be executing more members of the same app. Allow the user to ask that directory trees be "flattened" so that all files appear in the proc's session directory itself.
This commit was SVN r27232.
2012-09-04 17:52:12 +00:00