George Bosilca
3a2f071018
If the user asked for dynamic rules but "forget" to provide them, nicely
...
complain and switch back to the default behavior (fixed rules).
This commit was SVN r22109.
2009-10-19 17:58:47 +00:00
Ralph Castain
941b8722b4
Remove an errant entry in the platform files
...
This commit was SVN r22108.
2009-10-16 16:12:37 +00:00
Ralph Castain
ff9d72b3ab
Add a new multicast tag for collecting ps data
...
This commit was SVN r22107.
2009-10-16 04:21:22 +00:00
Terry Dontje
13907781b2
missed adding report-uri option
...
This commit was SVN r22106.
2009-10-15 18:05:24 +00:00
Ralph Castain
49ce2b4342
Add a new interface to the rmcast framework to query the output channel for the proc
...
This commit was SVN r22105.
2009-10-15 17:47:42 +00:00
Terry Dontje
c96af5654c
correct options and wording that were dropped in the last change due to committing v1.3 manpage to the trunk
...
This commit was SVN r22104.
2009-10-15 15:03:21 +00:00
Ralph Castain
99c67183d2
Minor cleanups, mainly to ensure we correctly block on blocking sends
...
This commit was SVN r22102.
2009-10-15 02:39:15 +00:00
Ralph Castain
2f91a4833b
Have the trigger event return the event itself in the callback function so it can be reset, if desired
...
This commit was SVN r22101.
2009-10-15 02:35:53 +00:00
Ralph Castain
2665825693
Correct an error that causes the system to "bounce" when we order a job killed. We didn't used to discriminate between a process being ordered to die, and a process that was aborted by an external signal. Unfortunately, that means the error mgr gets called and told a process abnormally aborted when we order termination, thus causing the errmgr to send out a "kill procs" command again.
...
Wouldn't be so bad, except...the errmgr orders the termination of ALL procs, which kills any other job that should have been left alone.
Add a new proc and job state indicating "killed_by_cmd" so we can tell the difference between a proc/job that was deliberately terminated by us vs one that is killed by external signal.
This change was tested to ensure it didn't interfere with ctrl-c operation (it doesn't - we order termination of all jobs when we get a ctrl-c).
This commit was SVN r22100.
2009-10-14 22:49:56 +00:00
Ralph Castain
18960a9c5a
Refactor the multicast support so the data type objects can be accessed beyond just the one component
...
Ensure that the local node is included in the allocation prior to bootstrap discovery
This commit was SVN r22099.
2009-10-14 17:43:40 +00:00
Terry Dontje
0a8645a411
This commit fixes trac:2017
...
This commit was SVN r22098.
The following Trac tickets were found above:
Ticket 2017 --> https://svn.open-mpi.org/trac/ompi/ticket/2017
2009-10-14 11:40:47 +00:00
Ralph Castain
60c4ebab45
Update makefile so it finds new platform file directory
...
This commit was SVN r22097.
2009-10-14 01:48:30 +00:00
Ralph Castain
3a046b7262
Update platform files for cisco clusters
...
This commit was SVN r22096.
2009-10-14 01:47:32 +00:00
Ralph Castain
bc869636be
Reset the verbosity levels to suppress debug output
...
This commit was SVN r22095.
2009-10-13 15:29:38 +00:00
Ralph Castain
e501589b3b
Cleanup the bootstrap procedure for multiple daemons starting up
...
This commit was SVN r22094.
2009-10-13 15:14:54 +00:00
Ralph Castain
c25dd14440
Correctly set the multicast interface, cleanup a comment
...
This commit was SVN r22093.
2009-10-13 15:14:28 +00:00
Ralph Castain
d8d80d6f1a
Closes trac:2054. Check if a user specifies more cpus-per-rank than there are cpus in a socket - if so, politely tell them "you are stupid" and abort.
...
This commit was SVN r22091.
The following Trac tickets were found above:
Ticket 2054 --> https://svn.open-mpi.org/trac/ompi/ticket/2054
2009-10-13 04:19:07 +00:00
Ralph Castain
1475d34c13
Ensure we default to byslot mapping
...
This commit was SVN r22090.
2009-10-11 23:50:42 +00:00
Ralph Castain
67625d5df6
Esure that mca params have a chance to be used in the standard hierarchy - in this case, allowing a choice of mapping policy to be made at the rmaps mca level.
...
This commit was SVN r22089.
2009-10-11 03:44:39 +00:00
Jeff Squyres
ad62148bc2
Trunk will now eventually become v1.7
...
This commit was SVN r22088.
2009-10-10 19:26:55 +00:00
Ralph Castain
84cc847be8
Next phase of auto-wireup using multicast. Enable use of multicast groups to separate comm from different application groups. Have the orted bootstrap message go to a different rml tag so the node can be added to the pool.
...
This commit was SVN r22083.
2009-10-10 01:19:56 +00:00
Ralph Castain
40e2299fa7
Test to ensure that num_procs was provided for the resilient mapper - it cannot be used with options like npernode.
...
Cleanup the show_help text file
This commit was SVN r22082.
2009-10-09 15:26:23 +00:00
Ralph Castain
b7a0125bb7
Add a test for the new opal if.c functions. Modify the multicast test
...
This commit was SVN r22081.
2009-10-09 15:25:18 +00:00
Ralph Castain
c58a30ea10
Add two new functions:
...
1. check for loopback interface
2. convert tuple addresses to ip addrs + mask
This commit was SVN r22080.
2009-10-09 15:24:41 +00:00
Jeff Squyres
c4f2db926f
Add missing semicolons. Wow.
...
This commit was SVN r22079.
2009-10-08 19:50:19 +00:00
Terry Dontje
0828945eea
Fix an issue with #2048 fix that did not goto the error case.
...
This commit was SVN r22076.
2009-10-08 13:27:32 +00:00
Terry Dontje
58c864699c
This commit fixes trac:2048
...
This commit was SVN r22075.
The following Trac tickets were found above:
Ticket 2048 --> https://svn.open-mpi.org/trac/ompi/ticket/2048
2009-10-08 12:54:53 +00:00
Jeff Squyres
3dc84e9d0b
Change the default value of shell_scripts_basename to not include the
...
version because they're installed in bindir by default, where you can
only have one Open MPI installation at a time. Plus, without the
version numbers is what mpi-selector expects.
Thanks to Bill Johnstone for pointing out the problem.
This commit was SVN r22074.
2009-10-08 11:47:53 +00:00
Jeff Squyres
9afe50d886
Update Cisco copyrights for consistency
...
This commit was SVN r22072.
2009-10-07 22:02:32 +00:00
Jeff Squyres
0d1e177453
Remove 2 extraneous ORTE_ERROR_LOGs and 1 extraneous opal_output.
...
This commit was SVN r22071.
2009-10-07 20:12:37 +00:00
Jeff Squyres
d317ce0367
Fix CID 1381: don't bother checking for (NULL == p); it's overkill.
...
posix_memalign() will either return 0 or not, indicating success. And
if posix_memalign() fails, it's not always going to be due to
out-of-memory -- just return ERR_IN_ERRNO.
This commit was SVN r22070.
2009-10-07 20:01:50 +00:00
Jeff Squyres
7900451e4e
Fix CID 1326: for the (unlikely) case where
...
opal_paffinity_base_get_processor_info() returns failure.
This commit was SVN r22069.
2009-10-07 19:52:08 +00:00
Jeff Squyres
5c1af9c2ba
Fix CID 1355: ensure that mca_base_param_reg_int() actually
...
succeeded.
This commit was SVN r22068.
2009-10-07 19:43:35 +00:00
Jeff Squyres
d56b8d9183
Fix CID 1369: minor memory leak.
...
This commit was SVN r22067.
2009-10-07 19:40:00 +00:00
Jeff Squyres
de59a24593
Fix CID 1384. Also remove some opal_output(0,...)'s in favor of
...
ORTE_ERROR_LOG.
This commit was SVN r22066.
2009-10-07 18:58:58 +00:00
Jeff Squyres
ec71acf7ca
Fix CID 1385: fix an over-aggressive use of close, munmap, etc. in the
...
error case. Also check for MAP_FAILED (instead of -1) from mmap().
This commit was SVN r22065.
2009-10-07 18:43:37 +00:00
Jeff Squyres
5ec86e5fe5
Fix CID 1386: fd can't be valid here, so don't bother to close/unlink.
...
This commit was SVN r22064.
2009-10-07 18:30:26 +00:00
Jeff Squyres
3b4f695009
MAP_FAILED is more POSIX-ly correct than ((void*)-1).
...
This commit was SVN r22063.
2009-10-07 14:20:18 +00:00
Jeff Squyres
d7db5f4c32
mmap(2) says that you must call mmap() with either MAP_SHARED or
...
MAP_PRIVATE. We didn't catch this because we checked for a NULL
return, not a -1 return. Doh! Thanks again to Julian Seward for
continuing to track this down.
This commit was SVN r22062.
2009-10-07 12:39:01 +00:00
Jeff Squyres
977574bd45
Fix a problem noted by Julian Seward: MAKE_MEM_UNDEFINED is not the
...
opposite of MAKE_MEM_DEFINED. Also add in a call to NOACCESS to
(mostly) reverse the effects of MAKE_MEM_DEFINED (technically, page 0
was accessible before this, even though it's a Bad Idea to access it).
This commit was SVN r22056.
2009-10-06 17:55:49 +00:00
Jeff Squyres
932b43be04
Check to ensure that the mmap succeeded. Thanks to Julia Seward for
...
pointing out the problem and suggesting the fix.
This commit was SVN r22055.
2009-10-06 17:44:14 +00:00
Shiqing Fan
14e6952482
Update two CMake find modules.
...
This commit was SVN r22054.
2009-10-06 08:01:37 +00:00
Shiqing Fan
7dff65cbc9
Clean up a little bit.
...
Add an option for setting up the job name.
This commit was SVN r22053.
2009-10-06 07:52:43 +00:00
George Bosilca
01bb4dafe0
Add a comment.
...
This commit was SVN r22052.
2009-10-05 17:36:11 +00:00
Jeff Squyres
0f8ac9223f
Refs trac:2023, #2027 .
...
This commit does a bunch of things:
* Address all remaining code review items from CMR #2023 :
* Defer mmap setup to be lazy; only set it up the first time we
invoke a collective. In this way, we don't penalize apps that
make lots of communicators but don't invoke collectives on them
(per #2027 ).
* Remove the extra assignments of mca_coll_sm_one (fixing a
convertor count setup that was the real problem).
* Remove another extra/unnecessary assignment.
* Increase libevent polling frequency when using the RML to
bootstrap mmap'ed memory.
* Fix a minor procs-related memory leak in btl_sm.
* Commit a datatype fix that George and I discovered along the way to
fixing the coll sm.
* Improve error messages when mmap fails, potentially trying to
de-alloc any allocated memory when that happens.
* Fix a previously-unnoticed confusion between extent and true_extent
in coll sm reduce.
This commit was SVN r22049.
The following Trac tickets were found above:
Ticket 2023 --> https://svn.open-mpi.org/trac/ompi/ticket/2023
2009-10-02 17:13:56 +00:00
Ralph Castain
dcab61ad83
Restore the prior default rank assignment scheme for round-robin mappers. Ensure that each app_context has sequential vpids.
...
This commit was SVN r22048.
2009-10-02 03:16:18 +00:00
Jeff Squyres
c8c3132605
Also check for posix_memalign.
...
This commit was SVN r22045.
2009-10-01 23:51:48 +00:00
George Bosilca
cf9f38eb56
Instead of just complaining about a version mismatch, clearly lists the versions
...
available locally.
This commit was SVN r22044.
2009-10-01 14:06:41 +00:00
George Bosilca
16c6370b73
A little bit of cleanup, the main logic is still the same.
...
This commit was SVN r22043.
2009-10-01 14:05:25 +00:00
Ralph Castain
a15c58c583
Fix the proc assignment into the job data object during assignment of vpids as comm_spawned procs were being overwritten by their parents with the same vpid.
...
Add a little debug output when updating proc state
This commit was SVN r22042.
2009-10-01 13:44:34 +00:00