Ralph Castain
c3ddf66445
Move the dislay-allocation code to where it is always seen
...
This commit was SVN r18227.
2008-04-21 20:28:59 +00:00
Ralph Castain
16c9100633
Add --display-allocation option to orterun that will display the node-by-node information regarding your allocation.
...
This commit was SVN r18216.
2008-04-20 02:25:45 +00:00
Ralph Castain
07f0a71faa
Cleanup the show_help entries on the seq mapper
...
This commit was SVN r18191.
2008-04-17 14:43:15 +00:00
Ralph Castain
e7487ad533
Implement the seq rmaps module that sequentially maps process ranks to a list hosts in a hostfile.
...
Restore the "do-not-launch" functionality so users can test a mapping without launching it.
Add a "do-not-resolve" cmd line flag to mpirun so the opal/util/if.c code does not attempt to resolve network addresses, thus enabling a user to test a hostfile mapping without hanging on network resolve requests.
Add a function to hostfile to generate an ordered list of host names from a hostfile
This commit was SVN r18190.
2008-04-17 13:50:59 +00:00
Ralph Castain
66e532669a
Remove some dead code
...
This commit was SVN r18182.
2008-04-16 20:33:53 +00:00
Ralph Castain
3413191e52
Fix singleton and singleton comm_spawn
...
This commit was SVN r18177.
2008-04-16 14:38:10 +00:00
Ralph Castain
7b91f8baff
Cleanup and fix bugs in the MPI dynamics section. Modify the dpm API so it properly takes ports instead of process names (as correctly identified by Aurelien). Fix race conditions in the use of ompi-server. Fix incompatibilities between the mpi bindings and the dpm implemenation that could cause segfaults due to uninitialized memory.
...
Fix the ompi-server -h cmd line option so it actually tells you something!
Add two new testing codes to the orte/test/mpi area: accept and connect.
This commit was SVN r18176.
2008-04-16 14:27:42 +00:00
Adrian Knoth
84e4013530
Always declare oob_tcp_disable_family, no matter if --disable-ipv6 is set.
...
This commit was SVN r18164.
2008-04-16 09:31:15 +00:00
Adrian Knoth
0ddfff4ffe
Added new oob-tcp parameter oob_tcp_disable_family.
...
Like btl_tcp_disable_family, this parameter more or less disables
a whole address family. Though the sockets are still created, the
corresponding information isn't added to the connection strings.
Likewise, we don't try to connect to addresses matching the disabled
address family.
This is particularly important for multidomain clusters, where IPv4 is
oftenly filtered (firewalled), sometimes by simply dropping the packets
instead of rejecting them (thus causing a connection timeout instead of
a quick "no route to host").
This commit was SVN r18163.
2008-04-16 09:22:00 +00:00
Ralph Castain
a4ea756a76
Ensure the node loop cntr gets incremented if the daemon already exists
...
This commit was SVN r18150.
2008-04-15 14:20:03 +00:00
Ralph Castain
35c260a14f
Fix the plm modules to accommodate the new remote_spawn entry - set that entry to NULL for all but rsh as only that module supports it at this time
...
This commit was SVN r18145.
2008-04-14 19:36:13 +00:00
Ralph Castain
84156c422f
Egad! Typo snuck in there...nasty vi!
...
This commit was SVN r18144.
2008-04-14 18:29:11 +00:00
Ralph Castain
7c7304466c
Add a binomial tree-based launch to ssh, turned "on" only when the plm_rsh_tree_spawned mca param is set to a non-zero value. This probably isn't a very optimized capability, but it does execute a tree-based launch that may scale better than linear at high node counts.
...
Add the daemon map capability to the ODLS to create and save a map of daemon vpid vs nodename from the launch message.
Cleanup a few places in the base plm launch support where we didn't adequately protect rml recv's from potentially executing sends.
This commit was SVN r18143.
2008-04-14 18:26:08 +00:00
Ralph Castain
e050f37578
Cleanup a few warnings about initializing variables.
...
Remove an obsolete data value.
This commit was SVN r18129.
2008-04-10 19:15:16 +00:00
Ralph Castain
851279fc9f
Consolidate the daemon wireup message into the launch message. The daemons don't need their contact info prior to the launch message anyway. This not only eliminates a job-wide communication from the startup procedure, but it also resolves a race condition reported when operating across highly distributed (i.e., cross-country) networks. In such scenarios, it proved possible for a daemon to receive its launch message -before- it had received the contact info message, even though the latter had been sent first!
...
This eliminates that problem...
This commit was SVN r18126.
2008-04-10 15:35:11 +00:00
Ralph Castain
57e3e86cda
Use the proper exit code for mpirun to indicate an error when something goes wrong during launch (in scenarios where the procs don't report the problem directly themselves)
...
This commit was SVN r18121.
2008-04-10 09:15:08 +00:00
Ralph Castain
e7d0dae89d
Ensure we update the daemon collective trees if num_procs changes, but only if it changes
...
This commit was SVN r18120.
2008-04-10 03:44:18 +00:00
Ralph Castain
22343e6e0b
Given total lack of interest/support from the folks behind these environments, and the fact that we can now scale so well with our own daemons, it seems unlikely that we will be able to pursue direct and/or standalone launch in these environments. If that situation ever changes, it is easy enough to revive the effort since little had really been done to-date.
...
Meantime, no reason to continue dragging these around.
This commit was SVN r18119.
2008-04-10 02:54:13 +00:00
Ralph Castain
dc2f88b9f0
Now that we have the daemon collectives, the unity routed module no longer needs the "hack" we inserted a week ago to tell the daemons how to talk directly to all the application procs. The modex and barrier messages flow cleanly across the daemons and are "dropped" into the procs where required.
...
Add some insurance to make certain that the daemons' number of procs only gets updated when it absolutely is intended.
This commit was SVN r18118.
2008-04-10 02:45:42 +00:00
Ralph Castain
0b3122ee2f
Update the cnos module - should (hopefully) compile and work...
...
This commit was SVN r18117.
2008-04-09 22:33:00 +00:00
Ralph Castain
3a0d09300b
Fully implement the inbound binomial allgather for daemon-based collectives. Supports both modex and barrier operations.
...
Comm_spawn still uses the rank=0 method - shifting that algo to the daemons is under study.
This commit was SVN r18115.
2008-04-09 22:10:53 +00:00
Ralph Castain
11c6773c83
Commit a patch from Brian that fixes potential segfaults in systems where IPv6 include files are found, but the kernel doesn't actually support IPv6.
...
This commit was SVN r18106.
2008-04-09 12:53:24 +00:00
Lenny Verkhovsky
2be4e32c79
1. Fixing Possible strdup of NULL
...
2. Fixing num_alloc when combined mapping policies ( rankfile & byslot or bynode )
This commit was SVN r18073.
2008-04-02 14:12:38 +00:00
Ralph Castain
f115b4aed2
Checkpoint the revised gather algorithm
...
This commit was SVN r18072.
2008-04-02 13:35:06 +00:00
Adrian Knoth
a56b9b1df1
Fix broken build with --disable-ipv6.
...
This commit was SVN r18071.
2008-04-02 10:53:48 +00:00
Ralph Castain
50433bf833
Turn off the new fqdn behavior pending resolution of hostfile issue
...
This commit was SVN r18064.
2008-04-01 20:52:22 +00:00
Ralph Castain
51533c9340
Add a new mapper component that sequentially maps ranks-to-hosts according to the ordering in the hostfile.
...
Not functional yet - still under development. Just placeholding for now to clear a backlog
This commit was SVN r18062.
2008-04-01 20:03:49 +00:00
Ralph Castain
ee5b96269e
The RML is comfortable with zero-byte payloads, so don't pack something we don't need
...
This commit was SVN r18061.
2008-04-01 19:24:46 +00:00
Ralph Castain
3a4c10efd6
Delete obsolete file, cleanup obsolete cruft in another file
...
This commit was SVN r18060.
2008-04-01 18:36:23 +00:00
Ralph Castain
39c2680e9a
Silence warning
...
This commit was SVN r18057.
2008-04-01 13:42:16 +00:00
Ralph Castain
524ed5d515
Don't have singletons wireup the iof. Instead, we let the fork'd orted handle io forwarding. This prevents an issue with the event library and pty's on singletons
...
This commit was SVN r18056.
2008-04-01 12:40:00 +00:00
Ralph Castain
3e8846d685
Some code cleanups from Brian to clarify port selection and opening logic
...
This commit was SVN r18055.
2008-04-01 12:39:02 +00:00
Ralph Castain
fe88956080
Fix singleton modex - ensure singletons know that a daemon is now in the system
...
This commit was SVN r18047.
2008-03-31 20:36:27 +00:00
Ralph Castain
f3936ff9bc
Record the daemon's state so that we don't attempt to send "die" messages to a daemon that is known to have failed to start.
...
This commit was SVN r18044.
2008-03-31 18:15:24 +00:00
George Bosilca
ee784b601e
For consistency reasons always use opal_home_directory and
...
opal_tmp_directory.
This commit was SVN r18043.
2008-03-31 18:13:41 +00:00
Ralph Castain
d8eb0eeec3
Correct the debug output
...
This commit was SVN r18042.
2008-03-31 18:09:37 +00:00
Ralph Castain
2b399a3563
Suppress a warning message - relegate it to only show up when verbosity is set as it is okay for this condition to be true
...
This commit was SVN r18041.
2008-03-31 17:48:07 +00:00
Ralph Castain
f327ebce31
Get the jobid correct - doh!
...
This commit was SVN r18040.
2008-03-31 17:42:50 +00:00
Ralph Castain
e396b9ee9a
Fix unity routed component by adding xcast of proc data to the daemons. This enables daemons to complete the revised modex procedure by forwarding their collected modex info to the rank=0 proc.
...
This commit was SVN r18039.
2008-03-31 17:35:29 +00:00
George Bosilca
493677426d
Use the OPAL function to retrieve the HOME and TMP environment values.
...
This commit was SVN r18037.
2008-03-31 17:10:08 +00:00
Ralph Castain
379b8a3e2f
Fix singleton operations that have no data in the modex.
...
Note: this also allows -any- modex operation to have zero data in it, not just singletons.
This commit was SVN r18034.
2008-03-31 13:53:23 +00:00
Ralph Castain
1889bbd119
Quiet some warnings about uninitialized variables
...
This commit was SVN r18032.
2008-03-31 13:52:10 +00:00
Ralph Castain
8506be755d
Clean-up the mess. Repair static builds. Remove unused and empty C-decl braces. Add missing prototype for function.
...
This commit was SVN r18031.
2008-03-31 13:02:33 +00:00
Ralph Castain
81a83dabc6
Setup sandbox for testing new orte collectives
...
This commit was SVN r18026.
2008-03-31 04:21:37 +00:00
George Bosilca
594884b613
The return is an int not a pointer.
...
This commit was SVN r18024.
2008-03-30 19:06:25 +00:00
George Bosilca
a6d5c15249
There is no need to force opal_progress down there. It will get called few
...
steps upper.
This commit was SVN r18022.
2008-03-30 19:05:09 +00:00
Lenny Verkhovsky
7e45d7e134
Few updates due to RMAPS rank_file component changes
...
1. applied prefix rule to functions and variables of RMAPS rank_file component
2. cleaned ompi_mpi_init.c from paffinity code
3. paffinity code moved to new opal/mca/paffinity/base/paffinity_base_service.c file
4. added opal_paffinity_slot_list mca parameter
This commit was SVN r18019.
2008-03-30 11:52:11 +00:00
Lenny Verkhovsky
cb83a1287d
Realy deleted old files now
...
This commit was SVN r18018.
2008-03-30 11:50:19 +00:00
Lenny Verkhovsky
f734ba51a4
Added files with names according to prefix rule
...
This commit was SVN r18017.
2008-03-30 11:42:09 +00:00
Lenny Verkhovsky
b43f4a2dc9
Deleted and added files after prefix rule changes
...
This commit was SVN r18016.
2008-03-30 11:41:01 +00:00