Adrian Knoth
e981a259bb
btl_tcp_disable_family=4 and btl_tcp_disable_family=6 are mutually
...
exclusive, so this should result in "unreachable" when set differently
between peers.
This commit was SVN r18168.
2008-04-16 10:14:58 +00:00
Adrian Knoth
84e4013530
Always declare oob_tcp_disable_family, no matter if --disable-ipv6 is set.
...
This commit was SVN r18164.
2008-04-16 09:31:15 +00:00
Adrian Knoth
0ddfff4ffe
Added new oob-tcp parameter oob_tcp_disable_family.
...
Like btl_tcp_disable_family, this parameter more or less disables
a whole address family. Though the sockets are still created, the
corresponding information isn't added to the connection strings.
Likewise, we don't try to connect to addresses matching the disabled
address family.
This is particularly important for multidomain clusters, where IPv4 is
oftenly filtered (firewalled), sometimes by simply dropping the packets
instead of rejecting them (thus causing a connection timeout instead of
a quick "no route to host").
This commit was SVN r18163.
2008-04-16 09:22:00 +00:00
Adrian Knoth
75c54616c7
renamed opal_sockaddr2str to opal_net_get_hostname for WANT_PEER_DUMP=1
...
This commit was SVN r18154.
2008-04-15 19:23:47 +00:00
Tim Mattox
55b2546026
Update the NEWS file for a 1.2.7 change.
...
This commit was SVN r18153.
2008-04-15 17:31:57 +00:00
Jeff Squyres
72af302360
Remove unused variable.
...
This commit was SVN r18151.
2008-04-15 14:58:32 +00:00
Ralph Castain
a4ea756a76
Ensure the node loop cntr gets incremented if the daemon already exists
...
This commit was SVN r18150.
2008-04-15 14:20:03 +00:00
Ralph Castain
73e4cfe58a
Add platform files for LANL's RRZ cluster. Update LANL platform files to not build libnbc, vt to save time
...
This commit was SVN r18146.
2008-04-15 02:19:54 +00:00
Ralph Castain
35c260a14f
Fix the plm modules to accommodate the new remote_spawn entry - set that entry to NULL for all but rsh as only that module supports it at this time
...
This commit was SVN r18145.
2008-04-14 19:36:13 +00:00
Ralph Castain
84156c422f
Egad! Typo snuck in there...nasty vi!
...
This commit was SVN r18144.
2008-04-14 18:29:11 +00:00
Ralph Castain
7c7304466c
Add a binomial tree-based launch to ssh, turned "on" only when the plm_rsh_tree_spawned mca param is set to a non-zero value. This probably isn't a very optimized capability, but it does execute a tree-based launch that may scale better than linear at high node counts.
...
Add the daemon map capability to the ODLS to create and save a map of daemon vpid vs nodename from the launch message.
Cleanup a few places in the base plm launch support where we didn't adequately protect rml recv's from potentially executing sends.
This commit was SVN r18143.
2008-04-14 18:26:08 +00:00
Aurelien Bouteiller
0f311ed824
Make sure the function returns NULL when no elan adapter is available instead of a random value.
...
This commit was SVN r18136.
2008-04-11 21:03:01 +00:00
Aurelien Bouteiller
20592cbcbf
Fixes a warning about mallocing 0 bytes when no elan adapter is available.
...
This commit was SVN r18135.
2008-04-11 20:59:12 +00:00
Aurelien Bouteiller
921a6ce3d4
Process with different jobid can kwon connet/accept to each other.
...
This commit was SVN r18134.
2008-04-11 15:40:59 +00:00
Rich Graham
249445d61f
added reduce-scatter followed by gather to root.
...
This commit was SVN r18133.
2008-04-11 13:49:08 +00:00
Rich Graham
a6bdbfab97
implement allreduce as reduce-scatter, followed by an allgather.
...
This commit was SVN r18132.
2008-04-11 04:06:29 +00:00
Jon Mason
08ead87604
Potential double free of locks
...
mca_btl_openib_endpoint_post_rr_nolock is freeing the endpoint lock on
the error case, but most/all of the functions calling this free the lock
regardless of its error case. Thus resulting is a double free of the
lock.
This commit was SVN r18131.
2008-04-10 21:15:01 +00:00
Ralph Castain
e050f37578
Cleanup a few warnings about initializing variables.
...
Remove an obsolete data value.
This commit was SVN r18129.
2008-04-10 19:15:16 +00:00
Rich Graham
70f3aab5f2
remove some code that is not needed.
...
This commit was SVN r18128.
2008-04-10 17:32:04 +00:00
Rich Graham
5c7db1e315
remove 2 race conditions in the buffer recycling logic.
...
This commit was SVN r18127.
2008-04-10 17:20:52 +00:00
Ralph Castain
851279fc9f
Consolidate the daemon wireup message into the launch message. The daemons don't need their contact info prior to the launch message anyway. This not only eliminates a job-wide communication from the startup procedure, but it also resolves a race condition reported when operating across highly distributed (i.e., cross-country) networks. In such scenarios, it proved possible for a daemon to receive its launch message -before- it had received the contact info message, even though the latter had been sent first!
...
This eliminates that problem...
This commit was SVN r18126.
2008-04-10 15:35:11 +00:00
Ralph Castain
4b798cf29a
Massage these platform files a little...
...
This commit was SVN r18125.
2008-04-10 15:32:41 +00:00
Edgar Gabriel
4964434205
reverting commit 18122, since the commit was executed accidentally in the
...
wring directory. The UH copyrights do belong into this file (i.e. because of
the fix which is in the 1.2 branch, the UH copyright notes are in the header
there alreary), but I want to have the proper log for that.
This commit was SVN r18124.
2008-04-10 15:09:31 +00:00
Edgar Gabriel
5989fa570c
Sorry, previous commit was in the wrong directory. This is the real fix (have
...
to undo 1822).
The verification of recvcount==0 and rank = root was braking
inter-communicator scatter, since the root (root==MPI_ROOT) might very well
have recvcount=0. The same fix has been applied to gather.c just the other way
round.
Fixes the bug reported on the mainling list by Martin Audet. If there is a
1.2.7 this fix might be worthwhile porting it over.
Please note, that while the test works now for basic and for inter, we get a
0byte malloc warning from the inter module, which we still have to fix in a
separate patch.
This commit was SVN r18123.
2008-04-10 15:03:14 +00:00
Edgar Gabriel
f87830767a
the verification of recvcount==0 and rank = root was braking
...
inter-communicator scatter, since the root (root==MPI_ROOT) might very well
have recvcount=0. The same fix has been applied to gather.c just the other way
round.
Fixes the bug reported on the mainling list by Martin Audet. If there is a
1.2.7 this fix might be worthwhile porting it over.
Please note, that while the test works now for basic and for inter, we get a
0byte malloc warning from the inter module, which we still have to fix in a
separate patch.
This commit was SVN r18122.
2008-04-10 14:58:51 +00:00
Ralph Castain
57e3e86cda
Use the proper exit code for mpirun to indicate an error when something goes wrong during launch (in scenarios where the procs don't report the problem directly themselves)
...
This commit was SVN r18121.
2008-04-10 09:15:08 +00:00
Ralph Castain
e7d0dae89d
Ensure we update the daemon collective trees if num_procs changes, but only if it changes
...
This commit was SVN r18120.
2008-04-10 03:44:18 +00:00
Ralph Castain
22343e6e0b
Given total lack of interest/support from the folks behind these environments, and the fact that we can now scale so well with our own daemons, it seems unlikely that we will be able to pursue direct and/or standalone launch in these environments. If that situation ever changes, it is easy enough to revive the effort since little had really been done to-date.
...
Meantime, no reason to continue dragging these around.
This commit was SVN r18119.
2008-04-10 02:54:13 +00:00
Ralph Castain
dc2f88b9f0
Now that we have the daemon collectives, the unity routed module no longer needs the "hack" we inserted a week ago to tell the daemons how to talk directly to all the application procs. The modex and barrier messages flow cleanly across the daemons and are "dropped" into the procs where required.
...
Add some insurance to make certain that the daemons' number of procs only gets updated when it absolutely is intended.
This commit was SVN r18118.
2008-04-10 02:45:42 +00:00
Ralph Castain
0b3122ee2f
Update the cnos module - should (hopefully) compile and work...
...
This commit was SVN r18117.
2008-04-09 22:33:00 +00:00
Ralph Castain
86b4ae5970
Remove a generated file from the repository - shouldn't have been there
...
This commit was SVN r18116.
2008-04-09 22:13:51 +00:00
Ralph Castain
3a0d09300b
Fully implement the inbound binomial allgather for daemon-based collectives. Supports both modex and barrier operations.
...
Comm_spawn still uses the rank=0 method - shifting that algo to the daemons is under study.
This commit was SVN r18115.
2008-04-09 22:10:53 +00:00
Ralph Castain
7cb1e72f76
Okay, let's really get those libz references out of there!
...
This commit was SVN r18114.
2008-04-09 22:05:09 +00:00
Ralph Castain
95d7e177c6
Not really a test, but a useful tool for testing computation of binomial trees
...
This commit was SVN r18113.
2008-04-09 21:58:42 +00:00
Ralph Castain
3120428f0f
Update several platform files to remove the libz dependency, add a couple for the Mac
...
This commit was SVN r18112.
2008-04-09 21:57:59 +00:00
Rich Graham
c6783549ef
getting old
...
This commit was SVN r18110.
2008-04-09 16:55:16 +00:00
Rich Graham
1a20c3ce51
more debug.
...
This commit was SVN r18109.
2008-04-09 16:19:52 +00:00
Rich Graham
e7e18303f6
more debug.
...
This commit was SVN r18108.
2008-04-09 15:10:58 +00:00
Rich Graham
b14c6b17d5
adding debug output.
...
This commit was SVN r18107.
2008-04-09 13:32:01 +00:00
Ralph Castain
11c6773c83
Commit a patch from Brian that fixes potential segfaults in systems where IPv6 include files are found, but the kernel doesn't actually support IPv6.
...
This commit was SVN r18106.
2008-04-09 12:53:24 +00:00
Rich Graham
10434fb2f1
add barrier synchorinzation at the end of the module init, to
...
avoid initializing shared memory variables in use.
This commit was SVN r18105.
2008-04-09 03:44:40 +00:00
Rich Graham
19bb1a2e86
fix initialization bug.
...
This commit was SVN r18104.
2008-04-08 23:34:06 +00:00
Donald Kerr
38e298cc9a
report error message in all libs, not just debug
...
This commit was SVN r18103.
2008-04-08 22:58:28 +00:00
Rich Graham
a69a8d9626
initialize the flags.
...
This commit was SVN r18102.
2008-04-08 22:16:39 +00:00
Rich Graham
8765a2bbdd
more debug code.
...
This commit was SVN r18101.
2008-04-08 20:38:20 +00:00
Rich Graham
08becf33b5
add more debugging.
...
This commit was SVN r18100.
2008-04-08 18:44:50 +00:00
Rich Graham
aa1b7dd406
more debug
...
This commit was SVN r18099.
2008-04-08 03:56:47 +00:00
Rich Graham
0c18bdeff7
more debug code.
...
This commit was SVN r18098.
2008-04-08 03:04:20 +00:00
Rich Graham
9d5a7238df
Add some debugging code.
...
This commit was SVN r18097.
2008-04-07 23:20:15 +00:00
Rich Graham
fa696734d5
add some debug code.
...
This commit was SVN r18096.
2008-04-07 21:03:23 +00:00