Ralph Castain
dc2f88b9f0
Now that we have the daemon collectives, the unity routed module no longer needs the "hack" we inserted a week ago to tell the daemons how to talk directly to all the application procs. The modex and barrier messages flow cleanly across the daemons and are "dropped" into the procs where required.
...
Add some insurance to make certain that the daemons' number of procs only gets updated when it absolutely is intended.
This commit was SVN r18118.
2008-04-10 02:45:42 +00:00
Ralph Castain
0b3122ee2f
Update the cnos module - should (hopefully) compile and work...
...
This commit was SVN r18117.
2008-04-09 22:33:00 +00:00
Ralph Castain
86b4ae5970
Remove a generated file from the repository - shouldn't have been there
...
This commit was SVN r18116.
2008-04-09 22:13:51 +00:00
Ralph Castain
3a0d09300b
Fully implement the inbound binomial allgather for daemon-based collectives. Supports both modex and barrier operations.
...
Comm_spawn still uses the rank=0 method - shifting that algo to the daemons is under study.
This commit was SVN r18115.
2008-04-09 22:10:53 +00:00
Ralph Castain
95d7e177c6
Not really a test, but a useful tool for testing computation of binomial trees
...
This commit was SVN r18113.
2008-04-09 21:58:42 +00:00
Ralph Castain
11c6773c83
Commit a patch from Brian that fixes potential segfaults in systems where IPv6 include files are found, but the kernel doesn't actually support IPv6.
...
This commit was SVN r18106.
2008-04-09 12:53:24 +00:00
Ralph Castain
5e6dc24e62
Fix ompi-server so it works with unity routed module - still not working with tree routing.
...
Cleanup debug flag so it activates debugging on the data server code itself
This commit was SVN r18080.
2008-04-04 19:17:28 +00:00
Tim Prins
313edd8955
- Fix a problem reported on the users list where we would segfault in finalize after calling spawn if the user did not call MPI_Comm_disconnect
...
- Fix the app context constructor so it initializes all the fields.
This commit was SVN r18079.
2008-04-04 15:07:39 +00:00
Ralph Castain
537395b924
Make two important MCA params "visible" to ompi_info
...
This commit was SVN r18074.
2008-04-02 14:54:57 +00:00
Lenny Verkhovsky
2be4e32c79
1. Fixing Possible strdup of NULL
...
2. Fixing num_alloc when combined mapping policies ( rankfile & byslot or bynode )
This commit was SVN r18073.
2008-04-02 14:12:38 +00:00
Ralph Castain
f115b4aed2
Checkpoint the revised gather algorithm
...
This commit was SVN r18072.
2008-04-02 13:35:06 +00:00
Adrian Knoth
a56b9b1df1
Fix broken build with --disable-ipv6.
...
This commit was SVN r18071.
2008-04-02 10:53:48 +00:00
Ralph Castain
50433bf833
Turn off the new fqdn behavior pending resolution of hostfile issue
...
This commit was SVN r18064.
2008-04-01 20:52:22 +00:00
Ralph Castain
8dca132604
Cleanup some ignores
...
Add missing variables!
This commit was SVN r18063.
2008-04-01 20:32:17 +00:00
Ralph Castain
51533c9340
Add a new mapper component that sequentially maps ranks-to-hosts according to the ordering in the hostfile.
...
Not functional yet - still under development. Just placeholding for now to clear a backlog
This commit was SVN r18062.
2008-04-01 20:03:49 +00:00
Ralph Castain
ee5b96269e
The RML is comfortable with zero-byte payloads, so don't pack something we don't need
...
This commit was SVN r18061.
2008-04-01 19:24:46 +00:00
Ralph Castain
3a4c10efd6
Delete obsolete file, cleanup obsolete cruft in another file
...
This commit was SVN r18060.
2008-04-01 18:36:23 +00:00
Ralph Castain
39c2680e9a
Silence warning
...
This commit was SVN r18057.
2008-04-01 13:42:16 +00:00
Ralph Castain
524ed5d515
Don't have singletons wireup the iof. Instead, we let the fork'd orted handle io forwarding. This prevents an issue with the event library and pty's on singletons
...
This commit was SVN r18056.
2008-04-01 12:40:00 +00:00
Ralph Castain
3e8846d685
Some code cleanups from Brian to clarify port selection and opening logic
...
This commit was SVN r18055.
2008-04-01 12:39:02 +00:00
Ralph Castain
fe88956080
Fix singleton modex - ensure singletons know that a daemon is now in the system
...
This commit was SVN r18047.
2008-03-31 20:36:27 +00:00
Ralph Castain
f3936ff9bc
Record the daemon's state so that we don't attempt to send "die" messages to a daemon that is known to have failed to start.
...
This commit was SVN r18044.
2008-03-31 18:15:24 +00:00
George Bosilca
ee784b601e
For consistency reasons always use opal_home_directory and
...
opal_tmp_directory.
This commit was SVN r18043.
2008-03-31 18:13:41 +00:00
Ralph Castain
d8eb0eeec3
Correct the debug output
...
This commit was SVN r18042.
2008-03-31 18:09:37 +00:00
Ralph Castain
2b399a3563
Suppress a warning message - relegate it to only show up when verbosity is set as it is okay for this condition to be true
...
This commit was SVN r18041.
2008-03-31 17:48:07 +00:00
Ralph Castain
f327ebce31
Get the jobid correct - doh!
...
This commit was SVN r18040.
2008-03-31 17:42:50 +00:00
Ralph Castain
e396b9ee9a
Fix unity routed component by adding xcast of proc data to the daemons. This enables daemons to complete the revised modex procedure by forwarding their collected modex info to the rank=0 proc.
...
This commit was SVN r18039.
2008-03-31 17:35:29 +00:00
George Bosilca
493677426d
Use the OPAL function to retrieve the HOME and TMP environment values.
...
This commit was SVN r18037.
2008-03-31 17:10:08 +00:00
Ralph Castain
379b8a3e2f
Fix singleton operations that have no data in the modex.
...
Note: this also allows -any- modex operation to have zero data in it, not just singletons.
This commit was SVN r18034.
2008-03-31 13:53:23 +00:00
Ralph Castain
ce96cb4800
Quite warning about uninitialized variable
...
This commit was SVN r18033.
2008-03-31 13:52:27 +00:00
Ralph Castain
1889bbd119
Quiet some warnings about uninitialized variables
...
This commit was SVN r18032.
2008-03-31 13:52:10 +00:00
Ralph Castain
8506be755d
Clean-up the mess. Repair static builds. Remove unused and empty C-decl braces. Add missing prototype for function.
...
This commit was SVN r18031.
2008-03-31 13:02:33 +00:00
Ralph Castain
81a83dabc6
Setup sandbox for testing new orte collectives
...
This commit was SVN r18026.
2008-03-31 04:21:37 +00:00
George Bosilca
60111ce66d
Few less warnings.
...
This commit was SVN r18025.
2008-03-30 19:06:49 +00:00
George Bosilca
594884b613
The return is an int not a pointer.
...
This commit was SVN r18024.
2008-03-30 19:06:25 +00:00
George Bosilca
a6d5c15249
There is no need to force opal_progress down there. It will get called few
...
steps upper.
This commit was SVN r18022.
2008-03-30 19:05:09 +00:00
Lenny Verkhovsky
7e45d7e134
Few updates due to RMAPS rank_file component changes
...
1. applied prefix rule to functions and variables of RMAPS rank_file component
2. cleaned ompi_mpi_init.c from paffinity code
3. paffinity code moved to new opal/mca/paffinity/base/paffinity_base_service.c file
4. added opal_paffinity_slot_list mca parameter
This commit was SVN r18019.
2008-03-30 11:52:11 +00:00
Lenny Verkhovsky
cb83a1287d
Realy deleted old files now
...
This commit was SVN r18018.
2008-03-30 11:50:19 +00:00
Lenny Verkhovsky
f734ba51a4
Added files with names according to prefix rule
...
This commit was SVN r18017.
2008-03-30 11:42:09 +00:00
Lenny Verkhovsky
b43f4a2dc9
Deleted and added files after prefix rule changes
...
This commit was SVN r18016.
2008-03-30 11:41:01 +00:00
Jeff Squyres
8d79bfe860
Fix for CID 937. All we really care about is being able to chrdir;
...
the extra checks were unnecessary.
This commit was SVN r18015.
2008-03-29 13:15:22 +00:00
Ralph Castain
6fcaa8df39
Remove stale define. Add global variable to be used soon.
...
This commit was SVN r18005.
2008-03-28 02:20:37 +00:00
Ralph Castain
9f1001a6f8
Ensure that the procs know how many daemons will be participating in collective operations.
...
This commit was SVN r17992.
2008-03-27 17:31:54 +00:00
Ralph Castain
6166278e18
Improve the scalability of the modex operation and fix a bug reported by Tim P
...
The bug was a race condition in the barrier operation that caused the barrier in MPI_Finalize to fail on very short programs.
Scalaiblity was improved by using the daemons to aggregate modex and barrier messages before sending them to the rank=0 proc. Improvement is proportional to ppn, of course, but there really wasn't a scaling problem at low ppn anyway. This modification also paves the way for better allgather operations since now all the data for each node is sitting at the daemon level, and the daemons are now aware that a collective operation on the OOB is underway (so they -can- participate in a collective of their own to support it).
Also added better diagnostics to map out the timing associated with MPI_Init - turned on by -mca orte_timing 1.
This commit was SVN r17988.
2008-03-27 15:17:53 +00:00
Ralph Castain
8e6da2ee76
Maintain the mapping bookmark across multiple comm_spawns
...
This commit was SVN r17984.
2008-03-27 00:19:13 +00:00
Ralph Castain
abfb3577c1
Ensure that the bookmark of the parent job is applied to the child in a comm_spawn so we start mapping from the right place
...
This commit was SVN r17982.
2008-03-26 21:18:16 +00:00
Josh Hursey
55044c3c4f
A fix from resulting from r17944. Need to make sure we go through
...
orte_proc_info_finalize properly so the 'init' flag is set on restart.
This is a bit cleaner anyway, esp since the GPR is gone.
This commit was SVN r17978.
The following SVN revision numbers were found above:
r17944 --> open-mpi/ompi@ec76fe4fe4
2008-03-26 14:13:33 +00:00
Ralph Castain
7ad6db207c
Cover some timing-related output
...
This commit was SVN r17977.
2008-03-26 12:54:50 +00:00
Rainer Keller
ce8154eb3e
- Coverity issues CID 945:
...
Event uninit_use: Using uninitialized value "rc"
Instead of initializing rc in the beginning, rather use return value
of opal_hash_table_set_value_uint32.
This commit was SVN r17976.
2008-03-26 11:39:25 +00:00
Brad Benton
0b84dfd2a6
POE is not currently working or supported, so removing from the trunk.
...
This commit was SVN r17970.
2008-03-26 02:06:40 +00:00