1
1
Граф коммитов

63 Коммитов

Автор SHA1 Сообщение Дата
Rainer Keller
37c1b6a67e - As with rev16656, value is not modified.
Get rid of compiler warning from g++ - trunk

This commit was SVN r16670.
2007-11-06 10:56:06 +00:00
Rainer Keller
9045c5a6f1 - Value pointed to is not modified (file-name / FILE-macro),
getting rid of compiler-warning when compiled with trunk of g++:
   when doing --enable-debug:
  ../../../../orte/class/orte_pointer_array.h:128: warning: deprecated
  conversion from string constant to 'char*'

This commit was SVN r16656.
2007-11-05 13:03:35 +00:00
Ralph Castain
54b2cf747e These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC.
The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component.

This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done:

As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in.

In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in.

The incoming changes revamp these procedures in three ways:

1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step.

The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic.

Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure.


2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed.

The size of this data has been reduced in three ways:

(a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes.

To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose.

(b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction.

(c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using.

While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly.


3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup.

It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k*50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging.

Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future.


There are a few minor additional changes in the commit that I'll just note in passing:

* propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details.

* requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details.

* cleanup of some stale header files

This commit was SVN r16364.
2007-10-05 19:48:23 +00:00
Gleb Natapov
febdade113 Make non threaded OPAL_ATOMIC_CMPSET macros work correctly.
This commit was SVN r16071.
2007-09-09 08:00:16 +00:00
Gleb Natapov
dd8b0c925f Add OPAL_ATOMIC_CMPSET macros that became non atomic with only one threaded.
This commit was SVN r15720.
2007-08-01 12:13:34 +00:00
Shiqing Fan
0f468f3668 - Remove the solution and project files, will commit them later.
This commit was SVN r15705.
2007-07-31 17:07:02 +00:00
Shiqing Fan
4d7b349cdb - Add VC8 solution and project files.
- If one wants to use this solution, remember to unload the project 'orte-restart' which is currently not working for Windows.

This commit was SVN r15680.
2007-07-30 11:05:34 +00:00
George Bosilca
0158806e4c Add the missing return.
This commit was SVN r15596.
2007-07-25 03:48:04 +00:00
Josh Hursey
6026929490 Fix compiler error on Cray by adding in the std io/lib headers.
This commit was SVN r15515.
2007-07-19 18:26:10 +00:00
Brian Barrett
6427c9f92a oops, need a return statement there...
This commit was SVN r15509.
2007-07-19 16:21:11 +00:00
Brian Barrett
52ee1cb5da fix missing ; in solaris functions
This commit was SVN r15505.
2007-07-19 15:15:41 +00:00
Ralph Castain
ccdb834574 Fix a couple of compile errors. Also, we need to ensure that we only attempt to call destructors on tsd keys that were defined.
This commit was SVN r15501.
2007-07-19 12:56:41 +00:00
Brian Barrett
c5d0066c27 add ability to have thread-specific data on windows, pthreads, solaris threads,
and non-threaded builds

This commit was SVN r15492.
2007-07-18 20:23:45 +00:00
Brian Barrett
9687f70aea Add solaris condition variables
This commit was SVN r15225.
2007-06-27 16:48:30 +00:00
Brian Barrett
249ddf8fff Only print warning message about condition variable useage if the MCA
param says we should  Also, check for != 0, rather than == 1, as there
are way too many double locks, but they'll get warned when we do the
double lock.  No need to warn again, in a meaningless way.

Originally part of r15167, reverted with r15172.

This commit was SVN r15173.

The following SVN revision numbers were found above:
  r15167 --> open-mpi/ompi@faa401dc47
  r15172 --> open-mpi/ompi@5f16251808
2007-06-22 15:28:12 +00:00
Brian Barrett
5f16251808 revert r15167. I don't know what I was thinking, but it was most definitely
"not right".

This commit was SVN r15172.

The following SVN revision numbers were found above:
  r15167 --> open-mpi/ompi@faa401dc47
2007-06-22 15:25:39 +00:00
Brian Barrett
faa401dc47 * Need to OBJ_RELEASE, not OBJ_DESTRUCT things that were created with
OBJ_NEW
  * Need to single when the passive unlock has left an expose epoch for
    the win_free case
  * Clean up some debugging output
  * fix missing variable initialization

This commit was SVN r15167.
2007-06-21 22:08:30 +00:00
Jeff Squyres
57486f0b69 Re-enable threaded builds. Need to protect usage of mutex->m_*
variables to only use them a) when debugging, and b) when we don't
have real thread support.

This commit was SVN r15099.
2007-06-15 14:26:23 +00:00
George Bosilca
8c5b63768c A working version of the mutex_trylock for Windows.
This commit was SVN r15084.
2007-06-14 22:28:28 +00:00
Ethan Mallove
a09aebc30b Fix typo. Change mutex_destory to mutex_destroy.
This commit was SVN r15083.
2007-06-14 19:02:37 +00:00
George Bosilca
76ff70e672 Allow a threaded build on Windows.
This commit was SVN r15070.
2007-06-14 04:52:37 +00:00
Brian Barrett
84d1512fba Add the potential for doing some basic error checking on mutexes during
single threaded builds.  In its default configuration, all this does
is ensure that there's at least a good chance of threads building
based on non-threaded development (since the variable names will be
checked).  There is also code to make sure that a "mutex" is never
"double locked" when using the conditional macro mutex operations.
This is off by default because there are a number of places in both
ORTE and OMPI where this alarm spews mega bytes of errors on a
simple test.  So we have some work to do on our path towards
thread support.

Also removed the macro versions of the non-conditional thread locks,
as the only places they were used, the author of the code intended
to use the conditional thread locks.  So now you have upper-case
macros for conditional thread locks and lowercase functions for
non-conditional locks.  Simple, right? :).

This commit was SVN r15011.
2007-06-12 16:25:26 +00:00
Rainer Keller
0e2a335297 - When not debugging, do not initialize a unused mutexattr.
This commit was SVN r14761.
2007-05-24 18:54:22 +00:00
Brian Barrett
0e9e0c518a Fix a couple more progress thread related issues...
This commit was SVN r14708.
2007-05-21 16:06:14 +00:00
George Bosilca
633ee3c2ce Small optimizations in order to force the compiler to inline some critical functions.
This commit was SVN r14317.
2007-04-12 04:29:43 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Rainer Keller
3669e8921e - Fix further compiler warnings regarding initialization
and shadowing variables.

This commit was SVN r13358.
2007-01-30 06:34:38 +00:00
Rainer Keller
125ba1acfa - Reduce the amount of warnings with -Wshadow -- mainly due to
usage of index and abs in inline-fcts in header files.

This commit was SVN r13217.
2007-01-19 19:48:06 +00:00
Brian Barrett
6f8b366acb Rename liborte to libopen-rte and libopal to libopen-pal per telecon today
and bug #632.

Refs trac:632

This commit was SVN r12762.

The following Trac tickets were found above:
  Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632
2006-12-05 18:27:24 +00:00
George Bosilca
413e54638d The usual cast problem on Windows.
This commit was SVN r12008.
2006-10-05 05:43:46 +00:00
Brian Barrett
b56c8c3a66 * Default the value of opal_uses_threads (which is used by the macro
versions of the thread lock functions to determine at runtime if a lock
  is needed) to OMPI_ENABLE_PROGRESS_THREADS instead of 
  OMPI_HAVE_THREAD_SUPPORT.  Opal only starts a thread when
  OMPI_ENABLE_PROGRESS_THREADS is enabled, and ORTE never really starts
  a thread that requires special locking considerations.

  MPI_INIT would set opal_uses_threads to true if thread level was 
  greater than MPI_THREAD_SINGLE, but it would never decreast the
  value of opal_uses_threads, meaning that we always enabled all that
  locking if we did a threaded build, which isn't neccessary.  Now
  we do locking iff progress threads are enabled OR thread level
  is above MPI_THREAD_SINGLE.

This commit was SVN r11390.
2006-08-24 13:57:26 +00:00
George Bosilca
5e280cda19 Latest and greatest. Now OPAL is ready for the Windows prime-time.
The same treatement will happens on all sub-projects. The .h files
have to be C++ compatibles and all symbols with an external visibility
have to get the {PROJECT}_DECLSPEC in front of the prototype.

This commit was SVN r11340.
2006-08-23 00:29:35 +00:00
George Bosilca
b20cdbc651 Don't call opal_mutex_unlock if there is no progress thread. Now we are
few tens of microsecond faster.

This commit was SVN r11339.
2006-08-22 23:55:33 +00:00
George Bosilca
ee5d0828e6 Correctly handle the basic operations on threads.
This commit was SVN r11325.
2006-08-22 17:55:36 +00:00
George Bosilca
6afa4c6c64 Windows friendly version. We have to split the OMPI_DECLSPEC in at least 3
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.

This commit was SVN r11270.
2006-08-20 15:54:04 +00:00
Brian Barrett
e76c0ceadb mutex_trylock returns 0 (success) if locked and non-zero (error) if not
locked.  Make the non-threaded case always return "locked", similar to
the non-threaded case for mutex_lock.

This commit was SVN r9956.
2006-05-17 15:50:21 +00:00
Gleb Natapov
2d9757e81b Fix deadlock in opal_condition. If broadcast is called on condition with more than one thread waiting on it the first awaken thread may consume all signals so other threads will never run.
This commit was SVN r9451.
2006-03-29 14:54:58 +00:00
Brian Barrett
68f5ad074a * need synch.h as well as threads.h for Solaris threads.
This commit was SVN r9149.
2006-02-26 17:08:47 +00:00
Jeff Squyres
bd04d8f09e Add missing <thread.h> for Solaris
This commit was SVN r9148.
2006-02-26 16:56:38 +00:00
George Bosilca
ecc3e00362 Various cleanups.
This commit was SVN r9002.
2006-02-12 21:36:07 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
George Bosilca
897751bb8d Update the windows threads.
This commit was SVN r8901.
2006-02-06 04:17:13 +00:00
George Bosilca
f4f8abe3bd Add the OPAL_MUTEX_TRYLOCK macro.
This commit was SVN r8789.
2006-01-23 18:35:40 +00:00
Brian Barrett
759bfc91a3 * George pointed out this should be OMPI_HAV_ESOLARIS_THREADS, not OPAL_HAVE_
SOLARIS_THREADS...

This commit was SVN r8713.
2006-01-17 17:17:25 +00:00
George Bosilca
698b9b52fe There is no need for the ret variable when DEBUG is not enabled.
This commit was SVN r8675.
2006-01-11 20:56:22 +00:00
Brian Barrett
9310fd6f85 When debugging code is turned on with --enable-debug, try to use error checking
mutexes instead of fast mutexes.  If an error occurs, a message will be
printed, and abort() will be called.

This commit was SVN r8671.
2006-01-11 04:34:29 +00:00
George Bosilca
c107d02eee Miss this one on the last commit.
This commit was SVN r8469.
2005-12-12 22:02:42 +00:00
George Bosilca
bd0ee62e62 Protect headers and use __WINDOWS__ for Windows code.
This commit was SVN r8468.
2005-12-12 22:01:51 +00:00
Brian Barrett
1358781bd8 * remove unused condition variable files (everything is in condition.{h,c}
these days).  Always in SVN history if we want them back.

This commit was SVN r8199.
2005-11-20 00:49:37 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00