openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	52ca6cf86c	The mpi_leave_pinned and mpi_leave_pinned_pipeline MCA parameters were needlessly registered in multiple different places, and none of them had a good help string. There was also an inconsistent check for setting both mpi_leave_pinned and mpi_leave_pinned_pipeline (i.e., it was only in ob1). This commit moves the registration of these params to one central place (ompi/runtime/ompi_mpi_params.c, with all other mpi_* MCA params) and uses globals to propagate the values as relevant. The error check was also moved to the central location to ensure that we can consistency everywhere. This commit was SVN r13226.	2007-01-21 14:02:06 +00:00
Jeff Squyres	32bfbfc735	Correct a filename that would prevent show_help messages from appearing properly. This commit was SVN r13225.	2007-01-21 13:56:16 +00:00
Jeff Squyres	75df4ca602	Minor fixes for MPI-level aborting: - Fix some fpritnf's in ompi_mpi_abort() that incorrectly assumed that we were always being invoked from MPI_ABORT (ompi_mpi_abort() may be invoked from a bunch of different places) - Also try to opal_backtrace_print() if opal_bactrace_buffer() is not supported. - Print a message in MPI_ABORT if we're aborting. This commit was SVN r12998.	2007-01-04 22:30:28 +00:00
Brian Barrett	4e157380bf	Heterogeneous support changes: * Add line about heterogeneous support to ompi_info output * Print warning and abort if heterogeneous detected and no heterogeneous support available. Refs trac:587 This commit was SVN r12943. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2006-12-30 17:13:18 +00:00
Galen Shipman	faa0bafa96	modify preconnect to use a rotating ring algorithm, OOB connections are brought up lazily so we want to be a bit less agressive. This commit was SVN r12906.	2006-12-21 01:36:57 +00:00
Brian Barrett	bdf0b231b2	Undo r12871, as it contained some code in ompi/runtime that shouldn't have been committed Refs trac:669 This commit was SVN r12872. The following SVN revision numbers were found above: r12871 --> open-mpi/ompi@597598b712 The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:52:13 +00:00
Brian Barrett	597598b712	Move the req_mtl structure back to the end of each of the structures in the CM PML. The req_mtl structure is cast into a mtl__request_structure for each MTL, which is larger than the req_mtl itself. The cast will cause the _request to overwrite parts of the heavy requests if the req_mtl isn't the LAST thing on each structure (hence the comment). This was moved as an optimization at some point, which caused buffer sends to fail... Refs trac:669 This commit was SVN r12871. The following Trac tickets were found above: Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669	2006-12-15 17:46:53 +00:00
Edgar Gabriel	1359ba9b13	Rewriting much of the errorcode and errorclass code, since - we have to be able to attach a string to an error class, not just to an error code - according to MPI-2 the attribute MPI_LASTUSEDCODE has to be updated everytime you add a new code or a new class. Thus, you have to have single list for both. Thus, we got rid of the error_class structure. In the error-code structure, we can distinguish whether we are dealing with an error code or an error class by looking at the err->code element of the structure. In case its value is MPI_UNDEFINED, the according entry is a class, else it is an error code. All predefined error codes have the code and the class field set to the same value. The test MPI_Add_error_class1 passes now. Fixes trac:418 This commit was SVN r12764. The following Trac tickets were found above: Ticket 418 --> https://svn.open-mpi.org/trac/ompi/ticket/418	2006-12-05 19:07:02 +00:00
Ralph Castain	79bd8a842e	Add xcast timing in mpi_init since we cannot directly measure it (tree-based comm). This commit was SVN r12706.	2006-11-30 15:06:40 +00:00
Ralph Castain	bc4e97a435	First stage in the move to a faster startup. Change the ORTE stage gate xcast into a binary tree broadcast (away from a linear broadcast). Also, removed the timing report in the gpr_proxy component that printed out the number of bytes in the compound command message as the answer was "not much" - reduces the clutter in the data. This commit was SVN r12679.	2006-11-28 00:06:25 +00:00
Brian Barrett	33320b7165	Rework the opal_progress interface to better support dynamic processes and at the same time, remove some of the MPI-related options from OPAL: - provide mechanism to change at runtime whether sched_yield() should be called when the progress engine is idle - provide mechanism for changing the rate at which the event engine is called when there are "no" users of the event engine (ie, when using MPI but not TCP) - fix some function names in the progress engine to better match their intended use (and remove MPI naming scheme) - remove progress_mpi_enable / progress_mpi_disable because we can now use the functions to set the sched_yield and tick rate interfaces - rename opal_progress_events() to opal_progress_set_event_flag() because the first really isn't descriptive of what the function does and I always got confused by it This commit was SVN r12645.	2006-11-22 02:06:52 +00:00
Ralph Castain	6d6cebb4a7	Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it. I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn). This commit was SVN r12597.	2006-11-14 19:34:59 +00:00
Ralph Castain	4e50cdae52	This commit accomplishes two things: 1. Fix the "hang" condition when an application isn't found. It turned out that the ODLS had some difficulty with the process actually not having been started - hence, it never called the waitpid callback. As a result, the "terminated" trigger didn't fire, and so mpirun didn't wake up. With this change, the HNP's errmgr forces the issue by causing the trigger to fire itself when an abort condition occurs. 2. Shift the recording of the pid and the nodename from mpi_init to the orted launcher. This allows programs such as Eclipse PTP to get the pids even for non-MPI applications. In the case of bproc, the pls handles this chore since we don't use orteds in that system. This commit was SVN r12558.	2006-11-11 04:03:45 +00:00
Ralph Castain	c77f6c605e	Update timing reports: 1. Remove timing of xcast from mpi_init 2. Add timing report from oob_xcast on how long it took to send the message This commit was SVN r12428.	2006-11-03 18:55:05 +00:00
Ralph Castain	60e27c77e7	Add some additional timing reporting: 1. Added reporting points around the xcasts in MPI_Init. Note that these times will include time spent waiting for a trigger to fire, which is why the times between stage gates did NOT include these times initially. The inter-stage-gate times still do NOT include the xcast time - the xcast time is reported separately. 2. Added the process vpid on the MPI_Init timing reports for clarity. 3. Added a report from the xcast function on the HNP that outputs the number of bytes in the message being sent to the processes. This commit was SVN r12422.	2006-11-03 16:04:40 +00:00
Ralph Castain	36d4511143	Bring the timing instrumentation to the trunk. If you want to look at our launch and MPI process startup times, you can do so with two MCA params: OMPI_MCA_orte_timing: set it to anything non-zero and you will get the launch time for different steps in the job launch procedure. The degree of detail depends on the launch environment. rsh will provide you with the average, min, and max launch time for the daemons. SLURM block launches the daemon, so you only get the time to launch the daemons and the total time to launch the job. Ditto for bproc. TM looks more like rsh. Only those four environments are currently supported - anyone interested in extending this capability to other environs is welcome to do so. In all cases, you also get the time to setup the job for launch. OMPI_MCA_ompi_timing: set it to anything non-zero and you will get the time for mpi_init to reach the compound registry command, the time to execute that command, the time to go from our stage1 barrier to the stage2 barrier, and the time to go from the stage2 barrier to the end of mpi_init. This will be output for each process, so you'll have to compile any statistics on your own. Note: if someone develops a nice parser to do so, it would be really appreciated if you could/would share! This commit was SVN r12302.	2006-10-25 15:27:47 +00:00
George Bosilca	4689c56210	Always cast the return of malloc. This commit was SVN r11990.	2006-10-05 05:07:43 +00:00
Ralph Castain	0411f9772e	Begin instrumenting for scalability tests. I have added a new MCA param (hey, you can't have too many!) called OMPI_MCA_orte_timing. If set to anything other than zero, the system will report out critical timing loops. At the moment, this includes three measurements: 1. Time spent going through the RDS->RAS->RMAPS, setting up triggers, etc. prior to calling the actual PLS launch function. This is reported out as time to setup job. 2. Time spent in MPI_Init from start of that function (well, right after opal_init) to the place where we send all of our info the registry. Reported out as time from start to exec_compound_cmd 3. Time actually spent executing the compound cmd. Reported out as time to exec_compound_cmd. A few additional timing points will be added shortly. These may eventually be removed or (better) setup with a conditional compile flag. This commit was SVN r11892.	2006-09-29 13:19:44 +00:00
Brian Barrett	ab6cbb2359	* update ompi_mpi_abort to call abort_procs_request on the processes that should die, according to the MPI standard. It's possible that the ORTE layer may kill additional processes, but that's beyond our control and seems to be allowed by the standard (ie, it might also end up killing all the procs in all the jobs covered by the communicator). * update the stack trace printing code to use the framework rather than calling execinfo directly, so that we should be able to get stack traces on all the platforms we support stack tracing on (if the user wants stack traces on abort, of course) This commit was SVN r11753.	2006-09-22 15:04:04 +00:00
Rainer Keller	80166a9516	- fix typos This commit was SVN r11703.	2006-09-19 07:55:41 +00:00
Ralph Castain	37dfdb76eb	Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done. This commit was SVN r11661.	2006-09-14 21:29:51 +00:00
George Bosilca	e33c35112b	Correct the conversion between int and bool. Apply it on all files except the one that will be modified by Ralph for the ORTE 2.0. The missing ones are in the rsh PLS. This commit was SVN r11476.	2006-08-28 18:59:16 +00:00
George Bosilca	3f0a7cad9e	The last patch for Windows support. Mostly casting and conversion to C++ friendly headers. This commit was SVN r11400.	2006-08-24 16:38:08 +00:00
Ralph Castain	8c7f0ed9ae	Change the SOH to the new State Monitoring and Reporting (SMR) framework. New API's will be appearing in the new framework shortly - this just gets the name change into the system. Other changes: 1. Remove the old xcpu components as they are not functional. 2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one. This will require an autogen/configure, I'm afraid. This commit was SVN r11228.	2006-08-16 16:35:09 +00:00
Brian Barrett	cdffc3158d	* only set threads if not running at thread single This commit was SVN r11193.	2006-08-15 15:55:53 +00:00
Jeff Squyres	c198fd2fd5	Remove some unused variables / compiler warnings. This commit was SVN r11118.	2006-08-05 10:43:54 +00:00
Jeff Squyres	b6c6d9a2b7	Bring over r10877 and r10881 from the /tmp/tbird branch: r10877: add warm up connection option.. of course this only warms up the first eager btl but this should be adequate for now.. r10881: Consulted with Galen and did a few things: - Fix the algorithm to actually make the connections that we want - Rename the MCA param to mpi_preconnect_all - Cleanup the code a bit: - move the logic to a separate .c file - check return codes properly This commit was SVN r11114. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r10877 r10877 r10881 r10881	2006-08-04 14:41:31 +00:00
Brian Barrett	a84e557815	Add new loop mode OPAL_EVLOOP_ONELOOP that behaved like OPAL_EVLOOP_ONCE did pre-libevent update. The problem is that the behavior of OPAL_EVLOOP_ONCE was changed by the OMPI team, which them broke things during the update, so it had to be reverted to the old meaning of loop until one event occurs. OPAL_EVLOOP_ONELOOP will go through the event loop once (like EVLOOP_NONBLOCK) but will pause in the event library for a bit (like EVLOOP_ONCE). fixes trac:234 This commit was SVN r11081. The following Trac tickets were found above: Ticket 234 --> https://svn.open-mpi.org/trac/ompi/ticket/234	2006-08-01 22:23:57 +00:00
Jeff Squyres	520147f209	Clean up the Fortran MPI sentinel values per problem reported on the users mailing list: http://www.open-mpi.org/community/lists/users/2006/07/1680.php Warning: this log message is not for the weak. Read at your own risk. The problem was that we had several variables in Fortran common blocks of various types, but their C counterparts were all of a type equivalent to a fortran double complex. This didn't seem to matter for the compilers that we tested, but we never tested static builds (which is where this problem seems to occur, at least with the Intel compiler: the linker compilains that the variable in the common block in the user's .o file was of one size/alignment but the one in the C library was a different size/alignment). So this patch fixes the sizes/types of the Fortran common block variables and their corresponding C instantiations to be of the same sizes/types. But wait, there's more. We recently introduced a fix for the OSX linker where some C versions of the fortran common block variables (e.g., _ompi_fortran_status_ignore) were not being found when linking ompi_info (!). Further research shows that the code path for ompi_info to require ompi_fortran_status_ignore is, unfortunately, necessary (a quirk of how various components pull in different portions of the code base -- nothing in ompi_info itself requires fortran or MPI knowledge, of course). Hence, the real problem was that there was no code path from ompi_info to the portion of the code base where the C globals corresponding to the Fortran common block variables were instantiated. This is because the OSX linker does not automatically pull in .o files that only contain unintialized global variables; the OSX linker typically only pulls in a .o file from a library if it either has a function that is used or have a global variable that is initialized (that's the short version; lots of details and corner cases omitted). Hence, we changed the global C variables corresponding to the fortran common blocks to be initialized, thereby causing the OSX linker to pull them in automatically -- problem solved. At the same time, we moved the constants to another .c file with a function, just for good measure. However, this didn't really solve the problem: 1. The function in the file with the C versions of the fortran common block variables (ompi/mpi/f77/test_constants_f.c) did not have a code path that was reachable from ompi_info, so the only reason that the constants were found (on OSX) was because they were initialized in the global scope (i.e., causing the OSX compiler to pull in that .o file). 2. Initializing these variable in the global scope causes problems for some linkers where -- once all the size/type problems mentioned above were fixed -- the alignments of fortran common blocks and C global variables do not match (even though the types of the Fortran and C variables match -- wow!). Hence, initializing the C variables would not necessarily match the alignment of what Fortran expected, and the linker would issue a warning (i.e., the alignment warnings referenced in the original post). The solution is two-fold: 1. Move the Fortran variables from test_constants_f.c to ompi/mpi/runtime/ompi_mpi_init.c where there are other global constants that are initialized (that had nothing to do with fortran, so the alignment issues described above are not a factor), and therefore all linkers (including the OSX linker) will pull in this .o file and find all the symbols that it needs. 2. Do not initialize the C variables corresponding to the Fortran common blocks in the global scope. Indeed, never initialize them at all (because we never need their values - we only check for their locations). Since nothing is ever written to these variables (particularly in the global scope), the linker does not see any alignment differences during initialization, but does make both the C and Fortran variables have the same addresses (this method has been working in LAM/MPI for over a decade). There were some comments here in the OMPI code base and in the LAM code base that stated/implied that C variables corresponding to Fortran common blocks had to have the same alignment as the Fortran common blocks (i.e., 16). There were attempts in both code bases to ensure that this was true. However, the attempts were wrong (in both code bases), and I have now read enough Fortran compiler documentation to convince myself that matching alignments is not required (indeed, it's beyond our control). As long as C variables corresponding to Fortran common blocks are not initialized in the global scope, the linker will "figure it out" and adjust the alignment to whatever is required (i.e., the greater of the alignments). Specifically (to counter comments that no longer exist in the OMPI code base but still exist in the LAM code base): - there is no need to make attempts to specially align C variables corresponding to Fortran common blocks - the types and sizes of C variables corresponding to Fortran common blocks should match, but do not need to be on any particular alignment Finally, as a side effect of this effort, I found a bunch of inconsistencies with the intent of status/array_of_statuses parameters. For all the functions that I modified they should be "out" (not inout). This commit was SVN r11057.	2006-07-31 15:07:09 +00:00
Jeff Squyres	51c5516815	Add a new MCA parameter: mpi_keep_peer_hostnames. If this is nonzero, (which is currently the default, although we may argue over this later :-) ), a new field in the ompi_proc_t named proc_hostname will have the string hostname of that peer. If 0, this field will be NULL. This allows for printing nicer error messages in environments where peer hostnames are not otherwise easily obtainable, such as the mvapi BTL (requested by Sandia, who has both a huge number of nodes and 6GB of RAM per node, so they don't care about the extra memory usage ;-) ). This commit was SVN r9902.	2006-05-11 19:46:21 +00:00
Jeff Squyres	82d590629d	After extensive conversations about this... - My original patch stands: MPI_FINALIZE directly invokes the attribute callbacks on MPI_COMM_SELF - We added some user-level checks to ensure that they don't call MPI_FINALIZE twice (this isn't really required, but it will prevent whacky segv's -- they'll at least get a nice error message) - Removed the attribute callbacks on MPI_COMM_SELF from ompi_mpi_comm_finalize (i.e., we just moved them from ompi_mpi_comm_finalize to ompi_mpi_finalize -- we just moved this process up earlier in the MPI_FINALIZE sequence of events) - Because there were so many conversations about this, here's the rationale: - MPI-2:4.8 says that we have to MPI_COMM_FREE MPI_COMM_SELF so that the attribute callbacks are invoked. - After considerable discussion, we came to the conclusion that FREE'ing COMM_SELF is not the issue -- calling the callbacks is the issue. - So it is sufficent for MPI_FINALIZE to directly invoke these attribute callbacks - The attribute callbacks are not invoked on other communicators because said communicators are not MPI_COMM_FREE'ed This commit was SVN r9628.	2006-04-13 17:00:36 +00:00
Jeff Squyres	201f8bb602	Properly delete attributes on MPI_COMM_SELF as the very first thing in MPI_FINALIZE, per MPI-2:4.8. This commit was SVN r9618.	2006-04-12 01:16:45 +00:00
Jeff Squyres	e371aff9f5	Fix minor compiler warning This commit was SVN r9514.	2006-04-01 12:41:48 +00:00
George Bosilca	f09a6f50df	The real name is ompi_mpi_abort_print_stack. This commit was SVN r9495.	2006-03-31 04:21:09 +00:00
Brian Barrett	becc55abf6	* add missing extern in header file This commit was SVN r9493.	2006-03-31 02:45:06 +00:00
Jeff Squyres	fd61d78599	Add two MCA parameters to the MPI level to control behavior during MPI_ABORT. From the ompi_info output: MCA mpi: parameter "mpi_abort_delay" (current value: "0") If nonzero, print out an identifying message when MPI_ABORT is invoked (hostname, PID of the process that called MPI_ABORT) and delay for that many seconds before exiting (a negative delay value means to never abort). This allows attaching of a debugger before quitting the job. MCA mpi: parameter "mpi_abort_print_stack" (current value: "0") If nonzero, print out a stack trace when MPI_ABORT is invoked This commit was SVN r9487.	2006-03-31 00:31:15 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
Brian Barrett	b1d2424013	Merge in present work on the MPI-2 onesided chapter. The current code is not complete, but stable enough that it will have no impact on general development, so into the trunk it goes. Changes in this commit include: - Remove the --with option for disabling MPI-2 onesided support. It complicated code, and has no real reason for existing - add a framework osc (OneSided Communication) for encapsulating all the MPI-2 onesided functionality - Modify the MPI interface functions for the MPI-2 onesided chapter to properly call the underlying framework and do the required error checking - Created an osc component pt2pt, which is layered over the BML/BTL for communication (although it also uses the PML for long message transfers). Currently, all support functions, all communication functions (Put, Get, Accumulate), and the Fence synchronization function are implemented. The PWSC active synchronization functions and Lock/Unlock passive synchronization functions are still not implemented This commit was SVN r8836.	2006-01-28 15:38:37 +00:00
Brian Barrett	60ac1cb5f4	print stack traces (when available) for opal and orte processes, as well as ompi processes. Also add SIGABRT to the list of signals that are intercepted to print out pretty messages. This commit was SVN r8672.	2006-01-11 04:36:39 +00:00
Tim Woodall	7eade5b856	added SIGFPE to default list of signals that generate a backtrace (where supported) This commit was SVN r8632.	2006-01-04 16:02:45 +00:00
Brian Barrett	45a3eccec6	re-enable the stack-trace functionality (it had been inadvertently disabled in a merge a long time ago). Set default signals to print a stack trace from none to SIGBUS,SIGSEGV. This commit was SVN r8603.	2005-12-22 19:39:50 +00:00
George Bosilca	53235eb34d	The Windows protection is called __WINDOWS__ (and it's a mix between WIN32 and _WIN32). This commit was SVN r8440.	2005-12-10 22:10:39 +00:00
George Bosilca	58ba8f033d	Handle the case when there is no F77 support available for us. This commit was SVN r8238.	2005-11-22 21:52:14 +00:00
Jeff Squyres	9812694227	Ensure to instantiate MPI_F_STATUS_IGNORE and MPI_F_STATUSES_IGNORE. Thanks to Anthony Chan for pointing this out. Note that these will only work for the Fortran compiler that Open MPI was configured with -- since these values, are, by definition, single-value, they cannot support all 4 values that Open MPI may generate for the different Fortran name-mangling schemes. A lengthy comment in ompi_mpi_init.c explains this in more detail. Added to the README to explain this situation, as well as the forthcoming .TRUE. Fortran fixes. This commit was SVN r8231.	2005-11-22 15:24:39 +00:00
Jeff Squyres	ebd97afdac	- Make the types of the MCA param variables be "int", not "int32_t" - Separate out the registration of the MCA params into a standalong function that is invoked by ompi_mpi_init() (so that ompi_info can see these params) - Rename the params to "mpi_ddt_" instead of "datatype_" so that they fit into the common naming scheme This commit was SVN r8196.	2005-11-18 22:51:11 +00:00
George Bosilca	932c67aeb3	MPI_COMM_WORLD should be the first communicator who get created even before MPI_COMM_SELF and MPI_COMM_NULL. This commit was SVN r8132.	2005-11-12 03:47:17 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
Edgar Gabriel	3633605010	moving op_init further up in ompi_mpi_init, since it is required when quering some of the collective components. Up to now, it just worked somehow, but now with correct reference counting for ops in place, it refused :-) This commit was SVN r7866.	2005-10-25 18:33:48 +00:00
Jeff Squyres	a459659a33	Print the string name of the return code This commit was SVN r7789.	2005-10-17 20:47:44 +00:00
Brian Barrett	1302cb4072	The next in a long line of crazed build system changes from Brian. This was originally suggested by Ralf Wildenhues, to try to speed autogen, configure, and make (and possibly even make install). Use automake's include directive to drastically reduce the number of Makefile files (although the number of Makefile.am files is the same - most are just included in a top-level Makefile.am). Also use an Automake SUBDIRs feature to eliminate the dynamic-mca tree, which was no longer really needed. This makes adding a framework easier (since you don't have to remember the dynamic-mca tree) and makes building faster (as make doesn't have to recurse through the dynamic-mca tree) This commit was SVN r7777.	2005-10-17 00:21:10 +00:00
Jeff Squyres	fcef1774d5	Per advice from Ralf W., change the pkgdata declarations in Makefile.am's to be a slightly more correct (and, more importantly, less error-prone) construct. This commit was SVN r7554.	2005-09-30 13:32:39 +00:00
Jeff Squyres	285ded5655	- Ensure to have !initialized \|\| finalized test first - If we have an NS error, don't return an error -- this function's purpose is to abort :-) - s/abort()/exit(1)/ so that we don't drop massive corefiles This commit was SVN r7524.	2005-09-27 20:26:38 +00:00
David Daniel	e4985c2a07	Moving totalview spin to the very end of mpi_init This commit was SVN r7444.	2005-09-20 15:22:15 +00:00
Galen Shipman	d932cfd342	merge of rcache work into the trunk.. lotsa fun ;-).. I regression tested before the merge, I will regression test tonight and correct issues that might have crept in. This commit was SVN r7329.	2005-09-12 22:28:23 +00:00
George Bosilca	948683215b	And more fixes ... This commit was SVN r7321.	2005-09-12 20:25:01 +00:00
Brian Barrett	ed56e743b7	* update configure.ac to use the modern version of AC_INIT and AM_INIT_AUTOMAKE, instead of the deprecated version. * Work around dumbness in modern AC_INIT that requires the version number to be set at autoconf time (instead of at configure time, as it was before). Set the version number, minus the subversion r number, at autoconf time. Override the internal variables to include the r number (if needed) at configure time. Basically, the right thing should always happen. The only place it might not is the version reported as part of configure --help will not have an r number. * Since AM_INIT_AUTOMAKE taks a list of options, no need to specify them in all the Makefile.am files. * Addes support for subdir-objects, meaning that object files are put in the directory containing source files, even if the Makefile.am is in another directory. This should start making it feasible to reduce the number of Makefile.am files we have in the tree, which will greatly reduce the time to run autogen and configure. This commit was SVN r7211.	2005-09-07 05:54:53 +00:00
Jeff Squyres	3bf265eeee	Minor fix This commit was SVN r7124.	2005-09-01 09:50:19 +00:00
Ralph Castain	96f4bb7a63	Hey, sports fans!! Guess what?? Here's the huge registry check-in you've all been waiting for with baited breath. The revised version sends a single message to all processes at the various stage gates, thus making the startup much more scalable. I could provide you with all the tawdry details, but won't for now - you are welcome to ask, though, and I'll merrily bore your ears to tears. In addition, the commit contains the following: 1. set the ignore properties on ompi/debuggers and orte/mca/pls/poe 2. Added simplified subscribe and put functions to the registry's API. I have also converted all of the ompi functions that registered subscriptions to the new API, and caught their associated put's as well. In a follow-on commit, I'll be adding support for George's hetero arch registry subscription (wanted to get this one in first). This commit was SVN r7118.	2005-09-01 01:07:30 +00:00
David Daniel	227947fc51	Moving MPI-side TotalView support into a separate directory ompi/debuggers/ so that compilation options can be more easily controlled. This commit was SVN r7113.	2005-08-31 20:35:15 +00:00
David Daniel	995641c1e6	Don't initialize proctable more than once (since the stage gate 1 trigger seems to get fired at least twice). This commit was SVN r7101.	2005-08-31 00:21:55 +00:00
David Daniel	a5d9199e7f	Adding a simple hook for TotalView that is activated if a particular MCA parameter is set. orterun/MPI integration still not quite working. This commit was SVN r7097.	2005-08-30 17:34:23 +00:00
Jeff Squyres	c9cdb36b0b	Finally get this right: move orte_sys_info.[ch] back into the orte tree. - fix up #include's throughout the tree (yay contrib/search_replace.pl!) - remove a few extraneous #include's - remove orte_sys_info*() from opal_init()/opal_finalize() (it's already in orte_init_stage1() and orte_system_finalize()) - remove dependencies in opal on orte_system_info -- util/os_path.c and util/os_create_dirpath.c (they only used path_sep, anyway -- easily changed to #defines) This commit was SVN r7059.	2005-08-26 21:03:41 +00:00
Josh Hursey	4eefb33182	Some param changes: - Change orte_base_infrastructre to orte_infrastructre to conform with ompi_info's needs - Move MCA Param registration in ORTE to a centralized function that is called first in orte_init_stage1 - Set the infrastructre flag as an argument to orte_init - Adjust initalization functions to properly pass down the infrastructre flag. This commit was SVN r7053.	2005-08-26 20:13:35 +00:00
Jeff Squyres	900631e9f9	- Add a first cut of the memory affinity (maffinity) framework. It's API is still a bit unstable and may change. - Add a primitive "first use" component that simply has each process "touch" the pages that they want to use, thereby [hopefully] locking them locally to a specific processor - Add hooks in ompi_mpi_init to enable memory affinity when processor affinity is used. - Added hooks in ompi_mpi_finalize to shut down memory affinity when it was initialized during ompi_mpi_init. - Added right hooks in ompi_info to display maffinity components. This commit was SVN r7044.	2005-08-26 10:56:39 +00:00
Jeff Squyres	626389254e	Fix typo This commit was SVN r7040.	2005-08-25 20:24:22 +00:00
Brian Barrett	dfdb5dc12a	* high resolution, low latency timers for a number of platforms, plus mods to opal_progress() to use the timers instead of a tick count for deciding whether to call the event loop or not. Currently supported platforms are: - solaris (x86 / sparc) - Linux (x86 / x86_64 / IA64) - Mac OS X (x86 / Power PC) This commit was SVN r6922.	2005-08-18 05:34:22 +00:00
Jeff Squyres	a7058bbe43	Be friendly -- if paffinity is requested and we are unable to set it (for any reason), print a friendly warning message. This commit was SVN r6907.	2005-08-16 17:18:56 +00:00
Jeff Squyres	409b9e73b2	Remove debugging printf. Doh! This commit was SVN r6905.	2005-08-16 16:20:55 +00:00
Jeff Squyres	f5cc86fa07	- Add MCA param mpi_paffinity_alone, which, when nonzero, will assume that this ORTE job is the only one on the nodes involved, and if told what processors to assign the processes to, will bind MPI processes to specific processors. - Convert #include's to new style - Convert some <tab>'s to spaces This commit was SVN r6904.	2005-08-16 16:17:52 +00:00
Jeff Squyres	c465eb8567	Rename opal/threads/thread.h -> opal/threads/threads.h to avoid a naming conflict with Solaris' <thread.h> This commit was SVN r6879.	2005-08-15 11:02:01 +00:00
Jeff Squyres	cf16a521c8	Ensure to get ompi/include/constants.h This commit was SVN r6845.	2005-08-12 21:42:07 +00:00
George Bosilca	8b93cb7661	Rename all the functions starting with mca_base_modex to mca_pml_base_modex. Change all the places where they are used to fit the new name. Remove the code to check the remote arch from the PML. We will have a GPR mechanism in ompi_mpi_initialize to do that. This commit was SVN r6750.	2005-08-05 18:03:30 +00:00
Jeff Squyres	4c1dd716c7	Change and add new features to the MCA parameter system: - new preferred API calls for registering MCA parameters are mca_base_param_reg_{int\|string} and mca_base_param_reg_{int\|string}_name. - See opal/mca/base/mca_base_param.h for docs on new calls. - Can now register and lookup a value at the same time. - Can now mark a parameter "read only" at registration time - Can now mark a parameter "internal" at registration time - Can now associate a help message with the parameter at registration time; displayed in the ompi_info output. The old API calls are still available for backwards compatibility (mca_base_param_register_{int\|string}. They will eventually be removed -- all developers are encouraged to use the new APIs from here on out and replace any old calls with the new API. Some params were also renamed -- the previous convention of using "base_" as a prefix for any param that was not associated with a component is henceforth deprecated. Instead, use one of the following prefixes: mca: for anything in the MCA base itself opal: for anything in OPAL orte: for anything in ORTE mpi: for anything in OMPI This commit was SVN r6698.	2005-08-01 22:38:17 +00:00
Ralph Castain	4e79a51395	Add a job_info segment to the system that holds a container for each job. Within each container is a keyval indicating the job state (i.e., all procs at stage1, finalized, etc.). This provides a rough state-of-health for the job. This required a little fiddling with a number of areas. Biggest problem was that it uncovered a potential for an infinite loop to be created in the registry. If a callback function modified the registry, the registry checked the triggers to see if anything had fired. Well, if the original callback was due to a trigger firing, that condition hadn't changed - so the trigger fired again....which caused the callback to be called, which modified the registry, which checked the triggers, etc. etc. Triggers are now checked and then "flagged" as being "in process" so that the registry will NOT recheck that trigger until all callbacks have been processed. Tried doing this with subscriptions as well, but that caused a problem - when we release processes from a stagegate, they (at the moment) immediately place data on the registry that should cause a subscription to fire. Unfortunately, the system will just hang if that subscription doesn't get processed. So, I have left the subscription system alone - any callback function that modifies the registry in a fashion that will fire a subscription will indeed fire that subscription. We'll have to see if this causes problems - it shouldn't, but a careless user could lock things up if the callback generates a callback to itself. Also fixed the code that placed a process' RML contact info on the registry to eliminate the leading '/' from the string. This commit was SVN r6684.	2005-07-29 14:11:19 +00:00
George Bosilca	8e0d8a0e99	The datatype should be initialized as soon as possible. Inside we detect the local architecture and create the local convertor. They will get used on the ompi_proc_init. This commit was SVN r6680.	2005-07-29 00:15:26 +00:00
George Bosilca	3b52a31e1f	Make some compilers quiet. Otherwise they complain about uninitialized variables even if the logic inside prevent an execution path where they can be used uninitialized. This commit was SVN r6560.	2005-07-20 06:47:10 +00:00
George Bosilca	6aa956241f	Solve the issues when several PML are available. The main problem here come from the fact that an PML is a lot more difficult than a PTL, and it can adapt it's behavior to the level of threading required by the user. In this case the behavior is the priorit of the PML. Therefore this information is never availale before the init function (of the PML) is called. So I try to keep nearly the same structure as it was before, with one change. When a PML get initialized it does not necessarily means it has been selected, so it does not means it has to create all it's internal structures (and select the PTL and all this stuff). They can all be done later, when a PML knows that it definitively get selected (when the enable function is called with the argument set to true). Thus, in the case of a PML close one have to check if the PML has been selected or not before trying to clean up the internals. I had to change the MPI_Init function to allow the PML to be enabled before we start adding procs inside. This commit was SVN r6434.	2005-07-12 05:40:56 +00:00
Josh Hursey	de5e0d4f2c	(Re-)Added two MCA Parameters that must have been lost in the merge way back when: * mpi_show_mca_params If set to true, this turns on the dumping of all MCA parameters when MPI_INIT is called. Only the 'rank 0' processes will print the parameters. * mpi_show_mca_params_file (This value is only used if the first argument is set to true) If this value is non-NULL it specifies the file to put the dump into. This file can then be used as input to mpirun for debugging purposes. If this value is not set (and mpi_show_mca_params is set) then the parameters are dumped to stdout. This commit was SVN r6401.	2005-07-08 21:01:37 +00:00
Brian Barrett	170ef8af1f	* rename ompi_show_help to opal_show_help * rename ompi_stacktrace to opal_stacktrace * rename ompi_strncpy to opal_strncpy This commit was SVN r6336.	2005-07-04 02:38:44 +00:00
Brian Barrett	a13166b500	* rename ompi_output to opal_output This commit was SVN r6329.	2005-07-03 23:31:27 +00:00
Brian Barrett	23b687b0f4	* rename ompi_event to opal_event This commit was SVN r6328.	2005-07-03 23:09:55 +00:00
Brian Barrett	39dbeeedfb	* rename locking code from ompi to opal This commit was SVN r6327.	2005-07-03 22:45:48 +00:00
Brian Barrett	ccd2624e3f	* rename ompi_progress to opal_progress This commit was SVN r6326.	2005-07-03 21:57:43 +00:00
Jeff Squyres	35c141aef6	While we're moving directories around, move ompi/mpi/runtime -> ompi/runtime, for consistency and parallel-ness will orte/runtime. Also remove a few useless #includes along the way. This commit was SVN r6317.	2005-07-03 12:07:29 +00:00

... 2 3 4 5 6

284 Коммитов