Ralph Castain
c1282d5b99
The opal_buffer type also generates its own alloc, so need to let it pass thru the check
2015-02-17 21:06:19 -08:00
Ralph Castain
207cc74f87
Correct name of help file
2015-02-17 16:03:20 -08:00
Ralph Castain
624b16e070
Protect the unload attribute function
2015-02-17 14:21:23 -08:00
Ralph Castain
78245e8a33
Continue massaging of the notifier framework. Convert it to an event-driven interface. Add the ability to report job state if requested. Cleanup object declarations.
2015-02-17 12:51:11 -08:00
Gilles Gouaillardet
8dc4f30fae
orte/tools: fix NULL pointer dereference
...
as reported by Coverity with CIDs 1196671 and 1196824
2015-02-17 15:45:06 +09:00
Gilles Gouaillardet
b762766969
orte/util: fix misc memory leaks
...
as reported by Coverity with CIDS 70314, 710653-710657 and 1196741-1196744
2015-02-17 12:27:23 +09:00
Ralph Castain
22f1d29b82
Re-introduce the ORTE notifier framework for logging errors that would otherwise result in abort for persistent systems. Thanks to L. Rajeshnarayanan of Intel for the contribution
...
Subsequent commits will integrate this capability with the state and errmgr frameworks.
2015-02-16 12:46:58 -08:00
Gilles Gouaillardet
8fe8079080
Fix a build failure when configure'd with --without-hwloc
...
see http://mtt.open-mpi.org/index.php?do_redir=2235
2015-02-16 10:31:09 +09:00
Jeff Squyres
3ac1d0dae5
*-info: add "lt_dladvise support" lines
2015-02-11 12:25:20 -08:00
Ralph Castain
2a83d2613a
Cleanup the orte/test/system directory
2015-02-11 10:42:38 -08:00
Ralph Castain
d5775bf9de
Cleanup orte MPI test directory so it all builds again
2015-02-11 10:14:06 -08:00
Ralph Castain
ce56c0a2cf
Oops - remove debug/exit
2015-02-11 10:14:06 -08:00
Jeff Squyres
c9e3f22933
orte mpi tests: fix a bunch of compiler warnings
2015-02-11 12:28:10 -05:00
Jeff Squyres
07179ef669
orte mpi tests: don't use deprecated MPI functions
...
Change MPI_Errhandler_set -> MPI_Comm_set_errhandler
2015-02-11 12:28:10 -05:00
Jeff Squyres
cc7f433c0f
Makefile: this file should not be executable
2015-02-11 07:33:56 -08:00
Ralph Castain
3de8c5c7c6
Cleanup the munge support - the credential cannot be reused for multiple connections
2015-02-10 20:34:35 -08:00
Ralph Castain
46fb850bb0
Continue adding support for options on orte-submit - still need to shift some of the MCA params to job object attributes
2015-02-10 13:56:14 -08:00
Ralph Castain
116fcaff2c
Start adding support for cmd line options to orte-submit
2015-02-10 12:13:21 -08:00
rhc54
cf3f4def48
Merge pull request #386 from marksantcroos/master
...
Add debug option to orte-dvm.
Looks fine - thanks
2015-02-10 11:38:52 -08:00
Ralph Castain
df2cd96772
Display the local/global attribute flag more prominently. Mark the attributes as global in orte-submit so they will be communicated
2015-02-10 10:47:32 -08:00
Mark Santcroos
ff6a69a68d
Add debug option to orte-dvm.
2015-02-10 13:02:23 -05:00
Ralph Castain
063e4c9989
Cleanup the pretty-print of odls cmds as some were missing. Add a new cmd to terminate the DVM, which the HNP will use to trun around and issue an xcast to the DVM.
2015-02-10 08:27:13 -08:00
Ralph Castain
3ae3b96c17
Fix master compilation - a buried header dependency must have been removed.
2015-02-10 07:22:10 -08:00
Elena
948c20d862
added pmix unit test to tarball
2015-02-10 13:41:15 +02:00
Howard Pritchard
b62d9c2c70
ess/alps: fix compile issue for pgi
...
remote -fi-noident cflag option. Wasn't helping anyway
and caused pgi compiles to break.
2015-02-09 20:49:04 -08:00
Ralph Castain
3478def791
Ensure that nodes get included in the nidmap when spawning a new DVM job - we really only need to do this once, but for now we do it for every job until we work out how to avoid the duplication. Remove debug from orte-dvm tool
2015-02-09 23:47:46 -05:00
Ralph Castain
ef13ba7db3
Add debug-daemons option to orte-dvm
2015-02-09 11:08:45 -05:00
Ralph Castain
a3275aa867
Once again, fix the blasted singleton comm_spawn
2015-02-05 17:34:25 -08:00
Ralph Castain
f28238af59
Fix a race condition seen by Absoft during finalize. Stop the orte progress thread without cleaning it up, thus allowing the frameworks to still cancel their posted recv's. Then cleanup the memory footprint afterwards.
2015-02-05 11:41:37 -08:00
Jeff Squyres
938b8e1dad
schitzo: fix free of uninitialized value
...
The "param" value is not assigned before this free() statement. So
remove it.
(yay clang compiler warnings)
2015-02-04 15:50:24 -05:00
Ralph Castain
251084a2da
When a tool requests the spawn of a new job, then exclusively forward output to that tool - the DVM should not output its own copy as well.
2015-02-04 07:59:47 -08:00
Ralph Castain
2b0b012460
Continue refinement of the DVM operations. Send the spawn request to the right place (it helps) as it isn't a comm_spawn request and has to be treated a little differently. Ensure IO gets forwarded back to the tool. Ensure the tool outputs show_help locally as there is no place to send it.
2015-02-04 06:21:54 -08:00
Ralph Castain
7299cc3ab9
Cleanup the communications handshake so that orte-submit properly terminates upon job completion, and properly sends the terminate command to orte-dvm
2015-02-03 07:25:43 -08:00
Elena
5919b636e1
changed output format in pmix unit test
2015-02-02 14:22:51 +02:00
Ralph Castain
4dba298e6e
Update orte-submit manpage, add the ompi-* versions of orte-dvm and orte-submit manpages
2015-02-01 15:46:40 -08:00
Ralph Castain
e303a9b1d6
Provide an orte-dvm man page. Provide an option to orte-submit for terminating the DVM
2015-02-01 12:14:44 -08:00
Ralph Castain
ec5ccb76cf
Enable persistent ORTE DVM so users can execute multiple OMPI jobs within an allocation without restarting the DVM every time.
2015-01-30 11:00:43 -08:00
rhc54
e7fa600d85
Merge pull request #360 from elenash/master
...
added unit test for pmix functionality
2015-01-28 06:18:57 -06:00
Elena
472baa1284
added unit test for pmix functionality
2015-01-28 13:18:26 +02:00
Ralph Castain
b838df9eb8
Get slurm to stay out of the way on singletons
2015-01-27 09:29:43 -06:00
Ralph Castain
294ebc907a
Fix singleton operations so they can work inside a slurm environment
2015-01-27 09:29:42 -06:00
Ralph Castain
3eca55caec
Continue fixing singletons in slurm environments
2015-01-27 09:29:42 -06:00
Ralph Castain
fcec24b2a4
Minor cleanups to handle comm_spawn and singletons
2015-01-27 09:29:42 -06:00
Ralph Castain
74385302c0
Add the personality to the orte_job_t datatype support
2015-01-27 09:29:42 -06:00
Ralph Castain
88c38f87d2
Get the orteds to use schizo as well
2015-01-27 09:29:42 -06:00
Ralph Castain
028b00154d
Complete implementation of the schizo framework to support OMPI component
2015-01-27 09:29:42 -06:00
Ralph Castain
11c92eefe6
ckpt
2015-01-27 09:29:42 -06:00
rhc54
a1707326bf
Merge pull request #359 from hppritcha/topic/better_help
...
orte/util: minor improvement to show_help
2015-01-25 08:13:49 -08:00
Howard Pritchard
1e94d84ae6
orte/util: minor improvement to show_help
...
Make sure the show help gives it a good try to
print an error message locally if the
send_buffer_nb method returns an error.
2015-01-23 13:54:03 -08:00
Howard Pritchard
2809c21e0f
rml/oob: check peer param in send methods
...
The rml/oob was not doing sanity checks on the input peer
parameter for the orte_rml_oob_send_nb and orte_rml_oob_send_buffer_nd.
Owing to the fact that there are places in the ompi/orte stack
where things like orte_show_help_norender are called way before
ORTE_PROC_MY_HNP, are setup properly, all kinds of weird
startup failures can occur as the rml/oob tries to process send
requests where the peer is junk.
Rather than try to expand this kind of thing:
/* if we are the HNP, or the RML has not yet been setup,
* or ROUTED has not been setup,
* or we weren't given an HNP, or we are running in standalone
* mode, then all we can do is process this locally
*/
if (ORTE_PROC_IS_HNP || orte_standalone_operation ||
NULL == orte_rml.send_buffer_nb ||
NULL == orte_routed.get_route ||
NULL == orte_process_info.my_hnp_uri) {
rc = show_help(filename, topic, output, ORTE_PROC_MY_NAME);
}
do the right thing in the rml level and return an error rather than
eventually failing in the send owing to peer not being valid.
2015-01-22 06:12:39 -08:00