1
1
openmpi/contrib/dist
Dave Goodell 33da7d6f23 gkcommit.pl: fix UTF-8 and other encoding issues
The gatekeeper script was not correctly respecting the locale specified
in the user's environment.  So basically this scenario could (and did)
easily happen:

1. A committer writes a valid message in UTF-8 and runs `svn commit` with
   a correct locale setting of `LANG=en_US.UTF-8`.

2. SVN transcodes that to UTF-8 for internal storage (a no-op in this
   case).

3. The gatekeeper, also with `LANG=en_US.UTF-8` set, runs
   `gkcommit.pl ...`.  This breaks down into the following steps:

   A. run `svn log --xml ...`, which SVN correctly transcodes from UTF-8
      into the current locale, which happens to also be UTF-8

   B. Perl reads this in and assumes this is a sequence of raw 8-bit
      bytes in a "native" latin1-type encoding.

   C. Perl's XML::Parser module spots the XML declaration stating the
      content is UTF-8 encoded: `<?xml version="1.0" encoding="UTF-8"?>`.
      Perl internally stores the parsed strings as proper Unicode
      strings (UTF-8 encoded internally, but that's irrelevant here).

   D. Perl writes out the commit message file in the _latin1_ encoding,
      transcoding characters from internal UTF-8.  This causes
      characters like "ä" (Unicode code point: 0xe4, UTF-8 encoding:
      0xc3 0xa4) to be encoded as a single byte: 0xe4.

This fix changes the behavior at steps 3A and 3D to transparently treat
the incoming/outgoing data as UTF-8 (assuming a UTF-8 locale is set in
the user's environment).

There can still be problems if either the committer or the gatekeeper
have locale settings that do not agree with the encoding that their
editor is producing, but such is i18n :(

Helpful references for anyone debugging this sort of issue in the
future:

* http://perldoc.perl.org/perllocale.html#Unicode-and-UTF-8
* http://perldoc.perl.org/perluniintro.html#Unicode-I%2fO

Refs trac:4217

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30709.

The following Trac tickets were found above:
  Ticket 4217 --> https://svn.open-mpi.org/trac/ompi/ticket/4217
2014-02-13 03:56:01 +00:00
..
build-server It occurs to me that the scripts used to build nightly and official 2012-08-08 19:31:51 +00:00
linux Implemening Jeffs comments 2013-11-09 15:41:31 +00:00
macosx-pkg This is a very large change to rename several #define values from 2009-05-06 20:11:28 +00:00
mofed Implemening Jeffs comments 2013-11-09 15:41:31 +00:00
find-copyrights.pl Look for README.WINDOWS.txt now (the .txt suffix is new). 2012-02-17 11:40:15 +00:00
gkcommit.pl gkcommit.pl: fix UTF-8 and other encoding issues 2014-02-13 03:56:01 +00:00
make_dist_tarball add "--distdir /path/to/move/openmpi-*" param to select where to store build products 2013-09-30 17:19:47 +00:00
make_tarball Add sym link from make_dist_tarball to make_tarball. 2009-06-25 16:51:21 +00:00
make-authors.pl Update AUTHORS file with all the IDs that have committed so far on the 2008-09-23 19:38:53 +00:00
make-html-man-pages.pl Update the script to make PHP-ized man pages to be a bit more 2012-09-21 06:43:53 +00:00