Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latin-1 encoding sometimes mojibake #88

Open
nihen opened this issue Aug 5, 2013 · 1 comment
Open

latin-1 encoding sometimes mojibake #88

nihen opened this issue Aug 5, 2013 · 1 comment

Comments

@nihen
Copy link
Contributor

nihen commented Aug 5, 2013

#!/usr/bin/env perl
use Test::More;

use utf8;
use Text::Xslate;
my $xslate = Text::Xslate->new();

is $xslate->render_string('<: $string :>', {string => "Ä"})      => 'Ä';
is $xslate->render_string('<: $string :>', {string => "\x{c4}"}) => 'Ä';

is $xslate->render_string('あ<: $string :>', {string => "Ä"})      => 'あÄ';
is $xslate->render_string('あ<: $string :>', {string => "\x{c4}"}) => 'あÄ';

done_testing;

result

ok 1
ok 2
ok 3
not ok 4
@pepl
Copy link

pepl commented Nov 22, 2015

The issue description is not quite accurate - the problem is malformed
UTF-8, not double-encoding.

Commit f261fc2 codified the behaviour for handling variables without
SvUTF8() on in templates with SvUTF8() on - the variable is assumed to be a
sequence of UTF-8 octets, and converted to characters before interpolation.

However, neither the PP nor the XS code were made robust against the
possibility that the variable to be interpolated was not a valid sequence
in UTF-8. The PP code uses Encode to convert to characters, and Encode was
substituting the replacement character (U+FFFD) in these cases, meaning that
the rendered template would contain replacement characters (which is not
great).

Worse, the XS code was performing no validation, meaning that such variables
were being interpolated verbatim, resulting in malformed UTF-8 in the
template.

Neither of these is good. This commit avoids generating malformed UTF-8 and
replacement characters by interpolating the variable as-is (ie treating it
as characters) if it is not a valid UTF-8 sequence. All existing tests pass,
and the test supplied with the issue now also passes.

0001-Fix-for-issue-88-Latin-1-text-could-end-up-as-malfor.patch.txt

pepl pushed a commit to pepl/p5-Text-Xslate that referenced this issue Feb 15, 2016
…olating the variable as-is if it is not a valid UTF-8 sequence (nc), Github issue xslate#88
pepl pushed a commit to pepl/p5-Text-Xslate that referenced this issue Apr 15, 2016
pepl pushed a commit to pepl/p5-Text-Xslate that referenced this issue Apr 20, 2016
…hanged between Perl 5.20 onwards, Github issue xslate#88"

This reverts commit d207e1035c384a9c43c2d2710d41e93a5188560d.
pepl pushed a commit to pepl/p5-Text-Xslate that referenced this issue Apr 20, 2016
…y interpolating the variable as-is if it is not a valid UTF-8 sequence (nc), Github issue xslate#88"

This reverts commit 723f10dfc38202c863c1851a73228b03b8604aba.
pepl pushed a commit to pepl/p5-Text-Xslate that referenced this issue Apr 20, 2016
…s malformed UTF-8 on output" (nc)

Now using U8 casts
-            *(d++) = UTF8_EIGHT_BIT_HI(c);
-            *(d++) = UTF8_EIGHT_BIT_LO(c);
+            *(d++) = UTF8_EIGHT_BIT_HI((U8) c);
+            *(d++) = UTF8_EIGHT_BIT_LO((U8) c);
pepl pushed a commit to pepl/p5-Text-Xslate that referenced this issue Apr 20, 2016
…s malformed UTF-8 on output" (nc)

Now using U8 casts
-            *(d++) = UTF8_EIGHT_BIT_HI(c);
-            *(d++) = UTF8_EIGHT_BIT_LO(c);
+            *(d++) = UTF8_EIGHT_BIT_HI((U8) c);
+            *(d++) = UTF8_EIGHT_BIT_LO((U8) c);

Minor POD improvement
syohex pushed a commit that referenced this issue Apr 20, 2016
…ormed UTF-8 on output" (nc)

Now using U8 casts
-            *(d++) = UTF8_EIGHT_BIT_HI(c);
-            *(d++) = UTF8_EIGHT_BIT_LO(c);
+            *(d++) = UTF8_EIGHT_BIT_HI((U8) c);
+            *(d++) = UTF8_EIGHT_BIT_LO((U8) c);

Minor POD improvement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants