Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMail and malformed UTF-8 headers #205

Closed
jorrizza opened this issue Jan 10, 2014 · 27 comments
Closed

RMail and malformed UTF-8 headers #205

jorrizza opened this issue Jan 10, 2014 · 27 comments
Milestone

Comments

@jorrizza
Copy link

In broken email messages (mostly spam) the headers aren't always encoded properly (I think?), causing sup to crash. It feeds the UTF-8 string into RMail::Header::Field.parse, which in turn defines an ASCII-8bit regex and matches the string against that. For now I've fixed it like this in rmail-sup's header.rb:

def parse(field)
  field = field.dup.to_str
  field.force_encoding('binary')
  if field =~ EXTRACT_FIELD_NAME_RE

But I've got no idea if it's sup who should send it an ASCII string, or RMail who should coerce it into an ASCII string.

And here's the traceback. The line number in header.rb is off a bit, but you get the idea:

/var/lib/gems/1.9.1/gems/rmail-sup-1.0.1/lib/rmail/header.rb:81:in `parse'
/var/lib/gems/1.9.1/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:233:in `block in parse_header'
/var/lib/gems/1.9.1/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:229:in `each'
/var/lib/gems/1.9.1/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:229:in `parse_header'
/var/lib/gems/1.9.1/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:194:in `parse_low'
/var/lib/gems/1.9.1/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:183:in `parse'
/var/lib/gems/1.9.1/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:331:in `parse'
/var/lib/gems/1.9.1/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:345:in `read'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/mbox.rb:84:in `block in load_message'
<internal:prelude>:10:in `synchronize'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/mbox.rb:74:in `load_message'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/message.rb:772:in `parsed_message'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/message.rb:261:in `load_from_source!'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/message.rb:360:in `build_from_source'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/poll.rb:204:in `block in poll_from'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/mbox.rb:143:in `poll'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/poll.rb:201:in `poll_from'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/poll.rb:146:in `block (2 levels) in do_poll'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/poll.rb:136:in `each'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/poll.rb:136:in `block in do_poll'
<internal:prelude>:10:in `synchronize'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/poll.rb:135:in `do_poll'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/util.rb:649:in `method_missing'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/modes/poll_mode.rb:15:in `poll'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/poll.rb:58:in `poll_with_sources'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/poll.rb:94:in `poll'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup/util.rb:649:in `method_missing'
/var/lib/gems/1.9.1/gems/sup-0.15.2/bin/sup:220:in `block (2 levels) in <module:Redwood>'
/var/lib/gems/1.9.1/gems/sup-0.15.2/lib/sup.rb:87:in `block in reporting_thread'
@gauteh
Copy link
Member

gauteh commented Jan 10, 2014

Perhaps it is better to use the fix_encoding! method on the Sup string before it is passed to parse, then use the /u modifier on the EXTRACT_FIELD_NAME_RE regexp (sup-heliotrope/rmail-sup@b5300ac) in place of /n [0].

[0] http://www.ruby-doc.org/core-2.1.0/Regexp.html#class-Regexp-label-Encoding

@gauteh
Copy link
Member

gauteh commented Jan 10, 2014

The proper fix would be of course to migrate to Mail.. #127.

@gauteh
Copy link
Member

gauteh commented Jan 26, 2014

I think you are missing the actual error from the trace, could you post the full output from the crash?

@jorrizza
Copy link
Author

I don't have the full log anymore. I should have saved it. I've tried finding the message that triggered it, but failed. It's easy to recreate the error in irb, though:

"õ" =~ EXTRACT_FIELD_NAME_RE
Encoding::CompatibilityError: incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)

So the full error log would be this error message, followed by the traceback in the original report.

@gauteh
Copy link
Member

gauteh commented Jan 28, 2014

Could you try the changes in pull-request #214 with a manually installed version of rmail with sup-heliotrope/rmail-sup#3 applied?

@jorrizza
Copy link
Author

I set up a separate user account on the same machine the error occurred on and I added the pull requests you mentioned. I did some Bundler stuff to get it all to work. I added my own inbox mbox file to this sup's sources to make sure the malformed headers are present. I ran sup-sync right after the initial setup. This is the result.

$ bin/sup-sync
Scanning mbox:/home/testsup/bunchofmail...
[2014-01-29 16:17:56 +0100] WARNING: content_type ��} is not safe, changed to application/octet-stream
[2014-01-29 16:17:57 +0100] WARNING: content_type ��} is not safe, changed to application/octet-stream
[2014-01-29 16:17:58 +0100] WARNING: content_type ��} is not safe, changed to application/octet-stream
[2014-01-29 16:17:59 +0100] WARNING: content_type ��} is not safe, changed to application/octet-stream
[2014-01-29 16:17:59 +0100] WARNING: content_type ��} is not safe, changed to application/octet-stream
[2014-01-29 16:18:00 +0100] WARNING: content_type ��} is not safe, changed to application/octet-stream
[2014-01-29 16:18:01 +0100] WARNING: content_type ��} is not safe, changed to application/octet-stream
[2014-01-29 16:18:01 +0100] WARNING: content_type ��} is not safe, changed to application/octet-stream
[2014-01-29 16:18:01 +0100] WARNING: content_type ��} is not safe, changed to application/octet-stream
[2014-01-29 16:18:01 +0100] WARNING: content_type ��} is not safe, changed to application/octet-stream
[2014-01-29 16:18:01 +0100] WARNING: content_type ��} is not safe, changed to application/octet-stream
/home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/address.rb:706:in `===': incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string) (Encoding::CompatibilityError)
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/address.rb:706:in `block in get_tokenize'
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/address.rb:650:in `loop'
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/address.rb:650:in `get_tokenize'
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/address.rb:643:in `get'
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/address.rb:402:in `address_lookahead'
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/address.rb:426:in `address'
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/address.rb:366:in `address_list'
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/address.rb:302:in `parse'
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/address.rb:789:in `parse'
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/header.rb:859:in `block in address_list_fetch'
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/header.rb:855:in `each'
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/header.rb:855:in `address_list_fetch'
    from /home/testsup/sup/vendor/rmail-sup-1.0.1.666/lib/rmail/header.rb:688:in `from'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/lib/sup/message.rb:482:in `message_to_chunks'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/lib/sup/message.rb:462:in `block in message_to_chunks'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/lib/sup/message.rb:462:in `map'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/lib/sup/message.rb:462:in `message_to_chunks'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/lib/sup/message.rb:265:in `load_from_source!'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/lib/sup/message.rb:362:in `build_from_source'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/lib/sup/poll.rb:204:in `block in poll_from'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/lib/sup/mbox.rb:143:in `poll'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/lib/sup/poll.rb:201:in `poll_from'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/lib/sup/util.rb:632:in `method_missing'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/bin/sup-sync:125:in `block in <top (required)>'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/bin/sup-sync:120:in `each'
    from /home/testsup/sup/vendor/ruby/1.9.1/bundler/gems/sup-6da5139571a5/bin/sup-sync:120:in `<top (required)>'
    from bin/sup-sync:16:in `load'
    from bin/sup-sync:16:in `<main>'

address.rb:706 defines the following regex:

/\A[\200-\377\w!$%&\'*+\/=?^_\`{\}|~#-]+/nm

This goes beyond my poor knowledge of some email RFCs.

@gauteh
Copy link
Member

gauteh commented Jan 29, 2014

Hm, this is a different, but similar problem..

do you know what address this fails on?

@jorrizza
Copy link
Author

Here's the hexdump of the address. Or the contents of @string anyway.

00000000  ef bf bd 20 4f 66 66 69  63 69 61 6c 20 53 69 74  |... Official Sit|
00000010  65 20 3c 68 62 76 76 40  79 61 68 6f 6f 2e 63 6f  |e <[email protected]|
00000020  6d 3e                                             |m>|
00000022

Edit: EF BF BD is UTF-8 for U+FFFD, the replacement character. So that first character might have been something different. I'm looking into it.

Edit 2: Found it!

$ grep -m 1 'Official Site <[email protected]>' ../bunchofmail | hexdump -C
00000000  46 72 6f 6d 3a 20 56 49  41 47 52 41 20 ae 20 4f  |From: VIAGRA . O|
00000010  66 66 69 63 69 61 6c 20  53 69 74 65 20 3c 68 62  |fficial Site <hb|
00000020  76 76 40 79 61 68 6f 6f  2e 63 6f 6d 3e 0a        |[email protected]>.|
0000002e

Yes, that's most definitely spam. The byte that made sup crash is AE.

@gauteh
Copy link
Member

gauteh commented Jan 29, 2014

Hm, what version of ruby and rmail are you running?

@jorrizza
Copy link
Author

I'm using the rmail from sup-heliotrope/rmail-sup#3 and ruby 1.9.3p484.

@gauteh
Copy link
Member

gauteh commented Jan 29, 2014

Could you post the entire message somewhere, I'm not able to crash sup with just those bits.

@gauteh
Copy link
Member

gauteh commented Jan 29, 2014

Preferably as a binary upload

@gauteh
Copy link
Member

gauteh commented Jan 29, 2014

By the way, you can use the environment variable SUP_BASE to make sup use something else than ~/.sup for home folder

@jorrizza
Copy link
Author

This is the message. The latest sup release (without the two pull requests) does not crash on this message. Thanks for the SUP_BASE tip. I didn't know about that.

@gauteh
Copy link
Member

gauteh commented Jan 30, 2014

Hi, could you try the latest tips of both pull-requests. Managed to crash sup again, but the test case crashes anyway as long as rubymail isn't patched for me. Anyway, I don't think I can merge this yet since i'm not sure the regexp's in rubymail make much sense..

@gauteh
Copy link
Member

gauteh commented Jan 30, 2014

By the way, I added your message to the test base - but removed any references I could find to your address or servers.. hope that is ok.

@jorrizza
Copy link
Author

Well, sup-sync has been chugging along happily for the past ten minutes. It seems that fixes the crash.

And I don't mind my email and server addresses used in a test case. The addresses are plastered all over the Internet anyway. Hence all the spam. ;)

@gauteh
Copy link
Member

gauteh commented Jan 30, 2014

Yup, but the two changed regexps in sup-heliotrope/rmail-sup#3 has to be fixed, before it can be merged.. (again, the real fix is to move to Mail.)

@gauteh gauteh added the bug label Mar 5, 2014
@gauteh
Copy link
Member

gauteh commented May 13, 2014

As reported in #310 setting the locale to C: LANG/LC_ALL=C might bypass this problem.

@rapha8l
Copy link

rapha8l commented May 21, 2014

Hi,
I opened a bug that may be related : #311

@seydar
Copy link

seydar commented Oct 13, 2015

I can't get sup to run with 2.2.1, so i'm using ruby 2.0.0-p195. i'm getting this error as well (sup v0.22.1).

Was this bug supposed to be fixed elsewhere?

@maxmeyer
Copy link

@gauteh I'm seeing this bug as well:

--- Encoding::CompatibilityError from thread: main
incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)
/home/user/.gem/2.2.4/gems/rmail-sup-1.0.1/lib/rmail/header.rb:82:in `=~'
/home/user/.gem/2.2.4/gems/rmail-sup-1.0.1/lib/rmail/header.rb:82:in `parse'
/home/user/.gem/2.2.4/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:233:in `block in parse_header'
/home/user/.gem/2.2.4/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:229:in `each'
/home/user/.gem/2.2.4/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:229:in `parse_header'
/home/user/.gem/2.2.4/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:194:in `parse_low'
/home/user/.gem/2.2.4/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:183:in `parse'
/home/user/.gem/2.2.4/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:331:in `parse'
/home/user/.gem/2.2.4/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:345:in `read'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/message.rb:495:in `message_to_chunks'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/message.rb:476:in `block in message_to_chunks'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/message.rb:476:in `map'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/message.rb:476:in `message_to_chunks'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/message.rb:265:in `load_from_source!'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/message.rb:376:in `build_from_source'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/poll.rb:204:in `block in poll_from'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/maildir.rb:161:in `block in poll'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/maildir.rb:160:in `each'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/maildir.rb:160:in `each_with_index'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/maildir.rb:160:in `poll'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/poll.rb:201:in `poll_from'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/util.rb:610:in `method_missing'
/home/user/.gem/2.2.4/gems/sup-0.22.1/lib/sup/sent.rb:33:in `block in write_sent_message'

This is quite annoying as all mails which are sent out trigger this bug and those mails are not saved into -mailbox. Is there a fix for that bug?

@maxmeyer
Copy link

The work around does not work for me. Is the move to the non-rmail fork of sup the problem?

@phretor
Copy link

phretor commented Feb 17, 2017

Bump. What's the status of this? I'm keep getting it even on 0.22.1 with rbenv-installed Ruby 2.1.2 on macOS Sierra.

I see that the proposed solution seemed to be #127, but the last update on that is on Aug, 2015.

Here is my traceback, for what is worth.

--- Encoding::CompatibilityError from thread: poll after loading inbox
incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/rmail-sup-1.0.1/lib/rmail/header.rb:81:in `=~'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/rmail-sup-1.0.1/lib/rmail/header.rb:81:in `parse'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:233:in `block in parse_header'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:229:in `each'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:229:in `parse_header'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:194:in `parse_low'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:183:in `parse'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:331:in `parse'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/rmail-sup-1.0.1/lib/rmail/parser.rb:345:in `read'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/message.rb:495:in `message_to_chunks'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/message.rb:476:in `block in message_to_chunks'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/message.rb:476:in `map'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/message.rb:476:in `message_to_chunks'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/message.rb:265:in `load_from_source!'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/message.rb:376:in `build_from_source'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/poll.rb:204:in `block in poll_from'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/maildir.rb:161:in `block in poll'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/maildir.rb:160:in `each'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/maildir.rb:160:in `each_with_index'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/maildir.rb:160:in `poll'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/poll.rb:201:in `poll_from'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/poll.rb:146:in `block (2 levels) in do_poll'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/poll.rb:136:in `each'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/poll.rb:136:in `block in do_poll'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/poll.rb:135:in `synchronize'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/poll.rb:135:in `do_poll'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/util.rb:610:in `method_missing'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/modes/poll_mode.rb:15:in `poll'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/poll.rb:58:in `poll_with_sources'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/poll.rb:94:in `poll'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup/util.rb:610:in `method_missing'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/bin/sup:207:in `block (2 levels) in <module:Redwood>'
/Users/phretor/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/sup-0.22.1/lib/sup.rb:87:in `block in reporting_thread'

@danc86
Copy link
Contributor

danc86 commented Jul 12, 2020

With sup 0.23 and rmail 1.1.4 this bug still exists although the traceback is a little different:

ArgumentError: invalid byte sequence in UTF-8
    /usr/share/gems/gems/rmail-1.1.4/lib/rmail/parser.rb:218:in `split'
    /usr/share/gems/gems/rmail-1.1.4/lib/rmail/parser.rb:218:in `parse_header'
    /usr/share/gems/gems/rmail-1.1.4/lib/rmail/parser.rb:194:in `parse_low'
    /usr/share/gems/gems/rmail-1.1.4/lib/rmail/parser.rb:183:in `parse'
    /usr/share/gems/gems/rmail-1.1.4/lib/rmail/parser.rb:332:in `parse'
    /usr/share/gems/gems/rmail-1.1.4/lib/rmail/parser.rb:346:in `read'
    /home/dan/sup/test/dummy_source.rb:33:in `load_message'
    /home/dan/sup/lib/sup/message.rb:807:in `parsed_message'
    /home/dan/sup/lib/sup/message.rb:268:in `load_from_source!'
    /home/dan/sup/lib/sup/message.rb:381:in `build_from_source'
    /home/dan/sup/test/test_message.rb:259:in `test_nonascii_header'

@danc86
Copy link
Contributor

danc86 commented Jul 12, 2020

Ahh nope ignore the previous comment. That's a different error which only happens in the test suite, because it uses a different code path for loading test messages.

The actual traceback with sup 0.23 and rmail 1.1.4 is this (slightly better than before but still broken):

--- Encoding::CompatibilityError from thread: poll after loading inbox
incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/address.rb:706:in `==='
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/address.rb:706:in `block in get_tokenize'
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/address.rb:650:in `loop'
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/address.rb:650:in `get_tokenize'
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/address.rb:643:in `get'
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/address.rb:402:in `address_lookahead'
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/address.rb:426:in `address'
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/address.rb:366:in `address_list'
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/address.rb:302:in `parse'
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/address.rb:789:in `parse'
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/header.rb:854:in `block in address_list_fetch'
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/header.rb:850:in `each'
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/header.rb:850:in `address_list_fetch'
/usr/share/gems/gems/rmail-1.1.4/lib/rmail/header.rb:683:in `from'
/home/dan/sup/lib/sup/message.rb:499:in `message_to_chunks'
/home/dan/sup/lib/sup/message.rb:479:in `block in message_to_chunks'
...

@danc86
Copy link
Contributor

danc86 commented Jul 12, 2020

Okay so there are two different issues going on here, both related to non-ASCII characters in headers (which are illegal, they should be encoded using RFC2047 header encoding, but spammers don't follow the rules).

Non-ASCII bytes in the mail header itself was crashing sup, but that was fixed in rmail 1.1.2:

terceiro/rmail@0479219

If there is a nested message/rfc822 MIME part with non-ASCII characters in its headers, that will still crash because it goes through a different code path for parsing. That's the traceback in the previous comment. Specifically, the line:

body.normalize_whitespace

wrongly forces the raw message body to be marked as UTF-8 when it should still be ASCII-8BIT because it hasn't been decoded to text yet. The normalize_whitespace methods calls the very questionable fix_encoding! method.

danc86 added a commit to danc86/sup that referenced this issue Jul 12, 2020
This patch changes DummySource to follow a similar pattern to the
Maildir source, which works by opening the files directly for parsing.
Previously DummySource was parsing strings fed through StringIO,
which was introducing extra (or at least, different) complications with
encoding. See for example the comments in issue sup-heliotrope#205.
danc86 added a commit to danc86/sup that referenced this issue Jul 12, 2020
In the code for handling message/rfc822 MIME parts, message.rb line 498,
we were calling the #normalize_whitespace method on the body string
before it was decoded.

I'm not too sure if messing with whitespace is the right thing to do
there, but that aside, that method was then also calling #fix_encoding!
which would forcibly transcode the raw body to UTF-8. Instead, we want to
keep the body as ASCII-8BIT at that point, and let it be decoded using
all the normal message decoding mechanisms.

The only other calls to #normalize_whitespace are in the UI, and in the
code path which handles body text of messages, message.rb line 592,
where the body text has already been decoded. So it seems like we can
safely make #normalize_whitespace just mess with whitespace and leave
the string encoding alone.

Fixes sup-heliotrope#205.
@danc86 danc86 added this to the 1.1 milestone Jul 12, 2020
danc86 added a commit that referenced this issue Jul 12, 2020
This patch changes DummySource to follow a similar pattern to the
Maildir source, which works by opening the files directly for parsing.
Previously DummySource was parsing strings fed through StringIO,
which was introducing extra (or at least, different) complications with
encoding. See for example the comments in issue #205.
@danc86 danc86 closed this as completed in d3fbac1 Jul 12, 2020
@danc86 danc86 removed the notsup label Jul 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants