Skip to content
This repository has been archived by the owner on May 21, 2024. It is now read-only.

ruby 1.9.3 mysqlstream - invalid byte sequence in UTF-8 #89

Closed
sgrgic opened this issue Apr 20, 2012 · 10 comments
Closed

ruby 1.9.3 mysqlstream - invalid byte sequence in UTF-8 #89

sgrgic opened this issue Apr 20, 2012 · 10 comments

Comments

@sgrgic
Copy link

sgrgic commented Apr 20, 2012

Hi,

We got this error after switching to ruby-1.9.3-p125. Error is in line 55 of mysql_stream.rb:
line = line.gsub("\n","")
It's weird because with same data I can't reproduce this problem in my local environment and on production
we have exception.
From exception log:
/lib/etl/control/source/mysql_streamer.rb:56:in `gsub': invalid byte sequence in UTF-8 (ArgumentError)
I'm trying to catch exception on production and to get string that is causing this but so far no luck.
Maybe you dealt with this problem already.

Regards,
Sinisa.

@sgrgic
Copy link
Author

sgrgic commented Apr 20, 2012

Ok, in case someone look into this. I just located row which rises exception, it contain this word:
lógica
when I replace ó with o, ctl job finishes without exception. Now need to figure out how mysqlstream should
read this kind of stuff.

Sinisa.

@sgrgic
Copy link
Author

sgrgic commented Apr 20, 2012

One solution for this problem, /lib/etl/control/source/mysql_streamer.rb, line 53:
mysql_command = """mysql --quick -h #{host} -u #{username} -e "#{@query.gsub("\n","")}" -D #{database} --password=#{password} -B"""
replace with:
query_utf8 = "SET CHARACTER SET 'utf8'; " + @query.gsub("\n","")
mysql_command = """mysql --quick -h #{host} -u #{username} -e "#{query_utf8}" -D #{database} --password=#{password} -B"""
Hope there are no side effects, so far I don't see any, but let's double check with you.

Thanks,
Sinisa.

@sgrgic
Copy link
Author

sgrgic commented Apr 20, 2012

Same thing using command line option:

mysql_command = """mysql --quick -h #{host} -u #{username} -e "#{@query.gsub("\n","")}" -D #{database} --password=#{password} -B --default-character-set=utf8"""

Sinisa.

@thbar
Copy link
Member

thbar commented Apr 23, 2012

@sgrgic sorry - I missed your earlier comments! It is probably a mysql setup thing and your fix is probably what should be done. You should check by adding data with accents in MySQL and see what goes out for instance.

By default historically, mysql wasn't set up for UTF-8 but for LATIN1, and it's fairly common to see for instance data which is actually UTF-8, stored as what MySQL believes to be LATIN1 (first issue), or just to have the client set up for LATIN1.

Can you (out of curiosity) paste the output of what's there? http://stackoverflow.com/a/1049776/20302


@sgrgic
Copy link
Author

sgrgic commented Apr 23, 2012

Sure, here is output:
mysql> show variables like "character_set_database";
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.00 sec)

mysql> show variables like "collation_database";
+--------------------+-----------------+
| Variable_name | Value |
+--------------------+-----------------+
| collation_database | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)

@thbar
Copy link
Member

thbar commented Apr 23, 2012

@sgrgic looks like you're safe then :) If you want to go further you could check out your my.cnf like advised here http://stackoverflow.com/a/3513812/20302 (my bet is that some non utf8 default may show up there).

It's perfectly ok to just pass the default-character-set like you did on the command line IMO.

@thbar
Copy link
Member

thbar commented Apr 23, 2012

@sgrgic can I close this one if you're ok with it?

As well I opened #91 to track adding some clean way to pass extra args here.

I will definitely merge a pull-request for that if you want to tackle this! (otherwise I'll do it myself, but not right now though).

@sgrgic
Copy link
Author

sgrgic commented Apr 23, 2012

Sure, we can close this. So far this looks good in our ctl jobs. If I notice something wrong will let you know.
Yeah, please add this fix when you catch some time. We have our branch for aw-etl and we add some stuff
there. We can discuss once about this on google groups and maybe merge those changes.

Thanks,
Sinisa.

@thbar
Copy link
Member

thbar commented Apr 23, 2012

Ok! And sure, please drop a line on the google group so we can discuss what could be merged back. Closing!

@thbar thbar closed this as completed Apr 23, 2012
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants