Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid continuation byte #30

Open
romafederico opened this issue Jun 16, 2017 · 10 comments
Open

Invalid continuation byte #30

romafederico opened this issue Jun 16, 2017 · 10 comments

Comments

@romafederico
Copy link

macOS, Webstorm 2017.1, Reactjs

I'm receving a utf-8 encoded JSON, converting it to a string and then utf8.decode(str).

At some point I'm getting the error Invalid continuation byte. Is there a way in which I can find the byte that is causing this error? This error appears with some of the users of my DB, not all, and I need to compare them.

Thanks

@PitPanda1
Copy link

PitPanda1 commented Jul 17, 2017

I am having the same issue. I'm trying to convert strings like that:

let test1 = 'Östliche'; // must be "Östliche"
let test2 = 'Neuwiesenstraße'; // must be "Neuwiesenstraße"

console.log(utf8.decode(test1));
console.log(utf8.decode(test2));
Error: Invalid continuation byte
    at Error (native)
    at readContinuationByte (I:\dev\importer\node_modules\utf8\utf8.js:131:9)
    at decodeSymbol (I:\dev\importer\node_modules\utf8\utf8.js:160:12)
    at Object.utf8decode [as decode] (I:\dev\importer\node_modules\utf8\utf8.js:206:33)
    at Object.<anonymous> (I:\dev\importer\import.js:18:18)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.runMain (module.js:590:10)
    at run (bootstrap_node.js:394:7)
    at startup (bootstrap_node.js:149:9)
    at bootstrap_node.js:509:3
// german special characters
let test1 = "Ä"; // Ä fails
let test2 = "ä"; // ä passes
let test3 = "Ãœ"; // Ü fails
let test4 = "ü"; // ü passes
let test5 = "Ö"; // Ö fails
let test6 = "ö"; // ö passes
let test7 = "ß"; // ß fails

// other special characters
let test8 = "Ã�"; // Á passes
let test9 = "á"; // á passes

All lowercases pass the test all uppercases not, except "ß" there is no lower / uppercase in german. Tested some other special characters but they passed the test.

@davide-scalzo
Copy link

Similar issue with emojis, anybody has an idea on how to fix it (other than a try / catch cop out?)

@wouterdialogic
Copy link

Similar issue, circumventing with a try catch block,

error:


Error: Invalid continuation byte
    at readContinuationByte (C:\Ampps\www\b5_revisited\node_modules\utf8\utf8.js:115:9)
    at decodeSymbol (C:\Ampps\www\b5_revisited\node_modules\utf8\utf8.js:156:12)
    at Object.utf8decode [as decode] (C:\Ampps\www\b5_revisited\node_modules\utf8\utf8.js:190:17)
    at try_to_utf8_decode (C:\Ampps\www\b5_revisited\b5_file_parser.js:104:16)
    at process_file (C:\Ampps\www\b5_revisited\b5_file_parser.js:146:13)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

this is an example of the input:

Wij de werkgroep “KREKEROCK “ organiseren al een paar jaar tijdens de kerstperiode, omdat deze periode zich ui
tstekend leent om eens stil te staan bij al het leed in de wereld, het muziekfestival KREKEROCK.
De opbrengst is steeds integraal voor CADAATAN KORTEMARK.
CADAATAN KORTEMARK houdt zich vooral bezig met het verbeteren van de omstandigheden waarin kinderen in bepaalde schooltjes op de Filip
ijnen de lessen volgen. De vereniging is vooral actief in het noorden van het eiland CEBU, meer bepaald in enkele barangay’s van SAN REMIGIO.

@AlejaRo
Copy link

AlejaRo commented Feb 4, 2020

Similar issue trying to convert the word "Información".
Has anyone fixed this issue? I've been all day trying to solve this but I haven't found the solution :(

@mboughaba
Copy link

mboughaba commented Feb 5, 2020

@AlejaRo

console.log(utf8.encode('Información')); // => Información
console.log(utf8.decode(utf8.encode('Información'))); // => Información

Please show us a snippet
I've surrounded it with a try/catch and it seems to work so far

according to the tests, this error is thrown when an invalid sequence is encountered

  • 3 bytes instead of 4 bytes
  • mix between unicode and hex sequences

raises(

@balwinder4264
Copy link

this code is throwing sam e error :

utf8.decode(
'Simplified Chinese: æˆ‘ä»¬ä¸ºæˆ‘ä»¬åˆ›é€ çš„æ¯�æ�°ä½œçš„å¥‰çŒ®ç²¾ç¥žå’Œå†³å¿ƒåŠ å‰§æ¯�个GWT代表的激情。但更比任何其他特质,在我们的机会心è„�的决定性特å¾�是GWTç»�销商补å�¿è®¡åˆ’。我们创建了一个消除了任何é™�åˆ¶ï¼Œé€Ÿåº¦é¢ ç°¸çš„æˆ�员访问他们赚å�–佣金和奖金世界上第一个自由æµ�动的å�¯å�˜è–ªé…¬è®¡åˆ’。我们清楚的ç»�销商å�‹å¥½çš„薪酬计划,使GWT业务的人æ�¥è¯´ï¼Œé‚£é‡Œçš„å¹³å�‡å…¼è�Œåˆ›ä¸šè€…çœŸæ­£æ‹¥æœ‰ä¸ºè‡ªå·±åˆ›é€ è´¢å¯Œï¼Œå¹¶ä¸Žä»–äººåˆ†äº«çš„æœºä¼šçš„æœºä¼šã€‚æˆ‘ä»¬æ„Ÿåˆ°è‡ªè±ªçš„æ˜¯æˆ‘ä»¬çš„é�©å‘½è‡ªç”±æµ�动的薪酬计划消除了直销其中å�ªæœ‰é¡¶çº§ç»�销商的精英能够实现财务伟大的现状。公平和æ„�图是我们å�šç”Ÿæ„�çš„æ–¹å¼�背å�Žçš„驱动力和区别使得GWTå…¬å�¸ä¹‹é—´åœ¨åŽ†å�²ä¸Šæœ€å¥½çš„家庭为基础的和基于互è�”网的机会。',
),

@MattChilders92
Copy link

+1
Having this issue with the letter "ß" in the string

@sebastianDejoy
Copy link

sebastianDejoy commented Jul 18, 2020

Same here when utf8.decode('è´¦å�•ä¿¡æ�¯') returning Error: Invalid continuation byte. It should decode to 账单信息 , is the library having issues with code points representated in 3 bytes or more (like chinese and korean)?

@paolobertani
Copy link

paolobertani commented Aug 23, 2022

@romafederico

I'm receving a utf-8 encoded JSON, converting it to a string and then utf8.decode(str).

as you receive the utf-8 encoded JSON and store it into a string you get the string re-encoded in UCS2

decoding as it's utf8 raises an error as expectesd

@paolobertani
Copy link

@PitPanda1

I am having the same issue. I'm trying to convert strings like that:

let test1 = 'Östliche'; // must be "Östliche"
let test2 = 'Neuwiesenstraße'; // must be "Neuwiesenstraße"

'Östliche' is not UTF-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants