Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml_to_df ends up ignoring encoding #3

Open
cejkiebo opened this issue Feb 3, 2023 · 0 comments
Open

xml_to_df ends up ignoring encoding #3

cejkiebo opened this issue Feb 3, 2023 · 0 comments

Comments

@cejkiebo
Copy link

cejkiebo commented Feb 3, 2023

I am trying to open a xml file encoded in ISO-8859-1 (aka latin-1) using xmlconvert, yet even if I specify xml_encoding it still claims my input isn't proper UTF-8. My call and traceback are as follows:

> carbu_df <- xmlconvert::xml_to_df("./PrixCarburants_instantane.xml",
+                                   xml.encoding = "latin-1",
+                                   records.xpath = "//pdv | //prix",
+                                   fields = "attributes")
Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html,  : 
  Input is not proper UTF-8, indicate encoding !
Bytes: 0xE8 0x73 0x2D 0x4C [9]
> traceback()
4: read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, 
       options = options)
3: read_xml.character(text, encoding = xml.encoding)
2: xml2::read_xml(text, encoding = xml.encoding)
1: xmlconvert::xml_to_df("./PrixCarburants_instantane.xml", xml.encoding = "latin-1", 
       records.xpath = "//pdv | //prix", fields = "attributes")

Loading the file using xml2::read_xml("./PrixCarburants_instantane.xml", encoding="latin-1") does work, and so does opening the file using Notepad and saving it as UTF-8 (which is a bit tedious). It appears to me that enc2utf8 and charToRaw somehow isn't doing its job when being confronted with direct latin-1 input.

My dataset can be found here

@cejkiebo cejkiebo changed the title xml_to_df ends up ignoring xml_to_df ends up ignoring encoding Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant