Encoding issue with GetBibEntryWithDOI #24

robert-winkler · 2016-12-01T23:57:13Z

Hi, I am processing a bib file with various Spanish authors.
Many of them are messed up due to accents, e.g. . Gutia'errez-Uribe instead of Gutiérrez-Uribe.
Is there any solution?
Thanks, Robert

library(RefManageR)
bib <- ReadBib("draft.bib", check = FALSE)
bib <- GetDOIs(bib)
bibfromdoi <- GetBibEntryWithDOI(bib$doi)
WriteBib(bibfromdoi, file = "cleanfromdoi.bib", biblatex=TRUE, verbose=TRUE)

draft.bib.zip

The text was updated successfully, but these errors were encountered:

mwmclean · 2016-12-09T16:21:50Z

You can certainly use RefManageR to update the fields you encounter with bad accent formatting, e.g.

bib[[1]]$author <- "Doe, John and Gutiérrez-Uribe, Jane"

if you need to change the authors of the first entry, but it's not going to be the most efficient. Recent versions of R have allowed users to specify additional LaTeX macros in tools::parse_Rd; you may want to have a look at that, but I think your best best is to just use regular expressions and gsub.

hongyuanjia · 2018-05-20T08:34:47Z

I encountered similar issue both using RefManageR and bibtex package when reading BibTeX file with multi-byte characters on Windows. I opened an issue on bibtex, please see here.

# Get current locale info
Sys.getlocale()
#> [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

# Set locale to Chinese
Sys.setlocale(locale = "Chinese")
#> [1] "LC_COLLATE=Chinese (Simplified)_China.936;LC_CTYPE=Chinese (Simplified)_China.936;LC_MONETARY=Chinese (Simplified)_China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_China.936"

bib_text <- "
    @misc{text,
        title = {{你好}},
        language = {zh-CN},
        author = {{你好}},
        month = jun,
        year = {2013},
        pages = {163}
    }
"
# change encoding to "UTF-8"
bib_text_utf8 <- enc2utf8(bib_text)
Encoding(bib_text_utf8)
#> [1] "UTF-8"

# make sure the saved BibTeX file is UTF-8 encoded
con <- file("test.bib", encoding = "UTF-8")
writeLines(bib_text_utf8, con)
close(con)

readLines("test.bib", encoding = "UTF-8")
#> [1] ""                                "        @misc{text,"            
#> [3] "            title = {{你好}},"   "            language = {zh-CN},"
#> [5] "            author = {{你好}},"  "            month = jun,"       
#> [7] "            year = {2013},"      "            pages = {163}"      
#> [9] "        }"                       "    "

It seems that bibtex::do_read_bib could not correctly encode multi-byte characters even if all related encoding options have been set to "UTF-8"

bib <- bibtex::do_read_bib("test.bib", "UTF-8", srcfile("test.bib", encoding = "UTF-8", "UTF-8"))
bib
#> [[1]]
#>      title   language     author      month       year      pages 
#> "{浣犲ソ}"    "zh-CN" "{浣犲ソ}"      "jun"     "2013"      "163" 
#> attr(,"entry")
#> [1] "misc"
#> attr(,"key")
#> [1] "text"
#> 
#> attr(,"include")
#> character(0)
#> attr(,"strings")
#> named character(0)
#> attr(,"preamble")
#> character(0)

Currently, one workaround is to encode the output of bibtex::do_read_bib manually to "UTF-8" in RefManageR::ReadBib. But I am not sure if this is the right way to do.

Encoding(bib[[1]])
#> [1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"

Encoding(bib[[1) <- "UTF-8"
bib
#> [[1]]
#>      title   language     author      month       year      pages 
#> "{你好}"    "zh-CN" "{你好}"      "jun"     "2013"      "163" 
#> attr(,"entry")
#> [1] "misc"
#> attr(,"key")
#> [1] "text"
#> 
#> attr(,"include")
#> character(0)
#> attr(,"strings")
#> named character(0)
#> attr(,"preamble")
#> character(0)

@mwmclean Any insights? Thanks!

mwmclean closed this as completed Jan 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issue with GetBibEntryWithDOI #24

Encoding issue with GetBibEntryWithDOI #24

robert-winkler commented Dec 1, 2016

mwmclean commented Dec 9, 2016

hongyuanjia commented May 20, 2018 •

edited

Loading

Encoding issue with GetBibEntryWithDOI #24

Encoding issue with GetBibEntryWithDOI #24

Comments

robert-winkler commented Dec 1, 2016

mwmclean commented Dec 9, 2016

hongyuanjia commented May 20, 2018 • edited Loading

hongyuanjia commented May 20, 2018 •

edited

Loading