Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch_bibtex #745

Open
ben-domingue opened this issue Dec 16, 2024 · 2 comments
Open

fetch_bibtex #745

ben-domingue opened this issue Dec 16, 2024 · 2 comments
Labels
rpackage things to include in R package

Comments

@ben-domingue
Copy link
Owner

Hi Ben! I spent some today to write the BibTex gen function in R. Perhaps we can share this with Mike and include this as part of the irw library. It works like follows:

fetch_bibtex(
  data_index_file = "IRW Data Dictionary - data index.csv",
  dataset_without_doi_file = "BibTex_Manual.csv",
  dataset_names = c("16_personalityfactors", "enem_2020_1mil_mt"),
  output_bib_file = "output.bib"
)

It reads the BibTex from the 2 files that I attached below, which are the same ones that will be used to render the docs on IRW website.

@ben-domingue ben-domingue added the rpackage things to include in R package label Dec 16, 2024
@ben-domingue
Copy link
Owner Author

library(dplyr)
library(readr)
library(glue)
library(httr)

fetch_bibtex <- function(data_index_file, dataset_without_doi_file, dataset_names, output_bib_file) {
  
  data_index <- read_csv(data_index_file, na = character()) %>%
    select(dataset = `Filename`, doi = DOI) %>%
    mutate(dataset = gsub(".R.ata$", "", dataset))
  
  dataset_without_doi <- read_csv(dataset_without_doi_file, na = character()) %>%
    select(dataset = Filename, BibTex)
  
  data_index <- data_index %>%
    left_join(dataset_without_doi, by = "dataset") %>%
    mutate(doi_or_bibtex = if_else(is.na(doi) | doi == "", BibTex, doi)) %>%
    select(dataset, doi_or_bibtex)
  
  filtered_data <- data_index %>%
    filter(dataset %in% dataset_names)
  bibtex_entries <- filtered_data %>%
    rowwise() %>%
    mutate(bibtex = {
      if (!is.na(doi_or_bibtex) && grepl("^@", doi_or_bibtex)) {
        # Directly use the BibTeX if it's already provided
        doi_or_bibtex
      } else if (!is.na(doi_or_bibtex)) {
        # Fetch BibTeX using DOI
        response <- tryCatch({
          GET(glue("https://doi.org/{doi_or_bibtex}"), add_headers(Accept = "application/x-bibtex"))
        }, error = function(e) NULL)
        
        if (!is.null(response) && status_code(response) == 200) {
          content(response, as = "text", encoding = "UTF-8")
        } else {
          glue("# Error fetching BibTeX for DOI: {doi_or_bibtex}")
        }
      } else {
        glue("# No BibTeX or DOI available for dataset: {dataset}")
      }
    }) %>%
    pull(bibtex)
  
  writeLines(bibtex_entries, output_bib_file)
  cat(glue("BibTeX entries written to {output_bib_file}\n"))
}

# Example
fetch_bibtex(
  data_index_file = "IRW Data Dictionary - data index.csv",
  dataset_without_doi_file = "BibTex_Manual.csv",
  dataset_names = c("16_personalityfactors", "enem_2020_1mil_mt"),
  output_bib_file = "output.bib"
)

@KingArthur0205
Copy link
Collaborator

Attaching the used CSV files:
IRW Data Dictionary - data index.csv
BibTex_Manual.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rpackage things to include in R package
Projects
None yet
Development

No branches or pull requests

2 participants