Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Forms Data Extraction #29

Merged
merged 12 commits into from
Dec 4, 2023
2 changes: 1 addition & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ export(get_ga_user)
export(get_github)
export(get_github_metrics)
export(get_github_user)
export(get_google_files)
export(get_repo_list)
export(get_repos_metrics)
export(get_youtube_stats)
export(gh_repo_wrapper)
export(list_calendly_events)
export(request_ga)
export(request_google_forms)
importFrom(assertthat,assert_that)
importFrom(assertthat,is.string)
importFrom(dplyr,"%>%")
Expand Down
1 change: 1 addition & 0 deletions R/auth.R
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ authorize <- function(app_name = NULL,
scope = scopes_list,
...
)
googledrive::drive_auth(token = token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why you had to add this line. Does this affect the Google Analytics code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At times we borrow the googledrive and googlesheets packages to do work for us. This makes it so that users don't have to do authorization for every single package. Instead they authorize metricminer and we send that authorization to the dependency packages we need.

}
set_token(token = token, app_name = app_name)
return(invisible(token))
Expand Down
252 changes: 247 additions & 5 deletions R/google-forms.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,251 @@

#' Get Google Forms
#' @description This is a function to get the Calendly API user info
#' @param token You can provide the a googledrive link that you'd like to narrow the search to
#' @return Calendly REST API response as a list
#' @importFrom utils menu installed.packages
#' @importFrom httr oauth_app oauth_endpoints oauth2.0_token
#' @param url The endpoint URL for the request
#' @param token credentials for access to Google using OAuth. `authorize("google")`
#' @param body_params The body parameters for the request
#' @param query_params The body parameters for the request
#' @param return_request Should a list of the request be returned as well?
#' @importFrom httr config accept_json content
#' @importFrom jsonlite fromJSON
#' @importFrom assertthat assert_that is.string
#' @export
#' @examples
request_google_forms <- function(token, url, query = NULL, body_params = NULL, query_params = NULL,
return_request = TRUE) {
if (is.null(token)) {
# Get auth token
token <- get_token(app_name = "google")
}
config <- httr::config(token = token)

result <- httr::GET(
url = url,
body = body_params,
query = query,
config = config,
httr::accept_json(),
encode = "json"
)

request_info <- list(
url = url,
token = token,
body_params = body_params,
query_params = query_params
)

if (httr::status_code(result) != 200) {
httr::stop_for_status(result)
return(result)
}

# Process and return results
result_content <- httr::content(result, "text")
result_list <- jsonlite::fromJSON(result_content)

if (return_request) {
return(list(result = result_list, request_info = request_info))
} else {
return(result_list)
}
}


#' Get Google Forms
#' @description This is a function to get the Calendly API user info
#' @param form_id The form ID we need to get
#' @param token credentials for access to Google using OAuth. `authorize("google")`
#' @param dataformat What format would you like the data? Options are "raw" or "dataframe". "dataframe" is the default.
#' @importFrom httr config accept_json content
#' @importFrom jsonlite fromJSON
#' @importFrom assertthat assert_that is.string
#' @examples \dontrun{
#'
#' authorize("google")
#' form_info <- get_google_form("https://docs.google.com/forms/d/1Z-lMMdUyubUqIvaSXeDu1tlB7_QpNTzOk3kfzjP2Uuo/edit")
#'
#' ### OR You can give it a direct form id
#' form_info <- get_google_form("1Z-lMMdUyubUqIvaSXeDu1tlB7_QpNTzOk3kfzjP2Uuo")
#' }
get_google_form <- function(form_id, token = NULL, dataformat = "dataframe") {
# If a URL is supplied, only take the ID from it.
if (grepl("https:", form_id[1])) {
form_id <- gsub("\\/viewform$|\\/edit$", "", form_id)
form_id <- gsub("https://docs.google.com/forms/d/e/|https://docs.google.com/forms/d/", "", form_id)
}

form_info_url <- gsub("\\{formId\\}", form_id, "https://forms.googleapis.com/v1/forms/{formId}")
form_response_url <- gsub("\\{formId\\}", form_id, "https://forms.googleapis.com/v1/forms/{formId}/responses")

message(paste0("Trying to grab form: ", form_id))

form_info <- request_google_forms(
url = form_info_url,
token = token
)

response_info <- request_google_forms(
url = form_response_url,
token = token
)

result <- list(
form_metadata = form_info,
response_info = response_info
)

if (dataformat == "dataframe") {

metadata <- get_question_metadata(form_info)

if (length(result$response_info$result) > 0) {
answers_df <- extract_answers(result)
} else {
answers_df <- "no responses yet"
}
return(list(title = result$form_metadata$result$info$title,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run form_info <- get_google_form("https://docs.google.com/forms/d/1Z-lMMdUyubUqIvaSXeDu1tlB7_QpNTzOk3kfzjP2Uuo/edit"), I get NULL as the value for form_info$title.

Is this expected behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running the code to grab multiple forms, I see that code returns the title of the Google form, making me think the above behavior may be a bug.

metadata = metadata,
answers = answers_df))
Comment on lines +108 to +109
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of making these tibbles? The output can get pretty ugly: #25

}
return(result)
}


#' Get multiple Google forms
#' @description This is a function to get the Calendly API user info
#' @param form_ids a vector of form ids you'd like to retrieve information for
#' @param token credentials for access to Google using OAuth. `authorize("google")`
#' @importFrom purrr map
#' @importFrom janitor make_clean_names
#' @examples \dontrun{
#'
#' authorize("google")
#' form_list <- googledrive::drive_find(shared_drive = googledrive::as_id("0AJb5Zemj0AAkUk9PVA"), type = "form")
#'
#' multiple_forms <- get_multiple_forms(form_ids = form_list$id)
#' }
get_multiple_forms <- function(form_ids = NULL, token = NULL) {

# Get all the forms info
all_form_info <- sapply(form_ids, function(form_id) {
get_google_form(
form_id = form_id,
token = token
)
}, simplify = FALSE, USE.NAMES = TRUE)


# Set up the names
titles <- purrr::map(all_form_info, ~ .x$title)
titles <- janitor::make_clean_names(titles)

# Set as names
names(all_form_info) <- titles

all_form_info
}

get_question_metadata <- function(form_info) {
metadata <- data.frame(
question_id = form_info$result$items$itemId,
title = form_info$result$items$title
)

if (length(form_info$result$items$questionItem$question$textQuestion) > 0) {
metadata <- data.frame(
metadata,
paragraph = form_info$result$items$questionItem$question$textQuestion
)
}
if (length(form_info$result$items$questionItem$question$choiceQuestion$type) > 0) {
metadata <- data.frame(
metadata,
choice_question = form_info$result$items$questionItem$question$choiceQuestion$type,
text_question = is.na(form_info$result$items$questionItem$question$choiceQuestion$type)
)
}

return(metadata)
}

extract_answers <- function(form_info) {
questions <- form_info$response_info$result$responses$answers

if (length(questions) > 0) {
# Extract the bits we want
answers <- purrr::map(
questions,
~ .x$textAnswers$answers
)

question_id <- purrr::map(
questions,
~ .x$questionId
)

# Reformat the answer info
answers <- purrr::map_depth(answers, 2, ~ ifelse(is.null(.x),
data.frame(value = "NA"),
.x
))

answers <- purrr::map_depth(answers, -1, ~ ifelse(length(.x) > 1,
paste0(.x, collapse = "|"),
.x
))
answers <- lapply(answers, purrr::map, -1)

# Turn into data frames
answers_df <- lapply(answers, paste0) %>% dplyr::bind_cols()
question_df <- lapply(question_id, paste0) %>% dplyr::bind_cols()

colnames(answers_df) <- paste0(colnames(answers_df), "_answers")
colnames(question_df) <- paste0(colnames(question_df), "_question")

# Put it all in a data.frame we will keep
info_df <- data.frame(
reponse_id = rep(form_info$response_info$result$responses$responseId, length(questions)),
answers_df,
question_df
)
} else {
info_df <- data.frame(value = "no responses yet")
}

return(info_df)
}


google_pagination <- function(first_page_result) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is needed for google forms that have a lot of responses, however I don't have any google forms that have that many responses so I can't test this. But I'll keep this infrastructure here for when we do have some data that needs it.

# Set up a while loop for us to store the multiple page requests in
cummulative_pages <- first_page_result$result$files
page <- 1

next_pg <- try(next_google(first_page_result), silent = TRUE)

while (!grepl("Error", next_pg$result[1])) {
cummulative_pages <- dplyr::bind_rows(cummulative_pages, next_pg$result$files)
next_pg <- try(next_google(first_page_result), silent = TRUE)
page <- page + 1
}
return(cummulative_pages)
}


next_google <- function(page_result) {
## TODO: Next page request is not working! Not sure why. It doesn't throw an error,
## but it just gives the same result everytime!
body_params <- c(page_result$request_info$body_params,
pageToken = page_result$result$nextPageToken
)

result <- request_google_forms(
token = token,
url = url,
body_params = body_params,
query_params = query_params,
return_request = TRUE
)

return(result)
}
55 changes: 0 additions & 55 deletions R/googledrive.R

This file was deleted.

3 changes: 2 additions & 1 deletion R/token-handlers.R
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@ get_token <- function(app_name, try = FALSE) {
get_stored_token <- function(app_name) {
if (app_name == "calendly") token <- getOption("calendly")
if (app_name == "github") token <- getOption("github")
if (app_name == "google") token <- try(readRDS(file.path(cache_secrets_folder(), "google.RDS")), silent = TRUE)
if (app_name == "google") token <- getOption("google")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized we don't need to readRDS we can just set this as an option too.


return(token)
}

Expand Down
2 changes: 1 addition & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
utils::globalVariables(c(
"result", "num", "test_name", "scopes", "set_token", "browseURL", "remove_token", "get_token", "get_github", "get_calendly", "%>%"
"result", "num", "test_name", "scopes", "set_token", "browseURL", "remove_token", "get_token", "get_github", "get_calendly", "%>%"
))
#' Supported endpoints
#' @description This is function stores endpoints and supported app names
Expand Down
27 changes: 27 additions & 0 deletions man/get_google_form.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading