-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detect duplicated columns - columns with similar values across all rows #167
Comments
For future memory, we mentioned a potential implementation for this (to double check) could be to convert to factor and then compare levels: library(magrittr)
dat <- simulist::sim_linelist()
head(dat)
#> id case_name case_type sex age date_onset date_admission outcome
#> 1 1 Uqbah al-Omar confirmed m 30 2023-01-01 <NA> recovered
#> 2 2 Gaitha al-Alli confirmed f 15 2023-01-07 <NA> recovered
#> 3 3 Muna el-Siddique probable f 90 2023-01-05 <NA> recovered
#> 4 4 Keauna Vickers confirmed f 21 2023-01-12 <NA> recovered
#> 5 6 Nazeeha al-Habeeb confirmed f 26 2023-01-14 <NA> recovered
#> 6 8 Delaney Clark confirmed f 65 2023-01-09 <NA> recovered
#> date_outcome date_first_contact date_last_contact ct_value
#> 1 <NA> <NA> <NA> 24.2
#> 2 <NA> 2023-01-04 2023-01-05 24.2
#> 3 <NA> 2022-12-31 2023-01-05 NA
#> 4 <NA> 2023-01-04 2023-01-08 24.2
#> 5 <NA> 2023-01-05 2023-01-09 24.2
#> 6 <NA> 2023-01-04 2023-01-07 24.2
lvls <- dat %>%
vapply(function(col) as.integer(factor(col, levels = unique(as.character(col)))), integer(nrow(.)))
cols_to_compare <- combn(ncol(dat), 2, simplify = FALSE)
duplicated_columns <- vapply(cols_to_compare, function(x) {
identical(lvls[, x[1]], lvls[, x[2]])
}, logical(1))
message("Duplicated columns: ", sprintf("\n- %s", lapply(cols_to_compare[duplicated_columns], paste, collapse = "/")))
#> Duplicated columns:
#> - 1/2 Created on 2024-08-07 with reprex v2.1.1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
No description provided.
The text was updated successfully, but these errors were encountered: