Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preschool_sel #736

Closed
ben-domingue opened this issue Dec 13, 2024 · 8 comments
Closed

preschool_sel #736

ben-domingue opened this issue Dec 13, 2024 · 8 comments
Assignees
Labels
data fix fixing an existing dataset

Comments

@ben-domingue
Copy link
Owner

decompose to multiple tables?

Children aged 2 - 5 answer item that assess their social and emotional skills, cognitive skills, early math, early reading, and other indicators of kindergarten readiness

@ben-domingue ben-domingue added the data fix fixing an existing dataset label Dec 13, 2024
@saviranadela saviranadela self-assigned this Dec 17, 2024
@saviranadela
Copy link
Collaborator

hi @ben-domingue ! how can i find the link to the raw data?

@ben-domingue
Copy link
Owner Author

see here: https://github.com/ben-domingue/irw/blob/main/data/preschool_sel.R
and here:
Children aged 2 - 5 answer item that assess their social and emotional skills, cognitive skills, early math, early reading, and other indicators of kindergarten readiness (CC BY 4.0) [link] Bailey, C. S., Korucu, İrem, Eveleigh, A., Schnur, G., Costello, L., Tuttle, M., Knox-Lane, T., Cassidy, C., Ondrusek, A., McNaboe, T., Mazhar, A., & Xie, F. (2023). Preschool Social and Emotional Development Study—Connecticut Dataset. LDbase. https://doi.org/10.33009/ldbase.1680213217.8ed0 Get BibTex

@saviranadela
Copy link
Collaborator

@saviranadela
Copy link
Collaborator

data:
preschool_sel.zip

code:

library(tidyverse)
library(haven)
library(labelled)

df <- read_sav('R305A180293_child-level_CT_v35.sav')

names(df) <- tolower(names(df))

df <- df |>
  select(contains('ss1'),
         contains('ss2'),
         contains('ss3'),
         contains('ss4'),
         contains('ss5'),
         contains('ss6'),
         contains('ss7'),
         contains('ss8'),
         contains('ss9'),
         contains('ss10'),
         contains('as1'),
         contains('as2'),
         contains('as3'),
         contains('as4'),
         contains('as5'),
         contains('as6'),
         contains('as7'),
         contains('as8'),
         contains('as9'),
         contains('as10'),
         starts_with('wj_lw'),
         starts_with('wj_ap'),
         starts_with('dn_1'),
         starts_with('dn_2'),
         starts_with('dn_3'),
         starts_with('dn_4'),
         starts_with('dn_5'),
         starts_with('dn_6'),
         starts_with('dn_7'),
         starts_with('dn_8'),
         starts_with('dn_9'),
         contains('_1s_'),
         contains('_2s_'),
         contains('_3s_'),
         contains('_4s_'),
         contains('_5s_'),
         contains('_6s_'),
         starts_with('box1'),
         starts_with('box2'),
         starts_with('htks1_1'),
         starts_with('htks1_2'),
         starts_with('htks1_3'),
         starts_with('htks1_4'),
         starts_with('htks1_5'),
         starts_with('htks1_6'),
         starts_with('htks1_7'),
         starts_with('htks1_8'),
         starts_with('htks1_9')) |> 
  select(-starts_with('emt1'),
         -starts_with('emt2'),
         -starts_with('emt3'),
         -starts_with('emt4'),
         -ends_with('hapb'),
         -ends_with('sadb'),
         -ends_with('angb'),
         -ends_with('afrb'),
         -ends_with('hapa'),
         -ends_with('sada'),
         -ends_with('anga'),
         -ends_with('afra'),
         -wj_lww_t1,
         -wj_lwss_t1,
         -wj_apw_t1,
         -wj_apss_t1,
         -contains('notes')) |>
  # replace invalid values with NA, change others to 0/1 wrong/right binary
  mutate(across(starts_with('dn'), ~if_else(. == 1, NA, .)),
         across(starts_with('dn'), ~if_else(. == 2, 1, .)),
         across(contains('_1s_') | contains('_2s_') | contains('_3s_') | contains('_4s_') | contains('_5s_') | contains('_6s_'), 
                ~if_else(. == 1, 0, .)),
         across(contains('_1s_') | contains('_2s_') | contains('_3s_') | contains('_4s_') | contains('_5s_') | contains('_6s_'), 
                ~if_else(. == 2, 1, .)),
         across(starts_with('htks'), ~if_else(. == 1, 0, .)),
         across(starts_with('htks'), ~if_else(. == 2, 1, .)),
         across(starts_with('box'), ~if_else(. == 0.5, NA, .)),
         # create participant ID
         id = row_number())

# find variables with no response or single responses to drop
# put them in a list to drop
drop_vars <- c()

for (i in 1:ncol(df)) {
  unique_vals <- unique(df[[i]])
  unique_len <- length(unique_vals)
  
  if (unique_len == 1 & is.na(unique(unique_vals[1]))) {
    drop_vars <- append(drop_vars, names(df)[i])
  }
  
  if (unique_len == 2) {
    if (is.na(unique_vals[1]) | is.na(unique_vals[2])) {
      drop_vars <- append(drop_vars, names(df)[i])
    }
  }
}

# drop variables with no responses or singular resposes
df <- df |>
  select(-all_of(drop_vars)) |>
  # pivot df to be long by item
  pivot_longer(cols = -id,
               names_to = 'item',
               values_to = 'resp',
               values_drop_na = T)

df_pl <- df %>%
  filter(grepl("^pl", item))

df_emt <- df %>%
  filter(grepl("^emt", item))

df_box <- df %>%
  filter(grepl("^box", item))

df_htks <- df %>%
  filter(grepl("^htks", item))

df_dn <- df %>%
  filter(grepl("^dn", item))

df_akt <- df %>%
  filter(grepl("^akt", item))

df_wj <- df %>%
  filter(grepl("^wj", item))

write.csv(df_pl, "preschool_sel_pl.csv", row.names=FALSE)
write.csv(df_emt, "preschool_sel_emt.csv", row.names=FALSE)
write.csv(df_box, "preschool_sel_box.csv", row.names=FALSE)
write.csv(df_htks, "preschool_sel_htks.csv", row.names=FALSE)
write.csv(df_dn, "preschool_sel_dn.csv", row.names=FALSE)
write.csv(df_akt, "preschool_sel_akt.csv", row.names=FALSE)
write.csv(df_wj, "preschool_sel_wj.csv", row.names=FALSE)

@saviranadela
Copy link
Collaborator

PR: #751

@ben-domingue
Copy link
Owner Author

ben-domingue commented Dec 19, 2024

@saviranadela this is great. one question. what is PL? see row 741 in green
https://docs.google.com/spreadsheets/d/1nhPyvuAm3JO8c9oa1swPvQZghAvmnf4xlYgbvsFH99s/edit?gid=0#gid=0

@saviranadela
Copy link
Collaborator

PL = preLAS. it's a language profienciency assessment for early learners! two subsets were used here: Simon Says (ss) and Art Show (as). see more: https://laslinks.com/prelas/

@ben-domingue
Copy link
Owner Author

fantastic, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data fix fixing an existing dataset
Projects
None yet
Development

No branches or pull requests

2 participants