Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

florida_twins_behavior #734

Open
ben-domingue opened this issue Dec 13, 2024 · 8 comments
Open

florida_twins_behavior #734

ben-domingue opened this issue Dec 13, 2024 · 8 comments
Labels
data fix fixing an existing dataset

Comments

@ben-domingue
Copy link
Owner

we should investigate whether this should be multiple tables.
code here: https://github.com/ben-domingue/irw/blob/main/data/florida_twins_behavior.R

@ben-domingue ben-domingue added the data fix fixing an existing dataset label Dec 13, 2024
@saviranadela
Copy link
Collaborator

saviranadela commented Dec 19, 2024

similar with florida_twins, so i guess we need to, but i might be wrong...

they have panas items, cads, friends, etc that can be split

@ben-domingue
Copy link
Owner Author

yeah i think we should split here. apologies that i messed this up in the beginning!

@ben-domingue
Copy link
Owner Author

see also comment i made in #735 . i think these are intertwined.

@ben-domingue
Copy link
Owner Author

@saviranadela same here

@saviranadela
Copy link
Collaborator

saviranadela commented Jan 9, 2025

@ben-domingue i think i understand why you separated this from the other one. based on the LDbase site, they categorize the data into four types:

  1. Parent survey
  2. Child survey (which isn’t necessarily a twin survey?)
  3. Behavior and environment survey (described on the site as 'corresponding to both the parent and twin self-report survey data')
  4. Twin progress monitoring

from what you’ve worked with previously, we only have # 2 (which is the florida_twins) and # 3 (florida_twins_behavior).

my guess is that # 3 is meant to include elements of both # 1 and # 4, which could explain why you didn’t include # 1 and # 4.

so my conclusion is, this might not be redundant with florida_twins #735

please let me know if this makes sense to you! 😬

more: https://ldbase.org/datasets/1c53beea-ddc1-4efa-a88b-dc18f311f1c6

@ben-domingue
Copy link
Owner Author

i think this makes sense. the two datasets at the beginning were just not quite right; at this point, i am happy with just decomposing the big 2 into N smaller datasets but i'm not really sure what the right value for N is. it sounds like you're getting some traction on it though?

@saviranadela
Copy link
Collaborator

decomposed to 7 smaller datasets

data:
florida_twins_behavior.zip

code:

library(tidyverse)
library(readr)

df <- read_csv('multiparentandchild0311 LDBase.csv')

names(df) <- tolower(names(df))

df <- df |>
  select(-starts_with('panas_pa'),
         -starts_with('panas_na'),
         -starts_with('ecs_ec'),
         -starts_with('ecs_imp'),
         -starts_with('rcads_mdd'),
         -starts_with('rcads_ocd'),
         -starts_with('rcads_gad'),
         -starts_with('rcads_pda'),
         -starts_with('rcads_sad'),
         -starts_with('rcads_sp'),
         -starts_with('cadsyv_pos'),
         -starts_with('cadsyv_dar'),
         -starts_with('cadsyv_pro'),
         -starts_with('cadsyv_neg'),
         -starts_with('cadsyv_soc'),
         -starts_with('cadsyv_resp'),
         -starts_with('cadsyv_dis'),
         -starts_with('tas_autonomic'),
         -starts_with('tas_offtask'),
         -starts_with('tas_thoughts'),
         -starts_with('friends_bad'),
         -starts_with('friends_school'),
         -starts_with('friends_good'),
         -contains('hem'),
         -contains('chaos'),
         -starts_with('p_'),
         -starts_with('p_panas'),
         -contains('pdbd'),
         -contains('feeling'),
         -pair_gender,
         -zyg_par,
         -starts_with('bg_id'),
         -`...1`,
         -id1,
         -contains('swan'),
         -twinid,
         -starts_with('n'))|>
  pivot_longer(cols = -c(id0, famid),
               names_to = 'item',
               values_to = 'resp',
               values_drop_na = T) |>
  rename(id = id0, family_id = famid)

unique(df$item)

# print response values
table(df$resp)

df_panas <- df %>%
  filter(grepl("panas", item))

df_ecs <- df %>%
  filter(grepl("ecs", item))

df_rcads <- df %>%
  filter(grepl("rcads", item))

df_cads <- df %>%
  filter(grepl("^cads_", item))

df_tas <- df %>%
  filter(grepl("tas", item))

df_friends <- df %>%
  filter(grepl("friends", item))

df_cadsyv <- df %>%
  filter(grepl("^cadsyv", item))

length(unique(df_panas$item))
length(unique(df_ecs$item))
length(unique(df_rcads$item))
length(unique(df_cads$item))
length(unique(df_tas$item))
length(unique(df_friends$item))
length(unique(df_cadsyv$item))

write.csv(df_panas, "florida_twins_behavior_panas.csv", row.names=FALSE)
write.csv(df_ecs, "florida_twins_behavior_ecs.csv", row.names=FALSE)
write.csv(df_rcads, "florida_twins_behavior_rcads.csv", row.names=FALSE)
write.csv(df_cads, "florida_twins_behavior_cads.csv", row.names=FALSE)
write.csv(df_tas, "florida_twins_behavior_tas.csv", row.names=FALSE)
write.csv(df_friends, "florida_twins_behavior_friends.csv", row.names=FALSE)
write.csv(df_cadsyv, "florida_twins_behavior_cadsyv.csv", row.names=FALSE)

@saviranadela
Copy link
Collaborator

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data fix fixing an existing dataset
Projects
None yet
Development

No branches or pull requests

2 participants