Merge branch 'master' of github.com:langcog/peekbank-data-import

peekbank · Aug 30, 2024 · 7be4527 · 7be4527
2 parents 31661e1 + f19fa2b
commit 7be4527
Show file tree

Hide file tree

Showing 10 changed files with 662 additions and 38 deletions.
diff --git a/.github/workflows/test_import.yml b/.github/workflows/test_import.yml
@@ -0,0 +1,36 @@
+# For help debugging build failures open an issue on the RStudio community with the 'github-actions' tag.
+# https://community.rstudio.com/new-topic?category=Package%20development&tags=github-actions
+on:
+  push:
+    branches:
+      - main
+      - master
+  pull_request:
+    branches:
+      - main
+      - master
+
+name: test-import
+
+jobs:
+  test-import:
+    runs-on: macOS-latest
+
+    env:
+      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
+
+    steps:
+      - uses: actions/checkout@v2
+
+      - uses: r-lib/actions/setup-r@v2
+
+      - uses: r-lib/actions/setup-pandoc@v1
+
+      - name: Install dependencies
+        run: |
+          install.packages(c('remotes','purrr','dplyr','here','tidyr'))
+          remotes::install_github("langcog/peekbankr",  force=T)          
+        shell: Rscript {0}
+
+      - name: Run pipeline 
+        run: Rscript helper_functions/pipeline.R
diff --git a/data/fernald_marchman_2012/ReadME.md b/data/fernald_marchman_2012/ReadME.md
@@ -27,10 +27,15 @@ Note: for images, some images were shown in slightly different versions or mirro
 
 For the manju and tempo trials, some were exposure where the object was on a background and some were tests where they were not on the background.
 
-5. Importing ambiguity
+Some images were mirrored depending on left/right positioning - image labels L and R are from the participants' perspective.
 
+IMPORTANT: for related/unrelated prime noun/verb trials, the trials are represented in the raw data TWICE - once centered on the onset of the verb and once centered on the onset of the noun. We only keep the trial representation centered on the onset of the noun.
 
+5. Importing ambiguity
 
-ToDos:
-* check with Martin and/or Virginia about whether slightly different images (mirroring) matter
+Point of disambiguation is tricky for verb and adjective trials - should this be the first informative moment (e.g. when an informative verb was mentioned) or at the onset of the noun?
 
+In the raw data, point of disambiguation:
+- exposure novel trials: F0 is the onset of the verb
+- 24mos: adjective: word onset is the adjective
+- 30mos: hard adjective trials: onset of the color/ size
diff --git a/data/fernald_marchman_2012/import.R b/data/fernald_marchman_2012/import.R
@@ -45,6 +45,12 @@ d_processed_24 <- d_raw_24 %>%
 d_raw_30 <- read_delim(fs::path(read_path, "TL230ABoriginalichartsn1-121toMF.txt"),
   delim = "\t"
 )
+# remove duplicated trials (recentered on verb instead of noun)
+d_raw_30 <- d_raw_30 |>
+  filter(
+    !(OriginalCondition %in% c("R-primeVerb","UR-primeVerb"))
+  )
+
 # d_raw_30 has two slightly different types of rows mixed together
 d_processed_30_part_1 <- d_raw_30 |>
   filter(is.na(Shifts)) |>
@@ -60,15 +66,17 @@ d_processed_30_part_1 <- d_raw_30 |>
 
 d_processed_30_part_2 <- d_raw_30 |>
   filter(!is.na(Shifts)) |>
-  # these *do* have looking data in non-looking cols
-  rename(
-    f01 = `Frames - word starts at frame 45 `,
-    f02 = `First Shift Gap`,
-    f03 = `RT`,
-    f04 = `CritOnSet`,
-    f05 = `CritOffSet`
-  ) |>
+  # # these *do* have looking data in non-looking cols
+  # rename(
+  #   f01 = `Frames - word starts at frame 45 `,
+  #   f02 = `First Shift Gap`,
+  #   f03 = `RT`,
+  #   f04 = `CritOnSet`,
+  #   f05 = `CritOffSet`
+  # ) |>
   preprocess_raw_data() %>%
+  #drop final x column
+  select(-x270) %>%
   relabel_time_cols(
     metadata_names = extract_col_types(.)[["metadata_names"]],
     pre_dis_names = extract_col_types(.)[["pre_dis_names"]],
@@ -143,9 +151,6 @@ d_tidy <- d_tidy %>%
     TRUE ~ right_image
   ))
 
-
-## TODO See Readme for some questions about stimulus table
-
 # create stimulus table
 stimulus_table_link <- d_tidy %>%
   distinct(target_image, target_label) |>
@@ -230,7 +235,7 @@ d_tidy <- d_tidy %>%
   )
 
 # create zero-indexed ids for trial_types
-d_trial_type_ids <- d_tidy %>% 
+d_trial_type_ids <- d_tidy %>%
   distinct(
     target_id, distractor_id, target_side,
     condition
@@ -252,7 +257,7 @@ d_tidy_semifinal <- d_tidy %>%
   left_join(d_administration_ids) %>%
   left_join(d_trial_type_ids) |>
   select(-condition2, -original_condition, -cond_orig)
-  
+
 
 # get zero-indexed trial ids for the trials table
 d_trial_ids <- d_tidy_semifinal %>%
@@ -262,13 +267,13 @@ d_trial_ids <- d_tidy_semifinal %>%
   ) %>%
   # the prescreen notes are not attached to all rows of a trial (sub_num x session x months x trial_type_id), so we fix this
   group_by(sub_num, session, months, trial_type_id) %>%
-  summarize(prescreen_notes = first(na.omit(prescreen_notes)), .groups = 'drop') %>% 
+  summarize(prescreen_notes = first(na.omit(prescreen_notes)), .groups = 'drop') %>%
   mutate(excluded = !is.na(prescreen_notes)) |>
   rename(exclusion_reason = prescreen_notes) |>
   group_by(sub_num, session, months) %>%
   mutate(trial_order = cumsum(trial_type_id != lag(trial_type_id, default = first(trial_type_id)))) %>%
-  ungroup() %>% 
-  mutate(trial_id = 0:(n()-1)) %>% 
+  ungroup() %>%
+  mutate(trial_id = 0:(n()-1)) %>%
   distinct()
 
 # join
@@ -464,5 +469,5 @@ write_and_validate(
   aoi_region_sets = NA,
   xy_timepoints = NA,
   aoi_timepoints,
-  upload = TRUE
+  upload = FALSE
 )
diff --git a/data/fernald_marchman_2012/notes b/data/fernald_marchman_2012/notes
diff --git a/data/kremin_2021/README.md b/data/kremin_2021/README.md
@@ -0,0 +1,32 @@
+# kremin_2021 dataset
+
+## Reference
+Kremin, L. V., Jardak, A., Lew-Williams, C., & Byers-Heinlein, K. (2023). Bilingual children’s comprehension of code-switching at an uninformative adjective. Language Development Research 3(1), 249–276.
+
+## Abstract
+Bilingual children regularly hear sentences that contain words from both languages, also known as code-switching. Investigating how bilinguals process code-switching is important for understanding bilingual language acquisition, because young bilinguals have been shown to experience processing costs and reduced comprehension when encountering code-switched nouns. Studies have yet to investigate if processing costs are present when children encounter code-switches at other parts of speech within a sentence. The current study examined how 30 young bilinguals (age range: 37 – 48 months) processed sentences with code-switches at an uninformative determiner-adjective pair before the target noun (e.g., “Can you find le bon [the good] duck?) compared to single-language sentences (e.g., “Can you find the good duck?”). Surprisingly, bilingual children accurately identified the target object in both sentence types, contrasting with previous findings that sentences containing codeswitching lead to processing difficulties. Indeed, children showed similar (and in some cases, better) comprehension of sentences with a code-switch at an uninformative adjective phrase, relative to single-language sentences. We conclude that functional information conveyed by a code-switch may contribute to bilingual children’s sentence processing.
+
+## Original study info
+Participants were 36-month-old bilinguals (Eng-Fre from Montreal, and Eng-Spa from Princeton).
+The key manipulation was code-mixing in prenominal adjectives before the target noun (e.g., "Can you see the good cow?" vs "Can you see le bon cow?"); note that the adjectives were uninformative.
+
+Data from Montreal was collected with a Tobii T60-XL eyetracker, and data from Princeton was collected using a video camera and manual gaze coding.
+
+Note that the data only include "single" and "mixed" conditions; there are also other "filler" trials that were not in the data (although the filler data for the Eng-Fra subset can be found in Sander-Montant et al. ([2022](osf.io/2m345/))).
+
+## Importing decisions
+Stimuli are available although these are in video files so will need to be extracted.
+CDI and Language exposure data for the Montreal subset are also available in Perez et al. ([2024](https://osf.io/mxksz/)) and could be imported (although they have not yet been).
+There are also vocabulary data from DVAP; these could be imported in the lang_measures field for subject_aux_data.
+
+We included importing decisions from the [processing script](https://osf.io/ug7t3/files/github/01_load.R), including the rectification of AOIs and the removal of duplicated header rows.
+Age was calculated from years, months, and days using the formula: years * 365.2425 + months * (365.2425/12) + days.
+
+Monitor size for the Montreal data was set as 1920x1200 based on the Tobii export.
+
+We decided that the data reflect non-vanilla trials, although the "single" condition trials (e.g., "Can you see the good cow?") could be construed as "vanilla" in the sense that the carrier phrase is unlikely to bias the results.
+(This would also be true for the "filler" condition trials should we decide to include these data).
+The data may be worth further analysis despite their non-vanilla status.
+
+## Importing ambiguity
+None other than those reported above.