Skip to content

Commit

Permalink
Merge branch 'master' of github.com:langcog/peekbank-data-import
Browse files Browse the repository at this point in the history
  • Loading branch information
adriansteffan committed Aug 30, 2024
2 parents 31661e1 + f19fa2b commit 7be4527
Show file tree
Hide file tree
Showing 10 changed files with 662 additions and 38 deletions.
36 changes: 36 additions & 0 deletions .github/workflows/test_import.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# For help debugging build failures open an issue on the RStudio community with the 'github-actions' tag.
# https://community.rstudio.com/new-topic?category=Package%20development&tags=github-actions
on:
push:
branches:
- main
- master
pull_request:
branches:
- main
- master

name: test-import

jobs:
test-import:
runs-on: macOS-latest

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

steps:
- uses: actions/checkout@v2

- uses: r-lib/actions/setup-r@v2

- uses: r-lib/actions/setup-pandoc@v1

- name: Install dependencies
run: |
install.packages(c('remotes','purrr','dplyr','here','tidyr'))
remotes::install_github("langcog/peekbankr", force=T)
shell: Rscript {0}

- name: Run pipeline
run: Rscript helper_functions/pipeline.R
11 changes: 8 additions & 3 deletions data/fernald_marchman_2012/ReadME.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,15 @@ Note: for images, some images were shown in slightly different versions or mirro

For the manju and tempo trials, some were exposure where the object was on a background and some were tests where they were not on the background.

5. Importing ambiguity
Some images were mirrored depending on left/right positioning - image labels L and R are from the participants' perspective.

IMPORTANT: for related/unrelated prime noun/verb trials, the trials are represented in the raw data TWICE - once centered on the onset of the verb and once centered on the onset of the noun. We only keep the trial representation centered on the onset of the noun.

5. Importing ambiguity

ToDos:
* check with Martin and/or Virginia about whether slightly different images (mirroring) matter
Point of disambiguation is tricky for verb and adjective trials - should this be the first informative moment (e.g. when an informative verb was mentioned) or at the onset of the noun?

In the raw data, point of disambiguation:
- exposure novel trials: F0 is the onset of the verb
- 24mos: adjective: word onset is the adjective
- 30mos: hard adjective trials: onset of the color/ size
39 changes: 22 additions & 17 deletions data/fernald_marchman_2012/import.R
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@ d_processed_24 <- d_raw_24 %>%
d_raw_30 <- read_delim(fs::path(read_path, "TL230ABoriginalichartsn1-121toMF.txt"),
delim = "\t"
)
# remove duplicated trials (recentered on verb instead of noun)
d_raw_30 <- d_raw_30 |>
filter(
!(OriginalCondition %in% c("R-primeVerb","UR-primeVerb"))
)

# d_raw_30 has two slightly different types of rows mixed together
d_processed_30_part_1 <- d_raw_30 |>
filter(is.na(Shifts)) |>
Expand All @@ -60,15 +66,17 @@ d_processed_30_part_1 <- d_raw_30 |>

d_processed_30_part_2 <- d_raw_30 |>
filter(!is.na(Shifts)) |>
# these *do* have looking data in non-looking cols
rename(
f01 = `Frames - word starts at frame 45 `,
f02 = `First Shift Gap`,
f03 = `RT`,
f04 = `CritOnSet`,
f05 = `CritOffSet`
) |>
# # these *do* have looking data in non-looking cols
# rename(
# f01 = `Frames - word starts at frame 45 `,
# f02 = `First Shift Gap`,
# f03 = `RT`,
# f04 = `CritOnSet`,
# f05 = `CritOffSet`
# ) |>
preprocess_raw_data() %>%
#drop final x column
select(-x270) %>%
relabel_time_cols(
metadata_names = extract_col_types(.)[["metadata_names"]],
pre_dis_names = extract_col_types(.)[["pre_dis_names"]],
Expand Down Expand Up @@ -143,9 +151,6 @@ d_tidy <- d_tidy %>%
TRUE ~ right_image
))


## TODO See Readme for some questions about stimulus table

# create stimulus table
stimulus_table_link <- d_tidy %>%
distinct(target_image, target_label) |>
Expand Down Expand Up @@ -230,7 +235,7 @@ d_tidy <- d_tidy %>%
)

# create zero-indexed ids for trial_types
d_trial_type_ids <- d_tidy %>%
d_trial_type_ids <- d_tidy %>%
distinct(
target_id, distractor_id, target_side,
condition
Expand All @@ -252,7 +257,7 @@ d_tidy_semifinal <- d_tidy %>%
left_join(d_administration_ids) %>%
left_join(d_trial_type_ids) |>
select(-condition2, -original_condition, -cond_orig)


# get zero-indexed trial ids for the trials table
d_trial_ids <- d_tidy_semifinal %>%
Expand All @@ -262,13 +267,13 @@ d_trial_ids <- d_tidy_semifinal %>%
) %>%
# the prescreen notes are not attached to all rows of a trial (sub_num x session x months x trial_type_id), so we fix this
group_by(sub_num, session, months, trial_type_id) %>%
summarize(prescreen_notes = first(na.omit(prescreen_notes)), .groups = 'drop') %>%
summarize(prescreen_notes = first(na.omit(prescreen_notes)), .groups = 'drop') %>%
mutate(excluded = !is.na(prescreen_notes)) |>
rename(exclusion_reason = prescreen_notes) |>
group_by(sub_num, session, months) %>%
mutate(trial_order = cumsum(trial_type_id != lag(trial_type_id, default = first(trial_type_id)))) %>%
ungroup() %>%
mutate(trial_id = 0:(n()-1)) %>%
ungroup() %>%
mutate(trial_id = 0:(n()-1)) %>%
distinct()

# join
Expand Down Expand Up @@ -464,5 +469,5 @@ write_and_validate(
aoi_region_sets = NA,
xy_timepoints = NA,
aoi_timepoints,
upload = TRUE
upload = FALSE
)
6 changes: 0 additions & 6 deletions data/fernald_marchman_2012/notes

This file was deleted.

32 changes: 32 additions & 0 deletions data/kremin_2021/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# kremin_2021 dataset

## Reference
Kremin, L. V., Jardak, A., Lew-Williams, C., & Byers-Heinlein, K. (2023). Bilingual children’s comprehension of code-switching at an uninformative adjective. Language Development Research 3(1), 249–276.

## Abstract
Bilingual children regularly hear sentences that contain words from both languages, also known as code-switching. Investigating how bilinguals process code-switching is important for understanding bilingual language acquisition, because young bilinguals have been shown to experience processing costs and reduced comprehension when encountering code-switched nouns. Studies have yet to investigate if processing costs are present when children encounter code-switches at other parts of speech within a sentence. The current study examined how 30 young bilinguals (age range: 37 – 48 months) processed sentences with code-switches at an uninformative determiner-adjective pair before the target noun (e.g., “Can you find le bon [the good] duck?) compared to single-language sentences (e.g., “Can you find the good duck?”). Surprisingly, bilingual children accurately identified the target object in both sentence types, contrasting with previous findings that sentences containing codeswitching lead to processing difficulties. Indeed, children showed similar (and in some cases, better) comprehension of sentences with a code-switch at an uninformative adjective phrase, relative to single-language sentences. We conclude that functional information conveyed by a code-switch may contribute to bilingual children’s sentence processing.

## Original study info
Participants were 36-month-old bilinguals (Eng-Fre from Montreal, and Eng-Spa from Princeton).
The key manipulation was code-mixing in prenominal adjectives before the target noun (e.g., "Can you see the good cow?" vs "Can you see le bon cow?"); note that the adjectives were uninformative.

Data from Montreal was collected with a Tobii T60-XL eyetracker, and data from Princeton was collected using a video camera and manual gaze coding.

Note that the data only include "single" and "mixed" conditions; there are also other "filler" trials that were not in the data (although the filler data for the Eng-Fra subset can be found in Sander-Montant et al. ([2022](osf.io/2m345/))).

## Importing decisions
Stimuli are available although these are in video files so will need to be extracted.
CDI and Language exposure data for the Montreal subset are also available in Perez et al. ([2024](https://osf.io/mxksz/)) and could be imported (although they have not yet been).
There are also vocabulary data from DVAP; these could be imported in the lang_measures field for subject_aux_data.

We included importing decisions from the [processing script](https://osf.io/ug7t3/files/github/01_load.R), including the rectification of AOIs and the removal of duplicated header rows.
Age was calculated from years, months, and days using the formula: years * 365.2425 + months * (365.2425/12) + days.

Monitor size for the Montreal data was set as 1920x1200 based on the Tobii export.

We decided that the data reflect non-vanilla trials, although the "single" condition trials (e.g., "Can you see the good cow?") could be construed as "vanilla" in the sense that the carrier phrase is unlikely to bias the results.
(This would also be true for the "filler" condition trials should we decide to include these data).
The data may be worth further analysis despite their non-vanilla status.

## Importing ambiguity
None other than those reported above.
Loading

0 comments on commit 7be4527

Please sign in to comment.