Social contacts and other risk factors for respiratory infections among internally displaced people in Somaliland. Data and analyses.
This repository contains the data and code used for all analyses described in our manuscript:
Van Zandvoort K, Bobe MO, Hassan AI, Abdi MI, Ahmed MS, Soleman MS, Warsame MY, Wais MA, Diggle E, McGowan CR, Satzke C, Mulholland K, Egeh MM, Hassan MM, Hergeeye MA, Eggo RM, Checchi F, Flasche S, Social contacts and other risk factors for respiratory infections among internally displaced people in Somaliland. Available at https://doi.org/10.1016/j.epidem.2022.100625.
This work is part of a larger study: Evaluating Strategies for Pneumococcal Immunization Campaigns in Crises (ESPICC).
You can clone the repository or download the zip from this URL: https://github.com/kevinvzandvoort/espicc-somaliland-digaale-survey-2019/archive/refs/heads/main.zip.
This survey was implemented using Open Data Kit. Android tablets were provided by LSHTM Open Data Kit https://opendatakit.lshtm.ac.uk
- Questionnaires were programmed in xlsx, and can be found in the
./questionnaire/xlsx
folder - They were converted in xls files uploaded to an ODK server and used with ODK Collect during fieldwork. The xls files can be found in the
./questionnaire/xls
folder.- The xls for the household questionnaire was manually edited to randomly select household members (based on their age), for inclusion in the contact survey. Edits are wrapped within comments listed as
<MANUAL EDIT>
and<END MANUAL>
.
- The xls for the household questionnaire was manually edited to randomly select household members (based on their age), for inclusion in the contact survey. Edits are wrapped within comments listed as
The following questionnaires are available:
s1_household
- A household survey asking about household-level risk factors and household demographics
s2_contacts
- A contact survey asking about social contacts within the 24 hours before the survey, and individual-level risk factors for respiratory infections
s3_anthropometry
- A form to enter anthropometric measures
s4_missing_houses
- A form to ask neighbours of shelters that were absent on all visits about the status of these shelters
Only a subset of the data collected with these questionnaires during the survey has been used for this analysis.
Data has been anonymized, and links between household-, contact-, and nutrition- data have been removed.
The anonymized data can be used to replicate all analyses, figures, and tables in the manuscript.
All data is stored in the ./data
folder. A data dictonary is provided in ./data/data_dictionary.xlsx
The following datasets are included:
household_data.RDS
- Reported household-level risk-factors
- collected with the
s1_household
form
household_data_members.RDS
- Age-group and sex of household members
- collected with the
s1_household
form
household_data_members_migration.RDS
- Age of people reported to have left surveyed households in the six months preceding the survey
- collected with the
s1_household
form
household_data_members_mortality.RDS
- Age of people reported to have died in surveyed households in the six months preceding the survey
- collected with the
s1_household
form
missing_houses_manual_categories.RDS
- Status of shelters where no individual was present on repeat visits, according to their neighbours
- collected with the
s4_missing_houses
form
participant_data.RDS
- Non-contact related individual-level risk factors
- collected with the
s2_contacts
form
contact_data_contactors.RDS
- Contact-related information from contactors (participants in the contact survey)
- collected with the
s2_contacts
form
contact_data_contactees.RDS
- Information about contactees reported by contactors
- collected with the
s2_contacts
form
nutrition_data.RDS
- anthropometric assessments of children aged 6 to 59 months old, who were included in the contact survey
- collected with the
s3_anthropometry
form
regression_data.RDS
- combined (aggregated) datasets of contact, participant, nutrition, and household level data, used for logistic regression analysis
Code for the analysis can be found in the ./scripts
folder.
The analysis can be replicated by running the index.R
file (in R), which sources these scripts.
Figures and tables will be created in a newly formed ./output
folder
The ./scripts/socialmixr_zenodo_data.R
script generates the data that can be used with the socialmixr
package.
This data has been uploaded to Zenodo: https://doi.org/10.5281/zenodo.5226280.
To use data in socialmixr
:
pacman::p_load(magrittr, socialmixr, data.table)
#' Get data from Zenodo
digaale_contact_data =
socialmixr::get_survey("https://zenodo.org/record/5226280")
#' The estimated population size in Digaale (for provided age groups)
#' can manually be downloaded
digaale_survey_population =
data.table::fread("https://zenodo.org/record/7071876/files/espicc_somaliland_digaale_survey_population.csv")
#' Note that weekends fall on Fridays and Saturdays in Somaliland.
#' - The dayofweek variable provided in the dataset has been kept
#' consistent with R defaults (0: Sunday to 6: Saturday)
digaale_contact_data$participants[, c("dayofweek", "dayofweek_name", "weekend")] %>%
unique %>% setorder(dayofweek) %>% .[]
#' socialmixr currently assumes the weekend to fall on dayofweek
#' 6 (Saturday) and 0 (Sunday)
#' - dayofweek can be manually edited so that Fridays and Saturdays
#' are taken as the weekend, if you wish to weight contacts by
#' weekday
digaale_contact_data$participants[, dayofweek := ifelse(dayofweek == 6, 0, dayofweek + 1)]
#' The contact matrix can then be constructed as follows
#' - The provided survey_population can be used to construct a
#' population representative matrix for Digaale IDP camp
#' - As the sample is not self-weighing (oversampling of young
#' age groups), it is recommended to apply the survey_weight
#' as weights
digaale_contact_matrix = digaale_contact_data %>%
socialmixr::contact_matrix(survey.pop = digaale_survey_population,
age.limits = digaale_survey_population$lower.age.limit,
symmetric = TRUE, weights = "survey_weight", weigh.dayofweek = TRUE)
#' Note socialmixr's contact matrices show contactors in rows
#' and contactees in columns
digaale_contact_matrix$matrix %>% round(1)
contact.age.group
[0,10) [10,20) [20,30) [30,40) [40,50) [50,60) 60+
[1,] 3.9 1.2 0.6 0.8 0.6 0.3 0.4
[2,] 1.6 4.5 1.0 0.8 0.6 0.4 0.3
[3,] 1.9 2.6 2.7 1.7 1.2 0.7 0.6
[4,] 2.8 2.1 1.7 2.9 1.5 1.1 0.9
[5,] 2.3 1.9 1.6 1.8 1.7 1.1 1.2
[6,] 1.8 2.0 1.5 2.1 1.6 1.6 1.6
[7,] 1.7 1.2 0.8 1.2 1.2 1.2 2.0