Skip to content

Commit

Permalink
Sycamore documentation updates (#164)
Browse files Browse the repository at this point in the history
* use bolding instead of headers
* Fix FAQ section heading
* Clean up Markdown styling
---------
Co-authored-by: Ilya Sytchev <[email protected]>
Co-authored-by: Zachary Clement <[email protected]>
  • Loading branch information
joannakennedyharvard authored May 10, 2023
1 parent 093264a commit 0ee6ab2
Showing 1 changed file with 37 additions and 51 deletions.
88 changes: 37 additions & 51 deletions docs/source/sycamore.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Sycamore

## Executive Summary:
## Executive Summary

Use `sycamore` to process and analyze Beiwe survey data.

Expand All @@ -17,28 +17,30 @@ For more information, see the [librosa documentation](https://librosa.org/doc/la

User-facing functions can be imported directly from sycamore.

### Main Function:
### Main Function
`from forest.sycamore import compute_survey_stats`

### Less commonly used functions:
`from forest.sycamore import aggregate_surveys_config`
`from forest.sycamore import survey_submits`
`from forest.sycamore import survey_submits_no_config`
`from forest.sycamore import agg_changed_answers_summary`
### Less commonly used functions
```python
from forest.sycamore import aggregate_surveys_config
from forest.sycamore import survey_submits
from forest.sycamore import survey_submits_no_config
from forest.sycamore import agg_changed_answers_summary
```

Note: Most users will only use compute_survey_stats. However, other functions are listed for users interested in code development, or for users running Sycamore on studies with a very large number of surveys. If a very large number of surveys are collected, the main function (`compute_survey_stats`), which runs all of the other functions, may take a long time when a researcher may only be interested in a specific output
Note: Most users will only use compute_survey_stats. However, other functions are listed for users interested in code development, or for users running Sycamore on studies with a very large number of surveys. If a very large number of surveys are collected, the main function (`compute_survey_stats`), which runs all the other functions, may take a long time when a researcher may only be interested in a specific output

## Usage:
## Usage
Download raw data from your Beiwe server and use this package to process survey data generated by the Beiwe app. Summary data provides metrics around survey submissions and survey question completion. Sycamore also takes various auxiliary files which can be downloaded from the Beiwe website to ensure accurate output.

## Data:
Methods are designed for use on the `survey_timings`, `survey_answers`, and `audio_recordings` data from the Beiwe app.
## Data
Methods are designed for use on the `survey_timings`, `survey_answers`, and `audio_recordings` data from the Beiwe app.

The `survey_timings` and `survey_answers` data streams are required for optimal data processing. The `survey_timings` stream is the best source of survey data because it has information on when a user responded to each question. Because survey files are not always uploaded to the Beiwe server, the `survey_answers` data stream is used as a backup to the `survey_timings` stream. The `survey_answers` stream only contains information about survey responses and the time of the survey's final submission, so the `survey_answers` stream alone shouldn't be used for survey processing.

The `audio_recordings` data stream can also be included in survey summary outputs. Sycamore does not process the audio data returned as part of audio surveys, but it can generate summaries with submission frequencies and survey duration for audio surveys.

## Auxiliary files:
## Auxiliary files
Sycamore requires users to manually download files from the Beiwe website to create some outputs. These files can be downloaded by clicking "Edit this Study" on the study page, and clicking on the relevant file.

The file supplied to `config_path` can be downloaded by clicking "Export study settings JSON file" under "Export/Import study settings" on the study settings page. If the `config_path` argument is not supplied, the `submits_and_deliveries.csv`, `submits_summary_daily.csv`, and `submits_summary_hourly.csv` files will not be generated. This is because these files rely on an estimate of when surveys were delivered, and Sycamore gets information about when survey deliveries are made from the study configuration file.
Expand All @@ -48,15 +50,16 @@ The file supplied to `interventions_filepath` can be downloaded by clicking "Dow
The file supplied to `history_path` can be downloaded by clicking "Download Surveys" next to "Survey History" on the study settings page. If this file is not supplied, Sycamore will not be able to provide prompts corresponding to audio surveys in output files. In addition, if this file is not supplied, and if the text of survey questions was changed during the study, surveys recovered from `survey_answers` files may not have the correct question IDs.

___
## Functions
1. [`sycamore.base.compute_survey_stats`](#1-sycamorebasecompute_survey_stats)
2. [`sycamore.common.aggregate_surveys_config`](#2-sycamorecommonaggregate_surveys_config)
3. [`sycamore.submits.survey_submits`](#3-sycamoresubmitssurvey_submits)
4. [`sycamore.submits.survey_submits_no_config`](#4-sycamoresubmitssurvey_submits_no_config)
5. [`sycamore.responses.agg_changed_answers_summary`](#5-sycamoreresponsesagg_changed_answers_summary)
## Functions

* [](#sycamorebasecompute_survey_stats)
* [](#sycamorecommonaggregate_surveys_config)
* [](#sycamoresubmitssurvey_submits)
* [](#sycamoresubmitssurvey_submits_no_config)
* [](#sycamoreresponsesagg_changed_answers_summary)
___
## 1. `sycamore.base.compute_survey_stats`

### `sycamore.base.compute_survey_stats`

compute_survey_stats runs aggregate_surveys_config, survey_submits, survey_submits_no_config, and agg_changed_answers_summary, and writes their output to csv files. `compute_survey_stats` takes the following arguments:

Expand All @@ -71,9 +74,8 @@ compute_survey_stats runs aggregate_surveys_config, survey_submits, survey_submi
`end_date`: the latest date you think you might want survey information. Beiwe will generate survey deliveries ending at this date, and it will not include any surveys taken after to this date in any outputs.
`submits_timeframe`: Which timeframe to generate submission summaries. This must be one of the frequencies specified in `forest.constants.Frequency`. It determines whether `submits_summary_daily.csv` or `submits_summary_hourly.csv` (which aggregate survey deliveries and deliveries at the daily or hourly levels) get generated. The default for this is to generate both hourly and daily summaries, so you can probably just leave this argument alone and delete any unwanted files. But, if you want, you can specify one timeframe.


*Example (without config file)*
```
```python
from forest.sycamore import compute_survey_stats

study_dir = path/to/data
Expand All @@ -90,7 +92,7 @@ compute_survey_stats(
```

*Example (with config file)*
```
```python
from forest.constants import Frequency
config_path = path/to/config file
interventions_filepath = path/to/interventions file
Expand All @@ -109,30 +111,28 @@ compute_survey_stats(
end_date=end_date, config_path = config_path, interventions_filepath = interventions_filepath,
history_path=history_path, submits_timeframe = submits_timeframe
)
```

Most users should be able to use `compute_survey_stats` for all of their survey processing needs. However, if a study has collected a very large number of surveys, subprocesses are also exposed to reduce processing time.

___
## 2. `sycamore.common.aggregate_surveys_config`
### `sycamore.common.aggregate_surveys_config`

Aggregate all survey information from a study, using the config file to infer information about surveys

*Example*
```
```python
from forest.sycamore import aggregate_surveys_config

agg_data = aggregate_surveys_config(study_dir, config_path, study_tz, history_path=history_path)
```

___
## 3. `sycamore.submits.survey_submits`
### `sycamore.submits.survey_submits`

Extract and summarize delivery and submission times

*Example*
```
```python
from forest.sycamore.submits import survey_submits

config_path = path/to/config file
Expand All @@ -152,27 +152,24 @@ submits_detail, submits_summary = survey_submits(
history_path
)
```

___
## 4. `sycamore.submits.survey_submits_no_config`
### `sycamore.submits.survey_submits_no_config`
Used to extract an alternative survey submits table that does not include delivery times

*Example*
```
```python
from forest.sycamore import survey_submits_no_config

study_dir = path/to/data
study_dir = "path/to/data"

submits_tbl = survey_submits_no_config(study_dir)
```

___
## 5. `sycamore.responses.agg_changed_answers_summary`
### `sycamore.responses.agg_changed_answers_summary`
Used to extract data summarizing user responses

*Example*
```
```python
from forest.sycamore import agg_changed_answers_summary

config_path = path/to/config file
Expand All @@ -187,31 +184,20 @@ study_tz = Timezone of study (if not defined, defaults to 'UTC')
agg_data = aggregate_surveys_config(study_dir, config_path, study_tz, history_path=history_path)

ca_detail, ca_summary = agg_changed_answers_summary(config_path, agg_data)
```

#FAQ:
## FAQ

#### In the `submits_summary.csv` file, there are some rows where `num_submitted_surveys` is greater than `num_surveys`. How could a user have submitted more surveys than were delivered to them?
**In the `submits_summary.csv` file, there are some rows where `num_submitted_surveys` is greater than `num_surveys`. How could a user have submitted more surveys than were delivered to them?**

Sycamore doesn't know exactly when surveys were delivered to users. Survey delivery times are estimated using the study configuration file which you enter when you run the code. For example, imagine that you started running a study in March with survey deliveries happening daily, and in April you decided to switch your surveys to be delivered weekly. If you ran Sycamore in April, your config file would tell Sycamore that surveys were delivered weekly throughout the whole study. So, if you had a user submitting surveys daily during March, they would have ~30 survey submissions, but Sycamore would think that only ~5 surveys had been delivered during that time.

In addition, this may happen if a researcher manually re-sends surveys, because Sycamore has no information about manual (unscheduled) deliveries.

#### In the `submits_and_deliveries.csv` file, there are a ton of rows with deliveries but no submissions. Why is this happening?
**In the `submits_and_deliveries.csv` file, there are a ton of rows with deliveries but no submissions. Why is this happening?**

If surveys are sent on a weekly schedule, Sycamore assumes that there is a survey delivered every week between the `start_date` and `end_date` which you entered. If you want there to be fewer empty rows in your output, you can move `start_date` and `end_date` to be closer to the actual start and end dates of your study.

#### What does `surv_inst_flg` mean in the outputs?
**What does `surv_inst_flg` mean in the outputs?**

`surv_inst_flg` is a unique identifying number to distinguish different times when the same individual took the same survey. This column is useful for joining outputs together.










0 comments on commit 0ee6ab2

Please sign in to comment.