update sycamore readme

onnela-lab · Mar 8, 2023 · a2e6fb3 · a2e6fb3
1 parent 94de089
commit a2e6fb3
Showing 1 changed file with 35 additions and 10 deletions.
diff --git a/docs/source/sycamore.md b/docs/source/sycamore.md
@@ -15,22 +15,34 @@ For more information, see the [librosa documentation](https://librosa.org/doc/la
 
 ## Import
 
-User-facing functions can be imported directly from sycamore:
+User-facing functions can be imported directly from sycamore. 
 
 `from forest.sycamore import compute_survey_stats`  
 `from forest.sycamore import aggregate_surveys_config` 
 `from forest.sycamore import survey_submits` 
 `from forest.sycamore import survey_submits_no_config` 
 `from forest.sycamore import agg_changed_answers_summary` 
 
+Note: Most users will only use compute_survey_stats. However, other functions are listed for users interested in code development.
+
 ## Usage:   
-Download raw data from your Beiwe server and use this package to process the data in the `survey_timings`, `survey_answers`, and `audio_recordings` data streams, using `survey_answers` as a backup for possible missing `survey_timings` files. Summary data provides metrics around survey submissions and survey question completion. Sycamore takes various auxiliary files which can be downloaded from the Beiwe website to ensure accurate output.  
+Download raw data from your Beiwe server and use this package to process survey data generated by the Beiwe app. Sycamore combines survey data from various data sources in order to facilitate research. It generates summary metrics on survey submissions and survey question completion.  
 
 ## Data:   
-Methods are designed for use on the `survey_timings` and `survey_answers` data from the Beiwe app.
+Methods are designed for use on the `survey_timings`, `survey_answers`, and `audio_recordings` data from the Beiwe app.  
+
+The `survey_timings` data stream is always necessary for data processing. However, because survey files are not always uploaded to the Beiwe server, the `survey_answers` data stream is also used as a backup to the `survey_timings` stream. The `survey_answers` stream only contains information about survey responses and the time of the survey's final submission, so the `survey_answers` stream alone shouldn't be used for survey processing. 
+
+The `audio_recordings` data stream can also be included in survey summary outputs. Sycamore does not process the audio data returned as part of audio surveys, but it can generate summaries with submission frequencies and survey duration for audio surveys. 
 
 ## Auxiliary files:   
-Sycamore requires users to manually download files from the Beiwe website to create some outputs. These files can be downloaded by clicking "Edit this Study" on the study page, and clicking on the relevant file. When running Sycamore, pass the path to the file downloaded by clicking "Export study settings JSON file" under "Export/Import study settings" to the `config_path` argument. Pass the file downloaded by clicking "Download Interventions" next to "Intervention Data" to the `interventions_filepath` argument. And, pass the file downloaded by clicking "Download Surveys" next to "Survey History" to the `history_path` argument. 
+Sycamore requires users to manually download files from the Beiwe website to create some outputs. These files can be downloaded by clicking "Edit this Study" on the study page, and clicking on the relevant file. 
+
+The file supplied to `config_path` can be downloaded by clicking "Export study settings JSON file" under "Export/Import study settings" on the study settings page.  If the `config_path` argument is not supplied, the `submits_and_deliveries.csv`, `submits_summary_daily.csv`, and `submits_summary_hourly.csv` files will not be generated. This is because these files rely on an estimate of when surveys were delivered, and Sycamore gets information about when survey deliveries are made from the study configuration file.  
+
+The file supplied to `interventions_filepath` can be downloaded by clicking "Download Interventions" next to "Intervention Data" on the study settings page. If the `interventions_filepath` argument is not supplied, and if your study used relative surveys, the `submits_and_deliveries.csv`, `submits_summary_daily.csv`, and `submits_summary_hourly.csv` files will not be generated. This is because the interventions file contains information about each Beiwe user's intervention date, and Sycamore cannot guess a user's intervention date from survey data alone. When running Sycamore, be sure to use an up-to-date version of the interventions file which contains intervention dates for users recently added to your study.  
+
+The file supplied to `history_path` can be downloaded by clicking "Download Surveys" next to "Survey History" on the study settings page. If this file is not supplied, Sycamore will not be able to provide prompts corresponding to audio surveys in output files. In addition, if this file is not supplied, and if the text of survey questions was changed during the study, surveys recovered from `survey_answers` files may not have the correct question IDs.
 
 ___
 ## Functions  
@@ -43,7 +55,18 @@ ___
 ___
 ## 1. `sycamore.base.compute_survey_stats` 
 
-compute_survey_stats runs aggregate_surveys_config, survey_submits, survey_submits_no_config, and agg_changed_answers_summary, and writes their output to csv files
+compute_survey_stats runs aggregate_surveys_config, survey_submits, survey_submits_no_config, and agg_changed_answers_summary, and writes their output to csv files. `compute_survey_stats` takes the following arguments:
+
+`data_dir`: the path to the directory where Beiwe data is stored.
+`output_dir`: the path to the file directory where output is to be written.
+`tz_str`: the time zone where the study was conducted. (if not defined, defaults to 'UTC'). You can see a list of all possible timezone names by importing `pytz` and using `pytz.all_timezones`
+`beiwe_ids`: the list of Beiwe IDs to run Forest on. If this is not specified, sycamore will run on all users in the data_dir directory.
+`config_path`: the filepath to your downloaded survey config file. See above for explanations about downloading auxiliary files.
+`interventions_filepath`: the filepath to your downloaded interventions timing file. 
+`history_path`: the filepath to your downloaded survey history file
+`start_date`: the earliest date you think you might want survey information. Beiwe will generate survey deliveries starting at this date, and it will not include any surveys taken prior to this date in any outputs. 
+`end_date`: the latest date you think you might want survey information. Beiwe will generate survey deliveries ending at this date, and it will not include any surveys taken after to this date in any outputs. 
+`submits_timeframe`: Which timeframe to generate submission summaries. This must be one of the frequencies specified in  `forest.constants.Frequency`. It determines whether `submits_summary_daily.csv` or `submits_summary_hourly.csv` get generated. The default for this is to generate both hourly and daily summaries, so you can probably just leave this argument alone and delete any unwanted files. But, if you want, you can specify one timeframe.
 
 
 *Example (without config file)*    
@@ -55,16 +78,17 @@ output_dir = path/to/output
 beiwe_ids = list of ids in study_dir
 start_date = "2022-01-01"
 end_date = "2022-06-04"
-study_tz = Timezone of study (if not defined, defaults to 'UTC')
+tz_str = "America/New_York"
 
 compute_survey_stats(
-    study_dir, output_dir, study_tz, beiwe_ids, start_date=start_date, 
+    study_dir, output_dir, tz_str, beiwe_ids, start_date=start_date, 
     end_date=end_date
 )
 ```
 
 *Example (with config file)* 
 ```
+from forest.constants import Frequency
 config_path = path/to/config file
 interventions_filepath = path/to/interventions file
 history_path = path/to/history/file
@@ -73,13 +97,14 @@ output_dir = path/to/output
 beiwe_ids = list of ids in study_dir
 start_date = "2022-01-01"
 end_date = "2022-06-04"
-study_tz = Timezone of study (if not defined, defaults to 'UTC')
+tz_str = "America/New_York"
+submits_timeframe = Frequency.HOURLY_AND_DAILY
 
 
 compute_survey_stats(
     study_dir, output_dir, study_tz, beiwe_ids, start_date=start_date, 
-    end_date=end_date, config_path, interventions_filepath, 
-    history_path=history_path
+    end_date=end_date, config_path = config_path, interventions_filepath = interventions_filepath,
+    history_path=history_path, submits_timeframe = submits_timeframe
 )
 
 ```