Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jasmine physical cyrcadian rhythm #205

Merged
merged 31 commits into from
Oct 30, 2023
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
7e170b5
Physical cyrcadian rhythm feature
GeorgeEfstathiadis Sep 22, 2023
72d80f1
remove rare argument from gps_summaries and place in Hyperparameters
GeorgeEfstathiadis Oct 3, 2023
c7ef436
typos plus pcr_window parameter for lookup
GeorgeEfstathiadis Oct 3, 2023
f6180a2
update documentation for refactoring of parameters and pcr
GeorgeEfstathiadis Oct 3, 2023
7bd5b6e
add new pcr parameters for bool to run and sampling rate
GeorgeEfstathiadis Oct 4, 2023
9107699
update gps_stats_main README
GeorgeEfstathiadis Oct 14, 2023
5e49b24
update link at README
GeorgeEfstathiadis Oct 14, 2023
cba42a3
Merge branch 'develop' into jasmine-physical_cyrcadian_rhythm
GeorgeEfstathiadis Oct 16, 2023
0d8d3c9
flake8 changes
GeorgeEfstathiadis Oct 16, 2023
9acaf14
dont include pcr when not used in columns
GeorgeEfstathiadis Oct 16, 2023
5ac8e12
Merge branch 'jasmine-physical_cyrcadian_rhythm' of https://github.co…
GeorgeEfstathiadis Oct 16, 2023
5f0fd2f
typing fix for return
GeorgeEfstathiadis Oct 16, 2023
dd129fa
typing fix for return 2
GeorgeEfstathiadis Oct 16, 2023
903d4d4
update tests
GeorgeEfstathiadis Oct 16, 2023
4a971f0
Merge branch 'develop' into jasmine-physical_cyrcadian_rhythm
hackdna Oct 17, 2023
62eb0b2
typos, import orders, re-assignments and order of operations
GeorgeEfstathiadis Oct 17, 2023
35b062f
PEP8 format and typos
GeorgeEfstathiadis Oct 17, 2023
c2b0fa2
Merge branch 'jasmine-physical_cyrcadian_rhythm' of https://github.co…
GeorgeEfstathiadis Oct 17, 2023
27a1b74
typo parameters.parameters
GeorgeEfstathiadis Oct 17, 2023
5607f6d
create new columns to reformat gps_summaries: split_day_night_cols, g…
GeorgeEfstathiadis Oct 17, 2023
d217db1
reformat gps_summaries
GeorgeEfstathiadis Oct 18, 2023
ef95fbf
Merge branch 'develop' into jasmine-physical_cyrcadian_rhythm
GeorgeEfstathiadis Oct 18, 2023
a0039f9
Merge branch 'develop' into jasmine-physical_cyrcadian_rhythm
hackdna Oct 18, 2023
1b107ec
documentation, naming changes
GeorgeEfstathiadis Oct 19, 2023
04e96b3
docs returns fix
GeorgeEfstathiadis Oct 19, 2023
643996a
add raises in docs of function
GeorgeEfstathiadis Oct 19, 2023
82aa9b0
add unit test for traj2stats functions
GeorgeEfstathiadis Oct 19, 2023
06c300e
Merge branch 'jasmine-physical_cyrcadian_rhythm' of https://github.co…
GeorgeEfstathiadis Oct 19, 2023
f97df96
Merge branch 'develop' into jasmine-physical_cyrcadian_rhythm
GeorgeEfstathiadis Oct 19, 2023
ce79648
style and naming changes
GeorgeEfstathiadis Oct 20, 2023
1aaeae7
Simplify chained comparison
hackdna Oct 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 2 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ sample_gps_data = sim_gps_data(n_persons, location, start_date, end_date, cycle,
gps_to_csv(sample_gps_data, path_to_synthetic_gps_data, start_date, end_date)

# 2. Specify parameters for imputation
# See https://github.com/onnela-lab/forest/wiki/Jasmine-documentation#input for details
# See https://forest.beiwe.org/en/latest/jasmine.html for details
# time zone where the study took place (assumes that all participants were always in this time zone)
tz_str = "Etc/GMT-1"
# Generate summary metrics e.g. Frequency.HOURLY, Frequency.DAILY or Frequency.HOURLY_AND_DAILY (see Frequency class in constants.py)
Expand All @@ -127,12 +127,8 @@ save_traj = False
parameters = None
# list of locations to track if visited, leave None if don't want these summary statistics
places_of_interest = ['cafe', 'bar', 'hospital']
# True if want to save a log of all locations and attributes of those locations visited
save_osm_log = True
# list of OpenStreetMap tags to use for identifying locations, leave None to default to amenity and leisure tagged locations or if you don't want to use OSM (see OSMTags class in constants.py)
osm_tags = None
# threshold of time spent in a location to count as being in that location, in minutes
threshold = 15

# 3. Impute location data and generate mobility summary metrics using the simulated data above
gps_stats_main(
Expand All @@ -143,9 +139,7 @@ gps_stats_main(
save_traj = save_traj,
parameters = parameters,
places_of_interest = places_of_interest,
save_osm_log = save_osm_log,
osm_tags = None,
threshold = threshold,
osm_tags = osm_tags,
)

# 4. Generate daily summary metrics for call/text logs
Expand Down
6 changes: 3 additions & 3 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,15 +170,15 @@ The summary statistics that are generated are listed below:
- Entropy measure based on the proportion of time spent at significant locations over the course of a day
- Letting p_i be the proportion of the day spent at significant location I, significant location entropy is calculated as -\sum_{i} p_i*log(p_i), where the sum occurs over all non-zero p_i for that day.
* - mis_duration
- Float
- Not Available
- Number of hours of GPS data missing over the course of a day
-
* - Physical circadian rhythm
- Not Available
- Float
- A continuous measurement of routine in the interval [0,1] that scores a day with 0 if there was a complete break from routine and 1 if the person followed the exact same routine as have in every other day of follow up
- For a detailed description of how this measure is calculated, see Canzian and Musolesi's 2015 paper in the Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, titled "Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis." Their procedure was followed using 30-min increments as a bin size.
* - Physical circadian rhythm stratified
- Not Available
- Float
- A continuous measurement of routine in the interval [0,1] that scores a day with 0 if there was a complete break from routine and 1 if the person followed the exact same routine as have in every other day of follow up
- Calculated in the same way as Physical circadian rhythm, except the procedure is repeated separately for weekends and weekdays.
```
Expand Down
39 changes: 22 additions & 17 deletions docs/source/jasmine.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ For instructions on how to install forest, please visit [here](https://github.co

### Input

When using jasmine, you should call function `gps_stats_main(study_folder, output_folder, tz_str, frequency, save_traj, parameters = None, save_osm_log = None, osm_tags = None, threshold, split_day_night, person_point_radius = 2, place_point_radius = 7.5, time_start = None, time_end = None, participant_ids = None, all_memory_dict = None, all_BV_set = None, quality_threshold = 0.05)` in the `traj2stats` module and specify:
When using jasmine, you should call function `gps_stats_main(study_folder, output_folder, tz_str, frequency, save_traj, places_of_interest = None, osm_tags = None, time_start = None, time_end = None, participant_ids = None, parameters = None, all_memory_dict = None, all_bv_set = None)` in the `traj2stats` module and specify:
- `study_folder`, string, the path of the study folder. The study folder should contain individual participant folder with a subfolder `gps` inside
- `output_folder`, string, the path of the folder where you want to save results

Expand All @@ -27,17 +27,13 @@ In addition, the main function takes four arguments that provide further flexibi
- `tz_str`, string, the timezone where the study is/was conducted. Please use "`pytz.all_timezones`" to check all options. For example, "America/New_York".
- `frequency`, Frequency class, the frequency of the summary stats (resolution for summary statistics) e.g. Frequency.HOURLY, Frequency.DAILY, etc.
- `save_traj`, bool, True if you want to save the trajectories as a csv file, False if you don't (default: False).
- `parameters`, a list of parameters, by default it is set to None. The details are as below.
- `places_of_interest`, a list of places of interest, by default it is set to None. The details are as used in openstreetmaps
- `save_osm_log`, bool, True if you want to output a log of locations visited and their tags(default: False).
- `osm_tags`, list of OSMTags class, a list of tags to filter the places of interest, by default it is set to None. The details are as used in openstreetmaps. Avoid using a lot of them if large area is covered.
- `threshold`, int, time spent in a pause needs to exceed the threshold to be placed in the log
- `split_day_night`, bool, True if you want to split all metrics to datetime and nighttime patterns (only for Frequency.DAILY)
- person_point_radius, float, radius of the person's circle when discovering places near him in pauses (default: 2)
- `place_point_radius`, float, radius of place's circle when place is returned as centre coordinates from osm (default: 7.5)
- `all_memory_dict` and `all_BV_set` are dictionaries from previous run (none if it's the first time).
- `parameters`, a list of parameters, by default it is set to None. The details are as below.
- `all_memory_dict` and `all_bv_set` are dictionaries from previous run (none if it's the first time).

You can also tweak the parameters that change the assumptions of the imputation and summary statistics. The parameters are

You can also tweak the parameters that change the assumptions of the imputation and summary statistics. The parameters are
(1) `l1`: the scale parameter in the abs function in the daily kernel;
(2) `l2`: the scale parameter in the abs function in the weekly kernel;
(3) `l3`: the scale parameter in the geographical kernel if only latitude or longitude is used;
Expand All @@ -58,7 +54,17 @@ You can also tweak the parameters that change the assumptions of the imputation
(18) `accuracylim`: we filter out GPS record with accuracy higher than this threshold.
(19) `r`: the maximum radius of a pause;
(20) `w`: a threshold for distance, if the distance to the great circle is greater than this threshold, we consider there is a knot;
(21) `h`: a threshold of distance, if the movement between two timestamps is less than h, consider it as a pause and a knot
(21) `h`: a threshold of distance, if the movement between two timestamps is less than h, consider it as a pause and a knot
(22) `save_osm_log`: bool, True if you want to output a log of locations visited and their tags(default: False).
(23) `log_threshold`: int, time spent in a pause needs to exceed the threshold to be placed in the log
(24) `split_day_night`: bool, True if you want to split all metrics to datetime and nighttime patterns (only for Frequency.DAILY)
(25) `person_point_radius`: float, radius of the person's circle when discovering places near him in pauses (default: 2)
(26) `place_point_radius`: float, radius of place's circle when place is returned as centre coordinates from osm (default: 7.5)
(27) `pcr_bool`: bool, True if you want to calculate the physical cyrcadian rhythm (default: False)
(28) `pcr_window`: int, number of days to look back and forward for calculating the physical cyrcadian rhythm (default: 14)
(29) `pcr_sample_rate`: int, number of seconds between each sample for calculating the physical cyrcadian rhythm (default: 30)


### Output

(1) summary statistics for all specified participants (.csv)
Expand All @@ -70,8 +76,8 @@ You can also tweak the parameters that change the assumptions of the imputation
- Contains start date/time and end date/time for each participant.\
- Is useful for tracking whose data during which time range have been processed, especially for the online algorithm.

(4) all_BV_set (.pkl)\
- It is a dictionary, with the key as user ID and the value as a numpy array with size, where each column represents [start_timestamp, start_latitude, start_longitude, end_timestamp, end_latitude, end_longitude]. If it is your first time run the code, it is set to NULL by default. If you want to continue your analysis from here in the future, all_BV_set is expected to be an input in your new analysis and it will be updated in that run. The size of the file should be fixed overtime.
(4) all_bv_set (.pkl)\
- It is a dictionary, with the key as user ID and the value as a numpy array with size, where each column represents [start_timestamp, start_latitude, start_longitude, end_timestamp, end_latitude, end_longitude]. If it is your first time run the code, it is set to NULL by default. If you want to continue your analysis from here in the future, all_bv_set is expected to be an input in your new analysis and it will be updated in that run. The size of the file should be fixed overtime.

(5) all_memory_dict (.pkl)\
- It is also a dictionary, with the key as user ID and the value as a numpy array of other parameters for the user. If it is your first time run the code, it is set to NULL by default. If you want to continue your analysis from here in the future, all_memory_dict is expected to be an input in your new analysis and it will be updated in that run. The size of the file should be fixed overtime.
Expand Down Expand Up @@ -115,8 +121,7 @@ This file imputes the missing trajectories based on the observed trajectory matr

`traj2stats.py`
This file converts the imputed trajectory matrix to summary statistics.

- `Hyperparameters`: @dataclass to store the hyperparameters for the imputation process.
- `Hyperparameters`: dataclass to store the hyperparameters for the imputation and summary statistics.
- `transform_point_to_circle`: transform a transforms a set of cooordinates to a shapely circle with a provided radius.
- `get_nearby_locations`: return a dictionary of nearby locations, a dictionary of nearby locations' names, and a dictionary of nearby locations' coordinates.
- `gps_summaries`: converts the imputed trajectory matrix to summary statistics.
Expand Down Expand Up @@ -147,9 +152,9 @@ The summary statistics that are generated are listed below:
| Average pause duration | Float | Average of the duration of all pauses that took place over the course of a day (in hour) | We consider that a participant has a pause if the distance that he has moved during a 30-s period is less than `r` m. By default, `r`=10.|
| Standard deviation of flight duration | Float | Standard deviation of the duration of all pauses that took place over the course of a day (in hour) | GPS is converted into a sequence of flights (straight line movement) and pauses (time spent stationary). The standard deviation of duration of pauses over the course of a day is reported. |
| Significant location entropy | Float | Entropy measure based on the proportion of time spent at significant locations over the course of a day | Letting p_i be the proportion of the day spent at significant location I, significant location entropy is calculated as -\sum_{i} p_i*log(p_i), where the sum occurs over all non-zero p_i for that day. |
| Minutes of GPS data missing | Float | Number of minutes of GPS data missing over the course of a day | |
| Physical circadian rhythm | Not Available | A continuous measurement of routine in the interval [0,1] that scores a day with 0 if there was a complete break from routine and 1 if the person followed the exact same routine as have in every other day of follow up | For a detailed description of how this measure is calculated, see Canzian and Musolesi's 2015 paper in the Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, titled "Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis." Their procedure was followed using 30-min increments as a bin size.|
| Physical circadian rhythm stratified | Not Available | A continuous measurement of routine in the interval [0,1] that scores a day with 0 if there was a complete break from routine and 1 if the person followed the exact same routine as have in every other day of follow up | Calculated in the same way as Physical circadian rhythm, except the procedure is repeated separately for weekends and weekdays. |
| Minutes of GPS data missing | Not Available | Number of minutes of GPS data missing over the course of a day | |
| Physical circadian rhythm | Float | A continuous measurement of routine in the interval [0,1] that scores a day with 0 if there was a complete break from routine and 1 if the person followed the exact same routine as have in every other day of follow up | For a detailed description of how this measure is calculated, see Canzian and Musolesi's 2015 paper in the Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, titled "Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis." Their procedure was followed using 30-min increments as a bin size.|
| Physical circadian rhythm stratified | Float | A continuous measurement of routine in the interval [0,1] that scores a day with 0 if there was a complete break from routine and 1 if the person followed the exact same routine as have in every other day of follow up | Calculated in the same way as Physical circadian rhythm, except the procedure is repeated separately for weekends and weekdays. |


### Other technical details
Expand Down
Loading