-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vignette for harmonizing full dataset #126
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this vignette is clear. Only one small change from me: the current merged dataset is saved as .RData and I need an extra step to save it as .rds so I can work with it in targets. Not sure if we should just save it as .rds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you tell me what command you use to build and view the vignettes?
- I would move one of these code samples to the examples on the front page? I think its an important use case for users. I would keep this vignette thought and link to it from the home page example.
vignettes/how_to_harmonize.Rmd
Outdated
|
||
## Introduction | ||
|
||
This vignette explains how you can transform variables across multiple CCHS datasets using the full datasets to the _cchsflow_ package. The full PUMF datasets can be found [here](https://odesi.ca/). A full harmonized dataset of all cchsflow variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to link to the actual dataset on odesi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of CCHS cycles on odesi, but I wasn't sure whether to add links to each individual cycle in this vignette or have the general link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like odesi update their website https://odesi.ca/en/browse? I think maybe just text that says go to this link and search for the cycle you want?
To show outputs in first chunk only, fix 2011 and 2012 outputs, use sample data
Sorry @kittychenn, I think you missed these comments,
|
@reikookamoto Adding you to this PR on Doug's suggestion, good to get an "outside" perspective on this feature. For some reason I can't add you as a reviewer (you don't show up when I search your username), maybe its because you're not part of the team? In any case I sent an invitation to join the GitHub team. |
Can you try adding me as a reviewer now?
Reiko
…On Fri, Oct 13, 2023 at 2:24 PM yulric ***@***.***> wrote:
@reikookamoto <https://github.com/reikookamoto> Adding you to this PR on
Doug's suggestion, good to get an "outside" perspective on this feature.
For some reason I can't add you as a reviewer (you don't show up when I
search your username), maybe its because you're not part of the team? In
any case I sent an invitation to join the GitHub team.
—
Reply to this email directly, view it on GitHub
<#126 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANBG5ZQ5N5FFQTDNHBUUPQ3X7GBOJAVCNFSM6AAAAAA4TXWYMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRRHE4TGNZTGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Should be there now. I think it was because you weren't a collaborator on the repo and not because you were not on the team.... |
@yulric I used knit to HTML to build and view the vignettes. The code sample is also available on the 'Get Started' page, so should I include it on the main page too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left my comments from an "outside" perspective @yulric
|
||
## Introduction | ||
|
||
This vignette explains how you can transform variables across multiple CCHS datasets using the full datasets to the _cchsflow_ package. The full PUMF datasets can be found [here](https://odesi.ca/). A full harmonized dataset of all _cchsflow_ variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider writing out the first instances of acronyms like CCHS and PUMF in full.
|
||
## Introduction | ||
|
||
This vignette explains how you can transform variables across multiple CCHS datasets using the full datasets to the _cchsflow_ package. The full PUMF datasets can be found [here](https://odesi.ca/). A full harmonized dataset of all _cchsflow_ variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This vignette explains how you can transform variables across multiple CCHS datasets using the full datasets to the _cchsflow_ package. The full PUMF datasets can be found [here](https://odesi.ca/). A full harmonized dataset of all _cchsflow_ variables | |
This vignette explains how you can transform variables across multiple Canadian Community Health Survey (CCHS) cycles using complete datasets with the _cchsflow_ package. The Public Use Microdata Files (PUMF) containing the complete data can be found [here](https://odesi.ca/). A full harmonized dataset of all _cchsflow_ variables | |
I'm not sure if I've correctly described the relationship between CCHS and PUMF, but something like this would provide more context to someone new to this area of study.
This vignette explains how you can transform variables across multiple CCHS datasets using the full datasets to the _cchsflow_ package. The full PUMF datasets can be found [here](https://odesi.ca/). A full harmonized dataset of all _cchsflow_ variables | ||
can be found [here](https://osf.io/j5wgu). With the original PUMF datasets, data file should be renamed such that it specifies the survey and cycle year, which follows the format of the _p sample data (ex. cchs2001_p, cchs2013_2014_p). | ||
|
||
To harmonize the data files, the `rec_with_table()` function is used to transform the indicated variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To harmonize the data files, the `rec_with_table()` function is used to transform the indicated variables. | |
To harmonize the data files, the `cchsflow::rec_with_table()` function is used to transform the indicated variables. | |
I know eventually we want users to use recodeflow::rec_with_table()
, but, for the time being, we could specify the package name to avoid confusion.
|
||
## How to combine a single variable across multiple cycles | ||
|
||
In this example, the sex variable from 2001 to 2018 CCHS datasets will be transformed and labeled using `rec_with_table()`, which is then combined into one dataset and labeled using `merge_rec_data()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused as to why we're harmonizing this variable from 2001 to 2018 when, in the previous section, users were advised not to harmonized data from cycles before 2014 with those from 2015 and onwards.
2014 with cycles from 2015
|
||
### Option 1: Using _cchsflow_ variable_details sheet | ||
|
||
When the variable argument in `rec_with_table()` is not specified, all variables listed in `variables.csv` and `variable_details.csv` will be transformed. In this example, all variables from the _cchsflow_ `variables.csv` and `variable_details.csv` sheets from 2001 to 2018 CCHS datasets will be transformed and labeled using `rec_with_table()`, which is then combined into one dataset and labeled using `merge_rec_data()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where will variables.csv
and variable_details.csv
be on the user's computer when they install/load the package (i.e., expected file path)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sheets will be in the inst/extdata
folder. The rec_with_table uses the sheets from that folder if the user does not pass in those parameters.
|
||
### Option 2: Using your own variable_details sheet | ||
|
||
In this example, all variables from personalized `variables.csv` and `variable_details.csv` sheets from 2001 to 2018 CCHS datasets will be transformed and labeled using `rec_with_table()`, which is then combined into one dataset and labeled using `merge_rec_data()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider showing the relationship between variables.csv
and sample_variables
and variable_details.csv
and sample_variable_details
. Is the user expected to do something like sample_variables <- readr::read_csv('variables.csv')
in their workspace before using the personalized spreadsheets?
…so that pkgdown would not complaing about including them in the references when building the documentation website
…ocumentation website
@kittychenn Sorry about getting back so late. All of Reiko's suggestions look good, can you address them? In addition I was building the website using the following commands,
and I'm getting an error in the Finally, I pushed some commits to fix some of the website build issues. |
This vignette provides examples of how to harmonize variables using the full dataset. It's a good starting point to harmonize variables, but users can also implement different pipeline tools to harmonize data. Should we include pipeline methods in the example or leave it as is?