Contribution instructions should state dataset limitations (CRAN) #29

tomjemmett · 2020-10-06T14:49:12Z

Currently the contribution section on Readme.md doesn't state any of the limitations imposed by CRAN, namely that the entire package must be <5MB in size.

chrismainey · 2020-10-06T17:20:03Z

Agreed. We should probably write guidance for other ways to submit larger datasets. Emphasis on this being for training, not a general data warehouse. Hacktoberfest?

tomjemmett · 2021-03-08T21:18:32Z

Here are some of my (highly opinionated) thoughts on this. I think we should aim to follow the Tidyverse Style Guide where possible.

datasets should be designed for teaching how to do things in R, so should be easy to understand and relevant datasets to a general audience
datasets need to be relatively small in size; CRAN has a limit of 5MB for the entire package, so each dataset should be no more than 500KB in size. You can check with object.size()
datasets should not contain any sensitive or disclosive information; they are being released publicly. The data ideally should be from a published source, or synthetic/generated data
datasets that come from other sources must be licensed under a suitable license for reshaping, e.g. MIT, GPL, OGL, CC. Attribution must be included to the source data
datasets should be saved as a tibble - you can use as_tibble() to convert
datasets should be named using camel case, as should all columns within the dataset
datasets should be documented with roxygen2; this documentation should be a high level overview
datasets should have a vignette that describes what the data is in more detail than the documentation goes into as well as containing a useful example (ideally examples) of how to use the data demonstrating useful R functions
vignettes should use tidyverse functions and avoid base R and data.table; this is more so because the introductory training NHS-R offers focussed on the tidyverse
vignettes should not require the use of too many extra packages. Any packages you use must be included in the Suggests section of DESCRIPTION

Lextuga007 · 2024-10-01T19:22:24Z

Adding to this list:

each function should have an example that is more than the use of glimpse() which is now listed in the Get Started vignette. An example being in ons_mortality the example is how to view the data in wide form with each date as a column.

Lextuga007 added the documentation Improvements or additions to documentation label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contribution instructions should state dataset limitations (CRAN) #29

Contribution instructions should state dataset limitations (CRAN) #29

tomjemmett commented Oct 6, 2020

chrismainey commented Oct 6, 2020

tomjemmett commented Mar 8, 2021

Lextuga007 commented Oct 1, 2024

Contribution instructions should state dataset limitations (CRAN) #29

Contribution instructions should state dataset limitations (CRAN) #29

Comments

tomjemmett commented Oct 6, 2020

chrismainey commented Oct 6, 2020

tomjemmett commented Mar 8, 2021

Lextuga007 commented Oct 1, 2024