The commit history of this repository reflects what a student might do as she works through this activity from STAT 545. This fully developed example shows:
[x] How to run an R script non-interactively
[x] How to use make
- to record which files are inputs vs. intermediates vs. outputs
- to capture how scripts and commands convert inputs to outputs
- to re-run parts of an analysis that are out-of-date
[x] The intersection of R and make
, i.e. how to
- run snippets of R code
- run an entire R script
- render an R Markdown document (or R script)
[x] The interface between RStudio and make
[x] How to use make
from the shell
[x] How Git facilitates the process of building a pipeline
File | |
---|---|
make | |
R script 1 | |
R script 2 | |
master rmd generator | |
words.txt | |
python script |
Original output
Final output
[makefile2dot][makefile2dot] is used yo produce the proceeding image in the make
pipeline
The output is like this:
make output.png # output.dot is automatically removed after png is made
makefile2dot: https://github.com/vak/makefile2dot
-
If the word file is avaliable in
/usr/share/dict/words
, it is copied from the location aswords.txt
, if unvaliable a.py
script downloads the word list from an online source. This evaluation is done using anif
else
Bourne-again shell (Bash
) also known as (sh
) snippets script is used to make the call between download or copy. -
An
R
script that contains a for loop is used to created a concatenated string vector of 26 elements with^
followed byletters
such as^S
usingpaste0
to be used asregex
input to match thewords.txt
. This vectors is then matched and computed into a tibble displaying the frequency of each letter in the start position (begining) of each word in thewords.txt
dataset. The table is saved asfreq_let.tsv
. -
Another
R
script is then used to generate the plots using thetsv
producing the outputfreq_let.png
(the snippets are commented out as they seem too chunky) in themakefile
. -
A different approach is taken where a master file
reportgen.txt
is used as a starting point using a combination ofreadLines
andwriteLines
and with different lines to be read so different.rmd
's could be generated,report.rmd
orreport2.rmd
. This reduces the number ofrmd
in the repo making it reproducible and less clutered.report.rmd
is the complete report a submission of @zeeva85 along with the original work from @jennyBC.report2.rmd
is a condensed submission of purely @zeeva85 modified version. They can be accesed via analysis1/ analysis2 (see below). Thermd
generates anmd
andhtml
file which is kept as per the original assignment. The usage to access the individual reports:-
make analysis1 # jennyBC version
make analysis2 # zeeva85 version
make # complete report with @zeeva85 + @jennyBC
- Ussage to clean
make clean_old # @jennyBC
make clean2 # @zeeva85
make clean # cleans all version, includes removal of output.png.
.py
python script is used to download datash
snippet evaluates it to download usingIF
ELSE
whenwords.txt
is unavailable.py
python script is used to make the make file workflow graph for better visualizaton