Skip to content

Example project we use in the reproducibility lesson.

License

Notifications You must be signed in to change notification settings

coderefinery/word-count

Repository files navigation

Binder

Word count example

This example project will count words in a given text and plot a bar chart of the 10 most common words.

Dependencies

See environment.yml.

Usage

In this example we wish to:

  1. Analyze word frequencies using statistics/count.py for 4 books (they are all in the data directory).
  2. Plot a histogram using plot/plot.py

For one book (isles.txt) use the scripts like this:

$ python code/count.py data/isles.txt > statistics/isles.data
$ python code/plot.py --data-file statistics/isles.data --plot-file plot/isles.png

To run these scripts for all books you can collect these calls all into one bash script and run it with bash run_all.sh. One step further and less code, you could also loop through all known book titles in a bash script and run it with: bash run_all_loop.sh.

Workflow

Implemented using Snakemake in Snakefile.

Tests

End to end tests are provided in the test directory.

Acknowledgement

Inspired by and derived from https://hpc-carpentry.github.io/hpc-python/ which is distributed under Creative Commons Attribution license (CC-BY 4.0).

CodeRefinery workshop

We use this example in the CodeRefinery workshop in this lesson: