index.Rmd

---
title: "Syllabus"
author:
  name: "Max Held"
  affiliation: "Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)"
date: "Summer Term 2020"
bibliography: library.bib
---

```{r setup, echo=FALSE, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(printr)
```


```{r readme, child="README.md"}
```

<div class="jumbotron" style="color:white; background: linear-gradient( rgba(0, 0, 0, 0.7), rgba(0, 0, 0, 0.7) ), url(img/keyboard-keys-2.jpg) no-repeat center center fixed; -webkit-background-size: cover; -moz-background-size: cover; -o-background-size: cover; background-size: cover;">
  <h2>Software Carpentry: Hacking Skills for Data Science</h2>
  <p>... because learning from hackers is learning to win?</p>
  <p> <span class="label label-default">
  #DataScience
  </span>
  <span class="label label-primary">
  #rstats
  </span>
  <span class="label label-info">
  Git(Hub)
  </span>
    <span class="label label-success">
  #ReproducibleResearch
  </span>
  </p>
  <p><small><sub>
    Image Credit: Red Alt [CC BY 2.0](https://creativecommons.org/licenses/by/2.0/) [hjl](https://www.flickr.com/photos/hjl/8205547070/in/photolist-dv6zgu-nffY2e)
  </sub></small></p>
</div>

---

<div class="embed-responsive embed-responsive-16by9">
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/dU1xS07N-FA?rel=0" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
</div>

> *[Coding – ] it’s the next best thing we have to a superpower.*
> -- [Drew Houston](@drewhouston) via [code.org](https://code.org)

> *So we were very worried that what if the astronaut, during mid-course, would select pre-launch, for example?*
> *Never would happen, they said.*
> *Never would happen.* 
> *It happened.*
> -- [Margaret Hamilton](https://www.metaltoad.com/blog/history-computer-girls-part-2-margaret)

> *Computers ... a bicycle for the mind*
> -- [Steven Jobs](https://www.brainpickings.org/2011/12/21/steve-jobs-bicycle-for-the-mind-1990/)

> *To me programming is more than an important practical art.*
> *It is also a gigantic undertaking in the foundations of knowledge.*
> -- [Grace Murray Hopper](https://en.wikipedia.org/wiki/Grace_Hopper)

> *Think of free speech, not free beer.*
> -- [Richard Stallman](https://stallman.org/)

> *Open source isn't like free sunshine; it's like a free puppy.*
> -- [Sarah Novotny](https://sarahnovotny.com/)

> *Most learning is not the result of instruction.*
> *It is rather the result of unhampered participation in a meaningful setting.*
> -- Ivan @Illich-1971


## Prerequisites {.alert .alert-success}

*Everyone* is welcome to this seminar.
This is *not* a "proper" computer science class, and participants do *not* need any background in CS, statistics or math.

You should just be curious and ready to:

- learn to use specialised command-line software and open-source tools for collaboration,
- read and write technical documents in simple, readable english and
- collaborate intensively using (perhaps unfamiliar) web-based tools.

No worries, we'll bring everyone up to speed in very little time.

You do *not* need to have completed a prior version of this class, or any other class.
If you *have* some prior training, you will start the class at a different level.


## Time and Place (Summer Term 2020) {.alert .alert-warning}

This will be an **all-remote**, largely **asynchronous** seminar, held via [Gitter chat](https://gitter.im/soztag/fossos), [GitHub](http://github.com/soztag/fossos) and occasional [Zoom](http://zoom.us) video conference.


### Preparatory Meeting

**Thursday, April 30th, 2020 15:00-17:00** (in a video conference, see below).


### Asynchronous Collaboration

Throughout the semester, students can work through the material at their own pace and schedule.
Support from the instructor and fellow students is available on the [Gitter chat](https://gitter.im/soztag/fossos).

Depending on needs, short video conferences will be held for selected topics.


### Digital Venus

We're going to use a few digital tools to work asynchronously.

- Static information will be at https://datascience.phil.fau.de/fossos/, the **class website**.
  You can find all the resources (~ readings) and software links on https://datascience.phil.fau.de/fossos/stack.html.
- Pretty much all *individual* activity (i.e. to be done by one or a few students) is tracked as issues on our class repository **issue tracker** at https://github.com/soztag/fossos/issues.
  If you have a question, have an idea to work on, or are looking for inspiration for a task, this is your place.
  Issues are organised using labels and assignees.
  Milestones are currently not in use.
- A (currently relevant) subset of these issues are also listed on our **Kanban board** at https://github.com/soztag/fossos/projects/2.
  This board gives you an overview what everyone is busy with at any given point.
  You can move your "own" issues around the board as appropriate, and you can also add issues that you want to see addressed.
- There is a [Gitter chat](https://gitter.im/soztag/fossos) that students can use throughout the semester to get support from the instructor and fellow students.
  If you have your own repo for your own project (advanced students) you can open your own gitter chat and invite the instructor or fellow students for support.

These venues are also linked from the top bar of the class website, so you can always easily find them.


## Language requirements

Depending on who will be attending the class, instruction may also occur in english or german.
In any event, all of the readings and other course material are in english, and participants are expected to be proficient in reading and writing english technical documents.


## A Multi-Semester Series {.alert .alert-info}

It is obviously impossible (for most students) to cover all of the material in this course in *one* semester.

This course (with a slightly different name) will therefore be taught *every semester*, in a non-consecutive series.
Students can join the class every semester, and take the class for however many semesters they wish (if they still have new things to learn).
Do not be confused by the name this class takes in some semester (say, "Advanced R ...") -- you can still join as a beginner.
Depending on the listing (see below) students can also take this class for credit *multiple times*.

By implication, the group of students in the class in any *given* semester will be *heterogeneous*, working at different levels.
For example, some students may already have taken a course in the series previously, while others are just starting out.
Because the previous experiences and learning speed of students vary greatly anyway, this is not a significant (additional) hindrance.
Tasks, expectations and material covered will accordingly differ for each student, depending on the background.


## Credits and Listings

You can generally take this class as an undergraduate (Bachelor) lower-divison seminar (**Proseminar**) worth 5 ECTS points, or an upper-division seminar (**Hauptseminar**) worth 7.5 ECTS points.
The workload will be adjusted accordingly.

Depending on your major, you may also take the class to fulfill requirements for a *Masters* program.
Please be in touch to discuss the details.

This class was/is listed as:

- 2018/2019 Winter Term: "Open Source Werkzeuge für die wissenschaftliche Datenverarbeitung" (the original *FOSSOS*), crosslisted in the following modules:
    - Bachelor Sociology
        - Sociological Methods (Module `SOZ M`, [Soziologische Methodenlehre](https://www.soziologie.phil.fau.de/institut/arbeitsbereiche/methoden-der-empirischen-sozialforschung/))
        - Labor and Organisation (Module `Soz Qf4`, [Arbeit und Organisation](https://www.soziologie.phil.fau.de/institut/arbeitsbereiche/arbeit-und-organisation/))
    - Bachelor Digital Humanities and Social Sciences ("BA Zweitfach")
        - Elective (Wahlpflichtbereich FPO 2018)
        - Elective (Wahlpflichtbereich FPO 2016)
- 2019 Summer Term: "Advanced R and Open Social Data Science"
    - Bachelor Sociology
        - Sociological Methods (Module `SOZ M1`, `SOZ M2` [Soziologische Methodenlehre](https://www.soziologie.phil.fau.de/institut/arbeitsbereiche/methoden-der-empirischen-sozialforschung/))
    - Bachelor Digital Humanities and Social Sciences ("BA Zweitfach")
        - Elective (Wahlpflichtbereich FPO 2018)
        - Elective (Wahlpflichtbereich FPO 2016)
    - "Soft Skills" (Schlüsselqualifikationen)
- 2019/2020 Winter Term: "Open Source Software for the Humanities and Social Sciences", crosslisted in:
    - Bachelor Sociology
        - Sociological Methods (Module `SOZ M`, [Soziologische Methodenlehre](https://www.soziologie.phil.fau.de/institut/arbeitsbereiche/methoden-der-empirischen-sozialforschung/))
    - "Soft Skills" (Schlüsselqualifikationen)
    - Bachelor Digital Humanities and Social Sciences ("BA Zweitfach")
        - Elective (Wahlpflichtbereich FPO 2018)
        - Elective (Wahlpflichtbereich FPO 2016)
- 2020 Summer Term:  "Software Carpentry -- Hacking Skills for Data Science"


## Related Classes

[Daniel Lemmer](https://www.pol.phil.fau.eu/person/daniel-lemmer/) is (again) offering an [introduction to R](https://univis.uni-erlangen.de/form?__s=2&dsc=anew/lecture_view&lvs=phil/dsp/isoz/zentr/einfhr_3&anonymous=1&founds=phil/dsp/ipowi/zentr/argent,/spanie,/wahlpa,///isoz/zentr/einfhr_3&sem=2019w&__e=183) (in german) as a seminar in the winter term 2019/2020.
Daniel's introduction to R is a great complement to *FOSSOS*, though it is *not* a prerequisite (and the same holds vice-versa).
His introduction is focused on running common statistical analyses in R.
*FOSSOS* is focused on open source tooling *around* R, R as a data science glue language and more advanced R.

If you have *not* taken (or will not) Daniel's (or another) introduction to R, you will probably spend your time in *FOSSOS* learning the broader open source tooling (["Software Carpentry"](https://datascience.phil.fau.de/fossos/stack.html#software_carpentry)) around R, which is still plenty of exciting material to keep you busy for a semester.

Daniel is *also* kindly hosting an open working group to learn statistics from first principles.
Contact [Daniel Lemmer](https://www.pol.phil.fau.eu/person/daniel-lemmer/) if you'd like to attend.


## Course Description

Digitisation has created both new challenges and yet unrealised potentials for empirical social sciences.
Larger, and often streamed datasets require more programmatic and dynamic statistical analyses.
Existing commercial programs with graphical user interfaces (GUIs) are expensive, and analyses can easily become intransparent, sometimes contributing to a crisis of reproducibility in the social sciences and beyond [e.g., @MairThouShaltBe2016] or even propagating outright bugs [e.g., @ReinhartGrowthTimeDebt2010].

Happily, the open source community has already pioneered a set of technologies and conventions for their software development efforts that have proven useful in solving these problems in many academic fields.
Additionally, open source software offers new ways to analyse and visualize data, as well as to present interactive results.

Together, these tools promise a radically open and participatory approach to science, and productive yet skeptical use of emerging data streams.

Unfortunately, learning these tools takes more time than is usually available until any given project deadline.

The goal of this series of seminars is therefore to train participants in a coherent set of leading tools and best practices, including:

- Software Carpentry
    - Open source issue trackers to manage projects and their learning.
    - Using leading community resources and services to troubleshoot issues.
    - Writing text in a lightweight markup language (markdown).
    - The world of UNIX-style command-line interface (CLI) programs ...
    - ... and package managers, such as Homebrew or APT.
    - Establishing an efficient plain-text workflow using editors and an Integrated Development Environment (IDE), including Atom and RStudio.
    - Source control management (SCM) and massively collaborative development using Git and GitHub.
    - Separating content and presentation using plain-text formats for technical and scientific writing, including LaTeX, Pandoc Markdown and RMarkdown and rendering results in a variety of formats (Word, HTML, PDF).
- Introductory R
    - Introduction to "base" R.
    - Literate programming in R.
- Intermediate R
    - Importing, transforming and modeling data using tools from the R tidyverse ecosystem.
    - Visualising data using ggplot2.
- Interactive R
    - Interactive visualisations using leading JavaScript libraries (via plotly, htmlwidgets).
    - Web dashboards using flexdashboard.
    - Interactive webapps using shiny.
- Advanced R
    - Types, functional programming, object oriented programming (only S3), metaprogramming and techniques, all following Hadley Wickham's [Advanced R](https://adv-r.hadley.nz)
- Cloud Computing
    - Offloading computationally intensive, or regularly automated tasks to cloud services.
    - Using containerisation (docker).
    - Applying continuous integration and deployment (CI/CD) tools such as Travis CI.
- Reproducible Research
    - Improving code quality by applying assertions using checkmate.
    - Storing datasets in public repositories such as the Harvard dataverse.
    - Releasing, publishing and indexing finished research using GitHub releases and zenodo.
    - Other tools and practices for open and reproducible science.
    - Strenghening reproducibility and portability by using dependency management (packrat) and containerisation (docker).
- Package Development
    - Including documentation (roxygen2), defensive programming (checkmate), testing (testthat) and more best practices, all following Hadley Wickham's [R packages](http://r-pkgs.had.co.nz).

Towards the end of each of the seminars, participants will be able to use (parts of) this toolchain to work on their own projects, or to contribute to existing free and open source software.

```{r venn, fig.cap="The [Data Science Venn Diagram](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) by Drew Conway (2010)", out.width='100%'}
knitr::include_graphics(path = "img/Data_Science_VD.png")
```

This course will *not* focus on math and statistics knowledge or substantive domain expertise, though both are essential for solid data science work.
Rather, the emphasis is on what Drew Conway loosely called *hacking skills* in his [Data Science Venn Diagram](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram), that is, simply getting these tools to work together, to learn how to troubleshoot them, and -- aspirationally -- to absorb some best practices of open source development.

While the course is *not* a proper computer science class, it should also be valuable to students with coding experience or a CS background who may be interested in the tooling and practices covered.

We will not cover the scaling and efficiency issues of proper “Big Data”, but confine ourselves to in-memory problems.
We also limit ourselves to the R ecosystem, though some tools and problems will be similar for other scripting languages such as python.

An introduction to data science and open source may well open up new job opportunities, or serve as a first stepping stone to a career in tech, but that is arguably not the only reason why social scientists should be excited about it.
Instead, to learn the way of open source is perhaps to update the ideals of the scientific process for the modern day:
radical openness and rigorous reproducibility, maximal inclusivity and promised meritocracy, generous sharing and personal attribution.
Open source may also be a worthwhile exercise in participant observation for social scientists:
here is a real, if surely flawed utopia, massively coordinating individuals that is *neither* market nor state.

Less loftily, but not least, the seminar also promises a starter dose of gratification from having built something that actually works, and is of some immediate use to our fellow human -- a good feeling sometimes hard to come by in the social sciences.


## Philosophy

This course is a little different from most seminars.

Teaching teaching R (and the broader ecosystem) at FAU sociology (as most other smaller, non-tech focused institutions) faces a couple of important constraints:

- Participants will have vastly different levels of previous experience, and will learn at different speeds.
- Given the relatively small number of interested students and complicated timetables, strictly consecutive seminars are difficult to organize.
  Too few students would ever meet the requirements (and schedule) to attend the advanced seminars.
- There is already plenty of high quality teaching material out there, and there is little point in re-inventing (an inferior) wheel.

To meet these constraints, this course will be held as a **non-consecutive multi-semester series of seminars**, and will, for the most part, operate on a **flipped classroom model**.


## Flipped Classroom

Because students will learn at different speeds, and from different starting points -- among other reasons -- teacher-centered teaching will be minimal in this class.

Instead, students will study the assigned material outside of class, including online documents, videos and interactive learning applications.

As they encounter problems, or develop own (small) projects, students will track such work on the issue tracker used in class.
In class, students will work on their own problems or projects, in small groups and assisted by the instructor as necessary.

This class does *not* offer a one-size-fits-all set of pre-defined materials and assignments necessary for successful participation.
What the class offers is:

- A carefully curated list of external learning resources, organised in a (somewhat) linear syllabus.
- A social setting (the class settings) and electronic fora (github repo) to keep organised, motivated and to help one another.
- Guidance and assistance by the instructor for each *individual* student.


## Expectations

Happily, there are a *lot* of great resources for learning data science tools out there, many of them free, some of them even open source themselves.
We will be reusing a lot of these resources, and I (the instructor) do not have to reinvent an (inferior) wheel.
There is no *one* curriculum that's quite right for us, so I have cobbled together material from different sources.

All resources are listed, in roughly advisable chronological order, along with the [stack](/stack).

<div class="alert alert-warning role="alert">
<b>Resources</b> listed in the <a href="/stack">stack</a> are <em>mandatory reading</em>.
<b>Additional Resources</b> listed in the <a href="/stack">stack</a> are <em>recommended or optional reading</em>.
</div>

The good news is that there are no academic papers or books for this class and everything students need is available online.
There is, however, still a lot of material to work through (to the tune of hours per week), though it is written in a hopefully more accessible style than many academic documents.
The listed resources are guaranteed to cover everything you need to use the software, often including tutorials, videos and exercises.
Students are not limited to the listed resources; they can also choose their own material, so as long as it covers roughly the same ground.
In fact, students are encouraged to share good additional resources with the rest of the class.

There is a lot of duplicate content between the alternative resources listed.
Students should browse *each* of the resources, and then work in-depth through whichever they find most suitable.

<div class="alert alert-warning role="alert">
First-time students of <em>FOSSOS</em> are expected to work through (not just read) all the material listed in the <a href="https://datascience.phil.fau.de/fossos/stack.html#introduction">Introduction</a> and <a href="https://datascience.phil.fau.de/fossos/stack.html#software_carpentry">Software Carpentry</a> sections.

Repeat participants in the seminar who have mastered this material can advance to any of the other sections according to their interests and should prepare accordingly.
</div>

Whenever your run into a problem, or have a question, raise an issue on our [https://github.com/soztag/fossos/issues](github issue tracker).
Please also make sure that:

- the issue does not *already exist* (always *search first!*)
- the issue is properly *labelled* (so we can all navigate through the issues)
- the issue is *answerable*, *actionable* and *closable*.
  Good issues are framed in such a way that they *can* be closed.


## Schedule

Because students will learn at different speeds, and from different starting points, there is not *a* schedule for the class.
The [stack](/stack) lists the tools (and resources) in the rough order in which they should be studied.

Students can work through this material at their own pace.
Likewise, some students may wish to cover a lot of breadth (at shallow depth), while others want to dig in on a particular topic.
This is all fine, but students should ensure that they learn *something* at a *useful* level to solve real-world problems, as will also be required for the assessment.
If in doubt, ask the instructor for guidance.

Every student should first become competent in the practices and tools covered in ["Software Carpentry"](https://datascience.phil.fau.de/fossos/stack.html#software_carpentry); these are required for all later topics.

As a loose guide, *every* student should cover at least *one* top-level heading ("Interactive R", "Intermediate R", etc.) per semester.

There are often several heavily overlapping resources recommended for a tool; students should study whichever best suits their taste.
It's a good idea to browse through all of the resources to make sure you don't miss anything.


## Assessment

Assessments are an unfortunate, tedious and arguably needless part of teaching -- but here we are, so we are going to make the best of it.

Instead of some *make belief* work or hobby project, assignments in this class are, for the most part, designed to be *actually useful* to other people.
This can be motivating, but it also means that other people are relying upon our work:
it has to be delivered by the time, and in the quality expected.

You can work on pretty much anything you like -- improving this very class (and its repo), some existing project that you like or even your own new (or existing) project.
The only conditions are:

1. The work needs to be related to the tools and practices covered in class.
2. The work needs to be on GitHub or otherwise transparent.
3. The instructor needs to be able to assess the quality of the work, and advise you in your work.
    This unfortunately rules out any projects not using the technologies covered in this class.

We will begin with relatively easy, small tasks to serve other students in class, then address smaller issues with resources for the broader community, and eventually, fixing "real" bugs or enhancing functionality of open source data science software.

All tasks, big and small, are listed and tracked on the [class github repository issue tracker](https://github.com/soztag/fossos).
Students should assign themselves to tasks they will be working on, and report / link to any progress on these tasks in the issue thread.


### Pass/Fail

**All students**, including those who **just want a "Sitzschein" (pass/fail option)** must contribute to a number of issues labelled as [`pass/fail`](https://github.com/soztag/fossos/labels/pass%2Ffail).
These are issues that are smaller in scale and scope.

There is no straightfoward minimum metric (say, number of closed issues) to pass the class.
Instead, students should display substantial contributions across a range of helpful activities, as recorded in the issue tracker.

Before working on these issues, students should *assign themselves*, to avoid us doing duplicate work.


### Graded

Students who want to receive a grade on the class also have to complete a couple of issues tagged with `graded-x`.

The numbers next to the labels roughly indicate the **estimated workload and difficulty** of a task (also known as "story points" in agile development).
Estimates are frequently wrong, and these points can be adjusted in consultation with the instructor, if some task turns out to be much harder or easier than expected.
These story points correspond to ECTS credit points; if you are taking this as a "Proseminar", you will need to have owned and closed issues worth 5 story points.
If you are taking this as a "Hauptseminar", you will need to have owned and closed 7.5 story points worth of issues.

You will be graded based on how well you have adhered to the best practices and tooling covered in class, as well as (if applicable) the guidelines and standards of the external project (some other repo) or platform (Stack Overflow)

There are **different *kinds* of graded issues**:


#### Reproducible Example

Labels:

- `community.rstudio`, `stack-overflow` or `bug report`,
- and `reprex`, and `question` respectively.

Though it may also benefit yourself, a well-formulated question or bug report with a reproducible example can also serve the community.
This is what we're aiming for here.

A well-formulated question, in the context of open source development is often a reproducible example, or *reprex*, for short.
This means that you should provide a code snippet (or, if not applicable, a very precise description of steps) that will *allow any other user to reproduce the behavior in question, with no additional resources*.
Producing this can be harder than it sounds, and just narrowing down a problem like that may often help you solve it.

Make sure to read and adhere to all the resources listed [community and help](https://www.maxheld.de/fossos/stack.html#community__help).

The three target platforms can be listed roughly in ascending order of precision of the question:

1. http://community.rstudio.com:
  Open to *relatively* open/vague questions, though you are absolutely expected to do your own research.
2. http://stackoverflow.com:
  Questions should be very precise and reproducible, and be *definitively answerable*.
  Not good for opiniated stuff.
  Consider the resources listed under [community and help](/stack.html#help).
3. Bug report:
  *If* you're absolutely sure that you have run into a bug, then it can be a good idea to raise it on the repository in question.
  For most things, you should raise it on S-O or community.rstudio first, to be sure that it really *is* a bug.

Here, as with all things open source, we must ensure that other people's time is well-spent engaging our question (or bug report).
To ensure that, please follow this procedure:

```{r reprex, fig.cap="Sequence Chart for a Reprex"}
DiagrammeR::mermaid(diagram = "reprex.mmd", height = 1200, width = 800)
```


#### Answer on S-O or community.rstudio

Labels:

- `community.rstudio`, `stack-overflow`,
-  `reprex`, and `answer` respectively.

Same process as for the above.


#### External Contribution

Labels: `external documentation`, `external software`.

These are improvements to *external* repos (typically also on GitHub), either other software (typically R repositories) or documentation and learning resources (typically those covered in class).
The actual work (forking, raising a pull request, etc.) consequently occurs in the external target repository, and this activity is merely *tracked* in a placeholder issue in the class repository.
Simply link to any relevant issues, commits or pull requests on the target repo in a placeholder issue.

This sounds quite challening, but it can be quite doable, especially if you're starting by improving the documentation.

To start contributing to open source, you might also find these resources helpful:

- code.likeagirl.io:
  [How to find a newcomer-friendly open source project](https://code.likeagirl.io/the-new-developers-guide-to-open-source-228ca257dd68)
- Look for open issues on projects that you like, labelled as "needs help", "good first issue" or similar.
  (Some maintainers will especially highlight starter issues.)

For contributions to external documentation or software, it is very important that we do not burden the respective maintainers with sub-par work.
To ensure that we deliver high quality work, you **must follow the following procedure**:

```{r external, fig.cap="Sequence Chart for an External Contribution"}
DiagrammeR::mermaid(diagram = "external.mmd", height = 1800, width = 800)
```

Grading criteria are listed for each of the issues.
Generally, a good grade will require following the practices and standards appropriate for the type of contribution in question, and students will need to demonstrate adequate command of the toolchain covered in class.
For an excellent grade, students will need to go (a bit) beyond the covered material, and work on an especially pressing or complicated problem.


#### Own Project

As an alternative to this (graded) assessment, if students already have some prior knowledge and a ready project they wish to work on, this can also be arranged.
Students should contact the instructor, and also track their progress on their *own* project in a placeholder issue on the fossos issue tracker.


### Grading Rubric

The graded tasks (see above) will be graded using the below rubrics.
The grading rubric is taken from the [University of British Columbia Master of Data Science program](http://ubc-mds.github.io) ([CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/us/)).


```{r grading-rubric, echo=FALSE}
accuracy <- c(
  Poor = c(
    "Code fails to run, doesn't have clear output, or performs the wrong task."
  ),
  Unsatisfactory = c(
    "Code performs only some of the correct tasks, the output is not easily understandable and the methods used to achieve the result are inefficient if performance is a concern."
  ),
  Satisfactory = c(
    "Code performs most of the correct tasks, the output is understandable, however the methods used to achieve the result are inefficient if performance is a concern."
  ),
  Good = c(
    "Code performs the correct tasks, the output is reasonably easy to understand, however the methods used to achieve the result are not the most efficient if performance is a concern."
  ),
  Excellent = glue::glue(
    "Code runs correctly without crashing, the output is very clear, and the intended or suitably correct methods are employed to achieve the correct result.",
    "Student has chosen the most efficient algorithm reasonable if performance is a concern.",
    .sep = " "
  )
)

mechanics <- c(
  Poor = glue::glue(
    "Evaluator was unable to run/open/read assignment submission despite best efforts.",
    "This may be because the student forgot to include certain files in the submission or tailored the software to only work on their local machine e.g. the code only works when run from a certain directory on the student's machine, contains paths to files only on the student's machine, etc., or they did not submit their assignment correctly or completely, or it was unclear where the relevant parts of the assignment are included in the submission.",
    .sep = " "
  ),
  Unsatisfactory = c(
    "Evaluator had to spend some time to get the raw submission to work correctly"
  ),
  Satisfactory = c(
    "Evaluator had to make an obvious, small, quick fix to get things working or the wrong file format was submitted"
  ),
  Good = c(
    "The submission is self-contained and works flawlessly; it just works in anybody's hands."
  ),
  Excellent = glue::glue(
    "The student did not forget to include all the files in the submission.",
    "Any necessary libraries to install are either included or are installed by a script, or are made obvious that that the evaluator must install them.",
    "Student used the asked for file format.",
    "All assignment instructions were followed.",
    "All files were put in a repository, in a reasonable place, with reasonable names; any source files .tex, .Rmd are rendered to a readable output format e.g. .pdf, all figures are included, there is a README file indicating where to find the different aspects of the assignment, etc.",
    .sep = " "
  )
)

code_quality <- c(
  Poor = glue::glue(
    "Code is difficult to read and understand due to many issues that affects readability.",
    "Code is also poorly organized.",
    .sep = " "
  ),
  Unsatisfactory = c(
    "Code is generally easy to read and understand with few non-reoccurring issues and at most two reoccurring issue that affects readability."
  ),
  Satisfactory = c(
    "Code is generally easy to read and understand with few non-reoccurring issues and at most one reoccurring issue that affects readability"
  ),
  Good = c(
    "Code is easy to read and understand with only 1-2 minor and non-reoccurring issues that affect readability."
  ),
  Excellent = glue::glue(
    "Code is exceptionally easy to read and understand.",
    "For example, variable names are clear, an appropriate amount of whitespace is used to maximize visibility, tabs and spaces are not mixed for indentation, sufficient comments are given.",
    "Any coding sections of the assignment that were not completed have documentation explaining what a coded solution would look like.",
    "Overall, the code is extremely well organized and documented.",
    .sep = " "
  )
)

robustness <- c(
  Poor = c(
    "Multiple issues with code repetition exist, and several tests are absent and/or are of poor efficacy"
  ),
  Unsatisfactory = c(
    "Some form of re-occuring code repetition exists, or tests efficacy is poor."
  ),
  Satisfactory = c(
    "Some form of re-occuring code repetition exists, or tests efficacy is poor."
  ),
  Good = c(
    "Code repetition is mostly minimized and effective tests are present for most functions."
  ),
  Excellent = glue::glue(
    "Code repetition is minimized via the use of loops/mapping functions, functions or classes or scripts/files as needed without becoming overly complicated.",
    "Functions are short, concise, and cohesive without losing clarity; code can be easily modified.",
    "Tests are present to ensure functions work as expected.",
    "Exceptions are caught and thrown if necessary, pnce students have learned about exceptions.",
    .sep = " "
  )
)

rubric <- dplyr::bind_rows(
  `Accuracy 25%` = accuracy,
  `Code Quality 25%` = code_quality,
  `Mechanics 25%` = mechanics,
  `Robustness 25%` = robustness,
  .id = "Dimension"
)
rubric
```


## Technical Requirements {#reqs}

Unfortunately, FAU has no computer lab facilities suitable for teaching this class and participants will have to **bring their own computers**.
This has the advantage that students will learn to set up their own development environments, but adds some unwelcome complexity (different OSes, etc.).

The class will assist students in installing software on their devices, but **students are responsible for maintaining their computers**.
In particular, student laptops must:

- have a reasonably current *desktop* operating system (MacOS >= 10.13, Microsoft Windows >= Vista, Linux),
- have a current version of a web browser installed,
- *not* be virus-infested or in some other borked-up state,
- *not* be a mobile device (iOS or Android won't work!) (unless you can SSH into a Linux box or something),
- and have ready access to one of the WiFi networks at FAU: `FAU-STUD`, `eduroam` or `FAU.fm`.
  (If you need help setting up your WiFi, consult the RRZE Website.)

Emphatically, none of this requires a new, powerful or expensive device, let alone software.
You can get a used laptop with / ready for Linux Ubuntu on EBay for well under €100 (if you buy a used computer, make sure that the hardware has good Linux support).
With some [tweaking](https://leanpub.com/universities/courses/jhu/cbds-chromebook), you can even use an inexpensive (`x86`) Google Chromebook (which runs on Linux).
For more information, see [stack](/stack.html#moving_to_linux).

If you are facing financial difficulties in obtaining a laptop for the class, please contact the instructor.
We'll figure something out for you.


### Operating System Maintenance {.alert .alert-warning}

It is *your* responsibility to maintain your own computer and operating system (OS), as well as to figure out how to install the below software on your machine (though we will all help one another within reason).


### In-Browser Development

For a ready-made development environment, you can use the  RStudio IDE (integrated development environment) *inside your web browser*.
RStudio is best for R development, but has decent support for other languages and includes access to a terminal and version control.

Using RStudio in the browser means that all the software you're using won't ever *really* be installed on your system, but only exist in a virtual image or online service.
If you want to do serious development work or are facing edge cases, you may require a "real" installation on your client (see instructions in [stack](/stack.html#moving_to_linux)).
However, in-browser development is a great way to have a standardized environment ready quickly.

You can run the RStudio IDE in your webbrowser in two ways:


#### `rocker/verse` Docker Image (Recommended)

Docker is an open-source industry standard to define, provision and share computing environments, known as *containers*.
Containers allow you to run computing environments on other computers.
Containers are similar to virtual machines (a computer inside a computer), but slimmer and generally neater.

A lot of the software you need to run in this class is included in the `rocker/verse` image published by the [Rocker Project](https://rocker-project.org).
For a list of things you *still* need to install "locally", consult the [stack](/stack.html).

For installation instructions, see [here](/stack.html#docker)).

Unfortunately, Docker has some [system requirement](https://docs.docker.com/docker-for-windows/install/) that many Windows versions do not meet.


#### Cloud Alternative (Not Recommended)

As a backup plan to using Docker on your own own operating system, you may use [RStudio Cloud](https://rstudio.cloud), a data science Software-as-a-Service (SaaS).
RStudio Cloud furnishes you with a ready RStudio session in a Docker image similar to `rocker/verse` with all necessary system dependencies.

RStudio Cloud is still in *alpha* and may not be always reliable.
Once out of alpha, it may also be a paid service, for which you may have to pay yourself.

Full disclosure: the instructor has worked for RStudio PBC.

You are strongly encouraged to invest the time and effort to set up and maintain a development environment on your own computer.

Otherwise:

<a class="btn btn-primary" href="https://rstudio.cloud" role="button">Sign up to RStudio Cloud</a>

<div class="alert alert-warning role="alert">
It's best to sign up with your GitHub account, but this <em>does not</em> give your RStudio Cloud instance read or write privileges to your repos.
Remember to also configure <a href="https://maurolepore.github.io/cloudgithub/">RStudio Cloud with your git credentials</a>.
</div>

You should also study the [RStudio Cloud guide](https://rstudio.cloud/learn/guide).


### Linux

<a class="btn btn-info" href="linux.html" role="button">Learn More</a>

If you want to install the programs used in this class on your system, rather than use them through a (Docker) container, you may find it easier to do that on Unix-compatible operating systems, including macOS and Linux.
Getting Windows to play nicely with open source software can be harder, and some convenient system utilities (such as a package manager) are often missing.
It *is* technically possible to use most, if not all, of the tools above on Windows, but they may behave slightly differently, and supporting them may be more involved.

If you are using a Windows machine, you may consider the following alternatives to get a more Unix-compatible operating system, roughly ranked from easiest to most involved:

1. Replace your existing operating system with, say, [Ubuntu](https://tutorials.ubuntu.com/tutorial/tutorial-install-ubuntu-desktop#0), a frequently used Linux distribution.
  Before you do this, make sure that your hardware has good Linux support.
  This would also delete all of your data and applications, and you might have to choose and use new replacement applications.
2. Same as 1, but with a [dual boot setup](https://opensource.com/article/18/5/dual-boot-linux).
  This way, you can retain both your old operating system, and a new Linux install.
  However, you always have to restart to switch between the two systems.
3. Same as 2, but in a [virtual machine](https://itsfoss.com/install-linux-in-virtualbox/) which can run alongside and *inside* your Windows install.
  ([Here](https://www.lifewire.com/install-ubuntu-linux-windows-10-steps-2202108) are alternative instructions).
  Apparently, if your computer and Windows 10 version support it, there is also now a fancier/more efficient way to do this via [Hyper-V](https://www.windowscentral.com/how-run-linux-distros-windows-10-using-hyper-v).
  Carries some performance penalty.
4. [Install the Windows Subsystem for Linux (WSL)](https://docs.microsoft.com/en-us/windows/wsl/install-win10).
  This solution is available only for recent versions of Windows 10.
  It seems pretty elegant, but has some limitations (no GUIs) and may be quite involved.
5. Buy an x86 Chromebook and use [crouton](https://github.com/dnschneid/crouton) or (better, but still in beta?) [crostini](https://www.zdnet.com/article/how-to-add-linux-to-your-chromebook/) to run Linux on your Chromebook.
6. Rent a virtual machine (VM, same as 3), but on a rented cloud host.
  You can access everything through a browser, but there is a (small) fee, depending on your setup.

There is no guarantee that any of these alternatives or links will work for you; you will have to research them on your own.


## Contributors

A big **Thank You** to all contributors (in alphabetical order by username):

```{r contribs, echo=FALSE, results='asis', message=FALSE}
# wrappers are necessary to shut up chatty function
# function apparently chats via cat, which cannot be disabled in chunk options
invisible(capture.output(
  contribs <- usethis::use_tidy_thanks()
))
cat(paste0(" [",contribs,"]","(https://www.github.com/",contribs,"), "))
```

## References