Skip to content

Commit

Permalink
deploy: 9f0eb53
Browse files Browse the repository at this point in the history
  • Loading branch information
roualdes committed Jan 10, 2024
1 parent 1522320 commit e4fe799
Show file tree
Hide file tree
Showing 21 changed files with 377 additions and 337 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
133 changes: 130 additions & 3 deletions _sources/and-beyond.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,133 @@ kernelspec:

## Python w/out Google Colab

* [JupyterLab Desktop](https://github.com/jupyterlab/jupyterlab-desktop)
* [JupyterLab](https://jupyterlab.readthedocs.io/en/latest/)
* [Virtual environment](https://docs.python.org/3/library/venv.html) -> Emacs/Vim/VS Code/other
* <a href="https://github.com/jupyterlab/jupyterlab-desktop" target="_blank">JupyterLab Desktop</a>
* <a href="https://jupyterlab.readthedocs.io/en/latest/" target="_blank">JupyterLab</a>
* <a href="https://docs.python.org/3/library/venv.html">Virtual environment -> Emacs/Vim/VS Code/other</a>

## Creating you own dataset

When you get to the point that you start creating your own small to
medium sized datasets, then this section is for you. This section
explains some general advice surrounding creating a dataset.

Entering data into a spreadsheet is easy. And that's good. But there
are some gotchas that you should avoid. Below you'll find lists of
the dos and then explanations, and the don'ts and explanations, for
creating your own datasets.

### DOs

* be consistent
* use simple variable names
* prefer all lower case letters
* minimize numbers and special characters
* use underscore `_` instead of space ` `
* organize files within directories

**Be consistent**. When programming, having to repeated look back at your
spreadsheet to figure out your variable names is beyond annoying. It is beyond
annoying because it interrupts your programming. Programming is hard enough,
try to minimize inconsistencies that can otherwise be settled by being
consistent.

**Use simple variable names**. Consider two variables you might want
to name with multiple words, like miles per gallon and brain to body
weight ratio. It is easy to name one variable using camel case,
e.g. `MilesPerGallon`, and another capitalized,
e.g. `Brain(g)bodyWeight(KG)`. The first name is fine, so long as you
are consistent and choose camel case for all of your variable names.
The second variable name is both not simple and inconsistent. Camel
case would have you capitalize each new work, as in `BrainBodyWight`.
In this case, even the units are not capitalized the same. This is a
recipe for frustration. Also see below, *don't put units in variabl
names*.

It is recommended to make yourself a simple rule, like *prefer all
lowercase letters*. Maybe that's not the rule for you, but don't get
caught up on the rule. The rule itself doesn't matter. Just be
simple and consistently so.

My go to rule is all lowercase letters, no numbers or special
characters other than `_`, and to separate words when there are
contiguous repeated letters, `ee` or `ss`, and otherwise don't
separate words. The separator I prefer is underscore `_` instead of
space ` `, which is mostly a carry over rule from programming in R.
Remember, the rule matters less than consistency with the rule.

**Organize files within directories**. When editing files, it is
tempting to write metadata into the file name. For instance, it is
unfortunately common for people to write file names such as
`draft_manuscript.docx`, `draft2_manuscript.docx`,
`draft3_manuscript.docx`, `final_manuscript.docx`,
`final_final_manuscript.docx`. File names are not intended to carry
the metadata associated with draft versions.

If you really need to maintain copies of drafts, and I guess you most
often do not need such copies, then you should create directories such as
`draft` and `final`. Each directory should contain a (singular) copy
of the files you absolutely need with each and every copy of the file.
Any files, such as data, that are the same for all copies of the file
should have their own directory. It might help future you to put a
separate notes file in each directory that reminds you of exact
purpose of the directory.

### DON'Ts

* don't start a variable name with a number
* don't use special character in variable names
* don't put units in variable names
* don't use abbreviations
* don't organize through file names
* don't put dates in your file names
* don't have multiple copies of your data

**Don't start a variable name with a number**. In most programming
languages, you can't start a variable name with a number. So it's
easiest to just avoid putting numbers in variable names altogether.
Occassionaly, it makes sense to use a number in a variable name. Just
don't start your variable name with a number.

**Don't use special characters in variable names**. This rule is much
like the rule above. In my experience, special characters,
e.g. `~!@#$%^&*()+=,<>/|\`, only make remembering a variable name more
difficult. The only special character that you should allow, when
necessary, in your variable names is underscore `_`. See **Use simple
variable names** above.

**Don't put units in variable names**. Units in variable names just
open the door for inconsistent variable names. It is easiest to just
avoid putting units or other metadata into variable names. Your data
should instead have a separate file of all the associated metadata.

**Don't use abbreviations**. Abbreviated variable names are
attractive, because they save typing. For instance, one could imagine
abbreviating micrograms as `ug`, `mg`, or `μg`. This creates
opportunity for misremembering and inconsistency. Such abbreviations
in variable names also breaks the rule **Don't put units in variable
names**. Further, see **Use simple variable names above** above.
Instead, put such metadata in a separate file.

**Don't organize through file names**. The only metadata a file name
should contain is the name of the file. Instead, use directories to
organize your files. See **Organize files within directories** above.

**Don't put dates in your file names**. Dates are metadata, see
**Don't organize through file names** above.

**Don't have multiple copies of your data**. Generally, you should
only have one copy of your dataset. See **Don't put dates in your
file names** above. If there are necessary edits to your data for a
specific analysis, then you should program those edits in Python code
and save that code for future re-use. This way you can re-create data
changes as necessary, and you minimize introducing permanent errors
into your dataset.

### tidy data

The most complete reference containing the advice above, and more, is
from Hadley Wickham's paper <a
href="https://vita.had.co.nz/papers/tidy-data.pdf"
target="_blank">Tidy Data (pdf)</a>. The paper lays out a framework
with the goal of making it easier to clean up (tidy) data, so that
subsequent analysis is easier.
6 changes: 3 additions & 3 deletions _sources/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,6 @@ kernelspec:
# Data

Dr. Robin Donatello hosts a number of datasets on her website:
<https://www.norcalbiostat.com/data/>. You can use any of these
datasets for practice or for the Exploratory Data Analysis Project
which concludes MATH 131.
<https://www.norcalbiostat.com/data/>. Consider using the datasets
`Email Spam`, `HIV`, `Depression`, or `Police Shootings` for the
Exploratory Data Analysis Project which concludes MATH 131.
26 changes: 15 additions & 11 deletions _sources/week-00.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,13 +44,15 @@ [email protected] account).

## Google Colab

[Google Colab](https://colab.research.google.com) provides a notebook
environment where the user can develop a reproducible document that blends text
and code together. Such reproducible documents are popular in the world of data
science, statistics, machine learning, and the various applied sciences that use
programming. By combining text and code, you can walk (via text) your audience
through an analysis (usually via code and/or math), showing the exact code you
used to draw any conclusions about the data or otherwise.
<a href="https://colab.research.google.com" target="_blank">Google
Colab</a> provides a notebook environment where the user can develop a
reproducible document that blends text and code together. Such
reproducible documents are popular in the world of data science,
statistics, machine learning, and the various applied sciences that
use programming. By combining text and code, you can walk (via text)
your audience through an analysis (usually via code and/or math),
showing the exact code you used to draw any conclusions about the data
or otherwise.

We will use Google Colab for free, as part of your campus Google
account [email protected]. The free aspect means we'll have
Expand All @@ -61,8 +63,10 @@ install Python on your personal machine, because I believe we can get
started faster this way. If you want to follow along with this course
using different tools, and you understand the consequences you face
for doing so, please see your options on the page [Week 06 and
beyond][./and-beyond.md].
beyond](./and-beyond.md).

From here, there's really no better way to learn about Google Colab than to go
touch it. Here's a link to [the Colab notebook associated with Week 00: Start
here](https://colab.research.google.com/drive/1weKuFgd98W76BloyuuB4d2HudB5KLYew?usp=sharing).
From here, there's really no better way to learn about Google Colab
than to go touch it. Here's a link to <a
href="https://colab.research.google.com/drive/1weKuFgd98W76BloyuuB4d2HudB5KLYew?usp=sharing"
target="_blank">the Colab notebook associated with Week 00: Start
here</a>.
8 changes: 4 additions & 4 deletions _sources/week-01.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ kernelspec:

# Week 01: Python basics

* [Week 01 Notes](https://colab.research.google.com/drive/1VQhUmSxM6WfSw1ZZeKfhkRhkfM9JPXQx?usp=sharing)
* [Week 01 Assignment](https://colab.research.google.com/drive/1h9Ck7kWNN9_I2Yun9Yc4uBoI2lgv6chi?usp=sharing)
* <a href="https://colab.research.google.com/drive/1VQhUmSxM6WfSw1ZZeKfhkRhkfM9JPXQx?usp=sharing" target="_blank">Week 01 Notes</a>
* <a href="https://colab.research.google.com/drive/1h9Ck7kWNN9_I2Yun9Yc4uBoI2lgv6chi?usp=sharing" target="_blank">Week 01 Assignment</a>

## Learning objectives

Expand All @@ -39,7 +39,7 @@ To follow along with this Lesson, please open the Colab notebook [Week
Notes](https://colab.research.google.com/drive/1VQhUmSxM6WfSw1ZZeKfhkRhkfM9JPXQx?usp=sharing).
The first code cell of this notebook calls to the remote computer, on
which the notebook is running, and installs the necessary packages.
For practice, you are repsonible for importing the necessary packages.
For practice, you are responsible for importing the necessary packages.

## Variable

Expand Down Expand Up @@ -602,5 +602,5 @@ Such tools have a steep learning curve and a huge payoff.
```

```{seealso}
[Week 01 Assignment](https://colab.research.google.com/drive/1h9Ck7kWNN9_I2Yun9Yc4uBoI2lgv6chi?usp=sharing)
<a href="https://colab.research.google.com/drive/1h9Ck7kWNN9_I2Yun9Yc4uBoI2lgv6chi?usp=sharing" target="_blank">Week 01 Assignment</a>
```
6 changes: 3 additions & 3 deletions _sources/week-02.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ kernelspec:

# Week 02: Introduction to working with data

* [Week 02 Notes](https://colab.research.google.com/drive/1qHzeZ_1RdfNe1l3KQsZi7xsSjLMVHbel?usp=sharing)
* [Week 02 Assignment](https://colab.research.google.com/drive/1os3hSTKNFblsA1MUTe25pvCjtaKfId30?usp=sharing)
* <a href="https://colab.research.google.com/drive/1qHzeZ_1RdfNe1l3KQsZi7xsSjLMVHbel?usp=sharing" target="_blank">Week 02 Notes</a>
* <a href="https://colab.research.google.com/drive/1os3hSTKNFblsA1MUTe25pvCjtaKfId30?usp=sharing" target="_blank">Week 02 Assignment</a>

## Learning objectives

Expand Down Expand Up @@ -281,5 +281,5 @@ msleep["smrt"]
```

```{seealso}
[Week 02 Assignment](https://colab.research.google.com/drive/1os3hSTKNFblsA1MUTe25pvCjtaKfId30?usp=sharing)
<a href="https://colab.research.google.com/drive/1os3hSTKNFblsA1MUTe25pvCjtaKfId30?usp=sharing" target="_blank">Week 02 Assignment</a>
```
6 changes: 3 additions & 3 deletions _sources/week-03.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ kernelspec:

# Week 03: Graphing and aggregating

* [Week 03 Notes](https://colab.research.google.com/drive/1HqqhJvfHsWJAj_3dgBt0SOV5E90Sq1pG?usp=sharing)
* [Week 03 Assignment](https://colab.research.google.com/drive/1_ZTWGesIh5DUB_l3UdTmR5KK9AFPEy_9?usp=sharing)
* <a href="https://colab.research.google.com/drive/1HqqhJvfHsWJAj_3dgBt0SOV5E90Sq1pG?usp=sharing" target="_blank">Week 03 Notes</a>
* <a href="https://colab.research.google.com/drive/1_ZTWGesIh5DUB_l3UdTmR5KK9AFPEy_9?usp=sharing" target="_blank">Week 03 Assignment</a>

## Learning outcomes

Expand Down Expand Up @@ -344,5 +344,5 @@ p.draw()
```

```{seealso}
[Week 03 Assignment](https://colab.research.google.com/drive/1_ZTWGesIh5DUB_l3UdTmR5KK9AFPEy_9?usp=sharing)
<a href="https://colab.research.google.com/drive/1_ZTWGesIh5DUB_l3UdTmR5KK9AFPEy_9?usp=sharing" target="_blank">Week 03 Assignment</a>
```
Loading

0 comments on commit e4fe799

Please sign in to comment.