From 4e945be45820b6d7fd5f60b6de80d36e1c6cd4e8 Mon Sep 17 00:00:00 2001 From: Chris Lo Date: Tue, 10 Sep 2024 13:58:53 -0700 Subject: [PATCH] plot blob --- 05-data-visualization.Rmd | 2 +- slides/W1.ipynb | 58 ------- slides/W1.qmd | 26 --- slides/lesson1_slides.qmd | 329 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 330 insertions(+), 85 deletions(-) delete mode 100644 slides/W1.ipynb delete mode 100644 slides/W1.qmd create mode 100644 slides/lesson1_slides.qmd diff --git a/05-data-visualization.Rmd b/05-data-visualization.Rmd index 7b0c6f1..a5967fb 100644 --- a/05-data-visualization.Rmd +++ b/05-data-visualization.Rmd @@ -56,7 +56,7 @@ expression = pd.read_csv("classroom_data/expression.csv") To create a histogram, we use the function [`sns.displot()`](https://seaborn.pydata.org/generated/seaborn.displot.html) and we specify the input argument `data` as our dataframe, and the input argument `x` as the column name in a String. ```{python} -sns.displot(data=metadata, x="Age") +plot = sns.displot(data=metadata, x="Age") ``` A common parameter to consider when making histogram is how big the bins are. You can specify the bin width via `binwidth` argument, or the number of bins via `bins` argument. diff --git a/slides/W1.ipynb b/slides/W1.ipynb deleted file mode 100644 index 48b8709..0000000 --- a/slides/W1.ipynb +++ /dev/null @@ -1,58 +0,0 @@ -{ - "cells": [ - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "---\n", - "title: \"Week 1\"\n", - "format: revealjs\n", - "editor: visual\n", - "---" - ], - "id": "a9602686" - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Quarto\n", - "\n", - "Quarto enables you to weave together content and executable code into a finished presentation. To learn more about Quarto presentations see .\n", - "\n", - "## Bullets\n", - "\n", - "When you click the **Render** button a document will be generated that includes:\n", - "\n", - "- Content authored with markdown\n", - "- Output from executable code\n", - "\n", - "## Code\n", - "\n", - "When you click the **Render** button a presentation will be generated that includes both content and the output of embedded code. You can embed code like this:\n" - ], - "id": "bfb19035" - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "1 + 1\n", - "blob = [2, 3, \"hello\"]\n", - "print(hello)" - ], - "id": "45f3af8f", - "execution_count": null, - "outputs": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} \ No newline at end of file diff --git a/slides/W1.qmd b/slides/W1.qmd deleted file mode 100644 index 678020b..0000000 --- a/slides/W1.qmd +++ /dev/null @@ -1,26 +0,0 @@ ---- -title: "Week 1" -format: revealjs -editor: visual ---- - -## Quarto - -Quarto enables you to weave together content and executable code into a finished presentation. To learn more about Quarto presentations see . - -## Bullets - -When you click the **Render** button a document will be generated that includes: - -- Content authored with markdown -- Output from executable code - -## Code - -When you click the **Render** button a presentation will be generated that includes both content and the output of embedded code. You can embed code like this: - -```{python} -1 + 1 -blob = [2, 3, "hello"] -print(hello) -``` diff --git a/slides/lesson1_slides.qmd b/slides/lesson1_slides.qmd new file mode 100644 index 0000000..4cda9dc --- /dev/null +++ b/slides/lesson1_slides.qmd @@ -0,0 +1,329 @@ +--- +title: "W1: Intro to Computing" +format: revealjs + #smaller: true + #scrollable: true +execute: + echo: true +output-location: fragment +--- + +## Welcome! + +![](images/R-3.1%20(Final).png) + +## Introductions + +- Who am I? + +. . . + +- What is DaSL? + +. . . + +- Who are you? + + - Name, pronouns, group you work in + + - What you want to get out of the class + + - Favorite spring activity + +## Goals of the course + +. . . + +- Fundamental concepts in programming languages: *How do programs run, and how do we solve problems effectively using functions and data structures?* + +. . . + +- Data science fundamentals: *How do you translate your scientific question to a data wrangling problem and answer it?* + + ![Data science workflow](https://d33wubrfki0l68.cloudfront.net/571b056757d68e6df81a3e3853f54d3c76ad6efc/32d37/diagrams/data-science.png){width="550"} + +## Culture of the course + +. . . + +- Learning on the job is challenging + - I will move at learner's pace + - Teach not for mastery, but teach for empowerment to learn effectively. + +. . . + +- Various personal goals and applications + - Curate content towards end of the course + +. . . + +- Respect Code of Conduct + +## Format of the course + +. . . + +- 6 classes: April 17, 24, May 1, 8, 15, 22 + +. . . + +- Streamed online, recordings will be available. + +. . . + +- 1-2 hour exercises after each session are strongly encouraged as they provide practice. + +- Optional time to work on exercises together on Fridays Noon - 1pm PT. + +. . . + +- Online discussion via Slack. + +## Content of the course + +1. Intro to Computing + +. . . + +2. Data structures + +. . . + +3. Data wrangling 1 + +. . . + +4. Data wrangling 2 + +. . . + +5. Data visualization + +. . . + +6. Loading your own data in, celebratory lunch!! + +## What is a computer program? + +. . . + +- A sequence of instructions to manipulate data for the computer to execute. + +. . . + +- A series of translations: English \<-\> Programming Code for Interpreter \<-\> Machine Code for Central Processing Unit (CPU) + +. . . + +We will focus on English \<-\> Programming Code for R Interpreter in this class. + +. . . + +Another way of putting it: **How we organize ideas \<-\> Instructing a computer to do something**. + +## Setting up Posit Cloud and trying out your first analysis! + +What's the connection between English \<-\> Programming Code for R Interpreter? + +## Break + +A pre-course survey: + +https://forms.gle/Hr59ZbAan1JTumCa7 + +## Grammar Structure 1: Evaluation of Expressions + +. . . + +- **Expressions** are built out of **operations** or **functions**. + +. . . + +- Operations and functions take in **data types** and return another data type. + +. . . + +- We can combine multiple expressions together to form more complex expressions: an expression can have other expressions nested inside it. + +## Examples + +```{r} +18 + 21 +``` + +. . . + +```{r} +max(18, 21) +``` + +. . . + +```{r} +max(18 + 21, 65) +``` + +. . . + +```{r} +18 + (21 + 65) +``` + +. . . + +```{r} +nchar("ATCG") +``` + +::: notes +If an expression is made out of multiple, nested operations, what is the proper way of the R Console interpreting it? Being able to read nested operations and nested functions as a programmer is very important. +::: + +## Function machine from algebra class + +. . . + +![](https://cs.wellesley.edu/~cs110/lectures/L16/images/function.png){alt="Function machine from algebra class." width="300"} + +. . . + +Operations are just functions. We could have written: + +```{r} +sum(18, 21) +``` + +. . . + +```{r} +sum(18, sum(21, 65)) +``` + +::: notes +Lastly, a note on the use of functions: a programmer should not need to know how the function is implemented in order to use it - this emphasizes abstraction and modular thinking, a foundation in any programming language. +::: + +## Data types + +- **Numeric**: 18, -21, 65, 1.25 + +- **Character**: "ATCG", "Whatever", "948-293-0000" + +- **Logical**: TRUE, FALSE + +## Grammar Structure 2: Storing data types in the environment + +. . . + +To build up a computer program, we need to store our returned data type from our expression somewhere for downstream use. + +```{r} +x = 18 + 21 +``` + +. . . + +::: callout-tip +## Execution rule for variable assignment + +Evaluate the expression to the right of `=`. + +Bind variable to the left of `=` to the resulting value. + +The variable is stored in the environment. + +`<-` is okay too! +::: + +::: notes +The environment is where all the variables are stored, and can be used for an expression anytime once it is defined. Only one unique variable name can be defined. + +The variable is stored in the working memory of your computer, Random Access Memory (RAM). This is temporary memory storage on the computer that can be accessed quickly. Typically a personal computer has 8, 16, 32 Gigabytes of RAM. When we work with large datasets, if you assign a variable to a data type larger than the available RAM, it will not work. More on this later. +::: + +## Downstream + +Look, now `x` can be reused downstream: + +```{r} +x - 2 +``` + +. . . + +```{r} +y = x * 2 +y +``` + +## Grammar Structure 3: Evaluation of Functions + +A function has a **function name**, **arguments**, and **returns** a data type. + +. . . + +::: callout-tip +## Execution rule for functions: + +Evaluate the function by its arguments, and if the arguments contain expressions, evaluate those expressions first. + +The output of functions is called the **returned value**. +::: + +. . . + +```{r} +sqrt(nchar("hello")) +``` + +. . . + +```{r} +(nchar("hello") + 4) * 2 +``` + +## A programming language has following features: + +. . . + +- Grammar structure to construct expressions + +. . . + +- Combining expressions to create more complex expressions + +. . . + +- Encapsulate complex expressions via functions to create modular and reusable tasks + +. . . + +- Encapsulate complex data via data structures to allow efficient manipulation of data + +## Tips on writing your first code + +. . . + +`Computer = powerful + stupid` + +Even the smallest spelling and formatting changes will cause unexpected output and errors! + +. . . + +- Write incrementally, test often + +. . . + +- Check your assumptions, especially using new functions, operations, and new data types. + +. . . + +- Live environments are great for testing, but not great for reproducibility. + +. . . + +- **Ask for help!** + +## That's all! + +Maybe see you Friday Noon - 1pm PT to practice together!