From 64b3e29942a121a93413737c1dc3a9a364914766 Mon Sep 17 00:00:00 2001 From: Rodolfo Lourenzutti Date: Wed, 2 Oct 2024 08:30:44 -0700 Subject: [PATCH] Remove ipynb --- slides/05_project_intro.ipynb | 260 ------------------- slides/05_version_control.ipynb | 436 -------------------------------- 2 files changed, 696 deletions(-) delete mode 100644 slides/05_project_intro.ipynb delete mode 100644 slides/05_version_control.ipynb diff --git a/slides/05_project_intro.ipynb b/slides/05_project_intro.ipynb deleted file mode 100644 index 45b12a8..0000000 --- a/slides/05_project_intro.ipynb +++ /dev/null @@ -1,260 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "# DSCI 100: Introduction to Data Science\n", - "\n", - "![](https://plaicraft.ai/PLAICraftTitle.png)\n", - "\n", - "## Today: Project kickoff\n", - "\n", - "- Overview and requirements\n", - "- Get a brief understanding of classification and regression (more in the next lecture!)\n", - "- Project repository setup" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "## Project: Predicting Usage of a Video Game Research Server\n", - "\n", - "This year we have a **real data science project** with **real stakeholders** (new this semester!)\n", - "- A group in CS at UBC is interested in understanding how people play video games\n", - "- They've set up a server running MineCraft and are recording play sessions\n", - "- Running the server is *not easy*: need to have the right hardware resources, software licenses, recruiting efforts, etc.\n", - "- Their questions:\n", - " - What kinds of player tend to play the most?\n", - " - Demand forecasting: when do the most players tend to play?\n", - " - Can we tell whether a player will continue to contribute given past play sessions and demographics?\n", - "- The data:\n", - " - Player skill level, demographic information\n", - " - Past play sessions\n", - "\n", - "## Your task\n", - "\n", - "Formulate and answer a **predictive question** about the data. Present the full analysis, from reading the data to communicating results." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "- What is *classification*?\n", - " - Predict a *class/category* for a new observation/measurement\n", - " - Using past observations with *known* class/category\n", - " - Learn more in lecture 6 & 7!" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "slideshow": { - "slide_type": "fragment" - }, - "tags": [] - }, - "source": [ - "- What is *regression*?\n", - " - Predict a *numerical value* for a new observation/measurement\n", - " - Using past observations with *known* numerical value\n", - " - Learn more in lecture 8 & 9! (you will have to read ahead if you want to do this!)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "## Deliverables\n", - "\n", - "- **Team Contract**\n", - "- **Project Planning Stage (Individual)**\n", - " - all the project details are in this item on Canvas!\n", - "- **Final Project Report**\n", - "- **Play Time**\n", - " - in exchange for this data, you'll be asked to contribute back to the study by playing *at least 3 hours* (total) over the semester.\n", - " - *Note:* The server has **limited capacity** (I think roughly 50 concurrent slots). So if you try to play but receive an error, it's because the cap has been reached. Try again later, you have the whole semester.\n", - " - If you enjoy your time, keep playing beyond 3 hours! It's free.\n", - " - The group in CS will monitor usage and increase capacity over time if the cap keeps getting hit.\n", - "\n", - "See dates on Canvas" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "## One Important Note\n", - "\n", - "This is new, untested *real data*. So the final conclusion of your project might be \"we couldn't get *anything* out of this data.\"\n", - "\n", - "**That's totally fine and is just as valuable as a report that says \"wow look at all the cool things we learned!\"**.\n", - "\n", - "Just make sure you:\n", - "- Critically analyze what happened: *why* things worked, or *why* they didn't\n", - "- Come up with suggestions for next steps (\"collect this other data to actually see a useful signal!\" or \"this data has a lot of info left over, maybe try a fancier model like X!\")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "# Create Repo with Template\n", - "\n", - "![](img/group_project/project-template-use_template.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "- Template URL: https://github.com/UBC-DSCI/dsci-100-project_template\n", - "\n", - "- Click on \"Use this template\"\n", - "\n", - "- Why are we doing this?\n", - " - jupyter creates temp \"checkpoint\" files (for backup in case issue) in a folder `.ipynb_checkpoints`\n", - " - We don't want to version control this file becuase it's not the actual work you want to track in version control\n", - " - You only want to version control files that matter, not temporary and backup files\n", - "\n", - "- The `.gitignore` file will \"disappear\" on jupyter lab, but you will see it in github\n", - "\n", - "- Using Git + Github is optional for the project, but it's by far the easiest way to collaborate effectively." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "# Give the project a name\n", - "\n", - "![](img/group_project/project-template-create_repository.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "# Project Template Setup Recap\n", - "\n", - "1. Go to the template repository\n", - " - https://github.com/UBC-DSCI/dsci-100-project_template\n", - "2. Click \"Use this template\" and create a repository with the template repository\n", - "3. Clone the repository from the owner's repo into their JupyterHub\n", - " - https://datasciencebook.ca/version-control.html#cloning-a-repository-using-jupyter\n", - " - Note: Put this in your home folder, **DO NOT** clone into your `dsci-100-student` folder. If you do, move it out.\n", - " - multiple people can have the same repo name, but within any given user, all the repo names need to be unique\n", - " \n", - "![](img/group_project/project-template-home_folder.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "editable": true, - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "## Activity \\#1: Explore Datasets - Preliminary\n", - " project assignment details on Canvas\n", - "- Using what you have learnt in weeks 1-4, read the dataset, take a look at it, and write a short description about the dataset. \n", - "- Some questions you should try to answer:\n", - " - What is the dataset about?\n", - " - How many variables are there?\n", - " - How many observations are there?\n", - "## Activity \\#2: Explore Datasets Part 2 - Outcome Variable\n", - "- Try to answer these questions now:\n", - " - Identify the main outcome/categorical/label variable in the dataset.\n", - " - How many values/groups are in this variable?\n", - " - How many observations are there in each value/group?\n", - "- Tip: Think about how you are organising your workbook: add more code and markdown cells (and arranged them!) to keep your notebook neat\n", - "## Activity \\#3: Explore Datasets Part 3 - Visualisations!\n", - "- Make some visualisations of the outcome variable:\n", - " - What does the distribution of the variable look like?\n", - " - What relationship does it have with some of the other variables?\n", - "- Tip: Try using a range of box plots, scatterplots, bar charts, line graphs, etc. " - ] - } - ], - "metadata": { - "celltoolbar": "Slideshow", - "kernelspec": { - "display_name": "R", - "language": "R", - "name": "ir" - }, - "language_info": { - "codemirror_mode": "r", - "file_extension": ".r", - "mimetype": "text/x-r-source", - "name": "R", - "pygments_lexer": "r", - "version": "4.3.3" - }, - "rise": { - "transition": "fade" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/slides/05_version_control.ipynb b/slides/05_version_control.ipynb deleted file mode 100644 index b2cc6ab..0000000 --- a/slides/05_version_control.ipynb +++ /dev/null @@ -1,436 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "# DSCI 100 - Introduction to Data Science\n", - "\n", - "\n", - "## Lecture 5 - Collaboration with version control\n", - "\n", - "\n", - "\n", - "Source: " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "# Housekeeping \n", - "- Group projects posted\n", - "- Project contract due this next week's Saturday (Oct 12)\n", - "- No Tutorial assignment this week.\n", - "- The midterm will be on Oct 16 during the tutorial. \n", - "    - The midterm's duration will be 70 minutes.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "## Course policy on plagiarism.\n", - "\n", - "- The quiz format is closed-book, you can only consult the [Python Reference Sheet](https://canvas.ubc.ca/courses/153793/modules/items/7177701); \n", - " - No need to print this, you will have during the exam;\n", - "\n", - "- You can find more information on what happens if you violate academic integrity here: https://science.ubc.ca/students/blog/academic-integrity.\n", - " - This also applies to worksheets/tutorials\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "## What is version control?\n", - "\n", - "- **Version control:** the process of keeping a record of changes to documents, including when the changes were made and who made them\n", - "- lets you view earlier versions and revert changes\n", - "- facilitates resolving conflicting edits\n", - "- originally for software development, but is now used for many tasks (e.g. data analysis!)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "## Why do we need tools to help us collaborate? \n", - "\n", - "No big deal. Just send files to your teammates in emails. Right?\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "source": [ - "Problems:\n", - "- which version is the newest?\n", - "- who made edits, when where they made?\n", - "- what were you working on? (when you revisit the project 3 months from now)\n", - "- can't easily revert changes (if something breaks)\n", - "- no sane way to discuss todo items, issues, etc.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "## Why do we need tools to help us collaborate? \n", - "\n", - "OK, fine. Then let's just share and edit files on dropbox/google drive. Right?\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "source": [ - "These solve *only* the problem of knowing which version is the newest\n", - "\n", - "Still:\n", - "- can't tell who made edits, when they were made\n", - "- can't tell what you were working on when you revisit the project 3 months from now\n", - "- can't easily revert changes\n", - "- no sane way to discuss todo items, issues, etc\n", - "\n", - "(and honestly, you still usually end up with `final_revision_v3_Oct2020_final.docx`...)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "### Git and GitHub\n", - "\n", - "In this course we use two major tools for version control\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "source": [ - "**Git:** \n", - "- keeps track of files in a **repository** (a folder that you tell Git to pay attention to)\n", - "- responsible for keeping track of changes, sharing files with others, handling conflict resolution, etc\n", - "- Git runs on your (and your teammates') machine" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "source": [ - "**GitHub:**\n", - "- a service that hosts your repository in the cloud\n", - "- helps manage permissions (who can view your project, who can edit it)\n", - "- provides tools for project-specific communication (organized into *issues*)\n", - "- can be used to build and host websites/blogs" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "notes" - } - }, - "source": [ - "Git - works on your local computer (e.g., JupyterHub workspace or your laptop)\n", - "\n", - "GitHub - remote repository hosting service (stores a copy of your work on the cloud)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "### Key version control concepts and commands\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "notes" - } - }, - "source": [ - "Now we will introduce key version control concepts and commands. 4 \"places\" we need to know about to understand how these work are: your working directory, the staging area, the hidden`.git` directory, and the remote repository. Only the staging area is not a real location on your computer, it is a conceptual/abstract place that acts as a holding area. We will learn more about this in a minute.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "### Key version control concepts and commands\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "notes" - } - }, - "source": [ - "Here we made changes to three files, however, we only want to share the changes to `README.md` and `analysis.ipynb` as `notes.txt` is our own private notes file that we are not quite ready to share yet (or maybe it's a file we will always keep private)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "### Key version control concepts and commands\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "notes" - } - }, - "source": [ - "To tell Git which files' changes we would like to log as part of our version control, we tell Git what files we want to **add**. This moves the changes to a abstract place called the \"staging area\"." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "### Key version control concepts and commands\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "notes" - } - }, - "source": [ - "Then, to actually log the changes in our version control history, we tell git that we'd like to **commit** the changes, and you'll see later in our demo, that when we do that, we will also a provide a relevant message that gets stored with the changes - allowing us to later understand what those changes were about.\n", - "\n", - "These changes get archived in a hidden `.git` folder. This special folder contains all the changes we ever logged, as well as who logged them, the messages associated with them, and the address of the remote reposiotry (if one exists)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "### Key version control concepts and commands\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "notes" - } - }, - "source": [ - "Finally, we tell Git to **push** our changes. When we do this, git uses the address in the hidden `.git` folder to send the changes from our local computer to the remote repository (e.g., on GitHub)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "### Key version control concepts and commands\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "notes" - } - }, - "source": [ - "Sometimes changes exist on the remote repository, but you don't yet have them on your local computer. This can happen because you edited a file directly using GitHub's web interface, or a collaborator pushed changes to the remote repository." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "### Key version control concepts and commands\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "notes" - } - }, - "source": [ - "To get these changes on your local computer, you need to tell Git to **pull** these changes. This will bring the changes into your working directory and the version control history log in the hidder `.git` folder." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "## Demo time!" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "notes" - } - }, - "source": [ - "Show the students how to:\n", - "\n", - "1. create a public GitHub repo with a README\n", - " - Use the template repo since this is what students will use https://github.com/UBC-DSCI/dsci-100-project_template\n", - " - The advantage for the demo is that checkpoints files do not show up as untracked which makes it less confusing.\n", - "2. edit a file there using the pen tool\n", - "3. clone that repo to the JupyterHub using the Jupyter Git extension\n", - "4. create a new Jupyter notebook that does something simple (like print hello world) and put it under version control (add and commit)\n", - "5. push the committed changes to GitHub\n", - " - You also need to create a PAT https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token\n", - "6. visit GitHub and see the changes (ooohhh ahhh!)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "### What did we learn?\n", - "\n", - "- \n", - "- \n", - "- \n" - ] - } - ], - "metadata": { - "anaconda-cloud": {}, - "celltoolbar": "Slideshow", - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.6" - }, - "rise": { - "transition": "fade" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -}