From c4c13ddb967d954e045024d6b96b50cf6bbfcccf Mon Sep 17 00:00:00 2001 From: Lindsey Heagy Date: Sat, 7 Jan 2023 12:40:14 -0800 Subject: [PATCH] deploy with Jupyter + version control (#110) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * welcome (#74) * update the welcome page to indicate this is the python version, include links to the R version * minor edits (newline for better diffs) Co-authored-by: Trevor Campbell * put headings on the table of contents for easier navigation (#75) * Update metadata and add chapter numbering (#44) * Update _config.yml * Update repo and branch info * Make some visible changes * Include packages we actually use and updated jupyterbook to fix netlify * Test rebuilding notebooks again * Fix typo * Update ToC * Add ibis * Always execute all notebooks * Delete requirements.txt * Update source/_config.yml * Update source/_config.yml * Change path and branch * Increase timeout * Update _config.yml (#90) * Ch3: wrangling (#76) * wip on ch3 * working on wrangling chapter * move chaining content to the intro * update content on summary statistics * update the disucssion on apply * remove the discussion on What is a List * move the assign content to the very end * minor wordsmithing on welcome page * edited learning objs in ch1 for new chaining * update discussion on chaining, edits from Trevor * update discussion of split * remove unnecessary call to File: dir Node: Top This is the top of the INFO tree This (the Directory node) gives a menu of major topics. Typing "d" returns here, "q" exits, "?" lists all INFO commands, "h" gives a primer for first-timers, "mEmacs" visits the Emacs topic, etc. In Emacs, you can click mouse button 2 on a menu item or cross reference to select it. --- PLEASE ADD DOCUMENTATION TO THIS TREE. (See INFO topic first.) --- * Menu: The list of major topics begins on the next line. Emacs * Ada mode: (ada-mode). The GNU Emacs mode for editing Ada. * Autotype: (autotype). Convenient features for text that you enter frequently in Emacs. * CC Mode: (ccmode). Emacs mode for editing C, C++, Objective-C, Java, Pike, and IDL code. * CL: (cl). Partial Common Lisp support for Emacs Lisp. * Dired-X: (dired-x). Dired Extra Features. * EUDC: (eudc). A client for directory servers (LDAP, PH) * Ebrowse: (ebrowse). A C++ class browser for Emacs. * Ediff: (ediff). A visual interface for comparing and merging programs. * Emacs: (emacs). The extensible self-documenting text editor. * Emacs FAQ: (efaq). Frequently Asked Questions about Emacs. * Emacs MIME: (emacs-mime). The MIME de/composition library. * Eshell: (eshell). A command shell implemented in Emacs Lisp. * Forms: (forms). Emacs package for editing data bases by filling in forms. * Gnus: (gnus). The newsreader Gnus. * IDLWAVE: (idlwave). Major mode and shell for IDL and WAVE/CL files. * MH-E: (mh-e). Emacs interface to the MH mail system. * Message: (message). Mail and news composition mode that goes with Gnus. * PCL-CVS: (pcl-cvs). Emacs front-end to CVS. * RefTeX: (reftex). Emacs support for LaTeX cross-references and citations. * SC: (sc). Supercite lets you cite parts of messages you're replying to, in flexible ways. * Speedbar: (speedbar). File/Tag summarizing utility. * VIP: (vip). An older VI-emulation for Emacs. * VIPER: (viper). The newest Emacs VI-emulation mode. (also, A VI Plan for Emacs Rescue or the VI PERil.) * Widget: (widget). Documenting the "widget" package used by the Emacs Custom facility. * WoMan: (woman). Browse UN*X Manual Pages `Wo (without) Man'. Texinfo documentation system * Info: (info). Documentation browsing system. Miscellaneous * Screen: (screen). Full-screen window manager. * Standards: (standards). GNU coding standards. GNU admin * Autoconf: (autoconf). Create source code configuration scripts Individual utilities * aclocal: (automake)Invoking aclocal. Generating aclocal.m4 * autoconf: (autoconf)autoconf Invocation. How to create configuration scripts * autoreconf: (autoconf)autoreconf Invocation. Remaking multiple `configure' scripts * autoscan: (autoconf)autoscan Invocation. Semi-automatic `configure.ac' writing * config.status: (autoconf)config.status Invocation. Recreating a configuration * configure: (autoconf)configure Invocation. Configuring a package * ifnames: (autoconf)ifnames Invocation. Listing the conditionals in source code GNU programming tools * automake: (automake). Making Makefile.in's Utilities * Bash: (bash). The GNU Bourne-Again SHell. GNU Packages * Tar: (tar). Making tape (or disk) archives. Individual utilities * tar: (tar)tar invocation. Invoking GNU `tar' Software development * Cpp: (cpp). The GNU C preprocessor. * Cpplib: (cppinternals). Cpplib internals. * gcc: (gcc). The GNU Compiler Collection. * gccinstall: (gccinstall). Installing the GNU Compiler Collection. * gccint: (gccint). Internals of the GNU Compiler Collection. * gfortran: (gfortran). The GNU Fortran Compiler. GNU Libraries * libgomp: (libgomp). GNU OpenMP runtime library Programming & development tools * gdbm_dump: gdbm_dump(gdbm). Dump the GDBM database into a flat file. * gdbm_load: gdbm_load(gdbm). Load the database from a flat file. Utilities GNU libraries * gmp: (gmp). GNU Multiple Precision Arithmetic Library. Software libraries * GnuTLS: (gnutls). GNU Transport Layer Security Library. * GnuTLS-Guile: (gnutls-guile). GNU Transport Layer Security Library. Guile bindings. * libidn2: (libidn2). Internationalized domain names (IDNA2008/TR46) processing. * libtasn1: (libtasn1). Library for Abstract Syntax Notation One (ASN.1). * mpfr: (mpfr). Multiple Precision Floating-Point Reliable Library. GNU Packages * mpc: (mpc)Multiple Precision Complex Library. Development * fftw3: (fftw3). FFTW User's Manual. Individual utilities * aclocal-invocation: (automake)aclocal Invocation. Generating aclocal.m4. * autoconf-invocation: (autoconf)autoconf Invocation. How to create configuration scripts * autoheader: (autoconf)autoheader Invocation. How to create configuration templates * autom4te: (autoconf)autom4te Invocation. The Autoconf executables backbone * automake-invocation: (automake)automake Invocation. Generating Makefile.in. * autoreconf: (autoconf)autoreconf Invocation. Remaking multiple ‘configure’ scripts * autoscan: (autoconf)autoscan Invocation. Semi-automatic ‘configure.ac’ writing * autoupdate: (autoconf)autoupdate Invocation. Automatic update of ‘configure.ac’ * config.status: (autoconf)config.status Invocation. Recreating configurations. * configure: (autoconf)configure Invocation. Configuring a package. * ifnames: (autoconf)ifnames Invocation. Listing conditionals in source. * libtool-invocation: (libtool)Invoking libtool. Running the 'libtool' script. * libtoolize: (libtool)Invoking libtoolize. Adding libtool support. * testsuite: (autoconf)testsuite Invocation. Running an Autotest test suite. Software development * Autoconf: (autoconf). Create source code configuration scripts. * Automake: (automake). Making GNU standards-compliant Makefiles. * Automake-history: (automake-history). History of Automake development. * GNU libtextstyle: (libtextstyle). Output of styled text. * GNU libunistring: (libunistring). Unicode string library. * Libtool: (libtool). Generic shared library support script. Localization * idn2: (libidn2)Invoking idn2. Internationalized Domain Name (IDNA2008/TR46) conversion. Encryption * Nettle: (nettle). A low-level cryptographic library. System Administration * certtool: (gnutls)certtool Invocation. Manipulate certificates and keys. * gnutls-cli: (gnutls)gnutls-cli Invocation. GnuTLS test client. * gnutls-cli-debug: (gnutls)gnutls-cli-debug Invocation. GnuTLS debug client. * gnutls-serv: (gnutls)gnutls-serv Invocation. GnuTLS test server. * psktool: (gnutls)psktool Invocation. Simple TLS-Pre-Shared-Keys manager. * srptool: (gnutls)srptool Invocation. Simple SRP password tool. Libraries * libgpg-error: (gnupg). Error codes and common code for GnuPG. GNU Libraries * libgcrypt: (gcrypt). Cryptographic function library. C++ libraries * autosprintf: (autosprintf). Support for printf format strings in C++. GNU Gettext Utilities * ISO3166: (gettext)Country Codes. ISO 3166 country codes. * ISO639: (gettext)Language Codes. ISO 639 language codes. * autopoint: (gettext)autopoint Invocation. Copy gettext infrastructure. * envsubst: (gettext)envsubst Invocation. Expand environment variables. * gettext: (gettext). GNU gettext utilities. * gettextize: (gettext)gettextize Invocation. Prepare a package for gettext. * msgattrib: (gettext)msgattrib Invocation. Select part of a PO file. * msgcat: (gettext)msgcat Invocation. Combine several PO files. * msgcmp: (gettext)msgcmp Invocation. Compare a PO file and template. * msgcomm: (gettext)msgcomm Invocation. Match two PO files. * msgconv: (gettext)msgconv Invocation. Convert PO file to encoding. * msgen: (gettext)msgen Invocation. Create an English PO file. * msgexec: (gettext)msgexec Invocation. Process a PO file. * msgfilter: (gettext)msgfilter Invocation. Pipe a PO file through a filter. * msgfmt: (gettext)msgfmt Invocation. Make MO files out of PO files. * msggrep: (gettext)msggrep Invocation. Select part of a PO file. * msginit: (gettext)msginit Invocation. Create a fresh PO file. * msgmerge: (gettext)msgmerge Invocation. Update a PO file from template. * msgunfmt: (gettext)msgunfmt Invocation. Uncompile MO file into PO file. * msguniq: (gettext)msguniq Invocation. Unify duplicates for PO file. * ngettext: (gettext)ngettext Invocation. Translate a message with plural. * xgettext: (gettext)xgettext Invocation. Extract strings into a PO file. The Algorithmic Language Scheme * Guile Reference: (guile). The Guile reference manual. * R5RS: (r5rs). The Revised(5) Report on Scheme. * take care of colons preceding code blocks * take care of chapter references * add discussion of lists and dicts * add table and discussion on basic data structures in python: * add description of info * some general cleanup in apply, assign * typo fix in the intro * a couple of type fixes: * polish chaining/multiline exps * polish ch3 up to and incl tidy data * polish indexing * more polish ch3 * fix python exercises link * more on groupby * improve groupy and discussion of lambda functions * try re-ordering the assign and apply content * global find replace to remove . in naming conventions * caption on fig24 fixed * polish up to apply * cleanup through Using to create new columns * through the summery * add :tags: [output_scroll] for large code outputs, change figure types * trim vertical whitespace on figures: * Update source/wrangling.md Co-authored-by: Joel Ostblom * Apply suggestions from code review Co-authored-by: Joel Ostblom * polishing ch3 * final polish on wrangling * final polish on ch3 joel comments Co-authored-by: Trevor Campbell Co-authored-by: Joel Ostblom * added altair_saver extension * update build_html.sh script with new docker image * Ch4: Viz (#77) * code formatting for viz * update viz chapter * updating the viz chapter * comments addressed through faithful dataset * more progress on the viz chapter (part way through morley data) * add back code to create the csv for mauna_loa * a couple minor typo fixes * polishing ch4 * minor polish on ch4 * code tags in learning objs * polish on ch4, fixed number -> percentage in figure labels * re-added other filetypes... * better line formatting in saving section * ignore altair warnings; committed faithful plots * moved faithful plots to img/ * done polishing ch4 Co-authored-by: Trevor Campbell * removed unused material * Front matter (#96) * preface python * remove foreword * added editors page * fix appendix,references * added py acks * minor ed * Update editors.md add Lindsey bio * Add joels bio Co-authored-by: Lindsey Heagy Co-authored-by: Joel Ostblom * Add jupyterlab help section (#101) * Ch1 fig cleanup (#99) * first figures in ch1: * code figures for ch1, including ppt to edit them * update figure sizes * remove old lingering image * removed hidden pptx cache file Co-authored-by: Trevor Campbell * Ch2 fig cleanup (#102) * update output scrolling for ch2 * update scrolling of large output tables * Ch3 fig cleanup (#103) * figure polishing for ch3 * more ch3 figures * Jupyter and Version Control (#98) * added jupyter and version control back in * converted bookdown to jupyterbook images in jupyter chapter * jupyterbook index entries in jupyter * fig references to jupyterbook in jupytermd * updated unicode characters in jupyter; added a caveat at the beginning regarding images * initial index and figure template add to version control * R -> Python in version control * fixed figure refs bookdown->jupyterbook in vsn ctl * jupyterbook index style vsn ctl * fixed images in vsn ctl * minor polish * added caveat * Update source/version-control.md * admonitions for caveats Co-authored-by: Lindsey Heagy * remove the visible call of `glue` in the saving section (#109) * remove the visible call of in the saving section * remove size calculation code Co-authored-by: Trevor Campbell Co-authored-by: Joel Ostblom Co-authored-by: GitHub Action --- source/_toc.yml | 2 + source/jupyter.md | 323 ++++++++------ source/version-control.md | 857 +++++++++++++++++++++++--------------- source/viz.md | 31 +- 4 files changed, 740 insertions(+), 473 deletions(-) diff --git a/source/_toc.yml b/source/_toc.yml index 58497d23..a36c7b04 100644 --- a/source/_toc.yml +++ b/source/_toc.yml @@ -23,6 +23,8 @@ parts: - file: regression2.md - file: clustering.md - file: inference.md + - file: jupyter.md + - file: version-control.md - caption: Appendix chapters: - file: appendixA.md diff --git a/source/jupyter.md b/source/jupyter.md index 33e4d4c8..bff3eaed 100644 --- a/source/jupyter.md +++ b/source/jupyter.md @@ -13,17 +13,8 @@ kernelspec: name: python3 --- -# Combining code and text with Jupyter {#getting-started-with-jupyter} - -```{r jupyter-setup, echo = FALSE, message = FALSE, warning = FALSE} -library(magick) -library(magrittr) -library(knitr) -library(fontawesome) - -knitr::opts_chunk$set(message = FALSE, - fig.align = "center") -``` +(getting-started-with-jupyter)= +# Combining code and text with Jupyter ## Overview @@ -32,46 +23,65 @@ that help tell the story of the analysis. In fact, ideally, we would like to *in with the text and images serving as narration for the code and its output. In this chapter we will show you how to accomplish this using Jupyter notebooks, a common coding platform in data science. Jupyter notebooks do precisely what we need: they let you combine text, images, and (executable!) code in a single -document. In this chapter, we will focus on the *use* of Jupyter notebooks to program in R and write +document. In this chapter, we will focus on the *use* of Jupyter notebooks to program in Python and write text via a web interface. These skills are essential to getting your analysis running; think of it like getting dressed in the morning! Note that we assume that you already have Jupyter set up and ready to use. If that is not the case, please first read -Chapter \@ref(move-to-your-own-machine) to learn how to install and configure Jupyter on your own +the {ref}`move-to-your-own-machine` chapter to learn how to install and configure Jupyter on your own computer. +```{note} +This book was originally written for the R programming language, and +has been edited to focus instead on Python. This chapter on Jupyter notebooks +has not yet been fully updated to focus on Python; it has images and examples from +the R version of the book. But the concepts related to Jupyter notebooks are generally +the same. We are currently working on producing new Python-based images and examples +for this chapter. +``` + ## Chapter learning objectives By the end of the chapter, readers will be able to do the following: - Create new Jupyter notebooks. -- Write, edit, and execute R code in a Jupyter notebook. +- Write, edit, and execute Python code in a Jupyter notebook. - Write, edit, and view text in a Jupyter notebook. - Open and view plain text data files in Jupyter. - Export Jupyter notebooks to other standard file types (e.g., `.html`, `.pdf`). ## Jupyter +```{index} Jupyter notebook, reproducible +``` + Jupyter is a web-based interactive development environment for creating, editing, -and executing documents called Jupyter notebooks. Jupyter notebooks \index{Jupyter notebook} are +and executing documents called Jupyter notebooks. Jupyter notebooks are documents that contain a mix of computer code (and its output) and formattable text. Given that they combine these two analysis artifacts in a single document—code is not separate from the output or written report—notebooks are one of the leading tools to create reproducible data analyses. Reproducible data -analysis \index{reproducible} is one where you can reliably and easily re-create the same results when +analysis is one where you can reliably and easily re-create the same results when analyzing the same data. Although this sounds like something that should always be true of any data analysis, in reality, this is not often the case; one needs to make a conscious effort to perform data analysis in a reproducible manner. An example of what a Jupyter notebook looks like is shown in -Figure \@ref(fig:img-jupyter). +{numref}`img-jupyter`. + -```{r img-jupyter, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "A screenshot of a Jupyter Notebook.", fig.retina = 2, out.width="100%"} -knitr::include_graphics("img/jupyter.png") +```{figure} img/jupyter.png +--- +name: img-jupyter +--- +A screenshot of a Jupyter Notebook. ``` ### Accessing Jupyter +```{index} JupyterHub +``` + One of the easiest ways to start working with Jupyter is to use a -web-based platform called \index{JupyterHub} JupyterHub. JupyterHubs often have Jupyter, R, a number of R +web-based platform called JupyterHub. JupyterHubs often have Jupyter, Python, a number of Python packages, and collaboration tools installed, configured and ready to use. JupyterHubs are usually created and provisioned by organizations, and require authentication to gain access. For example, if you are reading @@ -79,39 +89,51 @@ this book as part of a course, your instructor may have a JupyterHub already set up for you to use! Jupyter can also be installed on your -own computer; see Chapter \@ref(move-to-your-own-machine) for instructions. +own computer; see the {ref}`move-to-your-own-machine` chapter for instructions. ## Code cells +```{index} Jupyter notebook; code cell +``` + The sections of a Jupyter notebook that contain code are referred to as code cells. -A code cell \index{Jupyter notebook!code cell} that has not yet been +A code cell that has not yet been executed has no number inside the square brackets to the left of the cell -(Figure \@ref(fig:code-cell-not-run)). Running a code cell will execute all of +({numref}`code-cell-not-run`). Running a code cell will execute all of the code it contains, and the output (if any exists) will be displayed directly underneath the code that generated it. Outputs may include printed text or numbers, data frames and data visualizations. Cells that have been executed also have a number inside the square brackets to the left of the cell. This number indicates the order in which the cells were run -(Figure \@ref(fig:code-cell-run)). +({numref}`code-cell-run`). -```{r code-cell-not-run, echo = FALSE, fig.cap = "A code cell in Jupyter that has not yet been executed.", fig.retina = 2, out.width="100%"} -image_read("img/code-cell-not-run.png") |> - image_crop("3632x1000") +```{figure} img/code-cell-not-run.png +--- +name: code-cell-not-run +--- +A code cell in Jupyter that has not yet been executed. ``` -```{r code-cell-run, echo = FALSE, fig.cap = "A code cell in Jupyter that has been executed.", fig.retina = 2, out.width="100%"} -image_read("img/code-cell-run.png") |> - image_crop("3632x2000") +```{figure} img/code-cell-run.png +--- +name: code-cell-run +--- +A code cell in Jupyter that has been executed. ``` + + +++ ### Executing code cells -Code cells \index{Jupyter notebook!cell execution} can be run independently or as part of executing the entire notebook +```{index} Jupyter notebook; cell execution +``` + +Code cells can be run independently or as part of executing the entire notebook using one of the "**Run all**" commands found in the **Run** or **Kernel** menus in Jupyter. Running a single code cell independently is a workflow typically -used when editing or writing your own R code. Executing an entire notebook is a +used when editing or writing your own Python code. Executing an entire notebook is a workflow typically used to ensure that your analysis runs in its entirety before sharing it with others, and when using a notebook as part of an automated process. @@ -119,48 +141,62 @@ process. To run a code cell independently, the cell needs to first be activated. This is done by clicking on it with the cursor. Jupyter will indicate a cell has been activated by highlighting it with a blue rectangle to its left. After the cell -has been activated (Figure \@ref(fig:activate-and-run-button)), the cell can be run by either pressing the **Run** (`r fa("play", height = "11px")`) -button in the toolbar, or by using a keyboard shortcut of +has been activated ({numref}`activate-and-run-button`), the cell can be run by either pressing +the **Run** (▸) button in the toolbar, or by using a keyboard shortcut of `Shift + Enter`. -```{r activate-and-run-button, echo = FALSE, fig.cap = "An activated cell that is ready to be run. The red arrow points to the blue rectangle to the cell's left. The blue rectangle indicates that it is ready to be run. This can be done by clicking the run button (circled in red).", fig.retina = 2, out.width="100%"} -image_read("img/activate-and-run-button-annotated.png") |> - image_crop("3632x900") +```{figure} img/activate-and-run-button-annotated.png +--- +name: activate-and-run-button +--- +An activated cell that is ready to be run. The red arrow points to the blue +rectangle to the cell's left. The blue rectangle indicates that it is ready to +be run. This can be done by clicking the run button (circled in red). ``` To execute all of the code cells in an entire notebook, you have three options: 1. Select **Run** >> **Run All Cells** from the menu. -2. Select **Kernel** >> **Restart Kernel and Run All Cells...** from the menu (Figure \@ref(fig:restart-kernel-run-all)). +2. Select **Kernel** >> **Restart Kernel and Run All Cells...** from the menu ({numref}`restart-kernel-run-all`). -3. Click the (`r fa("fast-forward", height = "11px")`) button in the tool bar. +3. Click the (⏭) button in the tool bar. All of these commands result in all of the code cells in a notebook being run. However, there is a slight difference between them. In particular, only -options 2 and 3 above will restart the R session before running all of the -cells; option 1 will not restart the session. Restarting the R session means +options 2 and 3 above will restart the Python session before running all of the +cells; option 1 will not restart the session. Restarting the Python session means that all previous objects that were created from running cells before this command was run will be deleted. In other words, restarting the session and then running all cells (options 2 or 3) emulates how your notebook code would run if you completely restarted Jupyter before executing your entire notebook. -```{r restart-kernel-run-all, echo = FALSE, fig.cap = "Restarting the R session can be accomplished by clicking Restart Kernel and Run All Cells...", fig.retina = 2, out.width="100%"} -image_read("img/restart-kernel-run-all.png") |> - image_crop("3632x900") +```{figure} img/restart-kernel-run-all.png +--- +name: restart-kernel-run-all +--- +Restarting the Python session can be accomplished by clicking Restart Kernel and Run All Cells... ``` + ### The Kernel -The kernel \index{kernel}\index{Jupyter notebook!kernel|see{kernel}} is a program that executes the code inside your notebook and + +```{index} kernel, Jupyter notebook; kernel +``` + +The kernel is a program that executes the code inside your notebook and outputs the results. Kernels for many different programming languages have been created for Jupyter, which means that Jupyter can interpret and execute -the code of many different programming languages. To run R code, your notebook -will need an R kernel. In the top right of your window, you can see a circle -that indicates the status of your kernel. If the circle is empty (`r fa("circle", fill = "white", stroke = "black", stroke_width = "10px", height = "11px")`), -the kernel is idle and ready to execute code. If the circle is filled in (`r fa("circle", fill = "black", stroke = "black", stroke_width = "10px", height = "12px")`), -the kernel is busy running some code. +the code of many different programming languages. To run Python code, your notebook +will need an Python kernel. In the top right of your window, you can see a circle +that indicates the status of your kernel. If the circle is empty +(◯), the kernel is idle and ready to execute code. If the circle is filled in +(⬤), the kernel is busy running some code. + +```{index} kernel; interrupt, kernel; restart +``` -You may run into problems where your kernel \index{kernel!interrupt, restart} is stuck for an excessive amount +You may run into problems where your kernel is stuck for an excessive amount of time, your notebook is very slow and unresponsive, or your kernel loses its connection. If this happens, try the following steps: @@ -170,19 +206,24 @@ connection. If this happens, try the following steps: ### Creating new code cells -To create a new code cell in Jupyter (Figure \@ref(fig:create-new-code-cell)), click the `+` button in the +To create a new code cell in Jupyter ({numref}`create-new-code-cell`), click the `+` button in the toolbar. By default, all new cells in Jupyter start out as code cells, -so after this, all you have to do is write R code within the new cell you just +so after this, all you have to do is write Python code within the new cell you just created! -```{r create-new-code-cell, echo = FALSE, fig.cap = "New cells can be created by clicking the + button, and are by default code cells.", fig.retina = 2, out.width="100%"} -image_read("img/create-new-code-cell.png") |> - image_crop("3632x900") +```{figure} img/create-new-code-cell.png +--- +name: create-new-code-cell +--- +New cells can be created by clicking the + button, and are by default code cells. ``` ## Markdown cells -Text cells inside a Jupyter notebook are \index{markdown} \index{Jupyter notebook!markdown cell} called Markdown cells. Markdown cells +```{index} markdown, Jupyter notebook; markdown cell +``` + +Text cells inside a Jupyter notebook are called Markdown cells. Markdown cells are rich formatted text cells, which means you can **bold** and *italicize* text, create subject headers, create bullet and numbered lists, and more. These cells are given the name "Markdown" because they use *Markdown language* to specify the rich text formatting. @@ -196,19 +237,23 @@ where you can start learning Markdown. To edit a Markdown cell in Jupyter, you need to double click on the cell. Once you do this, the unformatted (or *unrendered*) version of the text will be -shown (Figure \@ref(fig:markdown-cell-not-run)). You +shown ({numref}`markdown-cell-not-run`). You can then use your keyboard to edit the text. To view the formatted -(or *rendered*) text (Figure \@ref(fig:markdown-cell-run)), click the **Run** (`r fa("play", height = "11px")`) button in the toolbar, +(or *rendered*) text ({numref}`markdown-cell-run`), click the **Run** (▸) button in the toolbar, or use the `Shift + Enter` keyboard shortcut. -```{r markdown-cell-not-run, echo = FALSE, fig.cap = "A Markdown cell in Jupyter that has not yet been rendered and can be edited.", fig.retina = 2, out.width="100%"} -image_read("img/markdown-cell-not-run.png") |> - image_crop("3632x900") +```{figure} img/markdown-cell-not-run.png +--- +name: markdown-cell-not-run +--- +A Markdown cell in Jupyter that has not yet been rendered and can be edited. ``` -```{r markdown-cell-run, echo = FALSE, fig.cap = "A Markdown cell in Jupyter that has been rendered and exhibits rich text formatting. ", fig.retina = 2, out.width="100%"} -image_read("img/markdown-cell-run.png") |> - image_crop("3632x900") +```{figure} img/markdown-cell-run.png +--- +name: markdown-cell-run +--- +A Markdown cell in Jupyter that has been rendered and exhibits rich text formatting. ``` ### Creating new Markdown cells @@ -218,11 +263,13 @@ By default, all new cells in Jupyter start as code cells, so the cell format needs to be changed to be recognized and rendered as a Markdown cell. To do this, click on the cell with your cursor to ensure it is activated. Then click on the drop-down box on the toolbar that says "Code" (it -is next to the `r fa("fast-forward", height = "11px")` button), and change it from "**Code**" to "**Markdown**" (Figure \@ref(fig:convert-to-markdown-cell)). +is next to the ⏭ button), and change it from "**Code**" to "**Markdown**" ({numref}`convert-to-markdown-cell`). -```{r convert-to-markdown-cell, echo = FALSE, fig.cap = "New cells are by default code cells. To create Markdown cells, the cell format must be changed.", fig.retina = 2, out.width="100%"} -image_read("img/convert-to-markdown-cell.png") |> - image_crop("3632x900") +```{figure} img/convert-to-markdown-cell.png +--- +name: convert-to-markdown-cell +--- +New cells are by default code cells. To create Markdown cells, the cell format must be changed. ``` ## Saving your work @@ -239,104 +286,121 @@ Mac OS). ### Best practices for executing code cells +```{index} Jupyter notebook; best practices +``` + As you might know (or at least imagine) by now, Jupyter notebooks are great for -interactively editing, writing and running R code; this is what they were +interactively editing, writing and running Python code; this is what they were designed for! Consequently, Jupyter notebooks are flexible in regards to code cell execution order. This flexibility means that code cells can be run in any -arbitrary order using the **Run** (`r fa("play", height = "11px")`) button. But this flexibility has a downside: +arbitrary order using the **Run** (▸) button. But this flexibility has a downside: it can lead to Jupyter notebooks whose code cannot be executed in a linear order (from top to bottom of the notebook). A nonlinear notebook is problematic because a linear order is the conventional way code documents are run, and others will have this expectation when running your notebook. Finally, if the code is used in some automated process, it will need to run in a linear order, -from top to bottom of the notebook. \index{Jupyter notebook!best practices} +from top to bottom of the notebook. The most common way to inadvertently create a nonlinear notebook is to rely solely -on using the `r fa("play", height = "11px")` button to execute cells. For example, -suppose you write some R code that creates an R object, say a variable named +on using the (▸) button to execute cells. For example, +suppose you write some Python code that creates an Python object, say a variable named `y`. When you execute that cell and create `y`, it will continue -to exist until it is deliberately deleted with R code, or when the Jupyter -notebook R session (*i.e.*, kernel) is stopped or restarted. It can also be -referenced in another distinct code cell (Figure \@ref(fig:out-of-order-1)). +to exist until it is deliberately deleted with Python code, or when the Jupyter +notebook Python session (*i.e.*, kernel) is stopped or restarted. It can also be +referenced in another distinct code cell ({numref}`out-of-order-1`). Together, this means that you could then write a code cell further above in the notebook that references `y` and execute it without error in the current session -(Figure \@ref(fig:out-of-order-2)). This could also be done successfully in +({numref}`out-of-order-2`). This could also be done successfully in future sessions if, and only if, you run the cells in the same unconventional order. However, it is difficult to remember this unconventional order, and it is not the order that others would expect your code to be executed in. Thus, in the future, this would lead to errors when the notebook is run in the conventional -linear order (Figure \@ref(fig:out-of-order-3)). +linear order ({numref}`out-of-order-3`). -```{r out-of-order-1, echo = FALSE, fig.cap = "Code that was written out of order, but not yet executed.", fig.retina = 2, out.width="100%"} -image_read("img/out-of-order-1.png") |> - image_crop("3632x800") +```{figure} img/out-of-order-1.png +--- +name: out-of-order-1 +--- +Code that was written out of order, but not yet executed. ``` -```{r out-of-order-2, echo = FALSE, fig.cap = "Code that was written out of order, and was executed using the run button in a nonlinear order without error. The order of execution can be traced by following the numbers to the left of the code cells; their order indicates the order in which the cells were executed.", fig.retina = 2, out.width="100%"} -image_read("img/out-of-order-2.png") |> - image_crop("3632x800") +```{figure} img/out-of-order-2.png +--- +name: out-of-order-2 +--- +Code that was written out of order, and was executed using the run button in a +nonlinear order without error. The order of execution can be traced by +following the numbers to the left of the code cells; their order indicates the +order in which the cells were executed. ``` +++ -(ref:out-of-order-3) Code that was written out of order, and was executed in a linear order using "Restart Kernel and Run All Cells..." This resulted in an error at the execution of the second code cell and it failed to run all code cells in the notebook. -```{r out-of-order-3, echo = FALSE, fig.cap = '(ref:out-of-order-3)', fig.retina = 2, out.width="100%"} -image_read("img/out-of-order-3.png") |> - image_crop("3632x800") +```{figure} img/out-of-order-3.png +--- +name: out-of-order-3 +--- +Code that was written out of order, and was executed in a linear order using +"Restart Kernel and Run All Cells..." This resulted in an error at the +execution of the second code cell and it failed to run all code cells in the +notebook. ``` + + You can also accidentally create a nonfunctioning notebook by creating an object in a cell that later gets deleted. In such a -scenario, that object only exists for that one particular R session and will +scenario, that object only exists for that one particular Python session and will not exist once the notebook is restarted and run again. If that object was referenced in another cell in that notebook, an error would occur when the notebook was run again in a new session. -These events may not negatively affect the current R session when +These events may not negatively affect the current Python session when the code is being written; but as you might now see, they will likely lead to errors when that notebook is run in a future session. Regularly executing -the entire notebook in a fresh R session will help guard +the entire notebook in a fresh Python session will help guard against this. If you restart your session and new errors seem to pop up when you run all of your cells in linear order, you can at least be aware that there is an issue. Knowing this sooner rather than later will allow you to fix the issue and ensure your notebook can be run linearly from start to finish. -We recommend as a best practice to run the entire notebook in a fresh R session +We recommend as a best practice to run the entire notebook in a fresh Python session at least 2–3 times within any period of work. Note that, -critically, you *must do this in a fresh R session* by restarting your kernel. +critically, you *must do this in a fresh Python session* by restarting your kernel. We recommend using either the **Kernel** >> -**Restart Kernel and Run All Cells...** command from the menu or the `r fa("fast-forward", height = "11px")` +**Restart Kernel and Run All Cells...** command from the menu or the ⏭ button in the toolbar. Note that the **Run** >> **Run All Cells** menu item will not restart the kernel, and so it is not sufficient to guard against these errors. -### Best practices for including R packages in notebooks +### Best practices for including Python packages in notebooks -Most data analyses these days depend on functions from external R packages that -are not built into R. One example is the `tidyverse` metapackage that we +Most data analyses these days depend on functions from external Python packages that +are not built into Python. One example is the `pandas` package that we heavily rely on in this book. This package provides us access to functions like -`read_csv` for reading data, `select` for subsetting columns, and `ggplot` for -creating high-quality graphics. +`read_csv` for reading data, and `loc[]` for subsetting rows and columns. +We also use the `altair` package for creating high-quality graphics. -As mentioned earlier in the book, external R packages need to be loaded before +As mentioned earlier in the book, external Python packages need to be loaded before the functions they contain can be used. Our recommended way to do this is via -`library(package_name)`. But where should this line of code be written in a +`import package_name`, and perhaps also to give it a shorter alias like +`import package_name as pn`. But where should this line of code be written in a Jupyter notebook? One idea could be to load the library right before the function is used in the notebook. However, although this technically works, this -causes hidden, or at least non-obvious, R package dependencies when others view +causes hidden, or at least non-obvious, Python package dependencies when others view or try to run the notebook. These hidden dependencies can lead to errors when -the notebook is executed on another computer if the needed R packages are not +the notebook is executed on another computer if the needed Python packages are not installed. Additionally, if the data analysis code takes a long time to run, uncovering the hidden dependencies that need to be installed so that the analysis can run without error can take a great deal of time to uncover. -Therefore, we recommend you load all R packages in a code cell near the top of +Therefore, we recommend you load all Python packages in a code cell near the top of the Jupyter notebook. Loading all your packages at the start ensures that all packages are loaded before their functions are called, assuming the notebook is run in a linear order from top to bottom as recommended above. It also makes it -easy for others viewing or running the notebook to see what external R packages +easy for others viewing or running the notebook to see what external Python packages are used in the analysis, and hence, what packages they should install on their computer to run the analysis successfully. @@ -346,40 +410,49 @@ their computer to run the analysis successfully. 2. As you write code in a Jupyter notebook, run the notebook in a linear order and in its entirety often (2–3 times every work session) via the **Kernel** >> -**Restart Kernel and Run All Cells...** command from the Jupyter menu or the `r fa("fast-forward", height = "11px")` +**Restart Kernel and Run All Cells...** command from the Jupyter menu or the ⏭ button in the toolbar. -3. Write the code that loads external R packages near the top of the Jupyter +3. Write the code that loads external Python packages near the top of the Jupyter notebook. ## Exploring data files -It is essential to preview data files before you try to read them into R to see +It is essential to preview data files before you try to read them into Python to see whether or not there are column names, what the delimiters are, and if there are lines you need to skip. In Jupyter, you preview data files stored as plain text -files (e.g., comma- and tab-separated files) in their plain text format (Figure \@ref(fig:open-data-w-editor-2)) by +files (e.g., comma- and tab-separated files) in their plain text format ({numref}`open-data-w-editor-2`) by right-clicking on the file's name in the Jupyter file explorer, selecting -**Open with**, and then selecting **Editor** (Figure \@ref(fig:open-data-w-editor-1)). +**Open with**, and then selecting **Editor** ({numref}`open-data-w-editor-1`). Suppose you do not specify to open the data file with an editor. In that case, Jupyter will render a nice table for you, and you will not be able to see the column delimiters, and therefore you will not know which function to use, nor which arguments to use and values to specify for them. -```{r open-data-w-editor-1, echo = FALSE, fig.cap = "Opening data files with an editor in Jupyter.", fig.retina = 2, out.width="100%"} -image_read("img/open_data_w_editor_01.png") |> - image_crop("3632x2000") +```{figure} img/open_data_w_editor_01.png +--- +name: open-data-w-editor-1 +--- +Opening data files with an editor in Jupyter. ``` -```{r open-data-w-editor-2, echo = FALSE, fig.cap = "A data file as viewed in an editor in Jupyter.", fig.retina = 2, out.width="100%"} -image_read("img/open_data_w_editor_02.png") |> - image_crop("3632x2000") +```{figure} img/open_data_w_editor_02.png +--- +name: open-data-w-editor-2 +--- +A data file as viewed in an editor in Jupyter. ``` + + ## Exporting to a different file format -In Jupyter, viewing, editing and running R code is done in the Jupyter notebook -file format with \index{Jupyter notebook!export} file extension `.ipynb`. This file format is not easy to open and +```{index} Jupyter notebook; export +``` + +In Jupyter, viewing, editing and running Python code is done in the Jupyter notebook +file format with file extension `.ipynb`. This file format is not easy to open and view outside of Jupyter. Thus, to share your analysis with people who do not commonly use Jupyter, it is recommended that you export your executed analysis as a more common file type, such as an `.html` file, or a `.pdf`. We recommend @@ -410,13 +483,15 @@ like. The font, page margins, and other details will appear different in the `.p At some point, you will want to create a new, fresh Jupyter notebook for your own project instead of viewing, running or editing a notebook that was started by someone else. To do this, navigate to the **Launcher** tab, and click on -the R icon under the **Notebook** heading. If no **Launcher** tab is visible, +the Python icon under the **Notebook** heading. If no **Launcher** tab is visible, you can get a new one via clicking the **+** button at the top of the Jupyter -file explorer (Figure \@ref(fig:launcher)). +file explorer ({numref}`launcher`). -```{r launcher, echo = FALSE, fig.cap = "Clicking on the R icon under the Notebook heading will create a new Jupyter notebook with an R kernel.", fig.retina = 2, out.width="100%"} -image_read("img/launcher-annotated.png") |> - image_crop("3632x2000") +```{figure} img/launcher-annotated.png +--- +name: launcher +--- +Clicking on the Python icon under the Notebook heading will create a new Jupyter notebook with an Python kernel. ``` +++ diff --git a/source/version-control.md b/source/version-control.md index 054bbe42..962df7b3 100644 --- a/source/version-control.md +++ b/source/version-control.md @@ -13,18 +13,8 @@ kernelspec: name: python3 --- -# Collaboration with version control {#Getting-started-with-version-control} - -```{r 12-getting-started-with-version-control, echo = FALSE, message = FALSE, warning = FALSE} -library(magick) -library(magrittr) -library(knitr) - -knitr::opts_chunk$set(message = FALSE, - echo = FALSE, - warning = FALSE, - fig.align = "center") -``` +(getting-started-with-version-control)= +# Collaboration with version control > *You mostly collaborate with yourself, > and me-from-two-months-ago never responds to email.* @@ -35,18 +25,30 @@ knitr::opts_chunk$set(message = FALSE, ## Overview +```{index} git, GitHub +``` + This chapter will introduce the concept of using version control systems to track changes to a project over its lifespan, to share and edit code in a collaborative team, and to distribute the finished project to its intended audience. This chapter will also introduce how to use -the two most common version control tools: Git \index{git} for local version control, -and GitHub \index{GitHub} for remote version control. +the two most common version control tools: Git for local version control, +and GitHub for remote version control. We will focus on the most common version control operations used day-to-day in a standard data science project. There are many user interfaces for Git; in this chapter we will cover the Jupyter Git interface. +```{note} +This book was originally written for the R programming language, and +has been edited to focus instead on Python. This chapter on version control +has not yet been fully updated to focus on Python; it has images and examples from +the R version of the book. But the concepts related to version control are generally +the same. We are currently working on producing new Python-based images and examples +for this chapter. +``` + ## Chapter learning objectives By the end of the chapter, readers will be able to do the following: @@ -96,7 +98,10 @@ and multiple people often end up editing the project simultaneously. In such a situation, determining who has the latest version of the project—and how to resolve conflicting edits—can be a real challenge. -*Version control* \index{version control} helps solve these challenges. Version control is the process +```{index} version control +``` + +*Version control* helps solve these challenges. Version control is the process of keeping a record of changes to documents, including when the changes were made and who made them, throughout the history of their development. It also provides the means both to view earlier versions of the project and to revert @@ -112,8 +117,11 @@ and what you're planning to do next! +++ +```{index} version control;system, version control;repository hosting +``` + To version control a project, you generally need two things: -a *version control system* \index{version control!system} and a *repository hosting service*. \index{version control!repository hosting} +a *version control system* and a *repository hosting service*. The version control system is the software responsible for tracking changes, sharing changes you make with others, obtaining changes from others, and resolving conflicting edits. @@ -145,33 +153,41 @@ and repository hosting services in use today. ## Version control repositories +```{index} repository, repository;local, repository;remote +``` + Typically, when we put a data analysis project under version control, -we create two copies of the repository \index{repository} (Figure \@ref(fig:vc1-no-changes)). +we create two copies of the repository ({numref}`vc1-no-changes`). One copy we use as our primary workspace where we create, edit, and delete files. -This copy is commonly referred to as \index{repository!local} the **local repository**. The local +This copy is commonly referred to as the **local repository**. The local repository most commonly exists on our computer or laptop, but can also exist within a workspace on a server (e.g., JupyterHub). The other copy is typically stored in a repository hosting service (e.g., GitHub), where we can easily share it with our collaborators. -This copy is commonly referred to as \index{repository!remote} the **remote repository**. +This copy is commonly referred to as the **remote repository**. -```{r vc1-no-changes, fig.cap = 'Schematic of local and remote version control repositories.', fig.retina = 2, out.width="100%"} -image_read("img/vc1-no-changes.png") |> - image_crop("3632x2000") +```{figure} img/vc1-no-changes.png +--- +name: vc1-no-changes +--- +Schematic of local and remote version control repositories. ``` -Both copies of the repository have a **working directory** \index{working directory} +```{index} working directory, git;commit +``` + +Both copies of the repository have a **working directory** where you can create, store, edit, and delete -files (e.g., `analysis.ipynb` in Figure \@ref(fig:vc1-no-changes)). +files (e.g., `analysis.ipynb` in {numref}`vc1-no-changes`). Both copies of the repository also maintain a full project history -(Figure \@ref(fig:vc1-no-changes)). This history is a record of all versions of the +({numref}`vc1-no-changes`). This history is a record of all versions of the project files that have been created. The repository history is not automatically generated; Git must be explicitly told when to record -a version of the project. These records are \index{git!commit} called **commits**. They +a version of the project. These records are called **commits**. They are a snapshot of the file contents as well metadata about the repository at that time the record was created (who made the commit, when it was made, etc.). In the local and remote repositories shown in -Figure \@ref(fig:vc1-no-changes), there are two commits represented as gray +{numref}`vc1-no-changes`, there are two commits represented as gray circles. Each commit can be identified by a human-readable **message**, which you write when you make a commit, and a **commit hash** that Git automatically adds for you. @@ -182,14 +198,19 @@ Messages act as a very useful narrative of the changes to a project over its lifespan. If you ever want to view or revert to an earlier version of the project, the message can help you identify which commit to view or revert to. -In Figure \@ref(fig:vc1-no-changes), you can see two such messages, +In {numref}`vc1-no-changes`, you can see two such messages, one for each commit: `Created README.md` and `Added analysis draft`. -The hash \index{hash} is a string of characters consisting of about 40 letters and numbers. +```{index} hash +``` + + + +The hash is a string of characters consisting of about 40 letters and numbers. The purpose of the hash is to serve as a unique identifier for the commit, and is used by Git to index project history. Although hashes are quite long—imagine having to type out 40 precise characters to view an old project version!—Git is able -to work with shorter versions of hashes. In Figure \@ref(fig:vc1-no-changes), you can see +to work with shorter versions of hashes. In {numref}`vc1-no-changes`, you can see two of these shortened hashes, one for each commit: `Daa29d6` and `884c7ce`. ## Version control workflows @@ -205,78 +226,86 @@ editing, and deleting files as you normally would—you must: In this section we will discuss all three of these steps in detail. -### Committing changes to a local repository {#commit-changes} +(commit-changes)= +### Committing changes to a local repository When working on files in your local version control repository (e.g., using Jupyter) and saving your work, these changes will only initially exist in the -working directory of the local repository (Figure \@ref(fig:vc2-changes)). +working directory of the local repository ({numref}`vc2-changes`). -```{r vc2-changes, fig.cap = 'Local repository with changes to files.', fig.retina = 2, out.width="100%"} -image_read("img/vc2-changes.png") |> - image_crop("3632x2000") +```{figure} img/vc2-changes.png +--- +name: vc2-changes +--- +Local repository with changes to files. +``` + +```{index} git;add, staging area ``` Once you reach a point that you want Git to keep a record of the current version of your work, you need to commit (i.e., snapshot) your changes. A prerequisite to this is telling Git which files should be included in that snapshot. We call this step **adding** the -files to the **staging area**. \index{git!add, staging area} +files to the **staging area**. Note that the staging area is not a real physical location on your computer; it is instead a conceptual placeholder for these files until they are committed. The benefit of the Git version control system using a staging area is that you can choose to commit changes in only certain files. For example, -in Figure \@ref(fig:vc-ba2-add), we add only the two files +in {numref}`vc-ba2-add`, we add only the two files that are important to the analysis project (`analysis.ipynb` and `README.md`) and not our personal scratch notes for the project (`notes.txt`). -```{r vc-ba2-add, fig.cap = 'Adding modified files to the staging area in the local repository.', fig.retina = 2, out.width="100%"} -image_read("img/vc-ba2-add.png") |> - image_crop("3632x1200") +```{figure} img/vc-ba2-add.png +--- +name: vc-ba2-add +--- +Adding modified files to the staging area in the local repository. ``` + + Once the files we wish to commit have been added -to the staging area, we can then commit those files to the repository history (Figure \@ref(fig:vc-ba3-commit)). +to the staging area, we can then commit those files to the repository history ({numref}`vc-ba3-commit`). When we do this, we are required to include a helpful *commit message* to tell collaborators (which often includes future you!) about the changes that were -made. In Figure \@ref(fig:vc-ba3-commit), the message is `Message about changes...`; in +made. In {numref}`vc-ba3-commit`, the message is `Message about changes...`; in your work you should make sure to replace this with an informative message about what changed. It is also important to note here that these changes are only being committed to the local repository's history. The remote repository on GitHub has not changed, and collaborators would not yet be able to see your new changes. -```{r vc-ba3-commit, fig.cap = "Committing the modified files in the staging area to the local repository history, with an informative message about what changed.", fig.retina = 2, out.width="100%"} -image_read("img/vc-ba3-commit.png") |> - image_crop("3632x1100") +```{figure} img/vc-ba3-commit.png +--- +name: vc-ba3-commit +--- +Committing the modified files in the staging area to the local repository history, with an informative message about what changed. ``` + + ### Pushing changes to a remote repository +```{index} git;push +``` + + + Once you have made one or more commits that you want to share with your collaborators, -you need \index{git!push} to **push** (i.e., send) those commits back to GitHub (Figure \@ref(fig:vc5-push)). This updates +you need to **push** (i.e., send) those commits back to GitHub ({numref}`vc5-push`). This updates the history in the remote repository (i.e., GitHub) to match what you have in your local repository. Now when collaborators interact with the remote repository, they will be able to see the changes you made. And you can also take comfort in the fact that your work is now backed up in the cloud! -```{r vc5-push, fig.cap = 'Pushing the commit to send the changes to the remote repository on GitHub.', fig.retina = 2, out.width="100%"} -image_read("img/vc5-push.png") |> - image_crop("3632x3000") +```{figure} img/vc5-push.png +--- +name: vc5-push +--- +Pushing the commit to send the changes to the remote repository on GitHub. ``` - ### Pulling changes from a remote repository @@ -284,38 +313,52 @@ If you are working on a project with collaborators, they will also be making cha (e.g., to the analysis code in a Jupyter notebook and the project's README file), committing them to their own local repository, and pushing their commits to the remote GitHub repository to share them with you. When they push their changes, those changes will only initially exist in -the remote GitHub repository and not in your local repository (Figure \@ref(fig:vc6-remote-changes)). +the remote GitHub repository and not in your local repository ({numref}`vc6-remote-changes`). + +```{figure} img/vc6-remote-changes.png +--- +name: vc6-remote-changes +--- +Changes pushed by collaborators, or created directly on GitHub will not be automatically sent to your local repository. +``` -```{r vc6-remote-changes, fig.cap = 'Changes pushed by collaborators, or created directly on GitHub will not be automatically sent to your local repository.', fig.retina = 2, out.width="100%"} -image_read("img/vc6-remote-changes.png") |> - image_crop("3632x2000") +```{index} git;pull ``` To obtain the new changes from the remote repository on GitHub, you will need -to **pull** \index{git!pull} those changes to your own local repository. By pulling changes, -you synchronize your local repository to what is present on GitHub (Figure \@ref(fig:vc7-pull)). +to **pull** those changes to your own local repository. By pulling changes, +you synchronize your local repository to what is present on GitHub ({numref}`vc7-pull`). Additionally, until you pull changes from the remote repository, you will not be able to push any more changes yourself (though you will still be able to work and make commits in your own local repository). -```{r vc7-pull, fig.cap = 'Pulling changes from the remote GitHub repository to synchronize your local repository.', fig.retina = 2, out.width="100%"} -image_read("img/vc7-pull.png") |> - image_crop("3632x2000") +```{figure} img/vc7-pull.png +--- +name: vc7-pull +--- +Pulling changes from the remote GitHub repository to synchronize your local repository. ``` + + ## Working with remote repositories using GitHub +```{index} repository;remote, GitHub, git;clone +``` + + + Now that you have been introduced to some of the key general concepts and workflows of Git version control, we will walk through the practical steps. There are several different ways to start using version control with a new project. For simplicity and ease of setup, -we recommend creating a remote repository \index{repository!remote} first. -This section covers how to both create and edit a remote repository on \index{GitHub} GitHub. -Once you have a remote repository set up, we recommend **cloning** (or copying) that \index{git!clone} +we recommend creating a remote repository first. +This section covers how to both create and edit a remote repository on GitHub. +Once you have a remote repository set up, we recommend **cloning** (or copying) that repository to create a local repository in which you primarily work. You can clone the repository either on your own computer or in a workspace on a server (e.g., a JupyterHub server). -Section \@ref(local-repo-jupyter) below will cover this second step in detail. +Section {numref}`local-repo-jupyter` below will cover this second step in detail. ### Creating a remote repository on GitHub @@ -325,80 +368,95 @@ at [https://github.com/](https://github.com/). Once you have logged into your account, you can create a new repository to host your project by clicking on the "+" icon in the upper right-hand corner, and then on "New Repository," as shown in -Figure \@ref(fig:new-repository-01). +{numref}`new-repository-01`. -(ref:new-repository-01) New repositories on GitHub can be created by clicking on "New Repository" from the + menu. +```{figure} img/version_control/new_repository_01.png +--- +name: new-repository-01 +--- +New repositories on GitHub can be created by clicking on "New Repository" from the + menu. +``` -```{r new-repository-01, fig.cap = '(ref:new-repository-01)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/new_repository_01.png") |> - image_flop() |> - image_crop("3632x1148") |> - image_flop() +```{index} repository;public ``` + Repositories can be set up with a variety of configurations, including a name, optional description, and the inclusion (or not) of several template files. One of the most important configuration items to choose is the visibility to the outside world, -either public or private. *Public* repositories \index{repository!public} can be viewed by anyone. +either public or private. *Public* repositories can be viewed by anyone. *Private* repositories can be viewed by only you. Both public and private repositories are only editable by you, but you can change that by giving access to other collaborators. To get started with a *public* repository having a template `README.md` file, take the -following steps shown in Figure \@ref(fig:new-repository-02): +following steps shown in {numref}`new-repository-02`: 1. Enter the name of your project repository. In the example below, we use `canadian_languages`. Most repositories follow a similar naming convention involving only lowercase letter words separated by either underscores or hyphens. 2. Choose an option for the privacy of your repository. 3. Select "Add a README file." This creates a template `README.md` file in your repository's root folder. 4. When you are happy with your repository name and configuration, click on the green "Create Repository" button. -```{r new-repository-02, fig.cap = 'Repository configuration for a project that is public and initialized with a README.md template file.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/new_repository_02.png") |> - image_flop() |> - image_crop("1700x2240+1000-100") |> - image_flop() +```{figure} img/version_control/new_repository_02.png +--- +name: new-repository-02 +--- +Repository configuration for a project that is public and initialized with a README.md template file. ``` + + A newly created public repository with a `README.md` template file should look something -like what is shown in Figure \@ref(fig:new-repository-03). +like what is shown in {numref}`new-repository-03`. -```{r new-repository-03, fig.cap = 'Respository configuration for a project that is public and initialized with a README.md template file.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/new_repository_03.png") |> - image_flop() |> - image_crop("3584x1700") |> - image_flop() +```{figure} img/version_control/new_repository_03.png +--- +name: new-repository-03 +--- +Respository configuration for a project that is public and initialized with a README.md template file. ``` + + +++ ### Editing files on GitHub with the pen tool -The pen tool \index{GitHub!pen tool} can be used to edit existing plain text files. When you click on +```{index} GitHub; pen tool +``` + +The pen tool can be used to edit existing plain text files. When you click on the pen tool, the file will be opened in a text box where you can use your -keyboard to make changes (Figures \@ref(fig:pen-tool-01) and \@ref(fig:pen-tool-02)). +keyboard to make changes ({numref}`pen-tool-01` and {numref}`pen-tool-02`). -```{r pen-tool-01, fig.cap = 'Clicking on the pen tool opens a text box for editing plain text files.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/pen-tool_01.png") |> - image_flop() |> - image_crop("3584x1500") |> - image_flop() +```{figure} img/version_control/pen-tool_01.png +--- +name: pen-tool-01 +--- +Clicking on the pen tool opens a text box for editing plain text files. ``` -```{r pen-tool-02, fig.cap = 'The text box where edits can be made after clicking on the pen tool.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/pen-tool_02.png") |> - # image_flop() |> - image_crop("3584x2000") # |> -# image_flop() + +```{figure} img/version_control/pen-tool_02.png +--- +name: pen-tool-02 +--- +The text box where edits can be made after clicking on the pen tool. ``` +```{index} GitHub; commit +``` + + + After you are done with your edits, they can be "saved" by *committing* your changes. When you *commit a file* in a repository, the version control system takes a snapshot of what the file looks like. As you continue working on the project, over time you will possibly make many commits to a single file; this generates a useful version history for that file. On GitHub, if you click the -green "Commit changes" button, \index{GitHub!commit} it will save the file and then make a commit -(Figure \@ref(fig:pen-tool-03)). +green "Commit changes" button, it will save the file and then make a commit +({numref}`pen-tool-03`). -Recall from Section \@ref(commit-changes) that you normally have to add files +Recall from {numref}`commit-changes` that you normally have to add files to the staging area before committing them. Why don't we have to do that when we work directly on GitHub? Behind the scenes, when you click the green "Commit changes" button, GitHub *is* adding that one file to the staging area prior to committing it. @@ -409,75 +467,90 @@ changes to multiple files simultaneously. This is especially useful when one You can also do things like run code when working in a local repository, which you cannot do on GitHub. In general, editing on GitHub is reserved for small edits to plain text files. -```{r pen-tool-03, fig.pos = "H", out.extra="", fig.cap = 'Saving changes using the pen tool requires committing those changes, and an associated commit message.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/pen-tool_03.png") |> - image_crop("3583x1500+1+500") +```{figure} img/version_control/pen-tool_03.png +--- +name: pen-tool-03 +--- +Saving changes using the pen tool requires committing those changes, and an associated commit message. ``` ### Creating files on GitHub with the "Add file" menu -The "Add file" menu \index{GitHub!add file} can be used to create new plain text files and upload files +```{index} GitHub; add file +``` + + + +The "Add file" menu can be used to create new plain text files and upload files from your computer. To create a new plain text file, click the "Add file" drop-down menu and select the "Create new file" option -(Figure \@ref(fig:create-new-file-01)). +({numref}`create-new-file-01`). -```{r create-new-file-01, fig.cap = 'New plain text files can be created directly on GitHub.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/create-new-file_01.png") |> - image_flop() |> - image_crop("3584x1600") |> - image_flop() +```{figure} img/version_control/create-new-file_01.png +--- +name: create-new-file-01 +--- +New plain text files can be created directly on GitHub. +``` + +```{index} markdown ``` + + A page will open with a small text box for the file name to be entered, and a larger text box where the desired file content text can be entered. Note the two tabs, "Edit new file" and "Preview". Toggling between them lets you enter and edit text and view what the text will look like when rendered, respectively -(Figure \@ref(fig:create-new-file-02)). -Note that GitHub understands and renders `.md` files \index{markdown} using a +({numref}`create-new-file-02`). +Note that GitHub understands and renders `.md` files using a [markdown syntax](https://guides.github.com/pdfs/markdown-cheatsheet-online.pdf) very similar to Jupyter notebooks, so the "Preview" tab is especially helpful for checking markdown code correctness. -```{r create-new-file-02, fig.cap = 'New plain text files require a file name in the text box circled in red, and file content entered in the larger text box (red arrow).', fig.retina = 2, out.width="100%"} -image_read("img/version_control/create-new-file_02.png") |> - image_flop() |> - image_crop("3584x1300") |> - image_flop() +```{figure} img/version_control/create-new-file_02.png +--- +name: create-new-file-02 +--- +New plain text files require a file name in the text box circled in red, and file content entered in the larger text box (red arrow). ``` Save and commit your changes by clicking the green "Commit changes" button at the -bottom of the page (Figure \@ref(fig:create-new-file-03)). +bottom of the page ({numref}`create-new-file-03`). -```{r create-new-file-03, fig.cap = 'To be saved, newly created files are required to be committed along with an associated commit message.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/create-new-file_03.png") |> - image_crop("3584x1500+1+500") +```{figure} img/version_control/create-new-file_03.png +--- +name: create-new-file-03 +--- +To be saved, newly created files are required to be committed along with an associated commit message. ``` You can also upload files that you have created on your local machine by using the "Add file" drop-down menu and selecting "Upload files" -(Figure \@ref(fig:upload-files-01)). +({numref}`upload-files-01`). To select the files from your local computer to upload, you can either drag and drop them into the gray box area shown below, or click the "choose your files" link to access a file browser dialog. Once the files you want to upload have been selected, click the green "Commit changes" button at the bottom of the -page (Figure \@ref(fig:upload-files-02)). +page ({numref}`upload-files-02`). -```{r upload-files-01, fig.cap = 'New files of any type can be uploaded to GitHub.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/upload-files_01.png") |> - image_flop() |> - image_crop("3584x1600") |> - image_flop() +```{figure} img/version_control/upload-files_01.png +--- +name: upload-files-01 +--- +New files of any type can be uploaded to GitHub. ``` -(ref:upload-files-02) Specify files to upload by dragging them into the GitHub website (red circle) or by clicking on "choose your files." Uploaded files are also required to be committed along with an associated commit message. - -```{r upload-files-02, fig.pos = "H", out.extra="", fig.cap = '(ref:upload-files-02)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/upload-files_02.png") |> - image_flop() |> - image_crop("3584x2200") |> - image_flop() +```{figure} img/version_control/upload-files_02.png +--- +name: upload-files-02 +--- +Specify files to upload by dragging them into the GitHub website (red circle) +or by clicking on "choose your files." Uploaded files are also required to be +committed along with an associated commit message. ``` + Note that Git and GitHub are designed to track changes in individual files. **Do not** upload your whole project in an archive file (e.g., `.zip`). If you do, then Git can only keep track of changes to the entire `.zip` file, which will not @@ -485,23 +558,33 @@ be human-readable. Committing one big archive defeats the whole purpose of using version control: you won't be able to see, interpret, or find changes in the history of any of the actual content of your project! -## Working with local repositories using Jupyter {#local-repo-jupyter} +## Working with local repositories using Jupyter + +```{index} git;Jupyter extension +``` + + Although there are several ways to create and edit files on GitHub, they are not quite powerful enough for efficiently creating and editing complex files, or files that need to be executed to assess whether they work (e.g., files containing code). For example, you wouldn't be able to run an analysis written -with R code directly on GitHub. Thus, it is useful to be able to connect the +with Python code directly on GitHub. Thus, it is useful to be able to connect the remote repository that was created on GitHub to a local coding environment. This can be done by creating and working in a local copy of the repository. In this chapter, we focus on interacting with Git via Jupyter using -the Jupyter Git extension. The Jupyter Git \index{git!Jupyter extension} extension +the Jupyter Git extension. The Jupyter Git extension can be run by Jupyter on your local computer, or on a JupyterHub server. -*Note: we recommend reading Chapter \@ref(getting-started-with-jupyter)* +*Note: we recommend reading the {ref}`getting-started-with-jupyter` chapter* *to learn how to use Jupyter before reading this chapter.* ### Generating a GitHub personal access token +```{index} GitHub; personal access token +``` + + + To send and retrieve work between your local repository and the remote repository on GitHub, you will frequently need to authenticate with GitHub @@ -510,40 +593,47 @@ There are several methods to do this, but for beginners we recommend using the HTTPS method because it is easier and requires less setup. In order to use the HTTPS method, -GitHub requires you to provide a *personal access token*. \index{GitHub!personal access token} +GitHub requires you to provide a *personal access token*. A personal access token is like a password—so keep it a secret!—but it gives you more fine-grained control over what parts of your account the token can be used to access, and lets you set an expiry date for the authentication. To generate a personal access token, you must first visit [https://github.com/settings/tokens](https://github.com/settings/tokens), which will take you to the "Personal access tokens" page in your account settings. -Once there, click "Generate new token" (Figure \@ref(fig:generate-pat-01)). +Once there, click "Generate new token" ({numref}`generate-pat-01`). Note that you may be asked to re-authenticate with your username and password to proceed. -(ref:generate-pat-01) The "Generate new token" button used to initiate the creation of a new personal access token. It is found in the "Personal access tokens" section of the "Developer settings" page in your account settings. -```{r generate-pat-01, fig.cap = '(ref:generate-pat-01)', fig.retina = 2, out.width="100%"} -image_read("img/generate-pat_01.png") +```{figure} img/generate-pat_01.png +--- +name: generate-pat-01 +--- +The "Generate new token" button used to initiate the creation of a new personal +access token. It is found in the "Personal access tokens" section of the +"Developer settings" page in your account settings. ``` + You will be asked to add a note to describe the purpose for your personal access token. Next, you need to select permissions for the token; this is where you can control what parts of your account the token can be used to access. Make sure to choose only those permissions that you absolutely require. In -Figure \@ref(fig:generate-pat-02), we tick only the "repo" box, which gives the +{numref}`generate-pat-02`, we tick only the "repo" box, which gives the token access to our repositories (so that we can push and pull) but none of our other GitHub account features. Finally, to generate the token, scroll to the bottom of that page -and click the green "Generate token" button (Figure \@ref(fig:generate-pat-02)). +and click the green "Generate token" button ({numref}`generate-pat-02`). -(ref:generate-pat-02) Webpage for creating a new personal access token. - -```{r generate-pat-02, fig.pos = "H", out.extra="", fig.cap = '(ref:generate-pat-02)', fig.retina = 2, out.width="100%"} -image_read("img/generate-pat_02.png") +```{figure} img/generate-pat_02.png +--- +name: generate-pat-02 +--- +Webpage for creating a new personal access token. ``` + Finally, you will be taken to a page where you will be able to see -and copy the personal access token you just generated (Figure \@ref(fig:generate-pat-03)). +and copy the personal access token you just generated ({numref}`generate-pat-03`). Since it provides access to certain parts of your account, you should treat this token like a password; for example, you should consider securely storing it (and your other passwords and tokens, too!) using a password manager. @@ -554,71 +644,94 @@ store it, though, do not fret—you can delete that token by clicking the To learn more about GitHub authentication, see the additional resources section at the end of this chapter. -(ref:generate-pat-03) Display of the newly generated personal access token. - -```{r generate-pat-03, fig.pos = "H", out.extra="", fig.cap = '(ref:generate-pat-03)', fig.retina = 2, out.width="100%"} -image_read("img/generate-pat_03.png") +```{figure} img/generate-pat_03.png +--- +name: generate-pat-03 +--- +Display of the newly generated personal access token. ``` + + ### Cloning a repository using Jupyter -*Cloning* a \index{git!clone} remote repository from GitHub +```{index} git;clone +``` + + + +*Cloning* a remote repository from GitHub to create a local repository results in a copy that knows where it was obtained from so that it knows where to send/receive new committed edits. In order to do this, first copy the URL from the HTTPS tab -of the Code drop-down menu on GitHub (Figure \@ref(fig:clone-02)). +of the Code drop-down menu on GitHub ({numref}`clone-02`). -(ref:clone-02) The green "Code" drop-down menu contains the remote address (URL) corresponding to the location of the remote GitHub repository. - -```{r clone-02, fig.pos = "H", out.extra="", fig.cap = '(ref:clone-02)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/clone_02.png") |> - image_crop("3584x1050") +```{figure} img/version_control/clone_02.png +--- +name: clone-02 +--- +The green "Code" drop-down menu contains the remote address (URL) corresponding to the location of the remote GitHub repository. ``` Open Jupyter, and click the Git+ icon on the file browser tab -(Figure \@ref(fig:clone-01)). +({numref}`clone-01`). -```{r clone-01, fig.pos = "H", out.extra="", fig.cap = 'The Jupyter Git Clone icon (red circle).', fig.retina = 2, out.width="100%"} -image_read("img/version_control/clone_01.png") |> - image_crop("2400x1300+1") +```{figure} img/version_control/clone_01.png +--- +name: clone-01 +--- +The Jupyter Git Clone icon (red circle). ``` + + Paste the URL of the GitHub project repository you -created and click the blue "CLONE" button (Figure \@ref(fig:clone-03)). +created and click the blue "CLONE" button ({numref}`clone-03`). -```{r clone-03, fig.pos = "H", out.extra="", fig.cap = 'Prompt where the remote address (URL) corresponding to the location of the GitHub repository needs to be input in Jupyter.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/clone_03.png") |> - image_crop("2400x1430+1") +```{figure} img/version_control/clone_03.png +--- +name: clone-03 +--- +Prompt where the remote address (URL) corresponding to the location of the GitHub repository needs to be input in Jupyter. ``` On the file browser tab, you will now see a folder for the repository. -Inside this folder will be all the files that existed on GitHub (Figure \@ref(fig:clone-04)). +Inside this folder will be all the files that existed on GitHub ({numref}`clone-04`). -```{r clone-04, fig.pos = "H", out.extra="", fig.cap = 'Cloned GitHub repositories can been seen and accessed via the Jupyter file browser.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/clone_04.png") |> - image_crop("2400x1200+1") +```{figure} img/version_control/clone_04.png +--- +name: clone-04 +--- +Cloned GitHub repositories can been seen and accessed via the Jupyter file browser. ``` + ### Specifying files to commit Now that you have cloned the remote repository from GitHub to create a local repository, you can get to work editing, creating, and deleting files. For example, suppose you created and saved a new file (named `eda.ipynb`) that you would -like to send back to the project repository on GitHub (Figure \@ref(fig:git-add-01)). +like to send back to the project repository on GitHub ({numref}`git-add-01`). To "add" this modified file to the staging area (i.e., flag that this is a file whose changes we would like to commit), click the Jupyter Git extension -icon on the far left-hand side of Jupyter (Figure \@ref(fig:git-add-01)). +icon on the far left-hand side of Jupyter ({numref}`git-add-01`). + +```{figure} img/version_control/git_add_01.png +--- +name: git-add-01 +--- +Jupyter Git extension icon (circled in red). +``` -```{r git-add-01, fig.pos = "H", out.extra="", fig.cap = 'Jupyter Git extension icon (circled in red).', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_add_01.png") |> - image_crop("3584x1200") +```{index} git;add ``` + This opens the Jupyter Git graphical user interface pane. Next, -click the plus sign (+) beside the file(s) that you want to "add" \index{git!add} -(Figure \@ref(fig:git-add-02)). Note that because this is the +click the plus sign (+) beside the file(s) that you want to "add" +({numref}`git-add-02`). Note that because this is the first change for this file, it falls under the "Untracked" heading. However, next time you edit this file and want to add the changes, you will find it under the "Changed" heading. @@ -628,32 +741,38 @@ This is a temporary "checkpoint file" created by Jupyter when you work on `eda.i You generally do not want to add auto-generated files to Git repositories; only add the files you directly create and edit. -(ref:git-add-02) `eda.ipynb` is added to the staging area via the plus sign (+). - -```{r git-add-02, fig.cap = '(ref:git-add-02)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_add_02.png") |> - image_crop("3584x1200") +```{figure} img/version_control/git_add_02.png +--- +name: git-add-02 +--- +`eda.ipynb` is added to the staging area via the plus sign (+). ``` Clicking the plus sign (+) moves the file from the "Untracked" heading to the "Staged" heading, so that Git knows you want a snapshot of its current state -as a commit (Figure \@ref(fig:git-add-03)). -Now you are ready to "commit" the changes. +as a commit ({numref}`git-add-03`). Now you are ready to "commit" the changes. Make sure to include a (clear and helpful!) message about what was changed so that your collaborators (and future you) know what happened in this commit. -(ref:git-add-03) Adding `eda.ipynb` makes it visible in the staging area. -```{r git-add-03, fig.pos = "H", out.extra="", fig.cap = '(ref:git-add-03)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_add_03.png") |> - image_crop("3584x1200") +```{figure} img/version_control/git_add_03.png +--- +name: git-add-03 +--- +Adding `eda.ipynb` makes it visible in the staging area. ``` + ### Making the commit +```{index} git;commit +``` + + + To snapshot the changes with an associated commit message, you must put a message in the text box at the bottom of the Git pane -and click on the blue "Commit" button (Figure \@ref(fig:git-commit-01)). \index{git!commit} +and click on the blue "Commit" button ({numref}`git-commit-01`). It is highly recommended to write useful and meaningful messages about what was changed. These commit messages, and the datetime stamp for a given commit, are the primary means to navigate through the project's history in the @@ -663,109 +782,142 @@ When you click the "Commit" button for the first time, you will be prompted to enter your name and email. This only needs to be done once for each machine you use Git on. -```{r git-commit-01, fig.pos = "H", out.extra="", fig.cap = 'A commit message must be added into the Jupyter Git extension commit text box before the blue Commit button can be used to record the commit.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_commit_01.png") +```{figure} img/version_control/git_commit_01.png +--- +name: git-commit-01 +--- +A commit message must be added into the Jupyter Git extension commit text box before the blue Commit button can be used to record the commit. ``` + After "committing" the file(s), you will see there are 0 "Staged" files. You are now ready to push your changes -to the remote repository on GitHub (Figure \@ref(fig:git-commit-03)). +to the remote repository on GitHub ({numref}`git-commit-03`). -```{r git-commit-03, fig.pos = "H", out.extra="", fig.cap = 'After recording a commit, the staging area should be empty.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_commit_03.png") |> - image_crop("3584x1500") + +```{figure} img/version_control/git_commit_03.png +--- +name: git-commit-03 +--- +After recording a commit, the staging area should be empty. ``` + + ### Pushing the commits to GitHub +```{index} git;push +``` + + + To send the committed changes back to the remote repository on -GitHub, you need to *push* them. \index{git!push} To do this, +GitHub, you need to *push* them. To do this, click on the cloud icon with the up arrow on the Jupyter Git tab -(Figure \@ref(fig:git-push-01)). - -(ref:git-push-01) The Jupyter Git extension "push" button (circled in red). +({numref}`git-push-01`). -```{r git-push-01, fig.pos = "H", out.extra="", fig.cap = '(ref:git-push-01)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_push_01.png") |> - image_crop("3584x1500") +```{figure} img/version_control/git_push_01.png +--- +name: git-push-01 +--- +The Jupyter Git extension "push" button (circled in red). ``` + You will then be prompted to enter your GitHub username and the personal access token that you generated earlier (not your account password!). Click -the blue "OK" button to initiate the push (Figure \@ref(fig:git-push-02)). +the blue "OK" button to initiate the push ({numref}`git-push-02`). -```{r git-push-02, fig.pos = "H", out.extra="", fig.cap = 'Enter your Git credentials to authorize the push to the remote repository.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_push_02.png") |> - image_crop("3584x1900") +```{figure} img/version_control/git_push_02.png +--- +name: git-push-02 +--- +Enter your Git credentials to authorize the push to the remote repository. ``` + If the files were successfully pushed to the project repository on -GitHub, you will be shown a success message (Figure \@ref(fig:git-push-03)). +GitHub, you will be shown a success message ({numref}`git-push-03`). Click "Dismiss" to continue working in Jupyter. -```{r git-push-03, fig.pos = "H", out.extra="", fig.cap = 'The prompt that the push was successful.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_push_03.png") |> - image_crop("3584x1900") +```{figure} img/version_control/git_push_03.png +--- +name: git-push-03 +--- +The prompt that the push was successful. ``` + If you visit the remote repository on GitHub, you will see that the changes now exist there too -(Figure \@ref(fig:git-push-04))! +({numref}`git-push-04`)! -```{r git-push-04, fig.pos = "H", out.extra="", fig.cap = 'The GitHub web interface shows a preview of the commit message, and the time of the most recently pushed commit for each file.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_push_04.png") |> - image_crop("3584x1900") +```{figure} img/version_control/git_push_04.png +--- +name: git-push-04 +--- +The GitHub web interface shows a preview of the commit message, and the time of the most recently pushed commit for each file. ``` + ## Collaboration ### Giving collaborators access to your project +```{index} GitHub; collaborator access +``` + + + As mentioned earlier, GitHub allows you to control who has access to your project. The default of both public and private projects are that only the -person who created the GitHub \index{GitHub!collaborator access} repository has permissions to create, edit and +person who created the GitHub repository has permissions to create, edit and delete files (*write access*). To give your collaborators write access to the -projects, navigate to the "Settings" tab (Figure \@ref(fig:add-collab-01)). - -(ref:add-collab-01) The "Settings" tab on the GitHub web interface. +projects, navigate to the "Settings" tab ({numref}`add-collab-01`). -```{r add-collab-01, fig.pos = "H", out.extra="", fig.cap = '(ref:add-collab-01)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/add_collab_01.png") |> - image_crop("3584x1250") +```{figure} img/version_control/add_collab_01.png +--- +name: add-collab-01 +--- +The "Settings" tab on the GitHub web interface. ``` -Then click "Manage access" (Figure \@ref(fig:add-collab-02)). - -(ref:add-collab-02) The "Manage access" tab on the GitHub web interface. +Then click "Manage access" ({numref}`add-collab-02`). -```{r add-collab-02, fig.pos = "H", out.extra="", fig.cap = '(ref:add-collab-02)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/add_collab_02.png") |> - image_crop("3584x1200") +```{figure} img/version_control/add_collab_02.png +--- +name: add-collab-02 +--- +The "Manage access" tab on the GitHub web interface. ``` -(Figure \@ref(fig:add-collab-03)). +Then click the green "Invite a collaborator" button ({numref}`add-collab-03`). -(ref:add-collab-03) The "Invite a collaborator" button on the GitHub web interface. - -```{r add-collab-03, fig.pos = "H", out.extra="", fig.cap = '(ref:add-collab-03)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/add_collab_03.png") |> - image_crop("3584x2200") +```{figure} img/version_control/add_collab_03.png +--- +name: add-collab-03 +--- +The "Invite a collaborator" button on the GitHub web interface. ``` Type in the collaborator's GitHub username or email, -and select their name when it appears (Figure \@ref(fig:add-collab-04)). +and select their name when it appears ({numref}`add-collab-04`). -```{r add-collab-04, fig.pos = "H", out.extra="", fig.cap = "The text box where a collaborator's GitHub username or email can be entered.", fig.retina = 2, out.width="100%"} -image_read("img/version_control/add_collab_04.png") |> - image_crop("3584x1250") +```{figure} img/version_control/add_collab_04.png +--- +name: add-collab-04 +--- +The text box where a collaborator's GitHub username or email can be entered. ``` -Finally, click the green "Add to this repository" button (Figure \@ref(fig:add-collab-05)). +Finally, click the green "Add to this repository" button ({numref}`add-collab-05`). -```{r add-collab-05, fig.pos = "H", out.extra="", fig.cap = 'The confirmation button for adding a collaborator to a repository on the GitHub web interface.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/add_collab_05.png") |> - image_crop("3584x1250") +```{figure} img/version_control/add_collab_05.png +--- +name: add-collab-05 +--- +The confirmation button for adding a collaborator to a repository on the GitHub web interface. ``` After this, you should see your newly added collaborator listed under the @@ -777,61 +929,76 @@ to enable write access. We will now walk through how to use the Jupyter Git extension tool to pull changes to our `eda.ipynb` analysis file that were made by a collaborator -(Figure \@ref(fig:git-pull-00)). +({numref}`git-pull-00`). + +```{figure} img/version_control/git_pull_00.png +--- +name: git-pull-00 +--- +The GitHub interface indicates the name of the last person to push a commit to the remote repository, a preview of the associated commit message, the unique commit identifier, and how long ago the commit was snapshotted. +``` -```{r git-pull-00, fig.pos = "H", out.extra="", fig.cap = 'The GitHub interface indicates the name of the last person to push a commit to the remote repository, a preview of the associated commit message, the unique commit identifier, and how long ago the commit was snapshotted.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_pull_00.png") |> - image_crop("3584x1600") +```{index} git;pull ``` -You can tell Git to "pull" by \index{git!pull} clicking on the cloud icon with -the down arrow in Jupyter (Figure \@ref(fig:git-pull-01)). +You can tell Git to "pull" by clicking on the cloud icon with +the down arrow in Jupyter ({numref}`git-pull-01`). -```{r git-pull-01, fig.pos = "H", out.extra="", fig.cap = 'The Jupyter Git extension clone button.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_pull_01.png") |> - image_crop("3584x1430") +```{figure} img/version_control/git_pull_01.png +--- +name: git-pull-01 +--- +The Jupyter Git extension clone button. ``` Once the files are successfully pulled from GitHub, you need to click "Dismiss" -to keep working (Figure \@ref(fig:git-pull-02)). +to keep working ({numref}`git-pull-02`). -```{r git-pull-02, fig.pos = "H", out.extra="", fig.cap = 'The prompt after changes have been successfully pulled from a remote repository.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_pull_02.png") |> - image_crop("3584x1450") +```{figure} img/version_control/git_pull_02.png +--- +name: git-pull-02 +--- +The prompt after changes have been successfully pulled from a remote repository. ``` And then when you open (or refresh) the files whose changes you just pulled, -you should be able to see them (Figure \@ref(fig:git-pull-03)). +you should be able to see them ({numref}`git-pull-03`). -(ref:git-pull-03) Changes made by the collaborator to `eda.ipynb` (code highlighted by red arrows). - -```{r git-pull-03, fig.pos = "H", out.extra="", fig.cap = '(ref:git-pull-03)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_pull_03.png") |> - image_crop("3584x1450") +```{figure} img/version_control/git_pull_03.png +--- +name: git-pull-03 +--- +Changes made by the collaborator to `eda.ipynb` (code highlighted by red arrows). ``` It can be very useful to review the history of the changes to your project. You can do this directly in Jupyter by clicking "History" in the Git tab -(Figure \@ref(fig:git-pull-04)). +({numref}`git-pull-04`). -```{r git-pull-04, fig.pos = "H", out.extra="", fig.cap = 'Version control repository history viewed using the Jupyter Git extension.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/git_pull_04.png") |> - image_crop("3584x1600") +```{figure} img/version_control/git_pull_04.png +--- +name: git-pull-04 +--- +Version control repository history viewed using the Jupyter Git extension. ``` + It is good practice to pull any changes at the start of *every* work session before you start working on your local copy. If you do not do this, and your collaborators have pushed some changes to the project to GitHub, then you will be unable to push your changes to GitHub until you pull. This situation can be recognized by the error message -shown in Figure \@ref(fig:merge-conflict-01). +shown in {numref}`merge-conflict-01`. -```{r merge-conflict-01, fig.pos = "H", out.extra="", fig.cap = 'Error message that indicates that there are changes on the remote repository that you do not have locally.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/merge_conflict_01.png") |> - image_crop("3584x1450") +```{figure} img/version_control/merge_conflict_01.png +--- +name: merge-conflict-01 +--- +Error message that indicates that there are changes on the remote repository that you do not have locally. ``` + Usually, getting out of this situation is not too troublesome. First you need to pull the changes that exist on GitHub that you do not yet have in the local repository. Usually when this happens, Git can automatically merge the changes @@ -842,22 +1009,36 @@ If, however, you and your collaborators made changes to the same line of the same file, Git will not be able to automatically merge the changes—it will not know whether to keep your version of the line(s), your collaborators version of the line(s), or some blend of the two. When this happens, Git will -tell you that you have a merge conflict in certain file(s) (Figure \@ref(fig:merge-conflict-03)). +tell you that you have a merge conflict in certain file(s) ({numref}`merge-conflict-03`). -```{r merge-conflict-03, fig.cap = 'Error message that indicates you and your collaborators made changes to the same line of the same file and that Git will not be able to automatically merge the changes.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/merge_conflict_03.png") |> - image_crop("3584x1450") +```{figure} img/version_control/merge_conflict_03.png +--- +name: merge-conflict-03 +--- +Error message that indicates you and your collaborators made changes to the +same line of the same file and that Git will not be able to automatically merge +the changes. ``` + + ### Handling merge conflicts -To fix the merge conflict, \index{git!merge conflict} you need to open the offending file +```{index} git;merge conflict +``` + + + +To fix the merge conflict, you need to open the offending file in a plain text editor and look for special marks that Git puts in the file to -tell you where the merge conflict occurred (Figure \@ref(fig:merge-conflict-04)). +tell you where the merge conflict occurred ({numref}`merge-conflict-04`). -```{r merge-conflict-04, fig.cap = 'How to open a Jupyter notebook as a plain text file view in Jupyter.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/merge_conflict_04.png") |> - image_crop("3584x1200") + +```{figure} img/version_control/merge_conflict_04.png +--- +name: merge-conflict-04 +--- +How to open a Jupyter notebook as a plain text file view in Jupyter. ``` The beginning of the merge @@ -865,22 +1046,26 @@ conflict is preceded by `<<<<<<< HEAD` and the end of the merge conflict is marked by `>>>>>>>`. Between these markings, Git also inserts a separator (`=======`). The version of the change before the separator is your change, and the version that follows the separator was the change that existed on GitHub. -In Figure \@ref(fig:merge-conflict-05), you can see that in your local repository +In {numref}`merge-conflict-05`, you can see that in your local repository there is a line of code that calls `scale_color_manual` with three color values (`deeppink2`, `cyan4`, and `purple1`). It looks like your collaborator made an edit to that line too, except with different colors (to `blue3`, `red3`, and `black`)! -```{r merge-conflict-05, fig.cap = 'Merge conflict identifiers (highlighted in red).', fig.retina = 2, out.width="100%"} -image_read("img/version_control/merge_conflict_05.png") |> - image_crop("3584x1400") +```{figure} img/version_control/merge_conflict_05.png +--- +name: merge-conflict-05 +--- +Merge conflict identifiers (highlighted in red). ``` Once you have decided which version of the change (or what combination!) to keep, you need to use the plain text editor to remove the special marks that -Git added (Figure \@ref(fig:merge-conflict-06)). +Git added ({numref}`merge-conflict-06`). -```{r merge-conflict-06, fig.cap = 'File where a merge conflict has been resolved.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/merge_conflict_06.png") |> - image_crop("3584x1400") +```{figure} img/version_control/merge_conflict_06.png +--- +name: merge-conflict-06 +--- +File where a merge conflict has been resolved. ``` The file must be saved, added to the staging area, and then committed before you will be able to @@ -895,7 +1080,10 @@ communication surrounding the project. Email and messaging apps are both very po designed for project-specific communication: they both generally do not have facilities for organizing conversations by project subtopics, searching for conversations related to particular bugs or software versions, etc. -GitHub *issues* \index{GitHub!issues} are an alternative written communication medium to email and +```{index} GitHub;issues +``` + +GitHub *issues* are an alternative written communication medium to email and messaging apps, and were designed specifically to facilitate project-specific communication. Issues are *opened* from the "Issues" tab on the project's GitHub page, and they persist there even after the conversation is over and the issue is *closed* (in @@ -908,68 +1096,71 @@ thread. Replying to issues from email is also possible. Given all of these advan we highly recommend the use of issues for project-related communication. To open a GitHub issue, -first click on the "Issues" tab (Figure \@ref(fig:issue-01)). +first click on the "Issues" tab ({numref}`issue-01`). -(ref:issue-01) The "Issues" tab on the GitHub web interface. - -```{r issue-01, fig.pos = "H", out.extra="", fig.cap = '(ref:issue-01)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/issue_01.png") |> - image_crop("3584x1700") +```{figure} img/version_control/issue_01.png +--- +name: issue-01 +--- +The "Issues" tab on the GitHub web interface. ``` -\newpage - -Next click the "New issue" button (Figure \@ref(fig:issue-02)). +Next click the "New issue" button ({numref}`issue-02`). -(ref:issue-02) The "New issues" button on the GitHub web interface. - -```{r issue-02, fig.pos = "H", out.extra="", fig.cap = '(ref:issue-02)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/issue_02.png") |> - image_crop("3584x1250") +```{figure} img/version_control/issue_02.png +--- +name: issue-02 +--- +The "New issues" button on the GitHub web interface. ``` Add an issue title (which acts like an email subject line), and then put the body of the message in the larger text box. Finally, click "Submit new issue" -to post the issue to share with others (Figure \@ref(fig:issue-03)). +to post the issue to share with others ({numref}`issue-03`). -```{r issue-03, fig.pos = "H", out.extra="", fig.cap = 'Dialog boxes and submission button for creating new GitHub issues.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/issue_03.png") |> - image_crop("3584x2200") +```{figure} img/version_control/issue_03.png +--- +name: issue-03 +--- +Dialog boxes and submission button for creating new GitHub issues. ``` You can reply to an issue that someone opened by adding your written response to -the large text box and clicking comment (Figure \@ref(fig:issue-04)). +the large text box and clicking comment ({numref}`issue-04`). -```{r issue-04, fig.pos = "H", out.extra="", fig.cap = 'Dialog box for replying to GitHub issues.', fig.retina = 2, out.width="100%"} -image_read("img/version_control/issue_04.png") |> - image_crop("3584x2000") +```{figure} img/version_control/issue_04.png +--- +name: issue-04 +--- +Dialog box for replying to GitHub issues. ``` + When a conversation is resolved, you can click "Close issue". The closed issue can be later viewed by clicking the "Closed" header link -in the "Issue" tab (Figure \@ref(fig:issue-06)). +in the "Issue" tab ({numref}`issue-06`). -(ref:issue-06) The "Closed" issues tab on the GitHub web interface. - -```{r issue-06, fig.pos = "H", out.extra="", fig.cap = '(ref:issue-06)', fig.retina = 2, out.width="100%"} -image_read("img/version_control/issue_06.png") |> - image_crop("3584x1900") +```{figure} img/version_control/issue_06.png +--- +name: issue-06 +--- +The "Closed" issues tab on the GitHub web interface. ``` ## Exercises Practice exercises for the material covered in this chapter can be found in the accompanying -[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme) +[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-python-worksheets#readme) in the "Collaboration with version control" row. You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button. You can also preview a non-interactive version of the worksheet by clicking "view worksheet." If you instead decide to download the worksheet and run it on your own machine, make sure to follow the instructions for computer setup -found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback +found in the {ref}`move-to-your-own-machine` chapter. This will ensure that the automated feedback and guidance that the worksheets provide will function as intended. -## Additional resources {#vc-add-res} +## Additional resources Now that you've picked up the basics of version control with Git and GitHub, you can expand your knowledge through the resources listed below: @@ -989,7 +1180,5 @@ you can expand your knowledge through the resources listed below: perfectly fine to just stick with GitHub. Just be aware that you have options! - GitHub's [documentation on creating a personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) - and the *Happy Git and GitHub for the useR* - [personal access tokens chapter](https://happygitwithr.com/https-pat.html) are both - excellent additional resources to consult if you need additional help + is an excellent additional resource to consult if you need help generating and using personal access tokens. diff --git a/source/viz.md b/source/viz.md index 5522124e..d5c0f49b 100644 --- a/source/viz.md +++ b/source/viz.md @@ -508,7 +508,7 @@ visualization. Let's create a scatter plot using the `altair` package with the `waiting` variable on the horizontal axis, the `eruptions` variable on the vertical axis, and the `mark_point` geometric object. By default, `altair` draws only the outline of each point. If we would -like to fill them in, we pass the argument `filled=True` to `mark_point`. In +like to fill them in, we pass the argument `filled=True` to `mark_point`. In place of `mark_point(filled=True)`, we can also use `mark_circle`. The result is shown in {numref}`faithful_scatter`. @@ -1225,9 +1225,9 @@ The plot in {numref}`islands_plot_sorted` is now a very effective visualization for answering our original questions. Landmasses are organized by their size, and continents are colored differently than other landmasses, making it quite clear that continents are the largest seven landmasses. -We can make one more finishing touch in {numref}`islands_plot_titled`: we will +We can make one more finishing touch in {numref}`islands_plot_titled`: we will add a title to the chart by specifying `title` argument in the `alt.Chart` function. -Note that plot titles are not always required; usually plots appear as part +Note that plot titles are not always required; usually plots appear as part of other media (e.g., in a slide presentation, on a poster, in a paper) where the title may be redundant with the surrounding context. @@ -1353,10 +1353,10 @@ Note that *vertical lines* are used to denote quantities on the *horizontal axis*, while *horizontal lines* are used to denote quantities on the *vertical axis*. -To add the dashed line on top of the histogram, we -**add** the `mark_rule` chart to the `morley_hist` +To add the dashed line on top of the histogram, we +**add** the `mark_rule` chart to the `morley_hist` using the `+` operator. -Adding features to a plot using the `+` operator is known as *layering* in `altair`. +Adding features to a plot using the `+` operator is known as *layering* in `altair`. This is a very powerful feature of `altair`; you can continue to iterate on a single plot object, adding and refining one layer at a time. If you stored your plot as a named object @@ -1446,7 +1446,7 @@ To fix this issue we can convert the `Expt` variable into a `nominal` (i.e., categorical) type variable by adding a suffix `:N` to the `Expt` variable. Adding the `:N` suffix ensures that `altair` will treat a variable as a categorical variable, and -hence use a discrete color map in visualizations. +hence use a discrete color map in visualizations. We also specify the `stack=False` argument in the `y` encoding so that the bars are not stacked on top of each other. @@ -1831,8 +1831,8 @@ perfectly re-created when loading and displaying, with the hope that the change is not noticeable. *Lossless* formats, on the other hand, allow a perfect display of the original image. -- *Common file types:* - - [JPEG](https://en.wikipedia.org/wiki/JPEG) (`.jpg`, `.jpeg`): lossy, usually used for photographs +- *Common file types:* + - [JPEG](https://en.wikipedia.org/wiki/JPEG) (`.jpg`, `.jpeg`): lossy, usually used for photographs - [PNG](https://en.wikipedia.org/wiki/Portable_Network_Graphics) (`.png`): lossless, usually used for plots / line drawings - [BMP](https://en.wikipedia.org/wiki/BMP_file_format) (`.bmp`): lossless, raw image data, no compression (rarely used) - [TIFF](https://en.wikipedia.org/wiki/TIFF) (`.tif`, `.tiff`): typically lossless, no compression, used mostly in graphic arts, publishing @@ -1845,8 +1845,8 @@ display of the original image. objects (lines, surfaces, shapes, curves). When the computer displays the image, it redraws all of the elements using their mathematical formulas. -- *Common file types:* - - [SVG](https://en.wikipedia.org/wiki/Scalable_Vector_Graphics) (`.svg`): general-purpose use +- *Common file types:* + - [SVG](https://en.wikipedia.org/wiki/Scalable_Vector_Graphics) (`.svg`): general-purpose use - [EPS](https://en.wikipedia.org/wiki/Encapsulated_PostScript) (`.eps`), general-purpose use (rarely used) - *Open-source software:* [Inkscape](https://inkscape.org/) @@ -1875,7 +1875,7 @@ Let's learn how to save plot images to `.png` and `.svg` file formats using the `faithful_scatter_labels` scatter plot of the [Old Faithful data set](https://www.stat.cmu.edu/~larry/all-of-statistics/=data/faithful.dat) {cite:p}`faithfuldata` that we created earlier, shown in {numref}`faithful_scatter_labels`. To save the plot to a file, we can use the `save` -method. The `save` method takes the path to the filename where you would like to +method. The `save` method takes the path to the filename where you would like to save the file (e.g., `img/filename.png` to save a file named `filename.png` to the `img` directory). The kind of image to save is specified by the file extension. For example, to create a PNG image file, we specify that the file extension is `.png`. Below @@ -1891,6 +1891,7 @@ faithful_scatter_labels.save("img/faithful_plot.svg") ``` ```{code-cell} ipython3 +:tags: [remove-cell] import os import numpy as np png_size = np.round(os.path.getsize("img/faithful_plot.png")/(1024*1024), 2) @@ -1915,10 +1916,10 @@ glue("svg_size", svg_size) - {glue:}`svg_size` MB ``` -Take a look at the file sizes in {numref}`png-vs-svg-table` -Wow, that's quite a difference! In this case, the `.png` image is almost 4 times +Take a look at the file sizes in {numref}`png-vs-svg-table`. +Wow, that's quite a difference! In this case, the `.png` image is almost 4 times smaller than the `.svg` image. Since there are a decent number of points in the plot, -the vector graphics format image (`.svg`) is bigger than the raster image (`.png`), which +the vector graphics format image (`.svg`) is bigger than the raster image (`.png`), which just stores the image data itself. In {numref}`png-vs-svg`, we show what the images look like when we zoom in to a rectangle with only 3 data points.