From 88c884e3c838b6602650335add5acbed8bfc1856 Mon Sep 17 00:00:00 2001 From: Chris Lo Date: Thu, 26 Sep 2024 16:25:29 -0700 Subject: [PATCH] lecture 2 --- 01-intro-to-computing.Rmd | 2 +- 02-data-structures.Rmd | 2 +- slides/lesson2_slides.html | 289 ++++++++++++++++++++++++++----------- slides/lesson2_slides.qmd | 140 ++++++++++++++---- 4 files changed, 317 insertions(+), 116 deletions(-) diff --git a/01-intro-to-computing.Rmd b/01-intro-to-computing.Rmd index 78172c6..23e41a4 100644 --- a/01-intro-to-computing.Rmd +++ b/01-intro-to-computing.Rmd @@ -222,7 +222,7 @@ And there is an operational equivalent: We will mostly look at functions with input arguments and return types in this course, but not all functions need to have input arguments and output return. Let's look at some examples of functions that don't always have an input or output: | Function call | What it takes in | What it does | Returns | -|----------------|----------------|-------------------------|----------------| +|---------------------------------------------------------------------------|--------------------------|---------------------------------------------------------------|---------| | [`pow(a, b)`](https://docs.python.org/3/library/functions.html#pow) | integer `a`, integer `b` | Raises `a` to the `b`th power. | Integer | | [`time.sleep(x)`](https://docs.python.org/3/library/time.html#time.sleep) | Integer `x` | Waits for `x` seconds. | None | | [`dir()`](https://docs.python.org/3/library/functions.html#dir) | Nothing | Gives a list of all the variables defined in the environment. | List | diff --git a/02-data-structures.Rmd b/02-data-structures.Rmd index edd1a23..6576ab6 100644 --- a/02-data-structures.Rmd +++ b/02-data-structures.Rmd @@ -88,7 +88,7 @@ The list data structure we have been working with is an example of an **Object** - **Attributes** that hold subset or additional data for the object. -- Functions called **Methods** that automatically takes the object as input. +- Functions called **Methods** that are for the object and *have to* take in the variable referenced as an input This organizing structure on an object applies to pretty much all Python data types and data structures. diff --git a/slides/lesson2_slides.html b/slides/lesson2_slides.html index f5fc703..17ecc1a 100644 --- a/slides/lesson2_slides.html +++ b/slides/lesson2_slides.html @@ -405,7 +405,7 @@

Solving problems…

Lists

-

List is a data structure that stores many elements of various data types, and the order matters.

+

List is a data structure that stores many elements of various data types, and the order matters.

You can create a List via the bracket [ ] operator:

@@ -423,7 +423,7 @@

Lists

len(staff)
-
+
3
@@ -437,7 +437,7 @@

Subset to an element of a list

chrNum[0]
-
+
2
@@ -446,12 +446,15 @@

Subset to an element of a list

chrNum[2]
-
+
1
+

Let the fifth element of chrNum be the sum of first and second element of chrNum:

+
+
chrNum[4] = chrNum[0] + chrNum[1]
@@ -464,7 +467,7 @@

Subset to an element of a list

chrNum
-
+
[2, 3, 1, 2, 5]
@@ -477,7 +480,7 @@

Subsetting multiple elements of lists

chrNum
-
+
[2, 3, 1, 2, 5]
@@ -487,7 +490,7 @@

Subsetting multiple elements of lists

chrNum[:3]
-
+
[2, 3, 1]
@@ -498,7 +501,7 @@

Subsetting multiple elements of lists

chrNum[-3:]
-
+
[1, 2, 5]
@@ -510,67 +513,92 @@

Subsetting multiple elements of lists

Learn more about subsetting lists in full complexity.

-
-

Objects in Python

-

The list data structure has an organization and functionality that metaphorically represents a pen-and-paper list in our physical world. Like a physical object, we have examined:

- +
+

List Methods

+

Methods are functions for a specific data structure, such as a list.

-

Such organization is called an Object.

+

chrNum.count(2) is a method for lists with chrNum and 2 as inputs.

+
+
+

The method returns the number of instances 2 appears as an element of chrNum.

+
+
+
+
chrNum = [2, 3, 1, 2, 2]
+
+
+ +
+
+
+
+
chrNum.count(2)
+
+
+
+
3
+
+
+
+
+
+
mixedList
+
+
+
+
[False, False, False, 'A', 'B', 92]
+
+
+
+
+
+
mixedList.count(False)
+
+
+
+
3
+
+
+
+ -
-
-

Objects in Python

-

Formally, an object contains the following:

-

What does it contain?

-
    -
  • Value that holds the essential data for the object.

  • -
  • Attributes that hold subset or additional data for the object.

  • -
-

What can it do?

-
    -
  • Functions called Methods that automatically takes the object as input.
  • -
-

This organizing structure on an object applies to pretty much all Python data types and data structures.

-
-
-

Lists as Objects

-

What does it contain?

-
    -
  • Value: the contents of the list, such as [2, 3, 4].

  • -
  • Attributes: None.

  • -
-

What can it do?

-
    -
  • Methods that can be used on the object: chrNum.count(2) returns the number of instances 2 appears as an element of chrNum.
  • -

Methods vs Functions

-

Methods have to take in the object of interest as an input: chrNum.count(2) automatically treat chrNum as an input. Methods are built for a specific Object type.

-

Functions do not have an implied input: len(chrNum) requires specifying a list in the input.

+

Methods have to take in the variable referenced as an input: chrNum.count(2) automatically treat chrNum as an input. Methods are for a specific data type.

+

Functions do not have an implied input: len(chrNum) requires specifying a list in the input. Functions are less tied to a data type: len("hello") is appropriate for Strings.

Otherwise, no distinction between the two.

+
+

Objects in Python

+

In a List, we have explored:

+
    +
  • What does it contain (in terms of data)?

  • +
  • What can it do (in terms of methods)?

  • +
+
+

Such organization is called an Object. Pretty much every data type and structure in Python is an object. We will formalize this later.

+
+

Dataframes

A Dataframe is a two-dimensional data structure that is similar to a spreadsheet.

-
-
import pandas as pd
-
-metadata = pd.read_csv("../classroom_data/metadata.csv")
-type(metadata)
+
+
import pandas as pd
+
+metadata = pd.read_csv("../classroom_data/metadata.csv")
+type(metadata)
-
-
+
+
pandas.core.frame.DataFrame

Let’s investigate the Dataframe as an object:

    -
  • What does a Dataframe contain? (values, attributes)

    +
  • What does a Dataframe contain (data)?

    • the spreadsheet, columns, column names, shape, subsetting
  • @@ -583,13 +611,13 @@

    Dataframes

What does a Dataframe contain?

-

Attributes: columns

-
-
metadata.ModelID
-metadata['ModelID']
+

Columns

+
+
metadata.ModelID
+metadata['ModelID']
-
-
+
+
0       ACH-000001
 1       ACH-000002
 2       ACH-000003
@@ -605,12 +633,12 @@ 

What does a Dataframe contain?

-

Attribute: column names

-
-
metadata.columns
+

Column names

+
+
metadata.columns
-
-
+
+
Index(['ModelID', 'PatientID', 'CellLineName', 'StrippedCellLineName', 'Age',
        'SourceType', 'SangerModelID', 'RRID', 'DepmapModelType', 'AgeCategory',
        'GrowthPattern', 'LegacyMolecularSubtype', 'PrimaryOrMetastasis',
@@ -624,12 +652,12 @@ 

What does a Dataframe contain?

-

Attribute: shape

-
-
metadata.shape
+

Shape

+
+
metadata.shape
-
-
+
+
(1864, 30)
@@ -637,15 +665,15 @@

What does a Dataframe contain?

Dataframe subsetting

-

Using the iloc attribute and bracket operations, you give two slices: one for the row, and one for the column.

-
-
df = pd.DataFrame(data={'status': ["treated", "untreated", "untreated", "discharged", "treated"],
-                            'age_case': [25, 43, 21, 65, 7],
-                            'age_control': [49, 20, 32, 25, 32]})
-df
-
-
-
+

Using iloc and bracket operations, you give two slices: one for the row, and one for the column.

+
+
df = pd.DataFrame(data={'status': ["treated", "untreated", "untreated", "discharged", "treated"],
+                            'age_case': [25, 43, 21, 65, 7],
+                            'age_control': [49, 20, 32, 25, 32]})
+df
+
+
+