Merge branch 'edcarp-gh-pages' into gh-pages

NOC-OI · Dec 10, 2024 · 337d6ba · 337d6ba
2 parents bca5470 + 791cc65
commit 337d6ba
Show file tree

Hide file tree

Showing 7 changed files with 61 additions and 36 deletions.
diff --git a/_episodes/02-numpy.md b/_episodes/02-numpy.md
@@ -811,7 +811,7 @@ numpy.savetxt("reshaped_data.csv", reshaped_data, delimiter=',')
 >
 > The `numpy.diff()` function takes an array and returns the differences
 > between two successive values. Let's use it to examine the changes
-> each day across the first 6 months of waves in year 3 from our dataset.
+> each day across the first 6 months of waves in year 4 from our dataset.
 >
 > ~~~
 > year4 = reshaped_data[3, :]

diff --git a/_episodes/03-matplotlib.md b/_episodes/03-matplotlib.md
@@ -40,6 +40,13 @@ import the `pyplot` module from `matplotlib` and use two of its functions to cre
 > data = numpy.reshape(data[:,2], [37,12]) 
 > ~~~
 > {: .language-python}
+>
+> ...or, if you saved the reshaped data into a file
+>
+> ~~~
+> import numpy
+> data = numpy.loadtxt(fname='reshaped_data.csv')
+> ~~~
 {: .prereq}
 
 

diff --git a/_episodes/04-lists.md b/_episodes/04-lists.md
@@ -23,6 +23,9 @@ list[2:9]), in the same way as strings and arrays."
 In the previous episode, we analyzed a single file of wave height data. However we might 
 need to process multiple files in future.
 
+There are four decadal CSV files for the 1980s, 1990s, 2000s, and 2010s. Before we can analyse these,
+we need to learn how to store an arbitary number of items in a list.
+
 The natural first step is to collect the names of all the files that we have to process. In Python,
 a list is a way to store multiple values together. In this episode, we will learn how to store
 multiple values in a list as well as how to work with lists.
@@ -180,32 +183,34 @@ does not.
 > index operations shown in the image:
 >
 > ~~~
-> print([x[0]])
+> print(x[0])
 > ~~~
 > {: .language-python}
 >
 > ~~~
-> [['pepper', 'zucchini', 'onion']]
+> ['pepper', 'zucchini', 'onion']
 > ~~~
 > {: .output}
 >
 > ~~~
-> print(x[0])
+> print(x[0][0])
 > ~~~
 > {: .language-python}
 >
 > ~~~
-> ['pepper', 'zucchini', 'onion']
+> 'pepper'
 > ~~~
 > {: .output}
 >
+> It's also possible to explicitely return a list, either in a print statement or to save in a variable:
+>
 > ~~~
-> print(x[0][0])
+> print([x[0]])
 > ~~~
 > {: .language-python}
 >
 > ~~~
-> 'pepper'
+> [['pepper', 'zucchini', 'onion']]
 > ~~~
 > {: .output}
 >

diff --git a/_episodes/05-loop.md b/_episodes/05-loop.md
@@ -384,7 +384,11 @@ so we should always use it when we can.
 > Suppose you have encoded a polynomial as a list of coefficients in
 > the following way: the first element is the constant term, the
 > second element is the coefficient of the linear term, the third is the
-> coefficient of the quadratic term, etc.
+> coefficient of the quadratic term, etc, where the polynomial is of the form
+>
+> ax^0 + bx^1 + cx^2
+>
+> (when writing polynomials mathematically, the x^0, is often omitted since this equals 1)
 >
 > ~~~
 > x = 5

diff --git a/_episodes/06-files.md b/_episodes/06-files.md
@@ -86,7 +86,7 @@ for filename in filenames:
 {: .language-python}
 
 ~~~
-waves-00.csv
+waves_00.csv
 ~~~
 {: .output}
 
@@ -95,15 +95,15 @@ maximum and minimum waveheight in the 2000s.](
 ../fig/waves_loop_1.svg)
 
 ~~~
-waves-10s.csv
+waves_10s.csv
 ~~~
 {: .output}
 
 ![Output from the second iteration of the for loop. Three line graphs showing the average,
 maximum and minimum waveheight in the 2010s.](../fig/waves_loop_2.svg)
 
 ~~~
-waves-80s.csv
+waves_80s.csv
 ~~~
 {: .output}
 
@@ -168,13 +168,14 @@ Let's load `waves_90s.csv`:
 
 ~~~
 data = numpy.loadtxt(fname = "waves_90s.csv", delimiter=',')
+data = numpy.reshape(data[:,2], [10,12])
 ~~~
 {: .language-python}
 
 If we try and take the mean for the entire year, we'll see that there must be NaNs:
 
 ~~~
-numpy.mean(data[:,2])
+numpy.mean(data)
 ~~~
 {: .language-python}
 
@@ -183,7 +184,7 @@ nan
 ~~~
 {: .output}
 
-If we had only plotted the reshaped data, we would see white squares where there are NaNs in the data:
+If we plot the reshaped data, we would see white squares where there are NaNs in the data:
 
 ~~~
 number_of_rows = data.shape[0]

diff --git a/_episodes/07-cond.md b/_episodes/07-cond.md
@@ -153,10 +153,13 @@ Now that we've seen how conditionals work,
 we can use them to look for thresholds in our wave data.
 We are about to use functions provided by the `numpy` module again.
 Therefore, if you're working in a new Python session, make sure to load the
-module with:
+module, and load and reshape one of the data files:
 
 ~~~
 import numpy
+
+data = numpy.loadtxt(fname = "waves_80s.csv", delimiter=",")
+data = numpy.reshape(data[:,2], [10,12])
 ~~~
 {: .language-python}
 

diff --git a/_episodes/08-func.md b/_episodes/08-func.md
@@ -32,7 +32,7 @@ keypoints:
 ---
 
 At this point,
-we've written code to draw some graphs of our wave height data,
+we've written code to draw some interesting features in our wave data,
 loop over all our data files to quickly draw these plots for each of them,
 and have Python make decisions based on what it sees in our data.
 But, our code is getting pretty long and complicated;
@@ -204,14 +204,16 @@ temperature in Kelvin was: 373.15
 ## Tidying up
 
 Now that we know how to wrap bits of code up in functions,
-we can make our analysis easier to read and easier to reuse.
+we can make our wave data easier to read and easier to reuse.
 First, let's make a `visualize` function that generates our plots:
 
 ~~~
 def visualize(filename):
     data = numpy.loadtxt(fname=filename, delimiter=',')
     number_of_rows = data.shape[0] # total number of months
     number_of_years = number_of_rows // 12 # total number of years = number of months / number of months per year
+
+    # need to reshape the data for plotting
     reshaped_data = numpy.reshape(data[:,2], [number_of_years,12])
 
     fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
@@ -234,7 +236,7 @@ def visualize(filename):
 ~~~
 {: .language-python}
 
-and another function called `detect_problems` that checks for those systematics
+<!-- and another function called `detect_problems` that checks for those systematics
 we noticed:
 
 ~~~
@@ -251,26 +253,24 @@ def detect_problems(filename):
     else:
         print('Seems OK!')
 ~~~
-{: .language-python}
+{: .language-python} -->
 
-Wait! Didn't we forget to specify what both of these functions should return? Well, we didn't.
+Wait! Didn't we forget to specify what this function should return? Well, we didn't.
 In Python, functions are not required to include a `return` statement and can be used for
 the sole purpose of grouping together pieces of code that conceptually do one thing. In such cases,
-function names usually describe what they do, _e.g._ `visualize`, `detect_problems`.
+function names usually describe what they do, _e.g._ `visualize`.
 
 Notice that rather than jumbling this code together in one giant `for` loop,
-we can now read and reuse both ideas separately.
+we can now read and reuse the code.
 We can reproduce the previous analysis with a much simpler `for` loop:
 
 ~~~
 import glob
-
 filenames = sorted(glob.glob('waves_*.csv'))
 
 for filename in filenames[1:3]:
     print(filename)
     visualize(filename)
-    detect_problems(filename)
 ~~~
 {: .language-python}
 
@@ -316,8 +316,7 @@ That looks right,
 so let's try `offset_mean` on our real data:
 
 ~~~
-data = numpy.loadtxt(fname='wavesmonthly.csv', delimiter=',')
-reshaped_data = numpy.reshape(data[:,2], [37,12])
+data = numpy.loadtxt(fname='reshaped_data.csv', delimiter=',')
 print(offset_mean(reshaped_data, 0))
 ~~~
 {: .language-python}
@@ -353,8 +352,8 @@ min, mean, and max of offset data are: -1.8875630630630629 1.960393809248024e-16
 {: .output}
 
 That seems almost right:
-the original mean was about 6.1,
-so the lower bound from zero is now about -6.1.
+the original mean was about 1.5,
+so the lower bound from zero is now about -1.9.
 The mean of the offset data isn't quite zero --- we'll explore why not in the challenges --- but
 it's pretty close.
 We can even go further and check that the standard deviation hasn't changed:
@@ -475,25 +474,31 @@ In fact,
 we can pass the filename to `loadtxt` without the `fname=`:
 
 ~~~
-numpy.loadtxt('wavesmonthly.csv', delimiter=',')
+numpy.loadtxt('reshaped_data.csv', delimiter=' ')
 ~~~
 {: .language-python}
 
 ~~~
-array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
-       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
-       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
+array([[3.788, 3.768, 4.774, 2.818, 2.734, 2.086, 2.066, 2.236, 3.322,
+        3.512, 4.348, 4.628],
+       [3.666, 4.326, 3.522, 3.18 , 1.954, 1.72 , 1.86 , 1.95 , 3.11 ,
+        3.78 , 3.474, 5.28 ],
+       [5.068, 4.954, 3.77 , 2.402, 2.166, 2.084, 2.246, 2.228, 2.634,
+        4.41 , 4.342, 3.28 ],
        ...,
-       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
-       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
-       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
+      [4.27 , 4.09 , 3.696, 3.302, 2.502, 1.772, 2.016, 2.172, 3.034,
+        3.462, 3.856, 5.76 ],
+       [4.294, 3.794, 4.646, 3.212, 2.226, 1.558, 1.894, 2.092, 2.58 ,
+        3.87 , 3.108, 6.044],
+       [6.488, 5.954, 4.26 , 5.838, 4.882, 3.678, 3.308, 2.786, 2.71 ,
+        3.046, 4.622, 5.048]])
 ~~~
 {: .output}
 
 but we still need to say `delimiter=`:
 
 ~~~
-numpy.loadtxt('wavesmonthly.csv', ',')
+numpy.loadtxt('reshaped_data.csv', ' ')
 ~~~
 {: .language-python}
 
@@ -655,7 +660,7 @@ and eight others that do.
 If we call the function like this:
 
 ~~~
-numpy.loadtxt('wavesmonthly.csv', ',')
+numpy.loadtxt('reshaped_data.csv', ',')
 ~~~
 {: .language-python}