diff --git a/_episodes/02-numpy.md b/_episodes/02-numpy.md index 2b8c3cd9a..62d90aad6 100644 --- a/_episodes/02-numpy.md +++ b/_episodes/02-numpy.md @@ -811,7 +811,7 @@ numpy.savetxt("reshaped_data.csv", reshaped_data, delimiter=',') > > The `numpy.diff()` function takes an array and returns the differences > between two successive values. Let's use it to examine the changes -> each day across the first 6 months of waves in year 3 from our dataset. +> each day across the first 6 months of waves in year 4 from our dataset. > > ~~~ > year4 = reshaped_data[3, :] diff --git a/_episodes/03-matplotlib.md b/_episodes/03-matplotlib.md index 13d07b2a8..0be63c31d 100644 --- a/_episodes/03-matplotlib.md +++ b/_episodes/03-matplotlib.md @@ -40,6 +40,13 @@ import the `pyplot` module from `matplotlib` and use two of its functions to cre > data = numpy.reshape(data[:,2], [37,12]) > ~~~ > {: .language-python} +> +> ...or, if you saved the reshaped data into a file +> +> ~~~ +> import numpy +> data = numpy.loadtxt(fname='reshaped_data.csv') +> ~~~ {: .prereq} diff --git a/_episodes/04-lists.md b/_episodes/04-lists.md index 8c1f343c1..79bcffdff 100644 --- a/_episodes/04-lists.md +++ b/_episodes/04-lists.md @@ -23,6 +23,9 @@ list[2:9]), in the same way as strings and arrays." In the previous episode, we analyzed a single file of wave height data. However we might need to process multiple files in future. +There are four decadal CSV files for the 1980s, 1990s, 2000s, and 2010s. Before we can analyse these, +we need to learn how to store an arbitary number of items in a list. + The natural first step is to collect the names of all the files that we have to process. In Python, a list is a way to store multiple values together. In this episode, we will learn how to store multiple values in a list as well as how to work with lists. @@ -180,32 +183,34 @@ does not. > index operations shown in the image: > > ~~~ -> print([x[0]]) +> print(x[0]) > ~~~ > {: .language-python} > > ~~~ -> [['pepper', 'zucchini', 'onion']] +> ['pepper', 'zucchini', 'onion'] > ~~~ > {: .output} > > ~~~ -> print(x[0]) +> print(x[0][0]) > ~~~ > {: .language-python} > > ~~~ -> ['pepper', 'zucchini', 'onion'] +> 'pepper' > ~~~ > {: .output} > +> It's also possible to explicitely return a list, either in a print statement or to save in a variable: +> > ~~~ -> print(x[0][0]) +> print([x[0]]) > ~~~ > {: .language-python} > > ~~~ -> 'pepper' +> [['pepper', 'zucchini', 'onion']] > ~~~ > {: .output} > diff --git a/_episodes/05-loop.md b/_episodes/05-loop.md index 30bb30444..7968f3fdc 100644 --- a/_episodes/05-loop.md +++ b/_episodes/05-loop.md @@ -384,7 +384,11 @@ so we should always use it when we can. > Suppose you have encoded a polynomial as a list of coefficients in > the following way: the first element is the constant term, the > second element is the coefficient of the linear term, the third is the -> coefficient of the quadratic term, etc. +> coefficient of the quadratic term, etc, where the polynomial is of the form +> +> ax^0 + bx^1 + cx^2 +> +> (when writing polynomials mathematically, the x^0, is often omitted since this equals 1) > > ~~~ > x = 5 diff --git a/_episodes/06-files.md b/_episodes/06-files.md index 11d8da3c7..92d1a8409 100644 --- a/_episodes/06-files.md +++ b/_episodes/06-files.md @@ -86,7 +86,7 @@ for filename in filenames: {: .language-python} ~~~ -waves-00.csv +waves_00.csv ~~~ {: .output} @@ -95,7 +95,7 @@ maximum and minimum waveheight in the 2000s.]( ../fig/waves_loop_1.svg) ~~~ -waves-10s.csv +waves_10s.csv ~~~ {: .output} @@ -103,7 +103,7 @@ waves-10s.csv maximum and minimum waveheight in the 2010s.](../fig/waves_loop_2.svg) ~~~ -waves-80s.csv +waves_80s.csv ~~~ {: .output} @@ -168,13 +168,14 @@ Let's load `waves_90s.csv`: ~~~ data = numpy.loadtxt(fname = "waves_90s.csv", delimiter=',') +data = numpy.reshape(data[:,2], [10,12]) ~~~ {: .language-python} If we try and take the mean for the entire year, we'll see that there must be NaNs: ~~~ -numpy.mean(data[:,2]) +numpy.mean(data) ~~~ {: .language-python} @@ -183,7 +184,7 @@ nan ~~~ {: .output} -If we had only plotted the reshaped data, we would see white squares where there are NaNs in the data: +If we plot the reshaped data, we would see white squares where there are NaNs in the data: ~~~ number_of_rows = data.shape[0] diff --git a/_episodes/07-cond.md b/_episodes/07-cond.md index 31b889b5c..a1c9a0eb5 100644 --- a/_episodes/07-cond.md +++ b/_episodes/07-cond.md @@ -153,10 +153,13 @@ Now that we've seen how conditionals work, we can use them to look for thresholds in our wave data. We are about to use functions provided by the `numpy` module again. Therefore, if you're working in a new Python session, make sure to load the -module with: +module, and load and reshape one of the data files: ~~~ import numpy + +data = numpy.loadtxt(fname = "waves_80s.csv", delimiter=",") +data = numpy.reshape(data[:,2], [10,12]) ~~~ {: .language-python} diff --git a/_episodes/08-func.md b/_episodes/08-func.md index 09be4063a..03082998b 100644 --- a/_episodes/08-func.md +++ b/_episodes/08-func.md @@ -32,7 +32,7 @@ keypoints: --- At this point, -we've written code to draw some graphs of our wave height data, +we've written code to draw some interesting features in our wave data, loop over all our data files to quickly draw these plots for each of them, and have Python make decisions based on what it sees in our data. But, our code is getting pretty long and complicated; @@ -204,7 +204,7 @@ temperature in Kelvin was: 373.15 ## Tidying up Now that we know how to wrap bits of code up in functions, -we can make our analysis easier to read and easier to reuse. +we can make our wave data easier to read and easier to reuse. First, let's make a `visualize` function that generates our plots: ~~~ @@ -212,6 +212,8 @@ def visualize(filename): data = numpy.loadtxt(fname=filename, delimiter=',') number_of_rows = data.shape[0] # total number of months number_of_years = number_of_rows // 12 # total number of years = number of months / number of months per year + + # need to reshape the data for plotting reshaped_data = numpy.reshape(data[:,2], [number_of_years,12]) fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) @@ -234,7 +236,7 @@ def visualize(filename): ~~~ {: .language-python} -and another function called `detect_problems` that checks for those systematics + -Wait! Didn't we forget to specify what both of these functions should return? Well, we didn't. +Wait! Didn't we forget to specify what this function should return? Well, we didn't. In Python, functions are not required to include a `return` statement and can be used for the sole purpose of grouping together pieces of code that conceptually do one thing. In such cases, -function names usually describe what they do, _e.g._ `visualize`, `detect_problems`. +function names usually describe what they do, _e.g._ `visualize`. Notice that rather than jumbling this code together in one giant `for` loop, -we can now read and reuse both ideas separately. +we can now read and reuse the code. We can reproduce the previous analysis with a much simpler `for` loop: ~~~ import glob - filenames = sorted(glob.glob('waves_*.csv')) for filename in filenames[1:3]: print(filename) visualize(filename) - detect_problems(filename) ~~~ {: .language-python} @@ -316,8 +316,7 @@ That looks right, so let's try `offset_mean` on our real data: ~~~ -data = numpy.loadtxt(fname='wavesmonthly.csv', delimiter=',') -reshaped_data = numpy.reshape(data[:,2], [37,12]) +data = numpy.loadtxt(fname='reshaped_data.csv', delimiter=',') print(offset_mean(reshaped_data, 0)) ~~~ {: .language-python} @@ -353,8 +352,8 @@ min, mean, and max of offset data are: -1.8875630630630629 1.960393809248024e-16 {: .output} That seems almost right: -the original mean was about 6.1, -so the lower bound from zero is now about -6.1. +the original mean was about 1.5, +so the lower bound from zero is now about -1.9. The mean of the offset data isn't quite zero --- we'll explore why not in the challenges --- but it's pretty close. We can even go further and check that the standard deviation hasn't changed: @@ -475,25 +474,31 @@ In fact, we can pass the filename to `loadtxt` without the `fname=`: ~~~ -numpy.loadtxt('wavesmonthly.csv', delimiter=',') +numpy.loadtxt('reshaped_data.csv', delimiter=' ') ~~~ {: .language-python} ~~~ -array([[ 0., 0., 1., ..., 3., 0., 0.], - [ 0., 1., 2., ..., 1., 0., 1.], - [ 0., 1., 1., ..., 2., 1., 1.], +array([[3.788, 3.768, 4.774, 2.818, 2.734, 2.086, 2.066, 2.236, 3.322, + 3.512, 4.348, 4.628], + [3.666, 4.326, 3.522, 3.18 , 1.954, 1.72 , 1.86 , 1.95 , 3.11 , + 3.78 , 3.474, 5.28 ], + [5.068, 4.954, 3.77 , 2.402, 2.166, 2.084, 2.246, 2.228, 2.634, + 4.41 , 4.342, 3.28 ], ..., - [ 0., 1., 1., ..., 1., 1., 1.], - [ 0., 0., 0., ..., 0., 2., 0.], - [ 0., 0., 1., ..., 1., 1., 0.]]) + [4.27 , 4.09 , 3.696, 3.302, 2.502, 1.772, 2.016, 2.172, 3.034, + 3.462, 3.856, 5.76 ], + [4.294, 3.794, 4.646, 3.212, 2.226, 1.558, 1.894, 2.092, 2.58 , + 3.87 , 3.108, 6.044], + [6.488, 5.954, 4.26 , 5.838, 4.882, 3.678, 3.308, 2.786, 2.71 , + 3.046, 4.622, 5.048]]) ~~~ {: .output} but we still need to say `delimiter=`: ~~~ -numpy.loadtxt('wavesmonthly.csv', ',') +numpy.loadtxt('reshaped_data.csv', ' ') ~~~ {: .language-python} @@ -655,7 +660,7 @@ and eight others that do. If we call the function like this: ~~~ -numpy.loadtxt('wavesmonthly.csv', ',') +numpy.loadtxt('reshaped_data.csv', ',') ~~~ {: .language-python}