Skip to content

Commit

Permalink
Merge branch 'edcarp-gh-pages' into gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
colinsauze committed Dec 10, 2024
2 parents bca5470 + 791cc65 commit 337d6ba
Show file tree
Hide file tree
Showing 7 changed files with 61 additions and 36 deletions.
2 changes: 1 addition & 1 deletion _episodes/02-numpy.md
Original file line number Diff line number Diff line change
Expand Up @@ -811,7 +811,7 @@ numpy.savetxt("reshaped_data.csv", reshaped_data, delimiter=',')
>
> The `numpy.diff()` function takes an array and returns the differences
> between two successive values. Let's use it to examine the changes
> each day across the first 6 months of waves in year 3 from our dataset.
> each day across the first 6 months of waves in year 4 from our dataset.
>
> ~~~
> year4 = reshaped_data[3, :]
Expand Down
7 changes: 7 additions & 0 deletions _episodes/03-matplotlib.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,13 @@ import the `pyplot` module from `matplotlib` and use two of its functions to cre
> data = numpy.reshape(data[:,2], [37,12])
> ~~~
> {: .language-python}
>
> ...or, if you saved the reshaped data into a file
>
> ~~~
> import numpy
> data = numpy.loadtxt(fname='reshaped_data.csv')
> ~~~
{: .prereq}
Expand Down
17 changes: 11 additions & 6 deletions _episodes/04-lists.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ list[2:9]), in the same way as strings and arrays."
In the previous episode, we analyzed a single file of wave height data. However we might
need to process multiple files in future.

There are four decadal CSV files for the 1980s, 1990s, 2000s, and 2010s. Before we can analyse these,
we need to learn how to store an arbitary number of items in a list.

The natural first step is to collect the names of all the files that we have to process. In Python,
a list is a way to store multiple values together. In this episode, we will learn how to store
multiple values in a list as well as how to work with lists.
Expand Down Expand Up @@ -180,32 +183,34 @@ does not.
> index operations shown in the image:
>
> ~~~
> print([x[0]])
> print(x[0])
> ~~~
> {: .language-python}
>
> ~~~
> [['pepper', 'zucchini', 'onion']]
> ['pepper', 'zucchini', 'onion']
> ~~~
> {: .output}
>
> ~~~
> print(x[0])
> print(x[0][0])
> ~~~
> {: .language-python}
>
> ~~~
> ['pepper', 'zucchini', 'onion']
> 'pepper'
> ~~~
> {: .output}
>
> It's also possible to explicitely return a list, either in a print statement or to save in a variable:
>
> ~~~
> print(x[0][0])
> print([x[0]])
> ~~~
> {: .language-python}
>
> ~~~
> 'pepper'
> [['pepper', 'zucchini', 'onion']]
> ~~~
> {: .output}
>
Expand Down
6 changes: 5 additions & 1 deletion _episodes/05-loop.md
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,11 @@ so we should always use it when we can.
> Suppose you have encoded a polynomial as a list of coefficients in
> the following way: the first element is the constant term, the
> second element is the coefficient of the linear term, the third is the
> coefficient of the quadratic term, etc.
> coefficient of the quadratic term, etc, where the polynomial is of the form
>
> ax^0 + bx^1 + cx^2
>
> (when writing polynomials mathematically, the x^0, is often omitted since this equals 1)
>
> ~~~
> x = 5
Expand Down
11 changes: 6 additions & 5 deletions _episodes/06-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ for filename in filenames:
{: .language-python}

~~~
waves-00.csv
waves_00.csv
~~~
{: .output}

Expand All @@ -95,15 +95,15 @@ maximum and minimum waveheight in the 2000s.](
../fig/waves_loop_1.svg)

~~~
waves-10s.csv
waves_10s.csv
~~~
{: .output}

![Output from the second iteration of the for loop. Three line graphs showing the average,
maximum and minimum waveheight in the 2010s.](../fig/waves_loop_2.svg)

~~~
waves-80s.csv
waves_80s.csv
~~~
{: .output}

Expand Down Expand Up @@ -168,13 +168,14 @@ Let's load `waves_90s.csv`:
~~~
data = numpy.loadtxt(fname = "waves_90s.csv", delimiter=',')
data = numpy.reshape(data[:,2], [10,12])
~~~
{: .language-python}
If we try and take the mean for the entire year, we'll see that there must be NaNs:
~~~
numpy.mean(data[:,2])
numpy.mean(data)
~~~
{: .language-python}
Expand All @@ -183,7 +184,7 @@ nan
~~~
{: .output}
If we had only plotted the reshaped data, we would see white squares where there are NaNs in the data:
If we plot the reshaped data, we would see white squares where there are NaNs in the data:
~~~
number_of_rows = data.shape[0]
Expand Down
5 changes: 4 additions & 1 deletion _episodes/07-cond.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,10 +153,13 @@ Now that we've seen how conditionals work,
we can use them to look for thresholds in our wave data.
We are about to use functions provided by the `numpy` module again.
Therefore, if you're working in a new Python session, make sure to load the
module with:
module, and load and reshape one of the data files:

~~~
import numpy
data = numpy.loadtxt(fname = "waves_80s.csv", delimiter=",")
data = numpy.reshape(data[:,2], [10,12])
~~~
{: .language-python}

Expand Down
49 changes: 27 additions & 22 deletions _episodes/08-func.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ keypoints:
---

At this point,
we've written code to draw some graphs of our wave height data,
we've written code to draw some interesting features in our wave data,
loop over all our data files to quickly draw these plots for each of them,
and have Python make decisions based on what it sees in our data.
But, our code is getting pretty long and complicated;
Expand Down Expand Up @@ -204,14 +204,16 @@ temperature in Kelvin was: 373.15
## Tidying up

Now that we know how to wrap bits of code up in functions,
we can make our analysis easier to read and easier to reuse.
we can make our wave data easier to read and easier to reuse.
First, let's make a `visualize` function that generates our plots:

~~~
def visualize(filename):
data = numpy.loadtxt(fname=filename, delimiter=',')
number_of_rows = data.shape[0] # total number of months
number_of_years = number_of_rows // 12 # total number of years = number of months / number of months per year
# need to reshape the data for plotting
reshaped_data = numpy.reshape(data[:,2], [number_of_years,12])
fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
Expand All @@ -234,7 +236,7 @@ def visualize(filename):
~~~
{: .language-python}

and another function called `detect_problems` that checks for those systematics
<!-- and another function called `detect_problems` that checks for those systematics
we noticed:
~~~
Expand All @@ -251,26 +253,24 @@ def detect_problems(filename):
else:
print('Seems OK!')
~~~
{: .language-python}
{: .language-python} -->

Wait! Didn't we forget to specify what both of these functions should return? Well, we didn't.
Wait! Didn't we forget to specify what this function should return? Well, we didn't.
In Python, functions are not required to include a `return` statement and can be used for
the sole purpose of grouping together pieces of code that conceptually do one thing. In such cases,
function names usually describe what they do, _e.g._ `visualize`, `detect_problems`.
function names usually describe what they do, _e.g._ `visualize`.

Notice that rather than jumbling this code together in one giant `for` loop,
we can now read and reuse both ideas separately.
we can now read and reuse the code.
We can reproduce the previous analysis with a much simpler `for` loop:

~~~
import glob
filenames = sorted(glob.glob('waves_*.csv'))
for filename in filenames[1:3]:
print(filename)
visualize(filename)
detect_problems(filename)
~~~
{: .language-python}

Expand Down Expand Up @@ -316,8 +316,7 @@ That looks right,
so let's try `offset_mean` on our real data:

~~~
data = numpy.loadtxt(fname='wavesmonthly.csv', delimiter=',')
reshaped_data = numpy.reshape(data[:,2], [37,12])
data = numpy.loadtxt(fname='reshaped_data.csv', delimiter=',')
print(offset_mean(reshaped_data, 0))
~~~
{: .language-python}
Expand Down Expand Up @@ -353,8 +352,8 @@ min, mean, and max of offset data are: -1.8875630630630629 1.960393809248024e-16
{: .output}

That seems almost right:
the original mean was about 6.1,
so the lower bound from zero is now about -6.1.
the original mean was about 1.5,
so the lower bound from zero is now about -1.9.
The mean of the offset data isn't quite zero --- we'll explore why not in the challenges --- but
it's pretty close.
We can even go further and check that the standard deviation hasn't changed:
Expand Down Expand Up @@ -475,25 +474,31 @@ In fact,
we can pass the filename to `loadtxt` without the `fname=`:

~~~
numpy.loadtxt('wavesmonthly.csv', delimiter=',')
numpy.loadtxt('reshaped_data.csv', delimiter=' ')
~~~
{: .language-python}

~~~
array([[ 0., 0., 1., ..., 3., 0., 0.],
[ 0., 1., 2., ..., 1., 0., 1.],
[ 0., 1., 1., ..., 2., 1., 1.],
array([[3.788, 3.768, 4.774, 2.818, 2.734, 2.086, 2.066, 2.236, 3.322,
3.512, 4.348, 4.628],
[3.666, 4.326, 3.522, 3.18 , 1.954, 1.72 , 1.86 , 1.95 , 3.11 ,
3.78 , 3.474, 5.28 ],
[5.068, 4.954, 3.77 , 2.402, 2.166, 2.084, 2.246, 2.228, 2.634,
4.41 , 4.342, 3.28 ],
...,
[ 0., 1., 1., ..., 1., 1., 1.],
[ 0., 0., 0., ..., 0., 2., 0.],
[ 0., 0., 1., ..., 1., 1., 0.]])
[4.27 , 4.09 , 3.696, 3.302, 2.502, 1.772, 2.016, 2.172, 3.034,
3.462, 3.856, 5.76 ],
[4.294, 3.794, 4.646, 3.212, 2.226, 1.558, 1.894, 2.092, 2.58 ,
3.87 , 3.108, 6.044],
[6.488, 5.954, 4.26 , 5.838, 4.882, 3.678, 3.308, 2.786, 2.71 ,
3.046, 4.622, 5.048]])
~~~
{: .output}

but we still need to say `delimiter=`:

~~~
numpy.loadtxt('wavesmonthly.csv', ',')
numpy.loadtxt('reshaped_data.csv', ' ')
~~~
{: .language-python}

Expand Down Expand Up @@ -655,7 +660,7 @@ and eight others that do.
If we call the function like this:

~~~
numpy.loadtxt('wavesmonthly.csv', ',')
numpy.loadtxt('reshaped_data.csv', ',')
~~~
{: .language-python}

Expand Down

0 comments on commit 337d6ba

Please sign in to comment.