-
Notifications
You must be signed in to change notification settings - Fork 0
/
Anurag Chatterjee_&_Aryan Saraswat_CS3237_Lab 5.py
518 lines (345 loc) · 20.5 KB
/
Anurag Chatterjee_&_Aryan Saraswat_CS3237_Lab 5.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
# -*- coding: utf-8 -*-
"""CS3237 Lab 5.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1Zf8EiyoNDqN3J4ARgNV8S8mrzxD7IoX_
# CS3237 Lab 5 Introduction to Deep Learning
|Student Number |Name |
|:--------------|:---------------------|
| | Aryan Sarswat |
| | Anurag S. Chatterjee |
## 1. Introduction
We will achieve the following objectives in this lab:
1. An understanding of the practical limitations of using dense networks in complex tasks
2. Hands-on experience in building a deep learning neural network to solve a relatively complex task.
Each step may take a long time to run. You and your partner may want to work out how to do things simultaneously, but please do not miss out on any learning opportunities.
## 2. Submission Instructions
Please work together as a team of 2 to complete this lab. You will need to submit ONE copy of this notebook per team, but please fill in the names of both team members above. This lab is worth 55 marks.
**DO NOT SUBMIT MORE THAN ONE COPY OF THIS LAB!**
## 3. Creating a Dense Network for CIFAR-10
We will now begin building a neural network for the CIFAR-10 dataset. The CIFAR-10 dataset consists of 50,000 32x32x3 (32x32 pixels, RGB channels) training images and 10,000 testing images (also 32x32x3), divided into the following 10 categories:
1. Airplane
2. Automobile
3. Bird
4. Cat
5. Deer
6. Dog
7. Frog
8. Horse
9. Ship
10. Truck
In the first two parts of this lab we will create a classifier for the CIFAR-10 dataset.
### 3.1 Loading the Dataset
We begin firstly by creating a Dense neural network for CIFAR-10. The code below shows how we load the CIFAR-10 dataset:
"""
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10
def load_cifar10():
(train_x, train_y), (test_x, test_y) = cifar10.load_data()
train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
train_x = train_x.astype('float32')
test_x = test_x.astype('float32')
train_x /= 255.0
test_x /= 255.0
ret_train_y = to_categorical(train_y,10)
ret_test_y = to_categorical(test_y, 10)
return (train_x, ret_train_y), (test_x, ret_test_y)
(train_x, train_y), (test_x, test_y) = load_cifar10()
"""----
#### Question 1
Explain what the following two statements do, and where the number "3072" came from (2 MARKS):
```
train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
```
***3072 is 32x32x3 which is the number of pixels each image has, since we are going to use a MLP we need the features to be a single dimensional vector, hence the reshape.***
*FOR GRADER: _______ / 2*
### 3.2 Building the MLP Classifier
In the code box below, create a new fully connected (dense) multilayer perceptron classifier for the CIFAR-10 dataset. To begin with, create a network with one hidden layer of 1024 neurons, using the SGD optimizer. You should output the training and validation accuracy at every epoch, and train for 50 epochs:
"""
"""
Write your code to build an MLP with one hidden layer of 1024 neurons,
with an SGD optimizer. Train for 50 epochs, and output the training and
validation accuracy at each epoch.
"""
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
# Create the neural network
nn = Sequential()
nn.add(Dense(1024, input_shape = (3072, ), activation = 'relu'))
nn.add(Dense(10, activation = 'softmax'))
# Create our optimizer
sgd = SGD(learning_rate=0.1, momentum=0.01, nesterov=False)
# 'Compile' the network to associate it with a loss function,
# an optimizer, and what metrics we want to track
nn.compile(loss='categorical_crossentropy', optimizer=sgd,
metrics = 'accuracy')
nn.fit(train_x, train_y, shuffle = True, epochs = 50,
validation_data = (test_x, test_y))
nn.evaluate(test_x, test_y)
"""#### Question 2
Complete the following table on the design choices for your MLP
(3 MARKS):
| Hyperparameter | What I used | Why? |
|:---------------------|:------------|:----------------------|
| Optimizer | SGD | Specified in question |
| # of hidden layers | 1 | Specified in question |
| # of hidden neurons | 1024 | Specified in question |
| Hid layer activation | relu | Adds non linearity to the model allowing it to model more complex data |
| # of output neurons | 10 | There are 10 categories |
| Output activation | softmax | To get a probability for the each class we are trying to predict |
| learning rate | 0.1 | It is not too high such that it would cause unstable updates |
| momentum | 0.01 | To stabalise gradient updates |
| decay | None | As the learning rate is not too high we dont need to decay it, and from experiments this learning rate seems to be stable |
| loss | categorical cross entropy | this is the loss used for multi-label classifications |
*FOR GRADER:*<br>
*Table: ___ / 3* <br>
*Code: ___ / 5* <br>
**TOTAL: ____ / 8** <br>
#### Question 3:
What was your final training accuracy? Validation accuracy? Is there overfitting / underfitting? Explain your answer (5 MARKS)
***The final training accuracy is : 67.72% and the validation accuracy is: 46.69%. There is evidence of overfitting as the training accuracy is alot higher than the validation accuracy, which means the model has memorised the training the data. Furthermore, the validation accuracy was fluctuating alot during training***
*FOR GRADER: ______ / 5*
### 3.3 Experimenting with the MLP
Cut and paste your code from Section 3.2 to the box below (you may need to rename your MLP). Experiment with the number of hidden layers, the number of neurons in each hidden layer, the optimization algorithm, etc. See [Keras Optimizers](https://keras.io/optimizers) for the types of optimizers and their parameters. **Train for 100 epochs.**
"""
"""
Cut and paste your code from Section 3.2 below, then modify it to get
much better results than what you had earlier. E.g. increase the number of
nodes in the hidden layer, increase the number of hidden layers,
change the optimizer, etc.
Train for 100 epochs.
"""
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
# Create the neural network
nn = Sequential()
nn.add(Dense(512, input_shape = (3072, ), activation='relu'))
nn.add(Dense(1024, activation = 'relu'))
nn.add(Dense(10, activation = 'softmax'))
sgd = SGD(learning_rate=0.01, momentum=0.1)
# 'Compile' the network to associate it with a loss function,
# an optimizer, and what metrics we want to track
nn.compile(loss='categorical_crossentropy', optimizer=sgd,
metrics = 'accuracy')
nn.fit(train_x, train_y, shuffle = True, epochs = 100,
validation_data = (test_x, test_y))
nn.evaluate(test_x, test_y)
"""----
#### Question 4:
Complete the following table with your final design (you may add more rows for the # neurons (layer1) etc. to detail how many neurons you have in each hidden layer). Likewise you may replace the lr, momentum etc rows with parameters more appropriate to the optimizer that you have chosen. (3 MARKS)
| Hyperparameter | What I used | Why? |
|:---------------------|:------------|:----------------------|
| Optimizer | SGD | Other optimizers led to worses training and similar validation performances |
| # of hidden layers | 2 | Increase the complexity of the model |
| # neurons(layer1) | 512 | To incrementally increase the number of neurons in each layer |
| Hid layer1 activation| relu | Add in non-linearity |
| # neurons(layer2) | 1024 | Increase the number of neurons so it can model more complex behavior |
| Hid layer2 activation| relu | Add in non-linearity |
| # of output neurons | 10 | Number of categories |
| Output activation | softmax | To get a probability for the each class we are trying to predict |
| learning rate | 0.01 | Lower than earlier as we have more epochs, thus we can lower the learing rate to stabalise trainig further |
| momentum | 0.1 | To stabalise gradient updates
| loss | categorical cross entropy | this is the loss used for multi-label classifications
*FOR GRADER:* <br>
*TABLE: _____ / 3* <br>
*CODE: ______ / 5*<br>
***TOTAL: ______ / 8***
#### Question 5
What is the final training and validation accuracy that you obtained after 100 epochs. Is there considerable improvement over Section 3.2? Are there still signs of underfitting or overfitting? Explain your answer (5 MARKS)
***Final training accuracy: 98.81% Final validation accuracy: 56.99. There is still a huge sign of overfitting as the training accuracy is alot higher than the validation accuracy, and there is fluctuations in the validation accuracy. However there is about a 8% increase in validation accuracy, and the increase in training accuracy is large however it has severely overfitted to the training dataset.***
*FOR GRADER: ______ / 5*
#### Question 6
Write a short reflection on the practical difficulties of using a dense MLP to classsify images in the CIFAR-10 datasets. (3 MARKS)
***Dense MLP are not able to correlate pixels with eachother in a neighbouring region thus it is not able to discern patterns as easily as Convolutional neural networks***
*FOR GRADER: _______ /3*
----
## 4. Creating a CNN for the MNIST Dataset
In this section we will now create a convolutional neural network (CNN) to classify images in the MNIST dataset that we used in the previous lab. Let's go through each part to see how to do this.
### 4.1 Loading the MNIST Dataset
As always we will load the MNIST dataset, scale the inputs to between 0 and 1, and convert the Y labels to one-hot vectors. However unlike before we will not flatten the 28x28 image to a 784 element vector, since CNNs can inherently handle 2D data.
"""
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
def load_mnist():
(train_x, train_y),(test_x, test_y) = mnist.load_data()
train_x = train_x.reshape(train_x.shape[0], 28, 28, 1)
test_x = test_x.reshape(test_x.shape[0], 28, 28, 1)
train_x=train_x.astype('float32')
test_x = test_x.astype('float32')
train_x /= 255.0
test_x /= 255.0
train_y = to_categorical(train_y, 10)
test_y = to_categorical(test_y, 10)
return (train_x, train_y), (test_x, test_y)
"""### 4.2 Building the CNN
We will now build the CNN. Unlike before we will create a function to produce the CNN. We will also look at how to save and load Keras models using "checkpoints", particularly "ModelCheckpoint" that saves the model each epoch.
Let's begin by creating the model. We call os.path.exists to see if a model file exists, and call "load_model" if it does. Otherwise we create a new model.
"""
# load_model loads a model from a hd5 file.
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
import os
MODEL_NAME = 'mnist-cnn.hd5'
def buildmodel(model_name):
if os.path.exists(model_name):
model = load_model(model_name)
else:
model = Sequential()
model.add(Conv2D(32, kernel_size=(5,5),
activation='relu',
input_shape=(28, 28, 1), padding='same')) # Question 7
model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
model.add(Conv2D(128, kernel_size=(5,5), activation='relu'))
model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=2))
model.add(Flatten()) # Question 9
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(10, activation='softmax'))
return model
"""----
#### Question 7
The first layer in our CNN is a 2D convolution kernel, shown here:
```
model.add(Conv2D(32, kernel_size=(5,5),
activation='relu',
input_shape=(28, 28, 1), padding='same')) # Question 7
```
Why is the input_shape set to (28, 28, 1)? What does this mean? What does "padding = 'same'" mean? (4 MARKS)
***This is because the size of the input image is 28x28 with only 1 channels. Padding=same means the output of the convolution should be the same size as the input shape***
*FOR GRADER: ______ / 4*
#### Question 8
The second layer is the MaxPooling2D layer shown below:
```
model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
```
What other types of pooling layers are available? What does 'strides = 2' mean? (3 MARKS)
***Another type of layer is AveragePooling. Stride = 2 means that the window for taking the maxpool skips 2 pixels each time it moves across***
*FOR GRADER: _____ / 3*
#### Question 9
What does the "Flatten" layer here do? Why is it needed?
```
model.add(Flatten()) # Question 9
```
***The flatten layer converts the tensor from (n,row,height,channels) to one dimensional vector (n, row x height x channels). This can then be processed by a MLP which is needed for classification***
*FOR GRADER: ____ / 2*
----
### 4.3 Training the CNN
Let's now train the CNN. In this example we introduce the idea of a "callback", which is a routine that Keras calls at the end of each epoch. Specifically we look at two callbacks:
1. ModelCheckpoint: When called, Keras saves the model to the specified filename.
2. EarlyStopping: When called, Keras checks if it should stop the training prematurely.
Let's look at the code to see how training is done, and how callbacks are used.
"""
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
def train(model, train_x, train_y, epochs, test_x, test_y, model_name):
model.compile(optimizer=SGD(lr=0.01, momentum=0.7),
loss='categorical_crossentropy', metrics=['accuracy'])
savemodel = ModelCheckpoint(model_name)
stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10
print("Starting training.")
model.fit(x=train_x, y=train_y, batch_size=32,
validation_data=(test_x, test_y), shuffle=True,
epochs=epochs,
callbacks=[savemodel, stopmodel])
print("Done. Now evaluating.")
loss, acc = model.evaluate(x=test_x, y=test_y)
print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))
"""Notice that there isn't very much that is unusual going on; we compile the model with our loss function and optimizer, then call fit, and finally evaluate to look at the final accuracy for the test set. The only thing unusual is the "callbacks" parameter here in the fit function call
```
model.fit(x=train_x, y=train_y, batch_size=32,
validation_data=(test_x, test_y), shuffle=True,
epochs=epochs,
callbacks=[savemodel, stopmodel])
```
----
#### Question 10.
What do the min_delta and patience parameters do in the EarlyStopping callback, as shown below? (2 MARKS)
```
stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10
```
***Min delta is the minimum difference in loss after which the model should stop. Patience is the number of epochs it should wait till it terminates training. In this particular case it waits for 10 epochs where the difference in loss is less than 0.001 and prematurely stops training.***
*FOR GRADER: ______ / 2*
---
### 4.4 Putting it together.
Now let's run the code and see how it goes (Note: To save time we are training for only 5 epochs; we should train much longer to get much better results):
"""
(train_x, train_y),(test_x, test_y) = load_mnist()
model = buildmodel(MODEL_NAME)
train(model, train_x, train_y, 5, test_x, test_y, MODEL_NAME)
"""----
#### Question 11.
Compare the relative advantages and disadvantages of CNN vs. the Dense MLP that you build in sections 3.2 and 3.3. What makes CNNs better (or worse)? (3 MARKS)
***CNN are able to extract spatial relationships between a group of pixels as it uses filters, however MLP are not so good at these tasks. Thus CNN can learn to extract features such as shapes, edges and more which help it deciding what an image is made of. This is especially important as it is very difficult to define what an object it (ex. a plane can be of many different sizes and shape) thus it is important to be able to come up with general representation which allow us to classify objects***
*FOR GRADER: ______ / 3*
## 5. Creating a CNN for the CIFAR-10 Dataset
Now comes the fun part: Using the example above for creating a CNN for the MNIST dataset, now create a CNN in the box below for the MNIST-10 dataset. At the end of each epoch save the model to a file called "cifar.hd5" (note: the .hd5 is added automatically for you).
---
#### Question 12.
Summarize your design in the table below (the actual coding cell comes after this):
| Hyperparameter | What I used | Why? |
|:---------------------|:------------|:----------------------|
| Optimizer | SGD | Worked the best amongst the various optimizers tried |
| Input shape | 32,32,3 | RGB Channels |
| First layer | Conv2D | To extract spatial features |
| Second layer | MaxPooling | To aggregate these features into higher level features |
| Add more layers | More Conv2D | To extract finer and different kind of features |
| Dense layer | 1024 | To take the conv2d features and relate each features (in each channel) with eachother |
| Dense layer | 10 | as there are 10 categories |
*FOR GRADER:* <br>
*TABLE: ________ / 3* <br>
*CODE: _________/ 7* <br>
**TOTAL: _______ / 10** <br>
---
***TOTAL: _______ / 55***
"""
"""
Write your code for your CNN for the CIFAR-10 dataset here.
Note: train_x, train_y, test_x, test_y were changed when we called
load_mnist in the previous section. You will now need to call load_cifar10
again.
"""
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
def load_cifar10():
(train_x, train_y), (test_x, test_y) = cifar10.load_data()
train_x = train_x.reshape(train_x.shape[0], 32, 32, 3)
test_x = test_x.reshape(test_x.shape[0], 32, 32, 3)
train_x = train_x.astype('float32')
test_x = test_x.astype('float32')
train_x /= 255.0
test_x /= 255.0
ret_train_y = to_categorical(train_y,10)
ret_test_y = to_categorical(test_y, 10)
return (train_x, ret_train_y), (test_x, ret_test_y)
(train_x, train_y), (test_x, test_y) = load_cifar10()
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3),
activation='relu',
input_shape=(32, 32, 3), padding='same'))
model.add(MaxPooling2D(pool_size=(2,2), strides=2))
model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=2))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(10, activation='softmax'))
savemodel = ModelCheckpoint("cifar")
stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10
model.compile(optimizer=SGD(lr=0.01, momentum=0.7),
loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x=train_x, y=train_y, batch_size=32,
validation_data=(test_x, test_y), shuffle=True,
epochs=50, callbacks= [savemodel, stopmodel])
print("Done. Now evaluating.")
loss, acc = model.evaluate(x=test_x, y=test_y)
print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))